News API: Extracting News Headlines and Articles

News plays an essential role in our daily life. Whether you want to create your own news website, or carry out a data analysis project, there is often a need to fetch different types of news articles or headlines to aggregate the news from different sources at one place or analyze them. Applications can be many, and fortunately, there is a way to retrieve the news articles from the web, from different sources and the same time.

In this tutorial you will learn how to extract news headlines and articles using the News API and save them to a CSV file.

Prerequisites

Get API key

Firstly, get the API key by registering on the News API web page. Be sure to register as an individual. This will grant the free usage of the API. Click here and fill out the form. The API key should be in your inbox shorty.

Install Libraries

Use the Requests Python library to interact with the APIs. To install Requests, fire this command from the Terminal/Command Prompt.

 

Apart from Requests, use the Pandas library to save the articles in a CSV file. To install Pandas type this command on the Terminal/Command Prompt.

News API Project

Import Libraries

Be sure to import the following libraries in your code:

API Key

Note that in order to interact with the API, it is mandatory to provide the API key.

 There are two ways to do that – 

1. Provide the API key in the request URL (e.g. https://newsapi.org/v2/everything?q=amazon&apiKey=YOUR_API_KEY)

2. Provide the API key as a header while making the request (we will use this approach here).

Headers

It is not a good practice to include any authentication related information in the request URL. Include it in the headers of the request. Create a dictionary to hold the header parameters.

API Endpoints

After setting the header information, create variables to hold the API endpoints. Something like this –

Payloads

Next step is to create different payloads that need to be sent to the API. Payload is nothing but some information that needs to be sent to the API. The payload can include information such as news category, country, language, etc. This payload forms a part of the request. Create dictionaries to hold the payload information –

In  everything_payload , the parameter ‘q’ stands for the keyword search value. The news articles matching the keyword search are returned as a response. The ‘sortBy’ parameter contains the option that applies to the returned articles. Based on this value, the articles are sorted in the response. More information about the request parameters for the different endpoints can be found here.

Requests

Now make a request to the API.

To get the top headlines:

 

Make the request using the get() method of the requests library. In the ‘url’ parameter, specify the API endpoint that needs to be hit. In the ‘headers’ parameter, mention the name of the dictionary that contains the header information. Pass the payload dictionary to the ‘params’ parameter. Collect the response in a variable. The response contains the status code of the response as well as the response body.

To get the news articles:

The structure of the request remains the same. The ‘headers’ parameter will remain the same throughout. Update the ‘url’ and the ‘params’ parameters. The news articles are returned based on the request parameters.

 

Just as retrieving the sources of the news, you can use one of these sources to obtain news from a particular source only.

This request returns the news sources available to the API. Start off by making any of the requests mentioned earlier.

Response

To print the response on your console –

Note the second parameter of json.dumps() . It specifies the indentation value of the JSON response body. If this value is not specified, the returned JSON response is in a single line which is hard to read. To view the JSON response in a human readable form, provide this parameter. The output looks something like this:

Save Response to CSV

More often than not, the JSON response needs to be saved in a CSV for further processing. Save all the meaningful information out of the JSON response in a CSV. To do so, follow these steps:

Convert the response to a pureJSON string format.

 

Load the JSON response in a Python dictionary for further processing. A JSON object is equivalent to a dictionary in Python.

 

The response contains different objects. Out of these objects, only the articles related information is relevant to us. In the response, a json array called ‘articles’ contains this information . A json array is equivalent to a list in Python. Extract the ‘articles’ array from the response in a variable.

For more information on the response values click here.

Next, convert the ‘articles_list’ to a json string and then convert that json string to a data frame. Write the data frame to a csv. The data frame is data structure that is a part of the Pandas library. Any sort of data can be stored in a data frame.

Next, write the dataframe to a csv.

 

The CSV looks something like this:

 

Complete Project Code

 

Course: REST API: Data Extraction with Python

Working with APIs is a skill requested for many jobs. Why?

APIs is the official way for data extraction and doing other automation stuff allowed by big websites. If there is an API allowing you to extract the data you need from a website, then you do not need regular web scraping.

Join our new course, REST APIs: Data Extraction and Automation with Python, for 90% OFF using this coupon:

https://www.udemy.com/course/rest-api-data-extraction-automation-python/?couponCode=REST-API-BLOG-NEWS

 

 

Rating: 4.3/5. From 27 votes.
Please wait...

One Reply to “News API: Extracting News Headlines and Articles”

  1. Very Nice Article

    Rating: 4.0/5. From 1 vote.
    Please wait...

Leave a Reply