Tuhunga Blog - Capturing data from a multipage API

Capturing data from a multipage API

Written by Jason on April 23, 2012

The Internet is becoming increasingly friendly when it comes to data accessibility. More and more sites are providing APIs that make it easier for users to access data.

Easier however, doesn't always mean easy. If you want to analyze data coming from an API, it can be challenging to get it into a useful form. APIs vary from one another, with different input parameters and output formats. However, the biggest problem for many people is that the output often spans multiple pages.

We've shown you how Tuhunga can handle different parameters and formats in our tutorials. In this post, we'll show you how to handle the third challenge - using Tuhunga to capture API output that spans multiple pages in a single import.

Keep reading to see an example in action using the World Bank API.

Let's say we want to retrieve the fertilizer consumption data series (it has a WB code of AG.CON.FERT.MT). We can use the Bank's API generator to create our call. The key question is how we handle multiple pages of data, as the Bank returns a maximum of 100 results per page, and our query will generate more than that.

http://api.worldbank.org/countries/indicators/AG.CON.FERT.MT?per_page=100&page=1&date=1960:2012&format=json

The first thing to do is to test the URL above. Take a look at the first part of the page that gets returned:

[{"page":1,"pages":31,"per_page":"100","total":3009} ...

Even with 100 results per page, there are 31 pages of data. No problem for Tuhunga - the important thing to note is that you know the minimum number of pages of data.

Let's see how to use this URL to get multiple pages in Tuhunga. We'll use the *tuhungapage* parameter. The parameter replaces a single number in the URL through the range of values you specify. It's basic format is as follows:

*tuhungapage>={start point}<={end point}[OPTIONAL ++{increment}]*

Since we want to capture the pages between 1 and 31, we'll use the following *tuhungapage* parameter:

*tuhungapage>=1<=31*

We omitted the increment setting since we want to advance one page at a time. The URL we'll use to capture data is:

http://api.worldbank.org/countries/indicators/AG.CON.FERT.MT?per_page=100&page=*tuhungapage>=1<=31*&date=1960:2012&format=json

We can use this URL as our source; Tuhunga will retrieve the first page of data, you'll configure the import normally, and once you've confirmed your import, Tuhunga will iteratively capture each of the 31 pages.

If you specify more pages than there are data, Tuhunga will stop importing at the first page that does not meet the parameters you've set in the import. Of course, it will make multiple attempts to ensure that the failure wasn't a one-time issue at the source.

Note in our sample URL above, we're using the JSON format, but we can also do multi-page imports with XML.

If you're interested, our reference guide contains more detailed information on the *tuhungapage* parameter.

Tags: examples, features, imports