top of page
Search

Pandas_Datareader and how to use it? (World Bank example)


ree

Data is the bread and butter in our field, without it all of us are nothing. Real-time data or real-life data is something we all seek. Pandas_Datareader is one such package that helps us capture internet data. It is a Python package that allows us to create a new pandas data frame object using various data sources. Mainly pandas_datareader is used for stock market analysis but there are numerous other data sources, which capture different data.

Various sources support different kinds of data, so not all sources implement the same methods and the data elements returned might also differ.

Currently, data sources that are supported are:

  • Tiingo: Tracing platform with price data on equities, mutual funds, and ETFs. An API key is needed

  • IEX: Data on Investor Exchange up to 15 years. (API needed)

  • OECD: Statistics on OECD.

  • World Bank: Thousands of panel data series from the World Bank Development Indicators by using the wb I/O functions.

  • Yahoo Finance (mostly used): Real-time data on stock markets. Also, provide historical data.

  • Econdb: Economic data from 90+ statistical agencies.

  • Eurostat: Statistics about Europe.

  • Naver: Data on Korean Stock markets.

There are other sources as well. Many of them need an API key which one can get after free registration. Web interfaces are constantly evolving and so there is a constant evolution.

We will see how to get the data using DataReader and use it. As we all know very well now how to download the new package. So to use this package we need to include it in our arsenal.

!pip install pandas_datareader

I will use the World Bank source to find insights into Forest coverage.

Let’s begin with basic libraries:


import pandas as pd
import matplotlib.pyplot as plt
from pandas_datareader import wb

After doing this, we need to get the indicators of information we would like to have or include in our data. We can use the world bank site to find the indicators or we can do a search function. Let's search for GDP per capita in constant dollars. So we will use a search like this:


matches= wb.search(‘gdp.*capita.*const’)

ree

So you can see the different IDs of datasets related to GDP per capita. One can download the data using these ids. We can use the download function to get the data.


dat = wb.download(indicator=’NY.GDP.PCAP.KD’, country=[‘US’, ‘RUS’, ‘IN’],start=2015, end=2020)

Here we put the indicator and country ISO code with starting and end years. You can find the ISO code of the country on Google. We took the US, Russia, and India into account from the year 2015 to 2020.


ree

So let's explore this data more. We will use groupby function.


ree

We can search for various indicators using search to find their ids. For example, let's look for Forest area %.

wb.search(‘Forest.*%’).iloc[:,:5]

We need to see the top 5 indicators regarding forests. In case you want to see if the indicator you want exists then you can definitely go to the World Bank site and check various indicators for development.


ree

We need the id of the Forest area(% of land area).


dat = wb.download(indicator=[‘NY.GDP.PCAP.KD’, ‘AG.LND.FRST.ZS’], country=’all’, start=2020, end=2020).dropna()
dat.columns = [‘gdp’, ‘Forest’]
dat= dat.sort_values(by=[‘gdp’], ascending=False)

I wanted to check the top 10 GDPs for the year 2020 and their forest cover.


ree

Using seaborn I am going to plot a scatterplot graph that shows the relation between these two data:


ree

We can see that most of the countries have low GDP and forest spread is not very much related to them. On the other hand, higher GDP countries have somewhere around the mean of the forest area coverage. Data somewhat have a normal distribution.


So this was a little showcase of how to use pandas _datareader. Mostly it is used in the real-time stock market using yahoo finance. Though I find playing with World Bank data more fun. So I will like to ask you guys to go ahead and use those data from the world bank to find various insights.

I hope this article helped someone to find their data. This simple tool helps a lot when you are trying to scrape your data from the internet, trying to make different data frames. So go ahead and make a new dataset and please let me know what kinda dataset you came up with.

Share and spread love.

Happy coding!!

 
 
 

Comments


bottom of page