Retrieving OECD data via SDMX

The OECD website offer numerous datasets that we can access directly from R via the SDMX standard. Once we have loaded the data we are free to analyzing and visualizing them. It all starts with the data we are interested in. Innovation economist like to study patent data that reflect the outcome of an inventive process. For instance, we might be interestd in studying patent outputs of OECD countries in the biotech industry. The respective data can be accessed from the following OECD website: https://stats.oecd.org/Index.aspx?DataSetCode=PATS_IPC. There we go to Export and select “SDMC (XML)”. Here we copy the “SDMX DATA URL“. This URL includes not only biotech patents but also nanotech etc. and applicants  as well as invetor data. According to my experience so far the query does not work if we use the full URL that we copied. That is why we have to manally change (reduce) it to get what we are interested in, for instance in the following way to focus on biotech and inventors: "http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/PATS_IPC/EPO_A.INVENTORS.AUS+AUT+BEL+CAN+CHL+CZE+DNK+EST+FIN+FRA+DEU+GRC+HUN+ISL+IRL+ISR+ITA+JPN+KOR+LUX+MEX+NLD+NZL+NOR+POL+PRT+SVK+SVN+ESP+SWE+CHE+TUR+GBR+USA+EU28+WLD+NMEC+DZA+AND+ARG+ARM+BLR+BMU+BIH+BRA+BGR+CYM+CHN+COL+CRI+HRV+CUB+CYP+DJI+ECU+EGY+SLV+GEO+GTM+HKG+IND+IDN+IRN+JAM+JOR+KAZ+KEN+PRK+KWT+LVA+LBN+LIE+LTU+MKD+MYS+MLT+MDA+MCO+MNG+MAR+NGA+PAK+PAN+PER+PHL+PRI+ROU+RUS+SAU+SYC+SGP+ZAF+LKA+TWN+THA+TTO+TUN+UKR+ARE+URY+UZB+VEN+ZWE+FRME+YUG.BIOTECH.PRIORITY/all?startTime=1999&endTime=2014"

Now we can work with this URL in R. Here is an example:


# First we have to load the required libraries

library ("rsdmx")
library ("ggplot2")
library ("dplyr")

# Then we have to identify the url to access the dataset
# by the SDMX standard

url <- "http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/PATS_IPC/EPO_A.INVENTORS.AUS+AUT+BEL+CAN+CHL+CZE+DNK+EST+FIN+FRA+DEU+GRC+HUN+ISL+IRL+ISR+ITA+JPN+KOR+LUX+MEX+NLD+NZL+NOR+POL+PRT+SVK+SVN+ESP+SWE+CHE+TUR+GBR+USA+EU28+WLD+NMEC+DZA+AND+ARG+ARM+BLR+BMU+BIH+BRA+BGR+CYM+CHN+COL+CRI+HRV+CUB+CYP+DJI+ECU+EGY+SLV+GEO+GTM+HKG+IND+IDN+IRN+JAM+JOR+KAZ+KEN+PRK+KWT+LVA+LBN+LIE+LTU+MKD+MYS+MLT+MDA+MCO+MNG+MAR+NGA+PAK+PAN+PER+PHL+PRI+ROU+RUS+SAU+SYC+SGP+ZAF+LKA+TWN+THA+TTO+TUN+UKR+ARE+URY+UZB+VEN+ZWE+FRME+YUG.BIOTECH.PRIORITY/all?startTime=1999&amp;endTime=2014"

# We read the data and transform the RSDMX object into
# a data frame:

dat <- readSDMX(url)
dat.f <- as.data.frame(dat)

# Then we can filter and order the data, for instance
# to visualize the top seven countries which had the
# largest number of patents in the year 2010

sub <- filter(dat.f,obsTime == "2010" )
sub.top <- sub [order (-sub$obsValue),] [1:7,3]
dat.f.top <- filter(dat.f,LOCATION %in% sub.top)
# Finally, we plot the data into a line graph

ggplot(data=dat.f.top, aes(x=obsTime, y=obsValue, group=LOCATION, colour=LOCATION)) +
 geom_line()+geom_point()+expand_limits(y=0)+
 xlab("Years") + ylab("Number of patents") +
 ggtitle("Biotech patents") +
 theme_bw() +
 theme(legend.position=c(.7, .4))

Hinterlasse eine Antwort

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind markiert *

Du kannst folgende HTML-Tags benutzen: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>