Mining PLOS and PubMed data


This post was inspired from Jon Tennant’s post on his blog (here). He was talking about the number of papers on paleontology field published in PLOSone. His post was mainly based on his code using `rplos` package (Github repo/CRAN repo) from `ropensci` community. Jon’s post was kind of fire up my R life again, especially in the field of text mining. So in this post, I will connect this original post with my research about analyzing biodiversity of Cikapundung riverbank area (on Figshare).

How I did it

This post is going to tap paper data from PLOS and PubMed Central (PMC) using R, RStudio, rplos package, and rplotly package. So how it goes:
  • First you need to install R then R Studio (not the other way around),
  • Fire up R Studio,
  • Install and load the library rplos (Github/CRAN) and rplotly (Github/CRAN),
  • PLOS data: tap PLOS data directly and plot the result,
  • PMC data: they don’t have any package for direct tapping (yet), but here’s the workaround explained from several sources. You have to go Flink system to do some queries based on key words. Be sure to choose you database by pulling down the menu,
  • Export the query result,
  • Then do some analysis and plotting from R Studio.

The Flink screenshots

Opening page and database menu


The query window, you can search by PMID or keywords


The search result and save it as csv


Then I write some code following Jon’s code from his blog and add it with some my own (on Github containing: plos_analysis.R, pmc.csv, and pmc2.csv)

And here’s the results

From PLOS data

Using keyword ‘political science’. I don’t know what happened on the spike. Is it US election and Brexit?


Using keyword ‘earth science’. It will be interesting to dig in what did happen on the spike.


Using keyword ‘hydrogeology’. Kind of weird huh. As if it was controlled by wet and dry season. And it was only five, yes five papers at on every spike. Compare that number with the order of hundreds in earth science and political science.


From PMC data

Using keyword ‘hydrogeology’


Using keyword ‘groundwater – river water interaction’




Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s