Quick wordclouds from PubMed abstracts – using PMID lists in R

Wordclouds are one of the most visually straightforward, compelling ways of displaying text info in a graph.

Of course, we have a lot of web pages (and even apps) that, given an input text, will plot you some nice tagclouds. However, when you need reproducible results, or getting done complex tasks -like combined wordclouds from several files-, a programming environment may be the best option.

In R, there are (as always), several alternatives to get this done, such as tagcloud and wordcloud.

For this script I used the following packages:

  • RCurl” to retrieve a PMID list, stored in my GitHub account as a .csv file.
  • RefManageR” and plyr to retrieve and arrange PM records. To fetch the info from the inets, we’ll be using the PubMed API (free version, with some limitations). 
  • Finally, tm, SnowballC” to prepare the data and wordcloud” to plot the wordcloud. This part of the script is based on this from Georeferenced.

One of the advantages of using RefManageR is that you can easily change the field which you are importing from, and it usually works flawlessly with the PubMed API.

My biggest problem sources when running this script: download caps, busy hours, and firewalls!.

At the beginning of the gist, there is also a handy function that automagically downloads all needed packages for you.

To source the script, simply type in the R console:

This script creates two directories in your working directory: ‘corpus1‘ for the abstracts file, and ‘wordcloud‘ to store the plot.


And there is the code:



R/Shiny for clinical trials: simple randomization tables

One of the things I most like from R + Shiny is that it enables me to serve the power and flexibility of R in small “chunks” to cover different needs, allowing people not used to R to benefit from it. However, what I like most is that’s really fun and easy to program those utilities for a person without any specific programming background.

Here’s a small hack done in R/Shiny: it covered an urgent need for a study involving patient randomisation to two branches of treatment, in what is commonly known as a clinical trial. This task posed some challenges:

  • First, this trial was not financed in any way (at least initially). It was a small, independent study comparing two approved techniques for chronic pain, so the sponsor had to avoid expensive software or services.
  • Another reason for software customization is that treatment groups were partially ‘blind’: for people who assessed effectiveness and… also for statistical analysis (treatment administration was open-label). This means that the person in charge of data analysis must know which group is assigned to a patient, but doesn’t know what treatment is assigned to either group.

To tackle the points above, my app should have two main features:

  • The sponsor (here, a medical doctor) must be able to effectively control study blindness and also provide emergency blind disclosure. This control should extend to data analysis to minimize bias favoring either treatment.
  • R has tools to create random samples, but the MD in charge of the study sponsoring doesn’t know how to use R. We needed a friendly interface for random table creation.

Here’s how I got it to work:

  • The very core of this Shiny app is a combination between the set.seed and sample R functions. The PIN number (the set.seed argument) works like a secret passcode that links to a given random table. E.g., every time I enter ‘5432’, the random tables will look the same. This protects from accidental blindness disclosure, as nobody can find the correct random table without the proper PIN, even if they can access the app’s source code.
  • The tables are created column by column, ordered at first. Then we proceed to randomize (via the sample function) both the treatment column (in the random table) and the Group column (in the PIN table).
  • Once the tables are created they can be downloaded as .CSV files, printed, signed and dated to document the randomization procedure. The app’s open source code and the PIN number will provide reproducibility to the procedure for many years.

Unfortunately I wasn’t able to insert iframes to embed the app, so I posted a screenshot:

Random table generator for clinical trials

The app is far from perfect, but it covers the basic needs for the trial. You can test it here:


And the GitHub repo is available here. Feel free to use/ adapt/ fork it to your needs!


Also, you can cite it if it’s been useful for your study!






Install R in Android, via GNURoot -no root required!

Playing with my tablet some time ago, I wondered if installing R could be possible. You know, a small android device “to the power of R”…

After searching on Google from time to time, I came across some interesting possibilities:

  • R Instructor, created “to bridge the gap between authoritative (but expensive) reference textbooks and free but often technical and difficult to understand help files“.
  • R Console Free. provides the necessary C, C++ and Fortran compilers to build and install R packages.
  • There’s always possible to root your device and install a Linux distribution for Android, which will let you install any repository/package, just like in any linux console.
  • Running R from your dedicated R server or from an external one (see R-fiddle), using your own browser. I see this option as particularly useful for those who want maximum performance.
  • Some additional thoughts on this topic are also stored in these Stack Overflow pages.
  • Without needing to root my device, I found GNURoot, an app that “provides a method for you to install and use GNU/Linux distributions and their associated applications/packages alongside Android“.

Finally, my preferred solution came with GNURoot (see this tutorial), and here’s how I managed to install the newest CRAN repositories! (NOTE: It should work “out of the box” but, as problems might appear, some experience with Linux is always advisable).

1. Install the .apk of GNURoot in your Android device. Don’t forget to donate if you like it! 🙂

2. Following the app instructions, download and install a linux distribution to run. In my case, I chose the .apk GNURoot Wheezy (a Debian Wheezy distro without Xterms). EDIT: Just be sure of having enough memory for it in your device

3. Once installed, just follow the steps to launch the Rootfs (Wheezy) as Fake Root. You will see a bash prompt, from which you can access a complete linux directory tree. This is the same as if you were in a computer (however, if you aren’t root you won’t be able to access the directories via your file browser from Android)


4. Now, we just have to update and upgrade:

apt-get update
apt-get upgrade

5. Then, update the sources.list file. We don’t have any graphical text editor (like gedit or kate)… but we have nano!:

nano /etc/apt/sources.list


Using the volume up + “W/S/A/D” you can move between the lines. Or alternatively, you can install a convenient keyboard with arrow buttons, like Hacker’s Keyboard! (thanks to JTT!)

Following instructions from CRAN, I added the following line to sources.list:

deb http://<favorite-cran-mirror>/bin/linux/debian wheezy-cran3/

Exit saving changes. But before “update and upgrade” again, don’t forget to add the key for the repository running the following:

apt-key adv --keyserver keys.gnupg.net --recv-key 381BA480

5. Update and upgrade…. voilà!

apt-get update
apt-get upgrade
apt-get install r-base r-base-dev

6. Now, you only have to run R just like in any bash console:



With this method you only have a prompt, without any graphical interface. ¿How do I make and see plots here?. If R runs from “inside” Android one option is to connect your Linux to an X-server app (thanks, J. Liebig). However, due to memory issues, I couldn’t put in practice this idea and see what happens. Try at your own risk! 🙂

Fortunately, there’s always possible to print R graphs in various formats, with the inconvenient that you have to browse to the plot’s location in Android -every time you need to check the output.


Here I leave a small script to begin playing with R on Android. Hope you enjoy it!

Using gdata, for MS Windows users

I use both GNU-Linux and Windows systems on a regular basis… so I’m aware of the advantages (more for GNU-Linux in my case) and disadvantages of both.

Recently I needed to analyse a database from a remote location, an Excel (*xlsx) file.

The problem was that I couldn’t put my gdata library to work… some weird errors about a missing Perl interpreter… just needed to install one. Based on this tutorial in CRAN, I downloaded ActivePerl.

Then, I followed the instructions to install it, leaving the default options. The program sets the PATH to the interpreter, so R can finally find Perl…

Just start then a R session… Done!