Quick wordclouds from PubMed abstracts – using PMID lists in R

Wordclouds are one of the most visually straightforward, compelling ways of displaying text info in a graph.

Of course, we have a lot of web pages (and even apps) that, given an input text, will plot you some nice tagclouds. However, when you need reproducible results, or getting done complex tasks -like combined wordclouds from several files-, a programming environment may be the best option.

In R, there are (as always), several alternatives to get this done, such as tagcloud and wordcloud.

For this script I used the following packages:

  • RCurl” to retrieve a PMID list, stored in my GitHub account as a .csv file.
  • RefManageR” and plyr to retrieve and arrange PM records. To fetch the info from the inets, we’ll be using the PubMed API (free version, with some limitations). 
  • Finally, tm, SnowballC” to prepare the data and wordcloud” to plot the wordcloud. This part of the script is based on this from Georeferenced.

One of the advantages of using RefManageR is that you can easily change the field which you are importing from, and it usually works flawlessly with the PubMed API.

My biggest problem sources when running this script: download caps, busy hours, and firewalls!.

At the beginning of the gist, there is also a handy function that automagically downloads all needed packages for you.

To source the script, simply type in the R console:

This script creates two directories in your working directory: ‘corpus1‘ for the abstracts file, and ‘wordcloud‘ to store the plot.


And there is the code:



R/Shiny for clinical trials: simple randomization tables

One of the things I most like from R + Shiny is that it enables me to serve the power and flexibility of R in small “chunks” to cover different needs, allowing people not used to R to benefit from it. However, what I like most is that’s really fun and easy to program those utilities for a person without any specific programming background.

Here’s a small hack done in R/Shiny: it covered an urgent need for a study involving patient randomisation to two branches of treatment, in what is commonly known as a clinical trial. This task posed some challenges:

  • First, this trial was not financed in any way (at least initially). It was a small, independent study comparing two approved techniques for chronic pain, so the sponsor had to avoid expensive software or services.
  • Another reason for software customization is that treatment groups were partially ‘blind’: for people who assessed effectiveness and… also for statistical analysis (treatment administration was open-label). This means that the person in charge of data analysis must know which group is assigned to a patient, but doesn’t know what treatment is assigned to either group.

To tackle the points above, my app should have two main features:

  • The sponsor (here, a medical doctor) must be able to effectively control study blindness and also provide emergency blind disclosure. This control should extend to data analysis to minimize bias favoring either treatment.
  • R has tools to create random samples, but the MD in charge of the study sponsoring doesn’t know how to use R. We needed a friendly interface for random table creation.

Here’s how I got it to work:

  • The very core of this Shiny app is a combination between the set.seed and sample R functions. The PIN number (the set.seed argument) works like a secret passcode that links to a given random table. E.g., every time I enter ‘5432’, the random tables will look the same. This protects from accidental blindness disclosure, as nobody can find the correct random table without the proper PIN, even if they can access the app’s source code.
  • The tables are created column by column, ordered at first. Then we proceed to randomize (via the sample function) both the treatment column (in the random table) and the Group column (in the PIN table).
  • Once the tables are created they can be downloaded as .CSV files, printed, signed and dated to document the randomization procedure. The app’s open source code and the PIN number will provide reproducibility to the procedure for many years.

Unfortunately I wasn’t able to insert iframes to embed the app, so I posted a screenshot:

Random table generator for clinical trials

The app is far from perfect, but it covers the basic needs for the trial. You can test it here:


And the GitHub repo is available here. Feel free to use/ adapt/ fork it to your needs!


Also, you can cite it if it’s been useful for your study!






Happening just now… 6th Conference of the R Spanish User Community

The R-Spain Conferences have been taking place since 2009 as an expression of the growing interest that R elicits in many fileds. The organisers are the Comunidad R Hispano (R-es). The community supports many groups and initiatives aimed to develop R knowledge and widen its use.

To attend the talks by streaming (they are in Spanish) you must registrate.

There is also a scientific programme with the presentations (some in English) here.

Ro Conferences 2014

Install R in Android, via GNURoot -no root required!

Playing with my tablet some time ago, I wondered if installing R could be possible. You know, a small android device “to the power of R”…

After searching on Google from time to time, I came across some interesting possibilities:

  • R Instructor, created “to bridge the gap between authoritative (but expensive) reference textbooks and free but often technical and difficult to understand help files“.
  • R Console Free. provides the necessary C, C++ and Fortran compilers to build and install R packages.
  • There’s always possible to root your device and install a Linux distribution for Android, which will let you install any repository/package, just like in any linux console.
  • Running R from your dedicated R server or from an external one (see R-fiddle), using your own browser. I see this option as particularly useful for those who want maximum performance.
  • Some additional thoughts on this topic are also stored in these Stack Overflow pages.
  • Without needing to root my device, I found GNURoot, an app that “provides a method for you to install and use GNU/Linux distributions and their associated applications/packages alongside Android“.

Finally, my preferred solution came with GNURoot (see this tutorial), and here’s how I managed to install the newest CRAN repositories! (NOTE: It should work “out of the box” but, as problems might appear, some experience with Linux is always advisable).

1. Install the .apk of GNURoot in your Android device. Don’t forget to donate if you like it! 🙂

2. Following the app instructions, download and install a linux distribution to run. In my case, I chose the .apk GNURoot Wheezy (a Debian Wheezy distro without Xterms). EDIT: Just be sure of having enough memory for it in your device

3. Once installed, just follow the steps to launch the Rootfs (Wheezy) as Fake Root. You will see a bash prompt, from which you can access a complete linux directory tree. This is the same as if you were in a computer (however, if you aren’t root you won’t be able to access the directories via your file browser from Android)


4. Now, we just have to update and upgrade:

apt-get update
apt-get upgrade

5. Then, update the sources.list file. We don’t have any graphical text editor (like gedit or kate)… but we have nano!:

nano /etc/apt/sources.list


Using the volume up + “W/S/A/D” you can move between the lines. Or alternatively, you can install a convenient keyboard with arrow buttons, like Hacker’s Keyboard! (thanks to JTT!)

Following instructions from CRAN, I added the following line to sources.list:

deb http://<favorite-cran-mirror>/bin/linux/debian wheezy-cran3/

Exit saving changes. But before “update and upgrade” again, don’t forget to add the key for the repository running the following:

apt-key adv --keyserver keys.gnupg.net --recv-key 381BA480

5. Update and upgrade…. voilà!

apt-get update
apt-get upgrade
apt-get install r-base r-base-dev

6. Now, you only have to run R just like in any bash console:



With this method you only have a prompt, without any graphical interface. ¿How do I make and see plots here?. If R runs from “inside” Android one option is to connect your Linux to an X-server app (thanks, J. Liebig). However, due to memory issues, I couldn’t put in practice this idea and see what happens. Try at your own risk! 🙂

Fortunately, there’s always possible to print R graphs in various formats, with the inconvenient that you have to browse to the plot’s location in Android -every time you need to check the output.


Here I leave a small script to begin playing with R on Android. Hope you enjoy it!

…start using R, from scratch!

Some time ago, since I was able to use R by myself, have found some fellows and other people who wanted to learn R as well. Then I pointed them to help pages, to CRAN repositories… but in some cases they said that didn’t know how to start using those resources. Obviously, the main self-perceived limitation for non-programmers is the use of “commands” -ok, many of the 80’s kids will remember the use of some command lines to access games such as PacMan, Frogger… :).

At the same time, they also wanted to refresh some basic statistics, acquiring a general knowledge of their data before asking for a statistician’s help. An idea to quickly help them was to make some scripts to guide them through basic commands, seeing results on real-time, and being able to recycle them for their own data.

If you have just started using R, maybe they can be useful for you. However, I will recommend that you use some open “plain text” file(s) to paste your favorite commands and clone/modify them to suit your needs. Remember to store the files where you can access them later!

  • Tip: you can change the extension of your mytext.txt file into mytext.R file, telling Windows to open it with the Notepad again. It will be also a plain text document, but some text editors will recognize it as an “R script” and will highlight the content according to that.
  • Apart from the Notepad in Windows, you also have a bunch of other text/code editors which are more pleasant to use. See for example R-studio and Notepad ++.

Copy the Gists below into your own text files, and begin playing with R!

Power and sample size calculator for mitochondrial DNA association studies (Shiny)

The functions detailed inside the piece of code below (in a Gist) has been useful for me when I had to calculate many possible scenarios of statistical power and sample size. The formulae were taken from the article of Samuels et al., AJHG 2006, and the script showed even useful for making a variety of comparative plots.

This is intended for estimating power/ sample size in association studies, involving mitochondrial DNA haplogroups (which are categories whose frequencies depend on each other), on a Chi-square test basis. The problem with scripts is that sometimes they aren’t as friendly to many people as GUIs are. To solve this, there are many solutions but, as I don’t have programming background (apart from R), the most straightforward for me was Shiny.

Shiny is a friendly interface which allows for great interactive features (see its Tutorial), and it loads onto the web browser from an open R console, just by clicking:


This Gist, displays a simple graph using two power/number-of-cases values (it was hard for me to show the graph, mostly thanks to Stackoverflow and to MadScone):

shiny:: runGist('5895082')

Where 5895082 is the ID of the Gist. Here is the source:

To work with files inside your computer, just run R from the same directory of the files ui.R and server.R, and execute the Gist with the command:


If this doesn’t work, you can paste the complete path to the ui and server files:

Structure of the human mitochondrial genome.

Structure of the human mitochondrial genome. (Photo credit: Wikipedia)

An .EPS to .PDF converter (using LaTeX!)

I am about to go on a short holiday, so I was tidying the code lines I had scattered around before leaving… And I found this: a minimal EPS to PDF converter, which is barely a LaTeX template.

It is intended for transforming an .EPS graph to the .PDF format. You can copy & paste this whole code into a blank text file (but with .TEX extension) and run it with a TeX editor. To install and use LaTeX, here it is a previous post about it.

When you have compiled it, you can search in the same file’s directory for the newly created PDF graph!