An introduction to joint modeling in R

By J Espasandin, O Lado, C Díaz, A Bouzas, I Guler, A Baluja. 

You can also check this post, written in #blogdown, here: intro-joint-modeling-r.

These days, between the 19th and 21st of February, has taken place the learning activity titled “An Introduction to the Joint Modeling of Longitudinal and Survival Data, with Applications in R” organized by the Interdisciplinary Group of Biostatistics (ICBUSC), directed by Professor Carmen Cadarso-Suárez, from the University of Santiago de Compostela.

The international nature of this scientific activity has been marked by the presence of researchers from different European countries such as Germany, Portugal, Holland, Greece or Turkey. It also emphasizes its interdisciplinary nature, with attendees from different fields of research, such as statistics, biology, medicine, ecology or bioinformatics, belonging to different universities, biomedical institutions or the industry.

The training activity has been taught by the professor Dimitris Rizopoulos of the Erasmus University Medical Center in Rotterdam, specialist in joint-modeling techniques. Professor Rizopoulos is the author of a book on joint modeling, as well as numerous publications and two related R packages: JM and JMbayes.

The Joint Modeling techniques presented during the scientific meeting allow for the simultaneous study of longitudinal and time-to-event data. Longitudinal data includes repeated measurements of individuals over time, and time-to event data represent the expected time before an event occurs (like death, an asthma crisis or a transplant). That combination of data frequently arises in the biomedical sciences, where it is common to analyze the evolution of a sick person over time.

This novel statistical tool is especially useful in the field of biomedicine. For instance, in patient follow-up studies after surgery; to design a personalised pattern of medical visits; to carry out predictions of survival based on the evolution of a patient, or updating those predictions in light of new data; identification of useful biomarkers; prediction of patient outcome with different chronic diseases such as diabetes, some types of cancer or cardiovascular disease.

The applicability of these models has been illustrated through the JM and JMBayes R packages (by D Rizopoulos), as well as the packages joineR (by Philipson et al.), and lcmm (by Proust-Lima et al.)

An overview of joint modeling

It basically combines (joins) the probability distributions from a linear mixed-effects model with random effects (which takes care of the longitudinal data) and a survival Cox model (which calculates the hazard ratio for an event from the censored data). The whole model and its parts can be extended in several ways:

  • To find latent population heterogeneity (latent class joint models).
  • Allow for multiple longitudinal markers.
  • Allow for the analysis of multiple failure times. This is the case of competing risks and recurrent events (for instance, when a child develops asthma attacks, to find the risk of recurrence).
  • Time-Dependent accelerated failure time (AFT) Models.
  • Dynamic predictions when new values are added for the longitudinal variable, using Maximum Likelihood Estimates and empirical Bayes estimates.

Also, the JM package has functions for discrimination and callibration, (of a single marker and between models): sensitivity & specificity, time-dependent ROCs and AUC.

Applications for joint modeling

Citing D. Rizopoulos:

Joint models for longitudinal and time-to-event data have become a valuable tool in the analysis of follow-up data. These models are applicable mainly in two settings: First, when the focus is on the survival outcome and we wish to account for the effect of an endogenous time-dependent covariate measured with error, and second, when the focus is on the longitudinal outcome and we wish to correct for nonrandom dropout.


When we need joint models for longitudinal and survival outcomes?

  • To handle endogenous time-varying covariates in a survival analysis context
  • To account for nonrandom dropout in a longitudinal data analysis context

How joint models work?

  • A mixed model for the longitudinal outcome
  • A relative risk model for the event process
  • Explain interrelationships with shared random effects

Last but not least… a dynamic predicion GIF!

# Animation example 
# Mixed-effects model fit
lmeFit.p1 <- lme(log(pro) ~ time + time:treat, data = prothro,
    random = ~ time | id)  

# Cox survival model fit
survFit.p1 <- coxph(Surv(Time, death) ~ treat, data = prothros, x = TRUE)  

# Joint model
jointFit.p1 <- jointModel(lmeFit.p1, survFit.p1, timeVar = "time",
    method = "piecewise-PH-aGH")

# We are interested in producing predictions of survival probabilities for Patient 155
dataP155 <- prothro[prothro$id == 155, ]
len_id <- nrow(dataP155)

# We can plot the data
sfit3 <- survfitJM(jointFit.p1, newdata = dataP155[1:3, ]) 
sfit4 <- survfitJM(jointFit.p1, newdata = dataP155[1:4, ]) 

plotfit3 <- plot(sfit3, estimator="mean", include.y = TRUE,, fill.area=TRUE, col.area="lightblue", main="Patient 155")
plotfit4 <- plot(sfit4, estimator="mean", include.y = TRUE,, fill.area=TRUE, col.area="lightblue", main="Patient 155")

  for(i in c(1:len_id)){
      sfit <- survfitJM(jointFit.p1, newdata = dataP155[1:i, ]) 
      plot(sfit, estimator="mean", include.y = TRUE,, fill.area=TRUE, col.area="lightblue", main="Patient 1")
},ani.width = 400, ani.height=400)

A great crowd over there!



A minimal Project Tree in R

You can also check this post, written in #blogdown, here: minimal-project-tree-r.


The last two days arrived at my twitter feed some discussions on how bad are the following sentences at the beginning of your R script/notebook, sparked by @JennyBryan’s slides at the IASC-ARS/NZSA Conference:



rm(list = ls())


Jenny Bryan offered a detailed explanation for this, as well as some fixes, in her tidyverse blog post. The main idea was:

  • To ensure reproducibility within a stable working directory tree. She proposes the very concise here::here() but other methods are available such as the template or the ProjectTemplate packages.
  • To avoid break havoc in other’s computers with rm(list = ls())!.

All of this buzz around project self-containment and reproducibility motivated me to finish a minimal directory tree that (with some variations) I have been using for this year’s data analysis endeavours.

It is a extremely simple tree which separates a /data, a /plot and an /img directory inside the main folder (root)

  • The data folder contains both raw data and processed data files saved by R.
  • The plot folder contains all the plots saved during the workflow.
  • The img folder has every other image (logos, etc) that R takes as an input to build the results.
  • Inside the root folder I store the main .R or .Rmd scripts.

This ensures that every folder has an unidirectional relationship with the root folder (except the data dir in this case). But the important thing is that the paths in the scripts are set relative to the root folder, so the entire tree can be copied elsewhere and still work as expected.

I also added some more features to the tree:

  • An .Rproj file
  • Parametrize the .Rmd file
  • Git repository so the tree can be conveniently cloned or downloaded, with a .gitignore file:

Here is a sketch of how it works:

And here is the actual code of the notebook/script. I have not included regular markdown text outside the R chunks, as this template is intended to be changed and filled with new text each time:

Script code

# Installs missing libraries on render!
list.of.packages <- c("rmarkdown", "dplyr", "ggplot2", "Rcpp", "knitr", "Hmisc", "readxl")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, repos='')

Working directories

# directory where the notebook is
wdir <- getwd() 
# directory where data are imported from & saved to
datadir <- file.path(wdir, "data") # better than datadir <- paste(wdir, "/data", sep="")
# directory where external images are imported from
imgdir <- file.path(wdir, "img")
# directory where plots are saved to
plotdir <- file.path(wdir, "plot")
# the folder immediately above root
Up <- paste("\\", basename(wdir), sep="") 
wdirUp <- gsub(Up, "", wdir) 

Data import

# Data name (stored as a parameter in the Rmarkdown notebook)
params <- NULL
params$dataname <- "cars"
dataname <- params$dataname # archive name
routexl <- paste(datadir, "/", dataname, ".xlsx", sep="")  # complete route to archive

mydata <- read_excel(routexl, sheet = 1)  # imports first sheet
# CSV / TSV (separated by tabs in this example)
dataname <- params$dataname # archive name
routecsv <- paste(datadir, "/", dataname, ".csv", sep="")  # complete route to archive

mydata <- read.csv(paste(routecsv, sep=""), 
         header = TRUE, 
         sep = "\t",
         dec = ".")

Data operations

# Hmisc::describe(mydata)
     speed dist
   1     4    2
   2     4   10
   3     7    4
   4     7   22
   5     8   16
   6     9   10
p1 <- ggplot(mydata, aes(x=speed, y=dist)) + geom_point()

Save plots

plotname1 <- "p1.pdf"
plotname2 <- "p1.png"

routeplot1 <- file.path(plotdir, plotname1)
routeplot2 <- file.path(plotdir, plotname2)
ggsave(routeplot1)  # (see

Save data

save(mydata, file="data/mydata.RData")
# MSEXCEL # not run
dataname2 <- "mydata"  # name we will give to file
routexl2 <- paste(datadir, "/", dataname2, ".xlsx", sep="")   # complete route to future archive

write.xlsx(mydata, routexl2) # creates archive in specified route
# CSV / TSV (separated by tabs in this example)
dataname2 <- "mydata"  # name we will give to file
routecsv2 <- paste(datadir, "/", dataname2, ".csv", sep="")  # complete route to future archive

write.table(mydata, file = routecsv2, append = FALSE, quote = FALSE, sep = "\t ",
            eol = "\n", na = "NA", dec = ".", row.names = FALSE,
            col.names = TRUE)


This script -and the dir tree that contains it- is saving me a lot of time and headaches (where I’ve put that data?….), I hope it can be also useful for people out there!.

Future improvements

Taming exam results in pdf with pdftools

You can also check this post, written in #blogdown, here: taming-exam-results-with-pdf.


There are several ways to mine tables and other content from a pdf, using R. After a lot of trial & error, here’s how I managed to extract global exam results from an international, massive, yearly examination, the EDAIC.

This is my first use case of “pdf mining” with R, and also a fairly simple one. However, more complex and very fine examples of this can be found elsewhere, using both pdftools and tabulizer packages.

As can be seen from the original pdf, exam results are anonymous. They consist on a numeric, 6-digit code and a binary result: “FAIL / PASS”. I was particularly interested into seeing how many of them passed the exam, as some indirect measure of how “hard” it can be.

Mining the table

In this case I preferred pdftools as it allowed me to extract the whole content from the pdf:

txt <- pdf_text("EDAIC.pdf") 
  [1] "EDAIC Part I 2017                                                  Overall Results\n                                         Candidate N°       Result\n                                            107131            FAIL\n                                            119233            PASS\n                                            123744            FAIL\n                                            127988            FAIL\n                                            133842            PASS\n                                            135692            PASS\n                                            140341            FAIL\n                                            142595            FAIL\n                                            151479            PASS\n                                            151632            PASS\n                                            152787            PASS\n                                            157691            PASS\n                                            158867            PASS\n                                            160211            PASS\n                                            161970            FAIL\n                                            162536            PASS\n                                            163331            PASS\n                                            164442            FAIL\n                                            164835            PASS\n                                            165734            PASS\n                                            165900            PASS\n                                            166469            PASS\n                                            167241            FAIL\n                                            167740            PASS\n                                            168151            FAIL\n                                            168331            PASS\n                                            168371            FAIL\n                                            168711            FAIL\n                                            169786            PASS\n                                            170721            FAIL\n                                            170734            FAIL\n                                            170754            PASS\n                                            170980            PASS\n                                            171894            PASS\n                                            171911            PASS\n                                            172047            FAIL\n                                            172128            PASS\n                                            172255            FAIL\n                                            172310            PASS\n                                            172706            PASS\n                                            173136            FAIL\n                                            173229            FAIL\n                                            174336            PASS\n                                            174360            PASS\n                                            175177            FAIL\n                                            175180            FAIL\n                                            175184            FAIL\nYour candidate number is indicated on your admission document        Page 1 of 52\n"
  [1] "character"

These commands return a lenghty blob of text. Fortunately, there are some \n symbols that signal the new lines in the original document.

We will use these to split the blob into something more approachable, using tidyversal methods…

  • Split the blob.
  • Transform the resulting list into a character vector with unlist.
  • Trim leading white spaces with stringr::str_trim.
tx2 <- strsplit(txt, "\n") %>% # divide by carriage returns
  unlist() %>% 
  str_trim(side = "both") # trim white spaces
   [1] "EDAIC Part I 2017                                                  Overall Results"
   [2] "Candidate N°       Result"                                                         
   [3] "107131            FAIL"                                                            
   [4] "119233            PASS"                                                            
   [5] "123744            FAIL"                                                            
   [6] "127988            FAIL"                                                            
   [7] "133842            PASS"                                                            
   [8] "135692            PASS"                                                            
   [9] "140341            FAIL"                                                            
  [10] "142595            FAIL"
  • Remove the very first row.
  • Transform into a tibble.
tx3 <- tx2[-1] %>% 
  # A tibble: 2,579 x 1
   1 Candidate N°       Result
   2    107131            FAIL
   3    119233            PASS
   4    123744            FAIL
   5    127988            FAIL
   6    133842            PASS
   7    135692            PASS
   8    140341            FAIL
   9    142595            FAIL
  10    151479            PASS
  # ... with 2,569 more rows
  • Use tidyr::separate to split each row into two columns.
  • Remove all spaces.
tx4 <- separate(tx3, ., c("key", "value"), " ", extra = "merge") %>%  
  mutate(key = gsub('\\s+', '', key)) %>%
  mutate(value = gsub('\\s+', '', value)) 
  # A tibble: 2,579 x 2
           key    value
         <chr>    <chr>
   1 Candidate N°Result
   2    107131     FAIL
   3    119233     PASS
   4    123744     FAIL
   5    127988     FAIL
   6    133842     PASS
   7    135692     PASS
   8    140341     FAIL
   9    142595     FAIL
  10    151479     PASS
  # ... with 2,569 more rows
  • Remove rows that do not represent table elements.
tx5 <- tx4[grep('^[0-9]', tx4[[1]]),] 
  # A tibble: 2,424 x 2
        key value
      <chr> <chr>
   1 107131  FAIL
   2 119233  PASS
   3 123744  FAIL
   4 127988  FAIL
   5 133842  PASS
   6 135692  PASS
   7 140341  FAIL
   8 142595  FAIL
   9 151479  PASS
  10 151632  PASS
  # ... with 2,414 more rows

Extracting the results

We already have the table! now it’s time to get to the summary:

tx5 %>%
  group_by(value) %>%
  summarise (count = n()) %>%
  mutate(percent = paste( round( (count / sum(count)*100) , 1), "%" )) %>% 
value count percent
FAIL 1017 42 %
PASS 1407 58 %

From these results we see that the EDAIC-Part1 exam doesn’t have a particularly high clearance rate. It is currently done by medical specialists, but its dificulty relies in a very broad list of subjects covered, ranging from topics in applied physics, the entire human physiology, pharmacology, clinical medicine and latest guidelines.

Despite being a hard test to pass -and also the exam fee-, it’s becoming increasingly popular among anesthesiologists and critical care specialists that wish to stay up-to date with the current medical knowledge and practice.



Starting a Rmarkdown Blog with Blogdown + Hugo + Github

Finally, -after 24h of failed attempts-, I could get my favourite Hugo theme up and running with R Studio and Blogdown.

All the steps I followed are detailed in my new Blogdown entry, which is also a GitHub repo.

After exploring some alternatives, like Shirin’s (with Jekyll), and Amber Thomas advice (which involved Git skills beyond my basic abilities), I was able to install Yihui’s hugo-lithium-theme in a new repository.

However, I wanted to explore other blog templates, hosted in GiHub, like:

The three first themes are currently linked in the blogdown documentation as being most simple and easy to set up for unexperienced blog programmers, but I hope the list will grow in the following months. For those who are willing to experiment, the complete list is here.

Finally I chose the hugo-tranquilpeak theme, by Thibaud Leprêtre, for which I mostly followed Tyler Clavelle’s entry on the topic. This approach turned out to be easy and good, given some conditions:

  • Contrary to Yihui Xie’s advice, I chose to host my blog, instead of Netlify (I love my desktop integration with GitHub, so it was interesting for me not to move to another service for my static content).
  • In my machine, I installed Blogdown & Hugo using R studio (v 1.1.336).
  • In GiHub, it was easier for me to host the blog directly in my main github pages repository (always named [USERNAME], in the master branch, following Tyler’s tutorial.
  • I have basic knowledge of html, css and javascript, so I didn’t mind to tinker around with the theme.
  • My custom styles didn’t involve theme rebuilding. At this moment they’re simple cosmetic tricks.

The steps I followed were:

Git & GitHub repos

  • Setting a GitHub repo with the name [USERNAME] (in my case See this and this.
  • Create a git repo in your machine:
    • Create manually a new directory called [USERNAME]
    • Run in the terminal (Windows users have to install git first):
    cd /Git/[USERNAME] # your path may be different
    git init # initiates repo in the directory
    git remote add origin[USERNAME]/[USERNAME] # connects git local repo to remote Github repo
    git pull origin master # in case you have LICENSE and files in the GitHub repo, they're downloaded
  • For now, your repo is ready. We will now focus in creating & customising our Blogdown.

RStudio and blogdown

  • We will open RStudio (v 1.1.336, development version as of today).
    • First, you may need to install Blogdown in R:
    • In RStudio, select the Menu > File > New Project following the lower half of these instructions. The wizard for setting up a Hugo Blogdown project may not be yet available in your RStudio version (not for much longer probably).

Creating new Project

Creating new Project

Selecting Hugo Blogdown format

Selecting Hugo Blogdown format

Selecting Hugo Blogdown theme

Selecting Hugo Blogdown theme

A config.toml file appears


config.toml file appears

Customising paths and styles

Before we build and serve our site, we need to tweak a couple of things in advance, if we want to smoothly deploy our blog into GitHub pages.

Modify config.toml file

To integrate with GiHub pages, there are the essential modifications at the top of our config.toml file:

  • We need to set up the base URL to the “root” of the web page (https://[USERNAME] in this case)
  • By default, the web page is published in the “public” folder. We need it to be published in the root of the repository, to match the structure of the GitHub masterbranch:
baseurl = "/./" 
publishDir = "./"
  • Other useful global settings:
ignoreFiles = ["\\.Rmd$", "\\.Rmarkdown$", "_files$", "_cache$"]
enableEmoji = true

Images & styling paths

We can revisit the config.toml file to make changes to the default settings.

The logo that appears in the corner must be in the root folder. To modify it in the config.toml:

picture = "logo.png" # the path to the logo

The cover (background) image must be located in /themes/hugo-tranquilpeak-theme/static/images . To modify it in the config.toml:

coverImage = "myimage.jpg"

We want some custom css and js. We need to locate it in /static/css and in /static/jsrespectively.

# Custom CSS. Put here your custom CSS files. They are loaded after the theme CSS;
# they have to be referred from static root. Example
customCSS = ["css/my-style.css"]

# Custom JS. Put here your custom JS files. They are loaded after the theme JS;
# they have to be referred from static root. Example
customJS = ["js/myjs.js"]

Custom css

We can add arbitrary classes to our css file (see above).

Since I started writing in Bootstrap, I miss it a lot. Since this theme already has bootstrap classes, I brought some others I didn’t find in the theme (they’re available for .md files, but currently not for .Rmd)

Here is my custom css file to date:

/* @import url(''); may conflict with default theme*/
@import url(''); /*google icons*/
@import url(''); /*font awesome icons*/

.input-lg {
  font-size: 30px;
.input {
  font-size: 20px;
.font-sm {
    font-size: 0.7em;
.texttt {
  font-family: monospace;
.alert {
padding: 15px;
margin-bottom: 20px;
border: 1px solid transparent;
border-radius: 4px;
.alert-success {
color: #3c763d;
background-color: #dff0d8;
border-color: #d6e9c6;
.alert-error {
  color: #b94a48;
  background-color: #f2dede;
  border-color: #eed3d7;
.alert-info {
  color: #3a87ad;
  background-color: #d9edf7;
  border-color: #bce8f1;
.alert-gray {
  background-color: #f2f3f2;
  border-color: #f2f3f2;

/*style for printing*/
@media print {
    .noPrint {

/*link formatting*/
a:link {
    color: #478ca7;
    text-decoration: none;
a:visited {
    color: #478ca7;
    text-decoration: none;
a:hover {
    color: #82b5c9;
    text-decoration: none;

Also, we have font-awesome icons!

Site build with blogdown

Once we have ready our theme, we can add some content, modifying or deleting the various examples we will find in /content/post .

We need to make use of Blogdown & Hugo to compile our .Rmd file and create our html post:


In the viewer, at the right side of the IDE you can examine the resulting html and see if something didn’t go OK.

Deploying the site

Updating the local git repository

This can be done with simple git commands:

cd /Git/[USERNAME] # your path to the repo may be different
git add . # indexes all files that wil be added to the local repo
git commit -m "Starting my Hugo blog" # adds all files to the local repo, with a commit message

Pushing to GitHub

git push origin master # we push the changes from the local git repo to the remote repo (GitHub repo)

Just go to the page https://[USERNAME] and enjoy your blog!

R code

Works just the same as in Rmarkdown. R code is compiled into an html and published as static web content in few steps. Welcome to the era of reproducible blogging!

The figure 1 uses the ggplot2 library:

ggplot(diamonds, aes(x=carat, y=price, color=clarity)) + geom_point()

diamonds plot with ggplot2.

Figure 1: diamonds plot with ggplot2.

Rmd source code

You can download it from here

I, for one, welcome the new era of reproducible blogging!


Quick wordclouds from PubMed abstracts – using PMID lists in R

Wordclouds are one of the most visually straightforward, compelling ways of displaying text info in a graph.

Of course, we have a lot of web pages (and even apps) that, given an input text, will plot you some nice tagclouds. However, when you need reproducible results, or getting done complex tasks -like combined wordclouds from several files-, a programming environment may be the best option.

In R, there are (as always), several alternatives to get this done, such as tagcloud and wordcloud.

For this script I used the following packages:

  • RCurl” to retrieve a PMID list, stored in my GitHub account as a .csv file.
  • RefManageR” and plyr to retrieve and arrange PM records. To fetch the info from the inets, we’ll be using the PubMed API (free version, with some limitations). 
  • Finally, tm, SnowballC” to prepare the data and wordcloud” to plot the wordcloud. This part of the script is based on this from Georeferenced.

One of the advantages of using RefManageR is that you can easily change the field which you are importing from, and it usually works flawlessly with the PubMed API.

My biggest problem sources when running this script: download caps, busy hours, and firewalls!.

At the beginning of the gist, there is also a handy function that automagically downloads all needed packages for you.

To source the script, simply type in the R console:

This script creates two directories in your working directory: ‘corpus1‘ for the abstracts file, and ‘wordcloud‘ to store the plot.


And there is the code:



R/Shiny for clinical trials: simple randomization tables

One of the things I most like from R + Shiny is that it enables me to serve the power and flexibility of R in small “chunks” to cover different needs, allowing people not used to R to benefit from it. However, what I like most is that’s really fun and easy to program those utilities for a person without any specific programming background.

Here’s a small hack done in R/Shiny: it covered an urgent need for a study involving patient randomisation to two branches of treatment, in what is commonly known as a clinical trial. This task posed some challenges:

  • First, this trial was not financed in any way (at least initially). It was a small, independent study comparing two approved techniques for chronic pain, so the sponsor had to avoid expensive software or services.
  • Another reason for software customization is that treatment groups were partially ‘blind’: for people who assessed effectiveness and… also for statistical analysis (treatment administration was open-label). This means that the person in charge of data analysis must know which group is assigned to a patient, but doesn’t know what treatment is assigned to either group.

To tackle the points above, my app should have two main features:

  • The sponsor (here, a medical doctor) must be able to effectively control study blindness and also provide emergency blind disclosure. This control should extend to data analysis to minimize bias favoring either treatment.
  • R has tools to create random samples, but the MD in charge of the study sponsoring doesn’t know how to use R. We needed a friendly interface for random table creation.

Here’s how I got it to work:

  • The very core of this Shiny app is a combination between the set.seed and sample R functions. The PIN number (the set.seed argument) works like a secret passcode that links to a given random table. E.g., every time I enter ‘5432’, the random tables will look the same. This protects from accidental blindness disclosure, as nobody can find the correct random table without the proper PIN, even if they can access the app’s source code.
  • The tables are created column by column, ordered at first. Then we proceed to randomize (via the sample function) both the treatment column (in the random table) and the Group column (in the PIN table).
  • Once the tables are created they can be downloaded as .CSV files, printed, signed and dated to document the randomization procedure. The app’s open source code and the PIN number will provide reproducibility to the procedure for many years.

Unfortunately I wasn’t able to insert iframes to embed the app, so I posted a screenshot:

Random table generator for clinical trials

The app is far from perfect, but it covers the basic needs for the trial. You can test it here:

And the GitHub repo is available here. Feel free to use/ adapt/ fork it to your needs!

Also, you can cite it if it’s been useful for your study methods!






Compile BayesX from source code – via Fink in OSX 10.10

My PC just passed away. My good old one since 2009… so I decided to buy a desktop computer running OSX 10.10 (although I cannot exclude the possibility of partitioning the disk and installing also Linux…). For now, I missed some apps, which I installed with Fink (a popular debian-based distro ported to Mac).

One of the programs I was fiddling around with was BayesX: “Bayesian Inference in Structured Additive Regression Models”, authored by great people at Muenchen University, (specially Nadja and Thomas, whom I’m glad to know personally).

The problem is that I used to have both BayesX binaries for Linux and Windows, but now I needed one for my Mac. Here’s how I did (pretty easy once I figured out how to do it!):


Xcode, Xquartz and Java are the first tools you will need in order to build Fink, and hence BayesX.


Fink is a popular port of GNU-Linux for MacOS. You can install it easily following the steps in this page.

Because my OS was 10.10, I had to follow instructions to compile Fink from source. Fortunately, the page provides a handy helper script to run in the terminal (for OSX 10.10). This script goes through all the steps. Open the Terminal and just either run the script or the commands one by one, and follow the instructions. It may take a while to download and install.


Go to the Download page and get the source. I suggest that you store the .zip file in a dedicated directory such as /Users/me/Downloads/source/code (changing “me” for your user). Then, open the Terminal and type:

cd /Users/me/Downloads/source/code
unzip -a # or whatever the name is


If you want to compile this source for Mac, you’ll see you need two more components (at least in this tutorial): cmake and the GNU Scientific Library (GSL).

Cmake is used to create a custom Makefile which will be used in compilation. You can insytall it typing in the Terminal:

sudo apt-get install cmake

Or you can use a (very useful for debugging) graphical user interface for Cmake.

To install GSL, simply type:

sudo apt-get install gsl


Once everything is installed, you can start to compile. Run in the Terminal

cd /Users/me/Downloads/source/code
cmake . # this will create the Makefile
cd /Users/Auri/Downloads/source/ # locate the Makefile

You will obtain an Unix Executable File in /Users/me/Downloads/source called BayesX. Double-clicking on it should open the BayesX prompt!.


Please, go to README.BayesX in the source for more information and issue solving.

I have tested this tutorial in another Mac machine -OSX 10.10- without any of this software installed, and it worked well. However, you may experience (mostly unknown) issues that may depend on different versions of Xcode, Fink, Java, GSL… etc. Despite this, I hope these steps will help you.

Also, from here you can only compile a mere console version of BayesX. I’m not sure how to run the Java user interface, but I will update this post to include any progress.

Hope you enjoy!


Society for Social Medicine

Advancing Knowledge for Population Health

Mad (Data) Scientist

Musings, useful code etc. on R and data science

My Blog

A topnotch site


Just another site


A blog on all things Geo, Data, Technology & the interconnected world. Occasionally off-piste.

Retraction Watch

Tracking retractions as a window into the scientific process

"R" you ready?

My advances in R - a learner's diary

TRinker's R Blog

Experiments & Experiences in R

What You're Doing Is Rather Desperate

Notes from the life of a [data] scientist

On unicorns and genes

Martin Johnsson's blog

vet epi

Denis Haine


Young Researchers in Biostatistics


...messing around with free code

TRinker's R Blog

...messing around with free code


...messing around with free code

Learning R

Finding my way around R


R news and tutorials contributed by (750) R bloggers