An introduction to joint modeling in R

By J Espasandin, O Lado, C Díaz, A Bouzas, I Guler, A Baluja. 

You can also check this post, written in #blogdown, here: intro-joint-modeling-r.

These days, between the 19th and 21st of February, has taken place the learning activity titled “An Introduction to the Joint Modeling of Longitudinal and Survival Data, with Applications in R” organized by the Interdisciplinary Group of Biostatistics (ICBUSC), directed by Professor Carmen Cadarso-Suárez, from the University of Santiago de Compostela.

The international nature of this scientific activity has been marked by the presence of researchers from different European countries such as Germany, Portugal, Holland, Greece or Turkey. It also emphasizes its interdisciplinary nature, with attendees from different fields of research, such as statistics, biology, medicine, ecology or bioinformatics, belonging to different universities, biomedical institutions or the industry.

The training activity has been taught by the professor Dimitris Rizopoulos of the Erasmus University Medical Center in Rotterdam, specialist in joint-modeling techniques. Professor Rizopoulos is the author of a book on joint modeling, as well as numerous publications and two related R packages: JM and JMbayes.

The Joint Modeling techniques presented during the scientific meeting allow for the simultaneous study of longitudinal and time-to-event data. Longitudinal data includes repeated measurements of individuals over time, and time-to event data represent the expected time before an event occurs (like death, an asthma crisis or a transplant). That combination of data frequently arises in the biomedical sciences, where it is common to analyze the evolution of a sick person over time.

This novel statistical tool is especially useful in the field of biomedicine. For instance, in patient follow-up studies after surgery; to design a personalised pattern of medical visits; to carry out predictions of survival based on the evolution of a patient, or updating those predictions in light of new data; identification of useful biomarkers; prediction of patient outcome with different chronic diseases such as diabetes, some types of cancer or cardiovascular disease.

The applicability of these models has been illustrated through the JM and JMBayes R packages (by D Rizopoulos), as well as the packages joineR (by Philipson et al.), and lcmm (by Proust-Lima et al.)

An overview of joint modeling

It basically combines (joins) the probability distributions from a linear mixed-effects model with random effects (which takes care of the longitudinal data) and a survival Cox model (which calculates the hazard ratio for an event from the censored data). The whole model and its parts can be extended in several ways:

  • To find latent population heterogeneity (latent class joint models).
  • Allow for multiple longitudinal markers.
  • Allow for the analysis of multiple failure times. This is the case of competing risks and recurrent events (for instance, when a child develops asthma attacks, to find the risk of recurrence).
  • Time-Dependent accelerated failure time (AFT) Models.
  • Dynamic predictions when new values are added for the longitudinal variable, using Maximum Likelihood Estimates and empirical Bayes estimates.

Also, the JM package has functions for discrimination and callibration, (of a single marker and between models): sensitivity & specificity, time-dependent ROCs and AUC.

Applications for joint modeling

Citing D. Rizopoulos:

Joint models for longitudinal and time-to-event data have become a valuable tool in the analysis of follow-up data. These models are applicable mainly in two settings: First, when the focus is on the survival outcome and we wish to account for the effect of an endogenous time-dependent covariate measured with error, and second, when the focus is on the longitudinal outcome and we wish to correct for nonrandom dropout.


When we need joint models for longitudinal and survival outcomes?

  • To handle endogenous time-varying covariates in a survival analysis context
  • To account for nonrandom dropout in a longitudinal data analysis context

How joint models work?

  • A mixed model for the longitudinal outcome
  • A relative risk model for the event process
  • Explain interrelationships with shared random effects

Last but not least… a dynamic predicion GIF!

# Animation example 
# Mixed-effects model fit
lmeFit.p1 <- lme(log(pro) ~ time + time:treat, data = prothro,
    random = ~ time | id)  

# Cox survival model fit
survFit.p1 <- coxph(Surv(Time, death) ~ treat, data = prothros, x = TRUE)  

# Joint model
jointFit.p1 <- jointModel(lmeFit.p1, survFit.p1, timeVar = "time",
    method = "piecewise-PH-aGH")

# We are interested in producing predictions of survival probabilities for Patient 155
dataP155 <- prothro[prothro$id == 155, ]
len_id <- nrow(dataP155)

# We can plot the data
sfit3 <- survfitJM(jointFit.p1, newdata = dataP155[1:3, ]) 
sfit4 <- survfitJM(jointFit.p1, newdata = dataP155[1:4, ]) 

plotfit3 <- plot(sfit3, estimator="mean", include.y = TRUE,, fill.area=TRUE, col.area="lightblue", main="Patient 155")
plotfit4 <- plot(sfit4, estimator="mean", include.y = TRUE,, fill.area=TRUE, col.area="lightblue", main="Patient 155")

  for(i in c(1:len_id)){
      sfit <- survfitJM(jointFit.p1, newdata = dataP155[1:i, ]) 
      plot(sfit, estimator="mean", include.y = TRUE,, fill.area=TRUE, col.area="lightblue", main="Patient 1")
},ani.width = 400, ani.height=400)

A great crowd over there!



SPSS, SAS or R…which one to choose?

A very interesting post!


Performing statistical analyses by hand in the era of new technologies would seem crazy.  Nowadays, there are three main statistical programs for doing statistics: IBM SPSS, SAS  and R, as it can be read in a more extensive post of this site. Sometimes, biostatisticians need to use more than one package to carry out their analyses.  This means that users of these programs have to move from one to another environment, from front-end to back-end, using different wizard and graphical interfaces, wasting in most occasions an important amount of time. Because of that, in this post I would like to address the differences between the aforementioned programs, pointing out the advantages and the most suitable usage of each software from my personal point of view.

For a good choice of a statistical program, one should take into account several issues such as the following:

1. Habit: It refers to how…

Ver la entrada original 576 palabras más

The complex structure of the longitudinal models


Two weeks ago, we started to talk in this blog about longitudinal data with the post by Urko Agirre. This type of data involves complex structure models called longitudinal models.

Longitudinal studies have two important characteristics:

  1. They are multivariate because for each studied individual many temporal measurements from the response variable (and covariates) are collected.

  2. They are multilevel as the variables measured are nested within the subjects under study, therefore resulting in layers.

These characteristics allow us to make inference about the general trend of the population as well as about the specific differences between subjects that can evolve in another way regarding the overall average behavior.

At the beginning of the 20th century this type of data started to be modelled. Different proposals appeared such as ANOVA models (Fisher, 1918), MANOVA models (generalised from ANOVA models to multivariate) or growth curves (Grizzle and Allen, 1969

Ver la entrada original 286 palabras más

Do we need Spatial Statistics? When?


Spatial Statistics, What is it? and Why use it?

The approach taken in this post is to offer an introduction of basic and main concepts of spatial data analysis as well as the importance of its utilization in some areas like epidemiology or ecology among others. But, before to introduce a definition of spatial statistics and some concepts about this field, I consider relevant to mention some situations in which our data need to be seen as ‘spatial data’.

It is possible that people associate spatial statistics with analysis that contain numerous maps. However, it goes beyond creating these, in fact spatial data analysis is subject to internal structure of the observed data. We therefore have to be careful with the questions not directly answered by looking at the data.

We could make a long list of areas where we can apply spatial statistics: epidemiology, agriculture, ecology, environmental science, geology…

Ver la entrada original 498 palabras más

Dealing with strings in R


As I mentioned in previous posts, I often have to work with Next Generation Sequencing data. This implies dealing with several variables that are text data or sequences of characters that might also contain spaces or numbers, e.g. gene names, functional categories or amino acid change annotations. This type of data is called string in programming language.

Finding matches is one of the most common tasks involving strings. In doing so, it is sometimes necessary to format or recode this kind of variables, as well as search for patterns.

Some R functions I have found quite useful when handling this data include the following ones:

  • colsplit ( ) in the reshape package. It allows to split up a column based on a regular expression
  • grepl ( ) for subsetting based on string values that match a given pattern. Here again we use regular expressions to describe the pattern

As you…

Ver la entrada original 151 palabras más