4 min read

Clinical datasets in R packages

When teaching stats to non-statisticians I am reluctant to use example data which is completely unrelated to the student’s field of interest. As my teaching is nearly always addressed to health care professionals (mainly doctors and nurses), I am always looking for clinical datasets. Unfortunately, example datasets frequently used in R tutorials (like iris, cars, etc.), while very handy, are of little interest to clinicians. That’s why I decided to look for clinical datasets included in R packages.

To build the collection of clinical datasets, I started from this previous collection of datasets (thanks to Vincent Arel-Bundock), and from an extremely useful script to find datasets available in all installed packages (thanks to Saghir Bashir). From these, I retained a dataset if the data:

  • refers to a medical area/topic (or is otherwise familiar to meds), and
  • is observed or measured at the patient (or person) level, and
  • is provided as a dataframe

These resulted in a collection of 78 selected datasets, which are listed in the table below. Please, be aware that this collection is by no means exhaustive, since I did not explore all R packages (not even all those in CRAN!). However, I thought it might be of interest to those who, like me, teach stats with R to meds.