EDA

Exploratory Data Analysis

Published

Last Updated on 16 January 2024

The following document aims to enable ReSECT users to autonomously carry out an automated exploratory analysis of all their registered patients. For this purpose, it is necessary that before carrying out the steps indicated in this document, each user has carried out the unification of forms of the corresponding tutorial.

The exploratory analysis will be carried out using the R software (download) and its integrated development environment, Rstudio (download).

Steps

  1. Open RStudio and go to File > New File > R Markdown. This will generate a file in the top left panel, called “Untitled”.

  2. Save this file: File > Save As.

  3. Delete the automatically generated text except for the heading, delimited by three upper and lower hyphens, which must also be respected.

  4. Create three chunks of code: Code > Insert Chunk. By default, the chunks created will be interpreted by the R language.

  5. Copy the following code snippets and paste them into the body of the chunks (respect the three single quotes at the beginning and at the end)

  6. Replace, inside the code of the second pasted chunk, the directory path “../data/datosClinicos_regPersonal.xlsx” by the one in your computer containing the .xlsx file with your personal records.

  7. Execute all the code: Code > Run Region > Run All

  8. Click on the -Knit button in the toolbar and choose the desired output format. This action will generate a file with all your exploratory analysis.

Code

# Instalation of packages

list.of.packages <- c("tidyverse", "readxl", "lubridate", "stringr", "DT", "DataExplorer")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
lapply(list.of.packages, require, character.only = TRUE)
# Import excel file with personal registries


datosClinicos_regPersonal <- readRDS("../data/datosClinicos_regPersonal.RDS") 
datosClinicos_rpa <- readRDS("../data/datosClinicos_rpa.RDS") 

# Variable mutation

vars1 <- c("edad_del_paciente_a_la_fecha_de_intervencion", "estancia_postoperatoria_dias", "numero_de_puertos")

datosClinicos_regPersonal <- datosClinicos_regPersonal %>% mutate(across(all_of(vars1),as.numeric))

vars2 <- c("altura_m" , "peso_kg", "indice_de_masa_corporal", "fev1_ml", "fev1_percent", "cvf_ml", "cvf_percent", 
           "dlco_percent", "dlco_va_percent", "edad_del_paciente_a_la_fecha_de_intervencion", "estancia_postoperatoria")

datosClinicos_rpa <- datosClinicos_rpa %>% mutate(across(all_of(vars2),as.numeric))
# EDA of personal registries

introduce(datosClinicos_regPersonal)
plot_intro(datosClinicos_regPersonal)
plot_missing(datosClinicos_regPersonal)
plot_bar(datosClinicos_regPersonal, ncol = 1)
plot_histogram(datosClinicos_regPersonal[,vars1], ncol = 1)

# EDA of anatomical lung resections

introduce(datosClinicos_rpa)
plot_intro(datosClinicos_rpa)
plot_missing(datosClinicos_rpa)
plot_bar(datosClinicos_rpa, ncol = 1)
plot_histogram(datosClinicos_rpa[,vars2], ncol = 1)

More information about the options of the automated exploratory analysis library DataExplorer for R.

Back to top