The data exported from the ReSECT platform are organised on the basis of the forms that make up the registry. Each form corresponds to a spreadsheet of the exported xlsx file. The forms are related by a series of key fields that allow us to correctly unify them.
The following document aims to produce a homogeneous data structure, for the purpose of further analysis, from the data exported by each department from the ReSECT platform. It describes the steps to follow to obtain two independent databases in excel format (Personal Records and Anatomical Lung Resection Process). To carry out this process we use the programming language R and its most recognised development environment, RStudio.
Step by Step
Export ALL of your service data from the ReSECT platform, by the person in charge of your Centre. The file, with extension .xlsx, should be named
data.xlsx
and saved in the directory of your choice on your computer.Download the free R software from its official website R-CRAN and install it.
Download the free development environment RStudio and install it.
Open RStudio and create a code file: File > New File > R Script.
Save it: File > Save as with the name “unificacion_formularios”. This file will have the extension .R.
Copy the code fragment of this document by clicking on the top right corner of the grey rectangle containing it, and paste it into the file you have created.
Once you have pasted the content, you must adapt the directory paths, contained in the initial part of the pasted text, to those of your local computer. The paths should be enclosed in inverted commas, as shown in the figure in the next step.
Select the text, which will make it appear in a blue background, and run it (Button -> Run). Affirm if in the course of the execution you are asked for permission to install other dependencies.
- At the end of all the steps you should have two files
datosClinicos_regPersonal.xlsx
anddatosClinicos_rpa.xlsx
in the directories of your computer that you have chosen asruta2
andruta3
.
Code
rm(list=ls())
<- "../data/data.xlsx"
ruta1
<- "../data/datosClinicos_regPersonal.xlsx"
ruta2
<- "../data/datosClinicos_rpa.xlsx"
ruta3
<- c(
list.of.packages "tidyverse",
"readxl",
"lubridate",
"janitor",
"writexl",
"openxlsx",
"googlesheets4")
<- list.of.packages[
new.packages !(list.of.packages %in% installed.packages()[,"Package"])
]
if(length(new.packages)) install.packages(new.packages)
lapply(list.of.packages, require, character.only = TRUE)
<- read_xlsx(ruta1, sheet = "Form1")
datosClinicos <- datosClinicos %>%
datosClinicos mutate("Patient Id" = `Patient Number`) %>%
select("Patient Id", everything())
<- read_xlsx(ruta1, sheet = "Form2")
estadoVital <- estadoVital %>%
estadoVital mutate("Patient Id" = `Patient Number`) %>%
select("Patient Id", everything())
<- read_xlsx(ruta1, sheet = "Form3")
regPersonal <- regPersonal %>%
regPersonal mutate("Patient Id" = `Patient Number`) %>%
select("Patient Id", everything())
<- read_xlsx(ruta1, sheet = "Form4")
rpaPreop <- rpaPreop %>%
rpaPreop mutate("Patient Id" = `Patient Number`) %>%
select("Patient Id", everything())
<- read_xlsx(ruta1, sheet = "Form5")
rpaCirugia <- rpaCirugia %>%
rpaCirugia mutate("Patient Id" = `Patient Number`) %>%
select("Patient Id", everything())
<- read_xlsx(ruta1, sheet = "Form6")
rpaPostop <- rpaPostop %>%
rpaPostop mutate("Patient Id" = `Patient Number`) %>%
select("Patient Id", everything())
<- read_xlsx(ruta1, sheet = "Form7")
rpaDx <- rpaDx %>%
rpaDx mutate("Patient Id" = `Patient Number`) %>%
select("Patient Id", everything())
<- read_xlsx(ruta1, sheet = "Form8")
rpaFollow <- rpaFollow %>%
rpaFollow mutate("Patient Id" = `Patient Number`) %>%
select("Patient Id", everything())
# Left Join Clinical Form with Vital Status (key variable "Patient Id")
<- left_join(
datosClinicos by = "Patient Id") |>
datosClinicos, estadoVital, select(-c("Patient Number.y", "Hospital.y")) |>
rename(Hospital=Hospital.x, "Patient Number"= "Patient Number.x")
# Removal of duplicated patients due to more than one vital status.
# The registries with an earlier vital status are kept
<- datosClinicos |>
datosClinicos arrange(`Patient Id`,
desc(as.Date(`Fecha del último contacto con el paciente`))
)
<- which(duplicated(datosClinicos$`Patient Id`))
duplicados
<- datosClinicos[-duplicados,]
datosClinicos
# Right Join Clinical Data with Personal Registry (key variable "Patient Id")
<- right_join(
datosClinicos_regPersonal by = "Patient Id") |>
datosClinicos, regPersonal, select(-c("Patient Number.y", "Hospital.y")) |>
rename("Hospital"="Hospital.x", "Patient Number"= "Patient Number.x")
# Variable names to "snake" case
<- datosClinicos_regPersonal |> clean_names("snake")
datosClinicos_regPersonal
# Saving xlsx file to ruta2
<- as.data.frame(datosClinicos_regPersonal)
datosClinicos_regPersonal write_xlsx(datosClinicos_regPersonal, path = ruta2)
# Full Join of ALR forms (key variable "Patients Id" and "Identificador del
# Procedimiento Principal")
<- rpaPreop |>
rpa full_join(rpaCirugia,
by = c("Patient Id", "Identificador del Procedimiento Principal")) |>
full_join(rpaPostop,
by = c("Patient Id", "Identificador del Procedimiento Principal")) |>
full_join(rpaDx,
by = c("Patient Id", "Identificador del Procedimiento Principal")) |>
full_join(rpaFollow,
by = c("Patient Id", "Identificador del Procedimiento Principal"))
<- rpa |> filter(!is.na(`Patient Number.x`) | !is.na(`Patient Number.y`))
rpa
# Right Join Clinical Data and ALR (key variable "Patients Id")
<- right_join(datosClinicos, rpa, by = "Patient Id")
datosClinicos_rpa rm(rpa)
# Removing duplicated variables after merging
<- datosClinicos_rpa |>
datosClinicos_rpa select (-c(
"Patient Number.x", "Hospital.x", "Version Id.x",
"Patient Number.y", "Hospital.y","Version Id.y",
"Patient Number.x.x", "Hospital.x.x", "Version Id.x.x",
"Patient Number.y.y","Hospital.y.y", "Version Id.y.y",
"Patient Number.y.y.y","Hospital.y.y.y", "Version Id")) |>
rename("Patient Number" ="Patient Number.x.x.x", "Hospital" = "Hospital.x.x.x")
# Removing patients operated on after 01/01/2023 because of prospective nature
# of ALR process. (patients without date of surgery are kept)
<- datosClinicos_rpa |> filter(
datosClinicos_rpa as.Date(`Fecha de Intervención Quirúrgica`) >= "2023-01-01" |
is.na(as.Date(`Fecha de Intervención Quirúrgica`)))
# Variable names to "snake" case
<- datosClinicos_rpa |> clean_names("snake")
datosClinicos_rpa
# Saving xlsx file to ruta3
<- as.data.frame(datosClinicos_rpa)
datosClinicos_rpa write_xlsx(datosClinicos_rpa,path = ruta3)