Joining Forms

Operative Log & Anatomical Lung Resection Process

Published

Last Updated on 31 October 2024

The data exported from the ReSECT platform are organised on the basis of the forms that make up the registry. Each form corresponds to a spreadsheet of the exported xlsx file. The forms are related by a series of key fields that allow us to correctly unify them.

The following document aims to produce a homogeneous data structure, for the purpose of further analysis, from the data exported by each department from the ReSECT platform. It describes the steps to follow to obtain two independent databases in excel format (Personal Records and Anatomical Lung Resection Process). To carry out this process we use the programming language R and its most recognised development environment, RStudio.

Step by Step

  1. Export ALL of your service data from the ReSECT platform, by the person in charge of your Centre. The file, with extension .xlsx, should be named data.xlsx and saved in the directory of your choice on your computer.

  2. Download the free R software from its official website R-CRAN and install it.

  3. Download the free development environment RStudio and install it.

  4. Open RStudio and create a code file: File > New File > R Script.

  5. Save it: File > Save as with the name “unificacion_formularios”. This file will have the extension .R.

  6. Copy the code fragment of this document by clicking on the top right corner of the grey rectangle containing it, and paste it into the file you have created.

  7. Once you have pasted the content, you must adapt the directory paths, contained in the initial part of the pasted text, to those of your local computer. The paths should be enclosed in inverted commas, as shown in the figure in the next step.

  8. Select the text, which will make it appear in a blue background, and run it (Button -> Run). Affirm if in the course of the execution you are asked for permission to install other dependencies.

  1. At the end of all the steps you should have two files datosClinicos_regPersonal.xlsx and datosClinicos_rpa.xlsx in the directories of your computer that you have chosen as ruta2 and ruta3.

Code

rm(list=ls())

ruta1 <- "../data/data.xlsx"                                                             

ruta2 <- "../data/datosClinicos_regPersonal.xlsx"                   
 
ruta3 <- "../data/datosClinicos_rpa.xlsx"                        

list.of.packages <- c(                                              
  "tidyverse", 
  "readxl", 
  "lubridate", 
  "janitor", 
  "writexl", 
  "openxlsx", 
  "googlesheets4")

new.packages <- list.of.packages[                                    
  !(list.of.packages %in% installed.packages()[,"Package"])
  ]

if(length(new.packages)) install.packages(new.packages)
lapply(list.of.packages, require, character.only = TRUE)
datosClinicos <- read_xlsx(ruta1, sheet = "Form1")    
datosClinicos <- datosClinicos %>% 
  mutate("Patient Id" = `Patient Number`) %>% 
  select("Patient Id", everything())

estadoVital <- read_xlsx(ruta1, sheet = "Form2") 
estadoVital <- estadoVital %>% 
  mutate("Patient Id" = `Patient Number`) %>% 
  select("Patient Id", everything())

regPersonal <- read_xlsx(ruta1, sheet = "Form3") 
regPersonal <- regPersonal %>% 
  mutate("Patient Id" = `Patient Number`) %>% 
  select("Patient Id", everything())

rpaPreop <- read_xlsx(ruta1, sheet = "Form4") 
rpaPreop <- rpaPreop %>% 
  mutate("Patient Id" = `Patient Number`) %>% 
  select("Patient Id", everything())

rpaCirugia <- read_xlsx(ruta1, sheet = "Form5") 
rpaCirugia <- rpaCirugia %>% 
  mutate("Patient Id" = `Patient Number`) %>% 
  select("Patient Id", everything())

rpaPostop <- read_xlsx(ruta1, sheet = "Form6") 
rpaPostop <- rpaPostop %>% 
  mutate("Patient Id" = `Patient Number`) %>% 
  select("Patient Id", everything())

rpaDx <- read_xlsx(ruta1, sheet = "Form7") 
rpaDx <- rpaDx %>% 
  mutate("Patient Id" = `Patient Number`) %>% 
  select("Patient Id", everything())

rpaFollow <- read_xlsx(ruta1, sheet = "Form8") 
rpaFollow <- rpaFollow %>% 
  mutate("Patient Id" =  `Patient Number`) %>% 
  select("Patient Id", everything())
# Left Join Clinical Form with Vital Status (key variable "Patient Id") 

datosClinicos <- left_join(
  datosClinicos, estadoVital, by = "Patient Id") |>
  select(-c("Patient Number.y", "Hospital.y")) |>
  rename(Hospital=Hospital.x, "Patient Number"= "Patient Number.x")

# Removal of duplicated patients due to more than one vital status. 
# The registries with an earlier vital status are kept

datosClinicos <- datosClinicos |>
  arrange(`Patient Id`, 
          desc(as.Date(`Fecha del último contacto con el paciente`))
          )


duplicados <- which(duplicated(datosClinicos$`Patient Id`))

datosClinicos <- datosClinicos[-duplicados,]

# Right Join Clinical Data with Personal Registry (key variable "Patient Id")

datosClinicos_regPersonal <- right_join(
  datosClinicos, regPersonal, by = "Patient Id") |>
  select(-c("Patient Number.y", "Hospital.y")) |>
  rename("Hospital"="Hospital.x", "Patient Number"= "Patient Number.x")


# Variable names to "snake" case
datosClinicos_regPersonal <- datosClinicos_regPersonal |> clean_names("snake")

# Saving xlsx file to ruta2 
datosClinicos_regPersonal <- as.data.frame(datosClinicos_regPersonal)
write_xlsx(datosClinicos_regPersonal, path = ruta2)
# Full Join of ALR forms (key variable "Patients Id" and "Identificador del
# Procedimiento Principal")

rpa <- rpaPreop |>
  full_join(rpaCirugia,
            by = c("Patient Id", "Identificador del Procedimiento Principal")) |>
  full_join(rpaPostop,
            by = c("Patient Id", "Identificador del Procedimiento Principal")) |>
  full_join(rpaDx,
            by = c("Patient Id", "Identificador del Procedimiento Principal")) |>
  full_join(rpaFollow,
            by = c("Patient Id", "Identificador del Procedimiento Principal"))

rpa <- rpa |> filter(!is.na(`Patient Number.x`) | !is.na(`Patient Number.y`))

# Right Join Clinical Data and ALR (key variable "Patients Id")
datosClinicos_rpa <- right_join(datosClinicos, rpa, by = "Patient Id")
rm(rpa)

# Removing duplicated variables after merging
datosClinicos_rpa <- datosClinicos_rpa |>
  select (-c(
    "Patient Number.x", "Hospital.x", "Version Id.x",
    "Patient Number.y", "Hospital.y","Version Id.y", 
    "Patient Number.x.x", "Hospital.x.x", "Version Id.x.x", 
    "Patient Number.y.y","Hospital.y.y", "Version Id.y.y",
    "Patient Number.y.y.y","Hospital.y.y.y", "Version Id")) |>
  rename("Patient Number" ="Patient Number.x.x.x", "Hospital" = "Hospital.x.x.x")

# Removing patients operated on after 01/01/2023 because of prospective nature
# of ALR process. (patients without date of surgery are kept)

datosClinicos_rpa <- datosClinicos_rpa |> filter(
  as.Date(`Fecha de Intervención Quirúrgica`) >= "2023-01-01" | 
    is.na(as.Date(`Fecha de Intervención Quirúrgica`)))


# Variable names to "snake" case
datosClinicos_rpa <- datosClinicos_rpa |> clean_names("snake")


# Saving xlsx file to ruta3
datosClinicos_rpa <- as.data.frame(datosClinicos_rpa)
write_xlsx(datosClinicos_rpa,path = ruta3)
Back to top