A. Gorska bioinformatics

Introduction

I've carried out several projects analysing large and complex clinical datasets. Routinely those datasets are evaluated with in the first place, univariate and then multivariate logistic regression. However, I'm interested in applying the data-mining toolkit to understanding the datasets, including visualisation and applied machine learning models such as Support Vector Machines, Random Forests and Decision Trees.

Those methods often require more data, but the underlying algorithmics provides an alternative view on the dataset's inner structure, and thus alternative clinical algorithms. Such work always requires deep domain knowledge, or what follows, a tight-communication loop with the clinical collaborators.

Current work

MISTRA eCRF standardization

I am currently leading efforts to standardize the existing electronic case report form (eCRF) for collecting data on patients with sexually transmitted infections (STIs), with a particular focus on Mpox. At present, there is no standardized eCRF specifically designed for this field.

Clinical algorithms

Throughout these projects, I've developed a toolkit to perform the ML pipeline for any new dataset rapidly. I wanted to extend this to first compare the performance of the classifiers to the multivariate classification and logistic regression methods and then extend it to the alternative tree-based clinical algorithms.

Clinical data standardization and harmonization

All clinical studies rely on a data collection tool, which can range from a simple Excel database to more sophisticated platforms like REDCap. The content of the data capture tool—such as the variables and their definitions—establishes the framework for downstream data analysis. The encoding, including the ontology used, plays a crucial role in ensuring the FAIRness (Findability, Accessibility, Interoperability, and Reusability) of the resulting dataset. Across various projects, I have been actively engaged in structuring, harmonizing, and standardizing electronic case report forms (eCRFs) to promote consistency and data quality.

2024, Clinical microbiology and infection

Górska A., Canziani L.M, Rinaldi E., Pana Z.D., Beale S., Bai F.M., Boxma-de Klerk B.M., de Bruijn S., Donà D., Ekkelenkamp M.B., Incardona D., Mallon P., Marchetti G.C., Puhan M., Riva A., Simensen V.C., Vaillant M., van der Zalm M.M, van Kuijk S.M.J., van Wingerden S., Judd A., Tacconelli E., Peñalvo J.L.
Learning from Post COVID-19 condition for epidemic preparedness: a variable catalogue for future Post-Acute Infection Syndromes

2022, The Lancet Regional Health - Europe

Tacconelli E., Górska A., Carrara E., Davis R. J., Bonten M., Sartor A.,Tacconelli E., Friedrich A. W., Glasner C., Goossens H., Hasenauer J., Haro J. M., José A., Peñalvo L., Sanchez-Niubohk A., Sialm A., Scipione G., Soriano G., Yazdanpana Y., Vorstenbosch E., Jaenisch T. Challenges of data sharing in European Covid-19 projects: A learning opportunity for advancing pandemic preparedness and response.

2024, Digital Health and Informatics Innovations for Sustainable Health Care Systems, MIE2024 conference

Puskaric M., Chandramouli B., Osmo T., Gusinow R., Dellacasa C., Rossi E., Cataudella S., Górska A., Rinaldi E., Privacy-Preserving Workflow for the Cross-Border Federated Analysis of Clinical Data

ORCHESTRA

Orchestra is a large, international, interdisciplinary project funded by EU with almost 30 million euros - of which our group constitutes a coordination package. The call was opened in an emergency manner in March 2020 as a part of the EU response to the pandemic. I was engaged at the project-proposal writing stage, creating visualisations and reviewing sections related to data management.

Since the beginning of the project, October 2020, I've been devoting a considerable amount of my time to the Orchestra project. I am a representative of the WP1 to WP7 data-management package (Figure), helping with data flow and communication across the project.

covid-boxplots — Orchestra work-package structure. The arrows correspond to the communication and data transfer. The project consists of the four work packages bringing together cohorts of participants, WP6 - a molecular analysis package, WP7 to maintain, protect and transfer data to the WP8, which is a statistical analysis package. Finally, WP10 is responsible for visualisation and communication.

Within the ORCHESTRA project, my role involved managing the Post-COVID-19 working group, where I contributed to data management, quality control, and visualization. Additionally, I supported the work of WP6. Our publication in *Eurosurveillance* presents an in-depth bioinformatics analysis of SARS-CoV-2 sequences from infected patients, examining the in-patient evolution of viral strains. We demonstrated that viral infections are not limited to a single infecting strain but also give rise to subpopulations of viral sub-species.

publications:

2025, Eurosurveillance

Berkell M., Górska A., Smet M., Bachelet D., Gentilotti E., Guedes M., Franco-Yusti M. A., Mazzaferri F.,Lizarazo Forero E., Matheeussen V., Visseaux B., Palacios-Baena Z. R., Caroccia N., Florence A.M., Charpentier C., van Leer C., Giannella M., Friedrich A. W., Rodríguez-Baño J., Ghosn J., ORCHESTRA working group13 , Kumar-Singh S., Laouénan C., Tacconelli E., Malhotra-Kumar S.
Quasi-species prevalence and clinical impact of evolving SARS-CoV-2 lineages in European COVID-19 cohorts, January 2020 to February 2022

2024, Scientific reports

Gentilotti E., Górska A., Cecchini M. P, Mirandola M., Meroi M., Nardo P. D., Sartori A., Konishi De Toffoli C., Kumar-Singh S., Zanusso G., Monaco S., Tacconelli E. & the ORCHESTRA-UNIVR Study Group
Chemosensory assessment and impact on quality of life in neurosensorial cluster of the post COVID 19 syndrome

2023, eClinical Medicine

Gentilotti E., Górska A., Tami A., Gusinow R., Mirandola M., Baño, R., Baena R. .. Tacconelli E.
Clinical phenotypes and quality of life to define post-COVID-19 syndrome: a cluster analysis of the multinational, prospective ORCHESTRA cohort

COVID19

COVID19 pandemic added a lot of work to the collective plate of the Infectious Diseases department in Verona. North of Italy was one of the most affected regions in the first months of 2020. However, thanks to the experience, we quickly updated the data-collection tools to study the new virus. This dataset was too small to attempt a ML-based prediction, nevertheless we observed the development of the parameters for the positive/negative outcomes Figure.

publication:

2021, BMC Infectious Diseases

Gentilotti E., Savoldi A., Compri M., Górska A., De Nardo P., Visentin A., Be G., Razzaboni E., Soriolo N., Meneghin D., Girelli D., Micheletto C., Mehrabi S., Righi E., Tacconelli E. Assessment of COVID-19 progression on day 5 from symptoms onset.

Bloomy

Bloomy project was a multi-centre collaboration of the German hospital-clinics aiming to develop an as early as a possible scoring system for the progression of sepsis. The dataset included ~2,500 patients with sepsis. The study followed patients longitudinally, including a post-hospitalisation follow-up, with quality of life scores. The dataset had thousands of features describing patients' demographics and comorbidities, hospital stay, treatment, antibiotics and microbiological information such as infecting bacteria. The dataset was quite complex but riddled with missing values - that needed to be inputted.

We have approached it in two ways: analysing the regression coefficients and Machine Learning methods. I build sequential models at different time points in the treatment: 3rd, 7th, end of therapy timepoints, using mainly RF, SVMs and ADABoost models (Figure). Those were subsequenlty used to select variables for the Cox proportional hazards, logistic regression, and parametric survival regression.

phagenome setup — Performance across various models at the end of therapy for mortality outcome. Models were trained 100 times, for the dataset split into training and testing, undersampled to maintain the equal distribution of the classes for binary classification.

publication:

2022, Lancet Infectectious Diseases

Tacconelli E., Göpel S., Gladstone B. P, Eisenbeis S., Hölzl F., Buhl M., Górska A., Cattaneo C., Mischnik A., Rieg S., M Rohde A., Kohlmorgen B., Falgenhauer J., Trauth J., Käding N., Kramme E., Biehl L. M , Walker S. V , Peter S., Gastmeier P., Chakraborty T., JGT Vehreschild M., Seifert H., Rupp J., Kern W. V. Development and validation of BLOOMY prediction scores for 14-day and 6-month mortality in hospitalised adults with bloodstream infections: a multicentre, prospective, cohort study.

SATURN

SATURN project investigated the relationship of antibiotic prescribing to the colonization with the ESBLs, and MRSA - two multi-drug resistant bacteria common in the nosocomial settings. The dataset had ~10,000 patients, whose data were collected in three countries. The participants were screened at admission to ensure their colonization happened during the study.

Figure shows basic results. As expected, many more patients got colonized with ESBLs than MRSAs. Among those patients who took antibiotics, the level of colonization with both ESBL and MRSA was ~3 times higher than in the respective group who weren't treated with antibiotics.

To analyse this dataset I've deployed a Mongodb instance to keep and quickly access the patients within the dataset. Next, I implemented a complex encoder to produce a numerical vector representing each patient, to be later fed to Random Forest, Ada Boosting, Gradient boosting and other machine learning methods. In order to encode often complex antibiotics therapy - I first included a simple length of treatment per antibiotic. For each antibiotic combination, several days, any two antibiotics were taken together. Finally, I included a binary vector that, again for each combination, encoded if one of the antibiotics was prescribed after the other. This way, a complete description of the antibiotic therapy is included in a patient vector. The 10,000 patients did not fully and equally capture the diversity of the antibiotics therapy. Nevertheless, machine learning methods were able to distinguish more and less selective antibiotic combinaitons.

Publication:

2019, Journal of Antimicrobial Chemotherapy.

Tacconelli, E., Górska A., Angelis, G., Lammens, C., Restuccia, G., Huson, D. H.,Carević, B., Preoţescu, L., Carmeli, Y., Kazma, M., Spanu, T., Carrara, E., Malhotra-Kumar S. Gladstone B. P., Estimating the Association between Antibiotic Exposure and Colonisation with Antibiotic-resistant Bacteria using Machine-learning Methods