Applicants & Short Reports
Mr Oliver Aasmets
Mr Paul-Stefan Popescu
Dr Mikhail Kolev
Prof. Malik Yousef
Dates: 12/07/2021 to 25/07/2021
For the grouping microbiome species, we have considered three categories: 1) Order; 2) Family and 3)Genus.
Based on this type of grouping we identified: 177 groups for Family, 447 for genus and 84 for order
Input: D is a two class microbiome data
Creates groups (D)
G= create groups based on taxonomy(Family, Genus or Order)
[G = {groupsi = [fi1,fi2,..,fik ] }, i=1,…,nt
return G
Rank groups(G)
For each groups t in G
R = {rank (t)}
[R is the collection of the groups and its ranks]
Dr Gianvito Pio
Gianvito Pio, PhD, Early Career Investigator (ECI), is an Assistant Professor at the Department of Computer Science, University of Bari, Italy. He published 36 papers, including 18 in journals with high impact factors, such as Machine Learning, Data Mining and Knowledge Discovery, Bioinformatics, BMC Bioinformatics, Information Sciences, and IEEE Transaction on Knowledge and Data Engineering. He is on the editorial board of Springer Medical & Biological Engineering & Computing, served as a reviewer for several international journals and participated in the scientific committee of several international conferences. His research interests include Machine Learning, Big Data Analytics, Bioinformatics and Blockchain.
STSM Title:
Standardizing the pipeline for the analysis of high dimensional noisy Microbiome data
Host: Prof. Sašo Džeroski, Jozef Stefan Institute, Ljubljana (Slovenia)
Dates: 01/03/2020 to 31/03/2020
Microbiome studies make use of different types of data that may show different characteristics. However, they usually have one common characteristic, i.e., they are high-dimensional and noisy. Identifying a standard process of analysis of such kind of data is fundamental since it would enable the design of tools that could allow performing clinical analysis without owning specific machine learning expertise. In this context, the purpose of this STSM was to study existing works for the analysis of microbiome data to identify commonalities in the followed workflows. A specific ontology tailored for defining and reasoning on Data Mining tasks (OntoDM) has been considered during such an analysis, aiming at defining a set of guidelines and possibly a limited set of pipelines according to the classification provided by the ontology. Particular attention has been put on possible strategies for handling the high dimensionality of data and the presence of noise, considering the introduction, in the pipelines, of specific pre-processing approaches for feature reduction/extraction as well as the adoption of specific learning methods that are inherently able to work with high-dimensional noisy data.
The obtained results confirmed that we can handle microbiome data similarly as we handle high-dimensional noisy data in other application domains, without specific steps (except for the CLR normalization). However, it is noteworthy that these conclusions apply to the specific task considered during the pilot study, namely multi-target regression in the specific case of the prediction of Type II diabetes.
Prof Vladimir Trajkovik
Vladimir Trajkovik was born in Skopje, R.N. Macedonia in 1971. He received a PhD degree in 2003 from Ss. Cyril and Methodius University in Skopje. His current position is a professor at the Faculty of Computer Science and Engineering. He has published 4 books as author or editor, more than 60 papers in respectable journals, and more than 160 conference papers. He has more than 1400 citations, with an h-index of 21.
His research interests include: Information Systems Analyses and Design, Distributed Systems, ICT based Collaboration Systems and Mobile services with a special focus on Connected Health.
STSM Title:
Evaluation of state-of-the-art research on the application of machine learning in human microbiome studies
Host: Prof. Tatjana Loncar Turukalo, University of Novi Sad, Serbia
Dates: 01/03/2020 – 07/03/2020
Thanks to sequencing technology development, microbiome studies with a large number of samples allow more sophisticated modelling using machine learning approaches to study relationships between microbiome and various health-related traits. In this STSM, we analysed the application of machine learning in analyses of the human microbiome from the perspective of possible application of machine learning paradigms.
We were interested in the preprocessing methodology pipeline from 16s rRNA data to OTU, the origin of human-based data sources (e.g. skin, gut, saliva), their sample size, the main purpose for using machine learning techniques from a microbiological point of view, the way feature reduction is made, description of machine learning methods, results and obtained performances, as well as their combination with statistical methods.
Microbiome profiling is typically conducted using 16S rRNA amplicon sequencing or shotgun sequencing. The 16S rRNA gene sequences are clustered into groups, namely operational taxonomic unit (OTU) using workflows such as QIIME or mothur. Once OTU clusters are determined, taxonomic information is assigned for the representative sequences of each OTU. Although analytical methods have recently been introduced using representative sequences of 16S rRNA, the OTU cluster-based variables are most frequently used as input features in microbiomes analysis.
Supervised learning methods are typically used to build a model to predict reported categorical outcomes, such as disease affection status. The most popular supervised learning methods used on OTU tables are support vector machine, Naïve Bayes, random forest, and k nearest neighbour methods.
Paper published thanks to the contribution of the work made during this STSM:
- Tonkovic, Petar, Slobodan Kalajdziski, Eftim Zdravevski, Petre Lameski, Roberto Corizzo, Ivan Miguel Pires, Nuno M. Garcia, Tatjana Loncar-Turukalo, and Vladimir Trajkovik. “Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review.” Biology 9, no. 12 (2020): 453. https://doi.org/10.3390/biology9120453
- Marcos-Zambrano, Laura Judith, Kanita Karaduzovic-Hadziabdic, Tatjana Loncar Turukalo, Piotr Przymus, Vladimir Trajkovik, Oliver Aasmets, Magali Berland et al. “Applications of machine learning in human microbiome studies: a review on feature selection, biomarker identification, disease prediction and treatment.” Frontiers in Microbiology 12 (2021): 313. https://doi.org/10.3389/fmicb.2021.634511
Applicant short Bio: Vladimir Trajkovik was born in Skopje, R.N. Macedonia in 1971. He received a PhD degree in 2003 from Ss. Cyril and Methodius University in Skopje. His current position is a professor at the Faculty of Computer Science and Engineering. He has published 4 books as author or editor, more than 60 papers in respectable journals, and more than 160 conference papers. He has more than 1400 citations, with an h-index of 21.
His research interests include: Information Systems Analyses and Design, Distributed Systems, ICT based Collaboration Systems and Mobile services with a special focus on Connected Health.
Dr Monika Simjanoska
Unfortunately cancelled.
Miodrag Cekic - 01/02/2021 - 07/03/2021 - Creating and utilizing ML and Deep Learning models to understand the role of the microbiome in relation to cancer therapeutics and diagnostics
Bio & Affiliation
Professional software developer and architect working with the cutting-edge technologies coming from and around the Microsoft .NET ecosystem. Machine learning and Data Science researcher utilising different tools and technologies for data visualisation, exploration, features engineering and modelling. Currently in the final stage as a PhD candidate in the field of Computer science and engineering, area of Bioinformatics, Ss. Cyril and Methodius University – Faculty of Computer Science and Engineering in Skopje, Macedonia. His research interest aims to understand the role of the microbiome in cancer diagnostics and therapeutics by creating and utilising ML models, and working with big sets of microbiome-related data.
STSM Title: Creating and utilizing ML and Deep Learning models to understand the role of the microbiome in relation to cancer therapeutics and diagnostics
Host: Prof. Ugur Osman Sezerman, Acibadem University – Biostatistics and Medical Informatics Department, Kayışdağı Caddesi 32, 34752 Ataşehir, Istanbul, Turkey
Dates: 01/02/2021 – 07/03/2021
Cancer is one of the leading causes of death worldwide. Colorectal cancer belongs to the group of the most malignant tumours for which their burden can be only reduced through early detection and appropriate treatment. Increasing evidence indicates that the intestine microbiota is related and can impact colorectal carcinogenesis. The study in this STSM proposes a multidisciplinary approach of a two-phase methodology for modeling and interpreting the key biomarkers that can play a significant role in understanding the drug-resistant mechanism and cancer carcinogenesis for patients diagnosed with colorectal cancer. The proposed methodology was evaluated using a publicly accessible dataset, which may serve clinicians as a complementary analysis tool in colorectal cancer diagnostic and therapeutics. The STSM work contributes to predictive modelling in healthcare and personalized medicine.
General Description
Recent studies have highlighted that gut microbiota can alter colorectal cancer susceptibility and progression due to its impact on colorectal carcinogenesis. Additionally, it can influence the metabolic pathways and modulate anticancer drug efficacy. This STSM work represents a comprehensive technical approach in modeling and interpreting the drug-resistance mechanisms and cancer carcinogenesis from clinical data for patients diagnosed with colorectal cancer. To accomplish our aim, we developed a methodology based on evaluating high-performance machine learning models where a Python-based random forest classifier provides the best performance metrics, with an overall accuracy of more than 90%. Our approach identified and interpreted the most significant genera in the cases of resistant groups and cancer progression and susceptibility. Thus far, many studies point out the importance of present genera in the microbiome and intend to treat it separately. The symbiotic bacterial analysis generated different sets of joint feature combinations, providing a combined overview of the model’s predictiveness and uncovering additional data correlations where different genera joint impacts support the therapy-resistant effect. This STSM work points out the different perspectives of a treatment since our aggregate analysis gives precise results for the genera that are often found together in a resistant group of patients, meaning that resistance is not due to the presence of one pathogenic genus in the patient microbiome, but rather several bacterial genera that live in symbiosis.
The findings concur with other related publications, indicating that the study within this STSM further establishes a novel methodology for a more effective and scientific approach to understanding the colorectal cancer therapy resistance mechanisms and carcinogenesis. In general, it points out the different perspectives of a treatment since our aggregate analysis gives precise results for the genera that are often found together in a resistant group of patients, meaning that resistance is not due to the presence of one pathogenic genus in the patient microbiome but several bacterial genera that live in symbiosis. This approach can be used as a complementary analysis tool in colorectal cancer diagnostic and therapeutic and for unseen microbiome data that can help oncologists decide the treatment and post-treatment strategy in terms of immunotherapy and drug resistance understandings.