Developing and demonstrating data mining and A.I. tools to betterunderstand patient heterogeneityand assist patient stratification


PhD awarding institution: University of Ljubljana

Lead Supervisor: V Martins dos Santos

Objectives: Instead of creating new data, it is often easier, more cost-effective and in many cases even more productive to make use of the data that is already present and that just waits to be collected, harmonized and analyzed from a different viewpoint. Many public databases, e.g. dealing with omics data or clinical studies, provide so called Application Programming Interfaces (APIs) for fast, easy and most importantly automated access of their data. The objectives are to use this “hidden” potential in already created data by 1) structuring the database for related samples; 2) designing and developing a data mining tool that accesses, collects and harmonizes data via those APIs and makes it easily usable for further downstream interpretation/analysis; 3) implementing an artificial intelligence algorithm that would classify automatically a specific sample or that would detect a potential misclassification; 4) undertaking proof of concept case studies using Decipher CNV data of patients suffering from developmental neurological malformations that might hit/overlap with the Encode data and also using metabolic liver pathologies. Interpretation focuses on patient similarity, heterogeneity aiming on data re-use for personalized medicine. This will integrate the extent the collected data from various open access data sources aiming on contributing to a better understanding of patient similarity and or heterogeneity for personalized medicine.