This week Data Science Seminar at Ciências will be presented by LASIGE researcher Sara Madeira. The seminar will cover biclustering techniques for identifying objects and their patterns in large datasets, ranging from genomics to social data. The seminar will take place on the 21st of February at 14h on room C6.3.38.
Biclustering, the discovery of sets of objects with coherent values/patterns on subsets of features, was shown to be key to unravel and characterize informative regions (biclusters) within matricial, time series and network data, in a wide-set of applications in biomedical and social data analysis. Particularly in biomedical problems, where groups of genes or patients tend to be only meaningfully related on a subset of the sampled/monitored conditions. The challenging combinatorial nature of the biclustering problem led to the development of several approaches with variations on the allowed type, number, positioning and quality of biclusters. The state of the art relies on efficient string processing and mining techniques, in the case of biclustering temporal data, and pattern mining algorithms, in the general case of biclustering matricial and network data.
This talk introduces the biclustering problem comparing it to the traditional clustering problem, provides an overview on state of the art on biclustering matricial and network data analysis, and then focus the problem of biclustering temporal data, tackling in particular the problem of biclustering gene expression time series obtained from transcriptomics. On going work on new triclustering algorithms to simultaneouly analyse multiple gene expression time series (three-way time series) and multiple multivariate time series collected at clinical follow-up, together with their applications in biomedical problems, such as the identification of disease progression patterns in the NEUROCLINOMICS2 project (PTDC/EEI-SII/1937/2014), are discussed.
SARA C. MADEIRA is an Associate Professor at the Department of Informatics of the Faculty of Sciences, University of Lisbon (FCUL), since mid February 2017, where she teaches graduate courses on data mining, machine learning and foundations of data science and an under-graduate course on intelligent systems. She is also a senior researcher at LASIGE, where she is a member of the Data and Systems Intelligence, and Health and Biomedical Informatics research lines. Her research interests include data mining, machine learning, bioinformatics and medical informatics. In this context, she was the PI of “NEUROCLINOMICS – Understanding NEUROdegenerative diseases through CLINical and OMICS data” (PTDC/EIA-EIA/111239/2009), a research project embracing the challenges of studying complex diseases and developing efficient and effective mining algorithms for biomedical data, using Amyotrophic Lateral Sclerosis and Alzheimer’s disease as case studies, which was followed by the ongoing project “NEUROCLINOMICS2 – Unravelling Prognostic Markers in NEUROdegenerative diseases through CLINical and OMICS data integration” (PTDC/EEI-SII/1937/2014). Her survey on “Biclustering Algorithms for Biological Data Analysis” was considered an ESI Hot Paper in Computer Science in November 2006. Biclustering algorithms and their applications in biomedical data analysis are still her main research topics.