PhD in Informatics Seminar #6 2020/2021 | DI Ciências ULisboa

Title: Recommender systems for scientific items: a Sequential Enrichment (SeEn) approach

Speaker: Márcia Barros, LASIGE – DI/FCUL

When: May 13 (Thursday) at 12:00


Databases for scientific entities, such as chemical compounds, diseases and astronomical objects, are growing in size and complexity, reaching billions of items per database. Researchers need new and innovative tools for helping them to choose relevant items. In this work, we propose the use of Recommender Systems (RS) approaches coupled with scientific literature processing and deep learning to address this challenge.
In previous work, we developed a methodology called LIBRETTI – LIterature Based RecommEndaTion of scienTific Items, whose goal is the creation of <user, item, rating> datasets, related with scientific fields. These datasets are created based on the major resource of knowledge that Science has: scientific literature. The first case studies conducted with LIBRETTI were in the fields of Astronomy and Chemistry, having as items open clusters of stars and chemical compounds, respectively. More recently, LIBRETTI methodology was applied to phenotypes, diseases, and gene terms, particularly related to the COVID-19 disease. With these datasets available, we developed a hybrid recommender model suitable for implicit feedback datasets and focused on retrieving a ranked list according to the relevance of the items, for recommending chemical compounds.
However, we know that science is mutable along the time, and relevant items in the past may not be relevant for a user anymore. Thus, we are now working on the recommendation of scientific items taking in account the time when each preference was published. Instead of <user,item,rating>, we are considering the sequence of scientific entities a user had interest along a time period and trying to predict the best next entity for this user.
For this end we are working in a sequential enrichment method, called SeEN, which consists of introducing in the sequence of each user the n most similar items to the one’s the user already saw. This enriched sequence is then considered as the input for state-of-the-art collaborative-filtering sequence-aware recommendation algorithms, such as BERT4Rec, improving the results when compared with the original sequence.