Our webinar Plankton Genomics: Multidisciplinary data mining to assess plankton distributions took place on 23 April 2021, and it was attended by more than 70 participants. It was an opportunity for the Blue-Cloud consortium to present the Plankton Genomics demonstrator, which is developing a Virtual Lab to showcase a deep assessment of plankton distributions by mining data across biomolecular, imaging and environmental domains.
The webinar was moderated by Sara Pittonet Gaiarin, Senior Project Manager at Trust-IT Services and Blue-Cloud coordinator, who provided an overview of the Blue-Cloud project, its thematic Virtual Research Environments and the objectives of the webinar. She also highlighted the potential brought by Blue-Cloud to the European Open Science environment in terms of FAIR data, interoperable marine-community services and the development of the Blue-Cloud strategic roadmap to 2030.
She was followed by Stéphane Pesant, Senior Marine Biology Curator at EMBL-EBI. He continued the presentation introducing the global Tara Oceans expedition, conducted from 2009 to 2013 to investigate the role of plankton ecosystems, which represents 60% of the biomass in the global ocean, in the context of climate change. Two main groups of analysis were conducted during the expedition, using ocean instruments to collect and analyse samples:
Pavla Debeljak, Researcher at Sorbonne Université, presented the state of the art of the Plankton Genomics demonstrator: the dataset used to build it, is composed by metagenomics (total environmental DNA) and metatranscriptomic (total environmental RNA), coming from the data collected from the Tara Oceans expedition. As only half of the Tara Oceans data are known, the majority of plankton genomics information is still unknown and the current common practice is to discard all the data which are not classifiable. Hence, in order to overcome this aspect, the main goal of the plankton genomics demonstrator is to develop two notebooks:
In particular, Debeljak showed examples of the 1st notebook with practical applications, such as the comparison of known and unknown metagenomic sequences of picoplankton; correlation between MetaG and MetaT data for Nanoplankton in surface waters; exploration of knowns and unknowns by sampling site and environmental parameters.
The last speaker was Jean-Olivier Irisson, Computational Ecologist at Sorbonne Université, who provided an overview of the 2nd notebook and outlined the next steps of the Plankton Genomics demonstrator development, focussing on extrapolation to unsampled parts of the ocean. The main goal of the 2nd notebook is to extract information from the open ocean and not only from Tara Oceans as they are coming from local stations and do not cover the entire available dataset. This is going to be performed through habitat modelling.
In addition, the plankton genomics demonstrator is going to bring possible innovations in predicting multiple entities simultaneously, exploiting the knowledge about their relative concentrations and using deep learning to better summarise the environmental context.
Below we have collected some of the most relevant questions and answers from the Q&A session.