Assessing plankton distributions through data mining: the Plankton Genomics demonstrator

2 May 2021

Our webinar Plankton Genomics: Multidisciplinary data mining to assess plankton distributions took place on 23 April 2021, and it was attended by more than 70 participants. It was an opportunity for the Blue-Cloud consortium to present the Plankton Genomics demonstrator, which is developing a Virtual Lab to showcase a deep assessment of plankton distributions by mining data across biomolecular, imaging and environmental domains.

Webinar highlights

Discovering the Plankton Genomics demonstrator
Q&A from the session
Watch the recordings and download the slides

Discovering the Plankton Genomics demonstrator

Webinar objectives and introduction

The webinar was moderated by Sara Pittonet Gaiarin, Senior Project Manager at Trust-IT Services and Blue-Cloud coordinator, who provided an overview of the Blue-Cloud project, its thematic Virtual Research Environments and the objectives of the webinar. She also highlighted the potential brought by Blue-Cloud to the European Open Science environment in terms of FAIR data, interoperable marine-community services and the development of the Blue-Cloud strategic roadmap to 2030.

Tara Ocean expedition and its use in Blue-Cloud

She was followed by Stéphane Pesant, Senior Marine Biology Curator at EMBL-EBI. He continued the presentation introducing the global Tara Oceans expedition, conducted from 2009 to 2013 to investigate the role of plankton ecosystems, which represents 60% of the biomass in the global ocean, in the context of climate change. Two main groups of analysis were conducted during the expedition, using ocean instruments to collect and analyse samples:

High Throughput Sequencing (HTS) which has the twofold objective of studying single cell and genomics organism, and community DNA and RNA.
High Throughput Imaging (HTI) used to analyse plankton organisms in their full spectrum.

1st Notebook - Discovery of known and unknown genetic entities

Pavla Debeljak, Researcher at Sorbonne Université, presented the state of the art of the Plankton Genomics demonstrator: the dataset used to build it, is composed by metagenomics (total environmental DNA) and metatranscriptomic (total environmental RNA), coming from the data collected from the Tara Oceans expedition. As only half of the Tara Oceans data are known, the majority of plankton genomics information is still unknown and the current common practice is to discard all the data which are not classifiable. Hence, in order to overcome this aspect, the main goal of the plankton genomics demonstrator is to develop two notebooks:

The first one aims to discover the unknown biodiversity from unassigned genetic signals, in comparison with known biodiversity, and the environmental context.
The second one plans to model the habitat and biogeography of the unknown biodiversity.

In particular, Debeljak showed examples of the 1st notebook with practical applications, such as the comparison of known and unknown metagenomic sequences of picoplankton; correlation between MetaG and MetaT data for Nanoplankton in surface waters; exploration of knowns and unknowns by sampling site and environmental parameters.

2nd Notebook - Next steps: extrapolation to unsampled parts of the ocean

The last speaker was Jean-Olivier Irisson, Computational Ecologist at Sorbonne Université, who provided an overview of the 2nd notebook and outlined the next steps of the Plankton Genomics demonstrator development, focussing on extrapolation to unsampled parts of the ocean. The main goal of the 2nd notebook is to extract information from the open ocean and not only from Tara Oceans as they are coming from local stations and do not cover the entire available dataset. This is going to be performed through habitat modelling.

In addition, the plankton genomics demonstrator is going to bring possible innovations in predicting multiple entities simultaneously, exploiting the knowledge about their relative concentrations and using deep learning to better summarise the environmental context.

Discover the potential of the Plankton Genomics Demonstrator!

Q&A from the session

Below we have collected some of the most relevant questions and answers from the Q&A session.

Is this only a training tool? Or can analysis tools be offered and eventually implemented on computing capacity?
- All Blue-Cloud demos are live tools open for public use. The Plankton Genomics Virtual Lab is not public yet but will be in June 2021.
Considering the AI work, are there any "tasks" that can be of particular interest to propose for a challenge (in the mediterranean sea for example)?
- Yes, but there will be few stations if we restrict it to the Mediterranean Sea (and Pacific) within 200 nm of coasts.
Do the next steps also include the possibiity to have more diverse time periods / variations / seasonality for models?
- Yes, but at a seasonal level. It is not possible to have more because of the lack of data and the fact that plankton reacts to longer term conditions than just the local ones.

Watch the recording and download the slides

YouTube | Twitter | LinkedIn

Blue-Cloud 2026 Final Conference - 28 May 2026 - slides and recordings now available