Plankton Genomics

The demonstrator “Plankton Genomics” is led by the European Bioinformatics Institute (EMBL-EBI) and created by the Faculty of Sciences at Sorbonne University, with contributions from Flanders Marine Institute (VLIZ).

The aim of the plankton genomic demonstrator is to showcase a deep assessment of plankton distributions by mining data across biomolecular, imaging and environmental domains. It will draw on the outputs of initiatives such as Tara Oceans and will focus on two key objectives:

  • Notebook 1: Exploring genetic data & identifying clusters containing unknown genes
    • Discovery of as yet undescribed biodiversity from genetic and morphological signals from the characterisation of their geographical distributions, co-occurrences/exclusions and correlation with environmental contexts.
  • Notebook 2: Mapping the geographic distribution of plankton functional gene clusters using habitat prediction models
    • Exploration of genetic and morphological markers of plankton diversity and abundance, in particular the new ones discovered above, to predict their spatiotemporal distribution and serve as high-resolution EOVs for biological processes.

The initial users of the plankton genomics demonstrator are, primarily, scientific researchers, including taxonomists, computational ecologists and bioinformaticians with extensive knowledge of the data collected during the Tara Oceans Expedition. In the short term, we expect an important uptake of the demonstrator by European initiatives such as the H2020 Blue Growth project AtlantECO, the Ocean Sampling Day initiative, and the Marine Genomic Observatories in close collaboration with EMBRC and ASSEMBLE Plus.

The end-users include a broad base of scientists in quest of the identification of unknown sequences in the oceanic environment, and also interested, for example in plankton biogeography, marine biogeochemistry, ecosystem health, and climate science.


A service to enabling the discovery of the unknown marine plankton genes (from annotation files) of the Tara Ocean expedition dataset and the building of gene clusters by similarities of sequences and larger metabolic pathways.