Dr Susanna-Assunta Sansone

Associate Director (Life, Natural and BioMedical Sciences)

http://uk.linkedin.com/in/sasansone - twitter: @SusannaASansone

I am Associate Director at the University of Oxford e-Research Centre and I am also a consultant at Springer Nature, and Founding Honorary Academic Editor of Scientific Data, an open access data publication platform.

I hold a PhD in Molecular Biology from Imperial College of Science, Technology and Medicine, London, UK; after few years working on vaccine genetics in an Imperial's spinnoff (now known as Emergent BioSolutions, Inc .) I moved to the European Bioinformatics Institute (EBI, Cambridge) where I worked for nine years as a Project and Team Coordinator and Principal Investigator, before moving to the Oxford e-Research Centre in 2010.


My interest and activities are in the areas of knowledge and information management, and interoperability of applications. Since 2000, my projects have focused on improving the collection, curation and representation of multi-dimensional life, environmental and biomedical sciences data, to support data reproducibility and the evolution of scholarly publishing, which drive science and discoveries. I collaborate with a variety of stakeholders in the academic, governmental and commercial settings to: promote the creation and uptake of community-developed data representation standards, such as ontology, and semantic web methodologies, and develop standards-driven resources for a coherent representation of the experimental information. My work supports meta-analysis, facilitates data interoperability and the analytical workflow.

I lead the Centre in several UK Research Councils, European Commission, USA National Institutes of Health (NIH) and pharma-funded research and infrastructure projects. Among the projects, I am a co-Investigator of the ELIXIR UK Node, where I am responsible for standards and curation areas; I am also the international partner in two NIH Big Data to Knowledge Centers of Excellence.

I am a founding and core member of several international grass-root standards and advocacy groups, and seat on the board of few non-for-profit efforts, including Dryad, the Research Data Alliance, and Force11 working on promoting and supporting the data reproducibility agenda.


There is a lack of teaching/training material and courses in my areas, therefore I focus on contributing to their creation. I give lectures, provide tutorial, mentoring and supervision to high achieving undergraduate and graduate students, including a DPhil graduate; including the newly established NIH Big Data to Knowledge Guide to the Fundamentals of Data Science Series. I am co-leader of the newly established “Data Management, Analysis and Statistics” foundation mandatory module, designed for the BBSRC Oxford Interdisciplinary Bioscience Doctoral Training Programme and the EPSRC and BBSRC synthetic Biology Centre for Doctoral Training. In an international context, I am co-founder of the nascent Data Carpentry for Life Science, an initiative under the ELIXIR UK umbrella, designed to teach basic concepts, skills and tools for working more effectively with data.

  • Nature Publishing Group - Open Data; Consultant
  • ELIXIR-UK Node; Executive Board Member
  • Dryad; Board of Directors Vice-Chair
  • Elsevier Research Data Management; Advisory Board Member
  • Research Data Alliance; Technical Advisory Board Member
  • Research File Service Project Board (formerly Storage as a Service), University of Oxford - Chair
  • IT Architecture Advisory Group, University of Oxford - Member
  • Research Data Management Delivery Group, University of Oxford - Member
  • Data Intensive Bioscience Expert Working Group, BBSRC - Member
  • BBSRC Oxford Interdisciplinary Bioscience DTP - Data Management Analysis & Statistics module; Co-lead


BioSharing and ISA are long-standing and mature infrastructure and resources I run, serving a variety of stakeholder communities in the life sciences, providing them with access to: registries of information on open community standards and a suite of software for collection, curation and storage of data and its provenance, along with semantic technologies and data publication methods

ISA infrastructure and ISA Commons

Embedded in several funded project

Providing a toolkit and a community-driven format, implemented by a growing community of service providers, institutional projects and data journals - to facilitate standards compliant collection, curation, sharing and publication of experiments in the life, natural and biomedical sciences.

Resource approved by the ELIXIR UK Node, and an ELIXIR Service, part of interoperability platform.

BioSharing Information Resources

Embedded in several funded projects

A curated, informative and educational resource on inter-related data standards, databases, and policies in the life, environmental and biomedical sciences, working with and for researcher, standard/database developer, funder, journal editor, librarian or data manager looking to make informed decisions.

Resource approved by the ELIXIR UK Node and part of the ELIXIR Service ELIXIR interoperability platform.

StatO and OBI - Ontologies for Statistics Results and BioMedical Investigation

Embedded in several funded project

The Ontology for Biomedical Investigations (OBI) project is an international, collaborative effort to build an integrated ontology for the description of biological and clinical investigations.

Digital platforms for scholarly publishing


Collaborations with scientific, technical and medical publishers, including Springer Nature Scientific Data and BioMedCentral (soon Oxford University Press) GigaScience on novel data platforms and ways to track and publish scholarly outputs.


NIH BD2K CEDAR - Centre for Expanded Data Annotation and Retrieval

Funds and duration: NIH, 2014-2018

CEDAR works to facilitate the use of metadata in the analysis of Big Data sets, contributing to the implementation of NIH Big Data two Knowledge (BD2K) initiative's vision. We work with colleagues at Stanford and Yale Universities to create a unified framework that researchers can use to create consistent, easily searchable standards-compliant metadata. As partner in the centre, I also seat on the Steering Committee, bringing in ISA, BioSharing and our ontology activities.

NIH BD2K BioCADDIE - Biomedical and healthCAre Data Discovery and Indexing Ecosystem

Funds and duration: NIH, 2014-2017

BioCADDIE engages a broad community of stakeholders to create the NIH Big Data two Knowledge (BD2K) Data Discovery Index (DDI). The DDI will do for data what PubMed (and PubMed Central) did for the literature. I seat on its Executive and Steering Committee and lead several working groups, bridging our BioSharing activities on standards and metadata.


Funds and duration: BBSRC, MRC, NERC, 2014-2017 (phase 1); EC, 2015- 2018

The UK Node contributes the country’s substantial expertise in bioinformatics expertise for researchers, computer scientists and data managers in the Life, Natural and Medical Sciences. We lead on standards and curation areas. The UK Node is also funded as part of the larger ELIXIR EXCELERATE grant, set to better integrate activities cross all nodes.

IMI eTRIKS - European Translational Information and Knowledge Management Services

Funds and duration: Roche, 2014-2017

eTRIKS develops the knowledge management platform and services to support data intensive translational research for the Innovative Medicines Initiative (IMI), Europe’s largest public-private initiative. Funded by Roche, we bring in this project ISA, BioSharing and our expertise on community standards.

COPO - Collaboratively Open Plant Omics

Funds and duration: BBSRC, 2015-2018

A collaboration with Earlham Institute, Warwick and EMBL-EBI, COPO develops a framework to utilise existing services to facilitate the description, deposition and publication of datasets, but also to enable the identification and citation of datasets, thereby increasing transparency and reproducibility.

UK-China collaboration on omics data publication and curation

Funds and duration: BBSRC, 2012-2015 (phase 1), 2015-2018 (phase 2)

Collaboration with GigaScience, a joint BioMedCentral and BGI data journal with associated database to define common curation practices for omics-based datasets.

Metagenomics Data Infrastructure

Funds and duration: BBSRC, 2012-2015 (Completed)

Coordinated by EMBL-EBI, the Metagenomics service is being developed to be an automated pipeline for the curation, archiving and analysis of metagenomic data.

COSMOS - COordination Of Standards In MetabOlomicS

Funds and duration: EC FP7, 2012-2015 (Completed)

A collaboration with EBI and a variety of other European partners, COSMOS (Coordination of Standards in Metabolomics) has brought together European metabolomics data providers to set and promote community standards.

PhenoMeNal: Infrastructure for phenome and metabolome analysis

Funds and duration: EC H2020, 2015-2018

A collaboration with variety of other European partners to develop a data processing and analysis infrastructure (and related services) for molecular phenotype data generated by metabolomics applications, set to improve the understanding of the causes and mechanisms underlying health, healthy ageing and diseases.

MultiMot: Infrastructure for cell migration data

Funds and duration: EC H2020, 2015-2018

A coordinated action with a variety of European partners, linked to international efforts to set and promote community standards and infrastructure to report and share cell migration data.


All my publications are here.

Selected one, showing the diversity of the work we do:

Standardizing data. Nature Nanotechnology.