Dr Philippe Rocca-Serra describes importance of ISA tools to life sciences

Dr Philippe Rocca-Serra describes importance of ISA tools to life sciences

The open source ISA framework and tools help to manage an increasingly diverse set of life science, environmental and biomedical experiments where one or a combination of technologies are employed.

Built around the 'Investigation' (project context), 'Study' (unit of research) and 'Assay' (analytical measurement) data model and serialisations (tabular, JSON and RDF), the ISA framework helps researchers provide rich descriptions of experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships) so that the resulting data and discoveries are reproducible and reusable.

Dr Philippe Rocca-Serra, the Centre's Senior Research Lecturer and co-lead of the ISA-tools project for ELIXIR UK, recently described why ISA is important to life science research to the EMBL Australia BioInformatics Resource.

What is ISA?

ISA is both a model and format to support the description of experimental studies in the field of biology.

Why does it matter for those of us in life science research?

It matters since it is part of the answer to the reproducibility problem, which has hit the headlines.

Awareness of data formats and specifications helps to create data management plans, which are key to moving data management from a retrospective activity to a prospective one.

Who is it for?

For scientists who need to look after their data, for students who need to learn about managing their digital output, for librarians and data managers to handle digital assets and for bioinformaticians.

Who is using it?

Data repositories and publishers now rely on this format. At the EMBL-EBI (the European BioInformatics Institute), the Metabolights repository uses the format to collect study descriptions for metabolomics studies. It is also at the core of the Horizon2020 PhenoMenal. Now, publishers, such as Nature Publishing Group or BiomedCentral choose ISA to back new titles such as Scientific Data and GigaScience.

The Harvard Stem Cell Discovery Engine has chosen ISA for managing datasets for its distinctive ability to support multiple data acquisition modalities. This feature sets it aside from many formats which are often tied to a data silo. At the same time, ISA maintains a mapping process to link to other formats, allowing conversion and possible deposition of datasets to those public repositories.

How is it relevant to bioinformatics?

Genomics and metabolomics techniques are becoming increasingly important to clinical applications. This is also true in translational research, biotechnology and the food industry, which are key areas for countries such as Australia. ISA has also been extended by a number of groups, such as the CaNanoLab, a data sharing portal designed to facilitate information sharing across the international biomedical nanotechnology research community to expedite and validate the use of nanotechnology in biomedicine, and the MIAPPE, which is devising a Minimum Information document which lists attributes that might be necessary to fully describe a plant phenotyping experiment. It is important for researchers to be aware of such initiatives.

How do we get involved?

ISA forum (isaforum@googlegroups.com) – email for help and support

User community – join the discussion

ISA github repository – contribute code and find all the different code repositories for the tools, specifications and docker containers for various micro-services being developed for Galaxy integration under H2020 PhenoMenal.

More information

The European Bioinformatics Institute (EMBL-EBI) is an academic research institute based in the UK and part of the European Molecular Biology Laboratory (EMBL). Established in 1994, EMBL-EBI grew out of EMBL's commitment to making biological data and information accessible to life sciences in all disciplines.

EMBL Australia Bioinformatics Resource (EMBL-ABR) is a distributed national research infrastructure providing bioinformatics support to life science researchers in Australia. It was set up as a collaboration with the EMBL-EBI to maximise Australia's bioinformatics capability.

Dr Rocca-Serra received a PhD in Molecular Biology from the University of Bordeaux, moving to the field of bioinformatics upon joining the Microarray Informatics Team at the EMBL-EBI, Cambridge. There, working at establishing ArrayExpress, he became an active member of several standardisation efforts aimed at promoting the vision for open data and open science. As part of several EU projects in toxicogenomics and nutrigenomics, he coordinated the development of the ISA project, which now continues at the e-Research Centre under Dr Susanna-Assunta Sansone.