-A +A

imprimer la page

Bioinformatics

Leader: Emmanuel Barillot

Keywords: tumour, DNA chips, molecular profiles, classification, statistical analysis, databases, cell phenotyping, high throughput sequencing

The Bioinformatics platform plays a two-fold role. On the one hand, we integrate the data generated by the Institut Curie's Biotechnology platforms: genome, transcriptome or proteome array platforms, mass spectrometry proteomics platform, large-scale sequencing platform and cell phenotyping platform. For this, it develops and manages the databases, tools and interfaces required for data integration. On the other hand, we provide collaborative bioinformatics and biostatistics data analysis support for the projects of our biologist or clinician colleagues.

The Institut Curie Bioinformatics platform is located on the Paris campus, in the Developmental Biology and Cancer building. Our work is based on a significant IT infrastructure, managed by the Institut Curie's Systems team (Jean-Gabriel Dick and Camille Barette). It comprises a 50 Terabyte SAN storage system, Sun opteron 8-processor servers (2 based on dual core processors with 32 Gb RAM and 2 based on quad core processors with 256 Gb RAM), along with workstations with dual quad core processors with 16 Gb RAM, i.e. a computing power of 400 logical processors.

IMAGES

- BioIT: database development and monitoring (Philippe La Rosa)


The multiple high throughput molecular approaches generate unprecedented data flows that must be structured and for which a unified view must be provided through an integration bioinformatics platform. This is the mission of the BioIT structure, in charge of the development, maintenance, administration, management and upgrading of the databases, processing pipelines and interfaces making up the platform.
The integration concerns both the clinical and biological data generated by the Institute and the vast amounts of related data, publicly available within the scientific community.

Our browsing and viewing tools provide us with a global overview of data collected, thus facilitating the formulation of working hypotheses, a critical step in the transition from data collection to knowledge discovery. This is based on software solutions available within the scientific community or on tools developed by the BioIT group when necessary.

We also develop automated data processing pipelines. The systematic nature of this approach facilitates traceability, guarantees homogeneous results and allows analyses to be rapidly repeated. The BioIT is in charge of the development of these processing pipelines.

- Biostatistics and data analysis (Philippe Hupé)


This second line of work consists in providing our bioinformatics and biostatistics expertise in the context of collaborations with our biologist and clinician colleagues from the Institute or elsewhere. Indeed, high throughput data analysis must be based both on the command of cutting edge statistics and bioinformatics tools and concepts, and on a thorough understanding of the biological and clinical questions to resolve.
The analysis is conducted by request from our colleagues, in close collaboration with these latter and must start with the definition of the experimental plan. Once the data have been generated, the first step consists in quality control and extraction of the biological signal, frequently referred to as normalization. At this stage, it may be necessary to define ad hoc corrective models and experience maturity is established. Next comes an exploratory analysis phase, with no prior hypotheses, where the experiment's main message is sought, for example the concerned biological pathways. This stage may lead to the formulation of hypotheses, the identification of experimental perspectives, or the definition of new experiments. Following the exploratory phase, an analysis is initiated in view of answering the clinical or biological question posed, for example the comparison of two tumour types or the creation of methods capable of predicting the occurence of metastases.

- Pheno-informatics (Alexandre Hamburger)


Many now standard technologies (micro-arrays, double-hybrid, MS-MS, etc.) have led to the generation of large volumes of data related to cell components (genes, proteins, RNA, etc.) and their interactions. More recently, major image analysis and robotics breakthroughs have provided us with the opportunity to observe the cell as a global entity, presenting a "phenotype", rather than a collection of individual elements.
Pheno-informatics concerns the acquisition, manipulation and analysis of such data: the behaviour of a cell, or population of cells, is quantified according to its type (cell line), miscellaneous disturbances and to the experimental context.
The resulting data may be used as an additional data source, enriching and complementing existing models, or as an autonomous source that could be used to significantly enhance our understanding of cell behaviour. Many applications can be imagined, both in the context of the development of biological knowledge and from a therapeutic standpoint.
In all cases, a new data type, radically different from the standards, requires the implementation of adapted analyses capable of making full use of these data and of intelligently managing its inherent complexity.

- Large-scale sequencing data analysis (Emmanuel Barillot)

The new sequencing technologies (454, Solexa, SOLiD) provide the ability to sequence DNA at an unprecedented rate of up to 10 Gigabases per week. The Institut Curie recently acquired a SOLiD sequencer, now used for studies involving the sequencing of complete genomes, genetic mutations, transcripts (mRNA and small RNA), or the mapping of genomic rearrangements, protein-DNA binding sites, histone modifications, etc. For each experiment, this technology can produce in excess of 100 million sequences of 35 to 50 bases. It requires the use of new tools for managing the large volumes of data, along with adapted analytical strategies and methods. Within this line of work, we are collaborating with the SOLiD platform team and its biologists users in order to define projects, imagine bioinformatics and biostatistical solutions and to conduct the data analyses.

- Key publications

2008

    * Volpe E, Servant N, Zollinger R, Bogiatzi SI, Hupé P, Barillot E, Soumelis V
      A critical function for transforming growth factor-beta, interleukin 23 and proinflammatory cytokines in driving and modulating human T(H)-17 responses
      Nat Immunol. Jun, 9(6):650-7

2007

    * P.Poullet, S.Carpentier, E.Barillot
      myProMS, a web server for management and validation of mass spectrometry-based proteomic data
      Proteomics, Aug;7(15):2553-6
    * Ph. Hupé, Ph. La Rosa, S. Liva, S. Lair, N. Servant, E. Barillot
      ACTuDB, a new database for the integrated analysis of array-CGH and clinical data for tumors
      Oncogene, Oct. 11;26(46):6641-52

2006

    * La Rosa P, Viara E, Hupe P, Pierron G, Liva S, Neuvial P, Brito I, Lair S, Servant N, Robine N, Manie E, Brennetot C, Janoueix-Lerosey I, Raynal V, Gruel N, Rouveirol C, Stransky N, Stern MH, Delattre O, Aurias A, Radvanyi F, Barillot E.
      VAMP: visualization and analysis of array-CGH, transcriptome and other molecular profiles
      Bioinformatics, Sep 1;22(17):2066-73
    * A. Elfilali, S. Lair, C. Verbeke, Ph. La Rosa, F. Radvanyi and E. Barillot
      ITTACA: a new database for integrated tumor transcriptome array and clinical data analysis
      Nucleic Acids Research, Jan 1;34
 


Institut Curie
14/09/2010