abstracts book - Campus des laboratoires de Villejuif

Transcription

abstracts book - Campus des laboratoires de Villejuif
TRANSCRIPTOME 2000
From Functional Genomics to Systems Biology
Paris, 6-9 November 2000 - Pasteur Institute
De la Génomique fonctionnelle à la Biologie des systèmes
Paris, 6-9 Novembre 2000 - Institut Pasteur
ABSTRACTS / RESUMES
Organizing Committee / Comité d'organisation
Charles Auffray (CNRS, France)
Bento Soares (University of Iowa, USA)
Sumio Sugano (University of Tokyo, Japan)
TRANSCRIPTOME 2000
FROM FUNCTIONAL GENOMICS TO SYSTEMS BIOLOGY
(DE LA GENOMIQUE FONCTIONNELLE A LA BIOLOGIE DES SYSTEMES)
NOVEMBER 6-9, 2000 - INSTITUT PASTEUR, PARIS, FRANCE
TABLE OF CONTENTS
General Information
Introduction
…………………………………………………………………3
…………..…………………………………………………………..….7
Scientific Program
Speakers Abstracts
…………………………………………………………………8
…………….…………………………………..……..15
Index …………..…….……………………………………………………………….44
2
GENERAL INFORMATION (INFORMATION GENERALE)
SCIENTIFIC COMMITTEE (COMITE SCIENTIFIQUE)
Charles Auffray, Genexpress, CNRS, Villejuif, France
Bento Soares, University of Iowa, Iowa City, USA
Sumio Sugano, University of Tokyo, Tokyo, Japan
LOCAL ORGANIZING COMMITTEE (COMITE D’ORGANISATION LOCAL)
Genexpress, CNRS, Villejuif, France
Institut Pasteur, Paris, France
General Secretariat (Secrétariat Général)
Odile Brasier
Flavie Brocher
Bénédicte Ecoutin
Web site and Poster (Site web et affiche)
Bertrand Bed’hom
Scientific and Social Program (Programme scientifique et social)
Charles Auffray
Sylvie Bortoli
Charles Decraene
Nicole Adeline Fayein
Sandrine Imbeaud
Betina Porcel-Setterblad
Béatrice Soury-Ségurens
Patrick Zaborski
Rima Zoorob
Philippe Glaser
3
ACKNOWLEDGMENTS (REMERCIEMENTS)
The organizers of the conference gratefully acknowledge the contributions of the following
institutions, charities and companies for their support
Les organisateurs de la conférence remercient vivement les institutions, associations et
sociétés suivantes pour leur soutien :
For their patronage (Pour leur patronage)
Le Ministre de la Recherche
L’Académie des Sciences
The European Union
For their support (Pour leur soutien)
Amersham Pharmacia Biotech
Association Française contre les Myopathies
Biospace Instruments et Mesures
Centre National de la Recherche Scientifique
Compaq
Department of Energy
InforMax
Institut National de la Santé et de la Recherche Médicale
Institut Pasteur
Ligue Nationale contre le Cancer
Ministère de l’Economie, des Finances, et de l’Industrie
Ministère de la Recherche
National Cancer Institute
Novartis
4
CHECK-IN (ENREGISTREMENT)
During the scientific sessions the conference registration desk will be located in the hall of the
C.I.S. (Centre d’Information Scientifique) and staff will be available to assist you with any
requests or questions during session hours.
Pendant les séances scientifiques, le bureau d’enregistrement sera situé dans le hall du
C.I.S. (Centre d’Information Scientifique) et le personnel sera disponible pour répondre à vos
questions durant les sessions.
SESSIONS (SEANCES)
All sessions will be held in the C.I.S. Auditorium. A name badge is required for all sessions.
Toutes les séances se tiendront dans l’Auditorium du C.I.S. Un badge nominatif sera
indispensable pour assister aux sessions.
EXHIBITS (STANDS)
All booths will be located close to the C.I.S. Auditorium.
Les stands seront situés près de l’Auditorium du C.I.S.
MEALS (REPAS)
As shown on the program schedule, lunches and coffee breaks are included with registration
for conference participants, as well as the « Wine and cheese » cocktail on Monday,
November 6. Other dinners are not included, but the schedule allows ample time to have
meals on your own. A Gala Evening will be held on Tuesday November 7, on « Le Grand
Pavois » for a cruise-dinner, an unforgettable experience in the heart of Paris. Advance
registration additional payment for the gala evening was required and gala evening tickets
distributed with conference materials at registration check-in. Transportation for gala evening
registrants will leave the Institut Pasteur promptly after the last conference on Tuesday
evening.
Due to space limitations, we are unable to accomodate guests for lunches or coffee breaks.
Guests have not permitted access to the scientific sessions.
Comme indiqué dans le programme, les déjeuners et les pauses-café sont compris dans le
montant des droits d’inscription, ainsi que le cocktail d’inauguration du lundi 6 novembre. Les
autres dîners ne sont pas compris, mais le programme laisse le temps nécessaire pour dîner
dans Paris selon le choix de chacun. Une soirée de gala est prévue le mardi 7 novembre à
bord du « Grand Pavois » pour un dîner-croisière, un moment inoubliable au cœur de Paris.
Les droits d’inscription de ce dîner ont été acquittés avant le congrès, et les tickets remis
5
avec les autres documents lors de l’enregistrement. Le transport sera organisé pour la soirée
de gala à 19h15 précises devant l’Institut Pasteur le mardi soir.
En raison du nombre limité de places, nous ne sommes pas en mesure de prendre en
charge les accompagnants lors des déjeuners ou pauses-café. Les accompagnants n’ont
pas accès aux séances scientifiques.
6
INTRODUCTION
TRANSCRIPTOME 2000 is part of a series of conferences and coordination workshops
initiated by the founders of the IMAGE Consortium (Charles Auffray, Greg Lennon, Mihael
Polymeropoulos, Bento Soares) with active support from the DOE (Marvin Stodolsky) and
numerous public and private organizations. TRANSCRIPTOME 2000 follows the conference
organized last year in Japan by Nobuo Nomura and Michio Oishi at the Kazusa DNA
Research Institute.
Speakers from all over the world will discuss and debate the most recent advances in the
emerging field of functional genomics, the study of biological systems based on global
knowledge of genomes, transcriptomes and proteomes.
Transcription of DNA into RNA followed by translation of messenger RNA into proteins are
the fundamental mechanisms underlying the functioning of living organisms. The discovery of
reverse transcription of mRNA into DNA allowed the development of cDNA cloning, one of
the fundamental techniques of genetic engineering described for the first time 25 years ago.
Some of the pioneers who contributed to the elucidation of these mechanisms will present an
historic overview of this great endeavour and their vision of the future.
***
TRANSCRIPTOME 2000 s'inscrit dans la série de conférences et de réunions de
coordination amorcée par les fondateurs du Consortium IMAGE (Charles Auffray, Greg
Lennon, Mihael Polymeropoulos, Bento Soares) avec le soutien actif du Department of
Energy américain (Marvin Stodolsky) et de nombreuses organisations publiques et privées.
TRANSCRIPTOME 2000 fait suite à la conférence organisée l’an dernier au Japon par
Nobuo Nomura et Michio Oishi au Kazusa DNA Research Institute.
Lors de TRANSCRIPTOME 2000, des conférenciers venus du monde entier discuteront et
débattront des avancées les plus récentes du champ émergent de la génomique
fonctionnelle, l'étude des systèmes biologiques fondée sur la connaissance globale des
génomes, des transcriptomes et des protéomes.
La transcription de l'ADN en ARN, suivie de la traduction des ARN messagers en protéines
sont les mécanismes fondamentaux qui sous-tendent le fonctionnement des êtres vivants. La
découverte de la rétro transcription d'ARN en ADN a permis le développement du clonage
d'ADNc, l'une des techniques fondamentales du génie génétique décrite pour la première fois
il y a 25 ans. Certains des pionniers qui ont contribué à élucider ces mécanismes vous
présenteront une rétrospective historique de cette grande aventure et leur vision du futur.
Charles Auffray, Genexpress, CNRS, Villejuif, France
Bento Soares, Université d’Iowa, Iowa City, USA
Sumio Sugano, Université de Tokyo, Tokyo, Japon
7
SCIENTIFIC PROGRAM (PROGRAMME SCIENTIFIQUE)
MONDAY, NOVEMBER 6, 2000
2:00 - 5:00 pm
(14:00 - 17:00)
LUNDI 6 NOVEMBRE 2000
Reception and Registration of Participants
(Accueil et inscription des participants)
Hall of C.I.S. Auditorium
Auditorium, Hall du C.I.S.)
5:00 - 7:00 pm
(17:00 - 19:00)
Opening Session
(Séance d'ouverture)
5:00 - 5:30 pm
(17:00 - 17:30)
Welcome Addresses
(Allocutions de bienvenue)
Philippe Kourilsky, Director General of Institut Pasteur, Paris, France
Francis Galibert, on Behalf of the Director of the Life Sciences Department (représentant la
Directrice du Département des Sciences de la Vie), CNRS, Paris, France
Hervé Chneiweiss, on Behalf of the French Minister of Research (représentant le Ministre
de la Recherche), Paris, France
5:30 - 7:00 pm
(17:30 - 19:00)
Session 1 : 25 Years of cDNA Research
(1ere séance: 25 ans de recherche avec les ADNc)
Chairperson (Modérateur) : Federico Mayor, Fundacion para una Cultura de Paz, Madrid,
Spain
5:30 - 6:00 pm
(17:30 - 18:00)
The Human Genome, Health and Bioethics
(Le génome humain, la santé et la bioéthique)
Federico Mayor, Fundacion para una Cultura de Paz, Madrid, Spain
6:00 - 6:30 pm
(18:00 - 18:30)
From the « Messenger » Saga to the Transcriptome Era
(De la saga du « messager » à l’ère du transcriptome)
François Gros, Académie des Sciences, Paris, France
6:30 - 7:00 pm
(18:30 - 19:00)
TRANSCRIPTOME 2000 : From Functional Genomics to Systems Biology
(TRANSCRIPTOME 2000 : De la génomique fonctionnelle à la biologie des systèmes)
Charles Auffray, Genexpress, CNRS, Villejuif, France
7:00 - 8:30 pm
(19:00 - 20:30)
Welcome Cocktail
(Cocktail de bienvenue)
8
TUESDAY, NOVEMBER 7, 2000
8:30 - 1:00 am
(8:30 - 13:00)
MARDI 7 NOVEMBRE 2000
Session 2 : cDNA Cloning and Sequencing
(2e séance : Clonage et séquençage d’ADNc)
Chairpersons (Modérateurs) : Bento Soares, University of Iowa, Iowa City, USA,
and Sumio Sugano, University of Tokyo, Tokyo, Japan
8:30 - 9:00 am
Novel Approaches for Gene Discovery and Selection of Full-Length cDNAs for the
Mammalian Gene Collection Program (MGC)
(Nouvelles approches pour la découverte de gènes et la sélection d’ADNc complets
pour le programme « collection de gènes mammaliens » MGC)
Maria de Fatima Bonaldo, University of Iowa, Iowa City, IA, USA
9:00 - 9:30 am
Analysis of Newly Identified Human cDNAs encoding Large Proteins : Integration of
the Genomic and cDNA Sequence Data to Move Beyond the Identification of
Transcribed Sequences
(Analyse d’ADNc humains nouvellement identifiés encodant de grandes protéines :
Intégration des données génomiques et des séquences d’ADNc pour dépasser le
stade de l’identification des séquences transcrites)
Omahu Ohara, Kazusa DNA Research Institute, Kisarazu, Japan
9:30 - 10:00 am
Over 2.4 Million Expressed Sequence Tags (ESTs) and Counting..…
(Plus de 2,4 millions d’étiquettes de séquences exprimées, comptage en cours…)
Sandra Clifton, Washington University, St Louis, MD, USA
10:00 - 10:30 am
Sequencing and Analysis of Full-Length cDNAs in the German cDNA Network
(Séquençage et analyse d’ADNc complets dans le réseau allemand)
Stefan Wiemann, DKFZ, Heidelberg, Germany
10:30 - 11:00 am
Coffee Break
(Pause café)
11:00 - 11:30 am
NEDO cDNA Sequencing Project
(Le projet NEDO de séquençage d’ADNc)
Sumio Sugano, University of Tokyo, Tokyo, Japan
11:30 - 12:00 pm
From Leukemia Patient to Full-Length cDNA Sequence
(Du patient leucémique à la séquence complète d’ADNc)
Judy Margolin, Baylor College of Medicine, Houston, TX, USA
12:00 - 12:30 pm
RIKEN Mouse cDNA Encyclopedia Project
(Le projet d’encyclopédie des ADNc de Souris au RIKEN)
Jun Kawai, RIKEN, Tsukuba, Japan
12:30 - 1:00 pm
(12:30 - 13:00)
Shotgun Sequencing the Human Transcriptome with Open Reading Frame ESTs
(ORESTES)
(Séquençage par mitraillage du transcriptome humain à l’aide d’étiquettes des
cadres ouverts de lecture de séquences exprimées)
Andrew Simpson, Ludwig Institute for Cancer Research, Sao Paulo, Brazil
1:00 - 2:30 pm
(13:00 - 14:30)
Lunch buffet
(Déjeuner buffet)
9
2:30 - 6:30 pm
(14:30 - 18:30)
Session 3 : cDNA Clustering and Genome Annotation
(3e Séance : Regroupement d’ADNc et annotation du génome)
Chairpersons (Modérateurs) : Winston Hide, SANBI, Capetown, South Africa
and Doron Lancet, Weizmann Institute, Rehovot, Israel
2:30 - 3:00 pm
(14:30 - 15:00)
Clustering Enriches the I.M.A.G.E. Collection
(Le regroupement enrichit la collection I.M.A.G.E)
Peg Folta, Department of Energy, Livermore, CA, USA
3:00 - 3:30 pm
An Alternate Transcription Map of Chromosome 22 Based on Verified Transcript
Variation
(Une carte transcriptionnelle alternative du chromosome 22 fondée sur la
vérification de la variabilité des transcrits)
Winston Hide, SANBI, Capetown, South Africa
(15:00 - 15:30)
3:30 - 4:00 pm
(15:30 - 16:00)
UniGene, the Genome, and the Transcriptome
(UniGene, le génome et le transcriptome)
Lukas Wagner, NCBI, Bethesda, MD, USA
4:00 - 4:30 pm
(16:00 - 16:30)
The TIGR Gene Indices : Reconstruction and Annotation of Transcribed Sequences
(Les index de TIGR : Reconstruction et annotation de séquences transcrites)
John Quackenbush, TIGR, Gaithersburg, MD, USA
4:30 - 5:00 pm
(16:30 - 17:00)
Coffee Break
(Pause café)
5:00 - 5:30 pm
(17:00 - 17:30)
Searching for the Protein Coding Genes on the Human Genome Sequence
(Recherche des gènes encodant des protéines dans la séquence du génome
humain)
William Saurin, Génoscope, Evry, France
5:30 - 6:00 pm
(17:30 - 18:00)
Harvesting the Human Genome : A World-Wide Endeavor
(Moissonner le génome humain : Un effort international)
Doron Lancet, Weizmann Institute, Rehovot, Israel
6:00 - 6:30 pm
(18:00 - 18:30)
Reconstructing the Human Transcriptome from the 3' End
(Reconstruction du transcriptome humain à partir de l'extrémité 3’)
Philipp Bucher, Swiss Institute of Bioinformatics, Lausanne, Switzerland
8:30 - 12:00 am
(20:30 - minuit)
Gala Evening
(Soirée de Gala)
10
WEDNESDAY, NOVEMBER 8, 2000
8:30 - 1:00 pm
(8:30 - 13:00)
MERCREDI 8 NOVEMBRE 2000
Session 4 : Transcriptome Analysis
(4e séance : L’analyse de transcriptomes)
Chairpersons (Modérateurs) : Bertrand Jordan, Génopole, CNRS/INSERM, Marseille,
France and Roger Bumgarner, University of Washington, Seattle, USA
8:30 - 9:00 am
Gene Expression Profiling of Primary Breast Carcinomas Using Nylon Arrays of
Candidates Genes
(Profilage de l'expression génique de carcinomes primaires du sein en utilisant des
réseaux de gènes candidats sur Nylon)
Catherine Nguyen, Génopole, CNRS/INSERM, Marseille, France
9:00 - 9:30 am
Statistical Analysis, Normalization and Reproducibility of Microarray Data
(Analyse statistique, normalisation et reproductibilité des données collectées avec
des microréseaux)
Roger Bumgarner, University of Washington, Seattle, WA, USA
9:30 - 10:00 am
DNA Array Applications in a Diverse Academic Setup
(Applications des réseaux d’ADN dans un environnement académique divers)
Shirley Horn-Saban, Weizmann Institute, Rehovot, Israel
10:00 - 10:30 am
Explaining Gene Expression Clusters Through Integration of Genome Annotation
and Microarray Data
(Explication des regroupements de gènes exprimés par l'intégration de l'annotation
du génome avec les données collectées avec des microréseaux)
Terry Gaasterland, Rockefeller University, New York, NY, USA
10:30 - 11:00 am
Coffee Break
(Pause café)
11:00 - 11:30 am
Exploring Human Transcriptomes Using cDNA Macro and Microarray Technologies
(Exploration de transcriptomes humains à l’aide des technologies de macro et de
microréseaux d’ADNc)
Sandrine Imbeaud, Genexpress, CNRS, Villejuif, France
11:30 - 12:00 pm
Analysis of Gene Expression in Xenopus Embryos Identifies Metabolic Pathways,
Predicts Gene Function and Provides a Global View of Embryonic Patterning
(L’analyse de l’expression des gènes chez les embryons de Xénope identifie des
voies métaboliques, prédit la fonction des gènes et fournit une vue globale de la
structuration de l’embryon)
Nicolat Pollet, DKFZ, Heidelberg, Germany
12:00 - 12:30 pm
Adapter-Tagged Competitive PCR and its Application to the Mammalian Central
Nervous System
(L’amplification par PCR compétitive à l’aide d’adapteurs étiquetés et son
application à l’étude du système nerveux central mammalien)
Kikuya Kato, Nara Institute of Science and Technology, Nara, Japan
12:30 - 1:00 pm
(12:30 - 13:00)
Storing, Managing and Analyzing Microarray Data
(Stockage, gestion et analyse des données collectées avec des microréseaux)
Alvis Brazma, European Bioinformatics Institute, Hinxton, UK
1:00 - 2:30 pm
(13:00 - 14:30)
Lunch Buffet
(Déjeuner buffet)
11
2:30 - 7:00 pm
(14:30 - 19:00)
Session 5 : Transcriptomes, Proteomes and Systems Biology
(5e séance : Transcriptomes, protéomes et biologie des systèmes)
Chairperson (Modérateur) : Margaret Buckingham, Institut Pasteur, Paris, France
2:30 - 3:00 pm
(14:30 - 15:00)
Analysis of Genomes and Transcriptomes in Terms of the Occurrence of
Protein Parts and Features
(Analyse de génomes et de transcriptomes en terme d’occurrence des
caractéristiques et des éléments des protéines)
Mark Gerstein, Yale University, New Haven, CT, USA
3:00 - 3:30 pm
(15:00 - 15:30)
Bridging Genomics with Proteomics: DNA and Protein Analysis on Arrays
(Relier la génomique à la protéomique : analyse d’ADN et de protéines sur des
réseaux)
Holger Eickhoff, Max-Planck Institute, Berlin, Germany
3:30 - 4:00 pm
Proteomics and the Challenge of Hydrophobic Membrane Proteins : the Example of
Chloroplast Membranes
(La protéomique et le défi des protéines de membrane hydrophobes : l’exemple des
membranes des chloroplastes)
Jacques Joyard, Génopole, CEA, Grenoble, France
(15:30 - 16:00)
4:00 - 4:30 pm
(16:00 - 16:30)
Tools for Functional Genomics Using Transcript Profiles and Proteomics
(Outils pour la génomique fonctionnelle utilisant les profils d’expression et la
protéomique)
Joakim Lundeberg, Royal Institute of Technology, Stockholm, Sweden
4:30 - 5:00 pm
(16:30 - 17:00)
Coffee Break
(Pause café)
5:00 - 5:30 pm
(17:00 - 17:30)
Proteomic Strategies in Cancer
(Stratégies protéomiques en cancérologie)
Julio Celis, University of Aarhus, Denmark
5:30 - 6:00 pm
(17:30 - 18:00)
Proteomics Databases
(Bases de données protéomiques)
Amos Bairoch, Swiss Institute of Bioinformatics, Geneva, Switzerland
6:00 - 6:30 pm
(18:00 - 18:30)
The Physiome Project : Integrating from Genomics to Function or vice versa
(Le projet physiome : Intégration de la génomique à la fonction ou vice versa)
James B. Bassingthwaighte, University of Washington, Seattle, WA, USA
6:30 - 7:00 pm
(18:30 - 19:00)
DOE Genome Scale Expression Efforts
(Les projets d’étude d’expression à l’échelle du génome au DOE)
Marvin Stodolsky, Department of Energy, Germantown, MD, USA
12
THURSDAY, NOVEMBER 9, 2000
8:30 - 1:00 pm
JEUDI 9 NOVEMBRE 2000
Session 6 : Applications in Biology, Biotechnology and Medicine
(6e séance : Applications en biologie, biotechnologie et médecine)
Chairpersons (Modérateurs) : Michel Caboche, Génoplante, INRA, Evry, France
and Greg Lennon, VeraGene, Potomac, MD, USA
8:30 - 9:00 am
Production and Quality Assessment of Full-Length-Enriched cDNA Libraries and
their use in transcriptome profiling using microarrays
(Production et contrôle-qualité de banques d’ADNc enrichies en clones complets et
leur utilisation pour le profilage de transcriptomes avec des microréseaux)
Claudio Schneider, Laboratorio Nazionale CIB, Trieste, Italy
9:00 - 9:30 am
Gene Expression Profiling of 3 Solid Tumors
(Profilage de l'expression génique de 3 tumeurs solides)
Annemarie Poustka, DKFZ, Heidelberg, Germany
9:30 - 10:00 am
The Cancer Gene Anatomy Program (CGAP) and the Mammalian Gene Collection
(MGC) : cDNA Resources for the Community
(Le programme « anatomie des gènes et cancer » (CGAP) et la collection de gènes
mammaliens (MGC) : Ressources d'ADNc pour la communauté)
Robert Strausberg, National Cancer Institute, Bethesda, MD, USA
10:00 - 10:30 am
Expression Profiling in Patient Tissue for Insights into Etiology and Pathophysiology
of Progressive Disease
(Le profilage d'expression dans les tissus de patients pour obtenir un aperçu de
l'étiologie et de la physiopathologie des maladies progressives)
Eric Hoffman, George Washington University, Washington, DC, USA
10:30 - 11:00 am
Coffee Break
(Pause café)
11:00 - 11:30 am
Molecular Immunology 2000: Transcriptional Profiling of Regulatory Vα24JαQ T
cells from Identical Twins Discordant for Type I Diabetes, and a New Mechanism for
Regulation of the Immune Response
(Immunologie moléculaire 2000: profilage transcriptionnel de cellules T régulatrices
V α24J αQ de jumeaux identiques discordants pour le diabète de type 1, et un
nouveau mécanisme pour la régulation de la réponse immune)
Jack Strominger, Harvard University, Cambridge, MA, USA
11:30 - 12:00 pm
A new highly sensitive microarray approach for differential screening using
radioactive probes
(Une nouvelle approche hautement sensible utilisant des microréseaux pour le
criblage différentiel à l’aide de sondes radioactives)
Jacques Mallet, CNRS, Paris, France
12:00 - 12:30 pm
High Throughput SNP scoring using Rolling Circle Amplification
(Enregistrement de SNP à haut débit en utilisant l'amplification par cercle
déroulant)
Tony Smith, Amersham Pharmacia Biotech, Amersham, Buckinghamshire, UK
12:30 - 1:00 pm
(12:30 - 13 00)
Populations, SNPs and Chips in Common Disease Mapping
(Populations, SNP et biopuces pour la cartographie des maladies communes)
Andres Metspalu, University of Tartu, Tartu, Estonia
1:00 - 2:30 pm
(13:00 - 14:30)
Lunch Buffet
(Déjeuner buffet)
13
2:30 - 4:30 pm
(14:30 - 16:30)
Session 7 : Future Perspectives
(7e séance : Futures perspectives)
Chairperson (Modérateur) : Zhu Chen, Shanghai University, Shanghai, China
2:30 - 3:00 pm
(14:30 - 15:00)
Gene Identification Projects at TIGEM
(Projets d’identification des gènes au TIGEM)
Giuseppe Borsani, TIGEM, Napoli, Italy
3:00 - 3:30 pm
(15:00 - 15:30)
The Human Genome, Transcriptome Analysis, Medicine and Cancer
(Le génome humain, l'analyse de transcriptomes, la médecine et le cancer)
Gert-Jan van Ommen, Leiden University, Leiden, The Netherlands
3:30 - 4:00 pm
Acute Promyelocytic Leukemia: A Model for Gene Transcriptional Regulationbased (15:30 - 16:00) Therapy
(Leucémies aiguës promyélocytaires : un modèle de thérapie fondée sur la
régulation transcriptionnelle des gènes)
Zhu Chen, Shanghai University, Shanghai, China
4:00 - 4:30 pm
(16:00 - 16:30)
Transcriptional Regulation of Cell Cycle Regulatory and Apoptosis Genes by DNA
Damage Induced by Camptothecin: Microarray Analysis of Dose- and TimeDependent Effects
(Régulation transcriptionnelle de gènes impliqués dans la régulation du cycle
cellulaire et l’apoptose sous l’effet de lésions de l’ADN induites par la
camptothécine : Analyse des effets temporels et dose-dépendants à l’aide de
microréseaux)
Yves Pommier, National Cancer Institute, Bethesda, MD, USA
4:30 - 5:00 pm
(16:30 - 17:00)
Coffee Break
(Pause Café)
5:00 - 7:00 pm
(17:00 - 19:00)
Session 8 : Ethical, Legal and Economical Issues
(8e séance : Questions éthiques, légales et économiques)
Chairperson (Modérateur) : Rebecca Eisenberg, University of Michigan, Ann Arbor, MI,
USA
5:00 - 5:30 pm
(17:00 - 17:30)
Patentability of Life and Ethics
(Brevetabilité de la vie et éthique)
Noëlle Lenoir, Conseil Constitutionnel, Paris, France
5 30 - 6:00 pm
(17:30 - 18:00)
Patenting Genome Research Tools and the Law
(Le brevetage des outils de la recherche génomique et le droit)
Rebecca Eisenberg, University of Michigan, Ann Arbor, MI, USA
6:00 - 6:30 pm
(18:00 - 18:30)
Legal Problems Related to Gene Patents
(Problèmes légaux liés aux brevets sur les gènes)
Joseph Straus, MPI for Foreign and International Patent, Munich, Germany
6:30 - 7:00 pm
(18:30 - 19:00)
From Functional Genomics to Integrated Economy in Biotechnology
(De la génomique fonctionnelle à l’intégration économique en biotechnologie)
Bernard Pau, CNRS, Montpellier, France
7:00 - 7:30 pm
(19:00 - 19:30)
Closing Address
(Allocution de clôture)
Geneviève Berger, Director General of CNRS, Paris, France
14
SPEAKERS ABSTRACTS
MONDAY, NOVEMBER 6, 2000
SESSION 1: 25 YEARS OF CDNA RESEARCH
The Human Genome, Health and Bioethics
Federico Mayor
Fundacion para una Cultura de Paz, Madrid, Spain
Federico Mayor, President of « Fundacion para una cultura de paz » studied Pharmacy in Madrid, graduating
with a Doctorate. He obtained the Chair of Biochemistry at the Faculty of Pharmacy of the University of
Granada, and from 1968-72 was Rector.
In 1974 he co-founded and directed the Centre for Molecular Biology « Severo Ochoa » of the Higher Council
for Scientific Research until he was appointed Deputy Director-General of UNESCO in 1987 and elected
Director General in 1993.
His scientific work includes more than 80 articles on brain metabolism, peri-natal biochemistry and molecular
pathologies of the new-born. He has directed and supervised more than 40 doctoral theses and is a member of
a score of international scientific academies and associations. He has also published three volumes of poetry.
From the « Messenger » Saga to the Transcriptome Era
François Gros
Académie des Sciences, Paris, France
François Gros is Secrétaire perpétuel of the Academy of Sciences and Honorary Professor at both the Collège
de France and the Pasteur Institute in Paris. His scientific work has been devoted to molecular biology. Since
the very beginning of his research, he has been interested in the way genes function and in gene regulation, but
also in protein biosynthesis. He has been awarded the Gold Medal of the Pontifical Academy, and a number of
prizes: Lacassagne Foundation, Charles Leopold Meyer Prize and von Humboldt Foundation.
TRANSCRIPTOME 2000 : From Functional Genomics to Systems Biology
Charles Auffray
Genexpress, CNRS, Villejuif, France
Over the past decade, large-scale systematic sequencing of cDNA libraries has provided an initial description of
the transcriptome, the entire set of gene transcripts of man and several animal and plant organisms. Speakers
will discuss progress in full-length cDNA cloning and quality control in large-scale sequencing programs. They
will also address the challenges of clustering the information collected to help genome annotation at the time
when the complete or working draft of genome sequences are becoming available.
Differential hybridization using arrays of cDNA clones is as old as cDNA cloning. Recent advances in materials,
optics, electronics, robotics, chemistry, genetic engineering and informatics have permitted the development of
integrated platforms allowing the parallel study of tens of thousands of transcripts in a variety of normal and
pathological conditions. Speakers will discuss the challenges in quality assessment, formatting, comparing and
validating the large amount of data collected using various platforms, the need for a public repository of cDNA
15
array and in situ hybridization data, and similar problems which are arising in the study of proteomes, the entire
sets of proteins which are governing the functioning of cells, organs and organisms.
The emergence of functional genomics represents a transition from mostly analytical, hypothesis-driven
research to a complementary global, exploratory mode that will ultimately bridge understanding of chemistry and
physiology by integrating knowledge of the fine details of all molecular structures and mechanisms together with
their natural or pathological variations. Speakers will illustrate the impact of this trend in the study of the biology
of the immune, muscular and nervous systems, and that of cancer and cardiovascular diseases.
Advances in genome research provide everyday a deeper insight into the mechanisms of life, thereby promising
to change our vision of the world and of ourselves, and to speed the understanding and treatment of diseases.
Public outreach programs and intense media coverage are triggering both growing public awareness and
concern. There is a need for both open, universal dissemination of genomics knowledge and for promotion of
innovation through mechanisms ensuring the sustainable development of new diagnostics, drugs and
treatments. International experts will discuss the ethical, legal, and economical issues involved.
TUESDAY, NOVEMBER 7, 2000
SESSION 2: CDNA CLONING AND SEQUENCING
Novel Approaches for Gene Discovery and Selection of Full-Length cDNAs for the Mammalian Gene
Collection Program (MGC)
1
1
1
1
1
1
Maria de Fatima Bonaldo , Sergey Malchenko , Brian Berger , Irina Koroleva , Einat Snir , Tamara Kucaba ,
2
1
2
1, 3
Chad Roberts , Todd Scheetz , Tom Casavant & Marcelo Bento Soares
2
3
1
Departments of Pediatrics, Electrical and Computer Engineering, Physiology and Biophysics, The University
of Iowa, Iowa City, Iowa 52242, USA
In the last three years, we have identified approximately 50,000 unique rat cDNAs/ESTs (“Rat Gene Discovery
and Mapping Program”), 36,000 unique mouse cDNAs (Brain Molecular Anatomy Project), and 25,000 human
cDNAs (Cancer Genome Anatomy Project). Our gene discovery strategy, based on the generation of ESTs from
serially subtracted normalized libraries, has proven most successful and it has enabled us to achieve
unprecedented rates of EST discovery. However novel technologies are now needed for identification of the
rarest mRNAs, often not represented in conventional cDNA libraries, and for the synthesis, cloning and
selection of complete collections of full-length mammalian cDNAs.
To facilitate discovery of rare mRNAs, with support from the U. S. Department of Energy, we have developed
technology for preferential cloning of mRNAs not represented (or under-represented) in normalized libraries
derived from the same starting RNA population. We have applied this method to construct a mouse
hippocampus cDNA library enriched for rare mRNAs. Preliminary characterization of this library by sequencing
and by microarray hybridization indicates that a significant enrichment has been achieved.
The development of a comprehensive collection of full-length cDNAs by the Mammalian Gene Collection
Program will require not only construction of full-length-enriched libraries but also the development of
technology for (a) identification/selection of full-length cDNAs in full-length-enriched libraries, and (b)
construction of subtracted libraries enriched for novel full-length cDNAs (i.e., those not yet represented in the
Mammalian Gene Collection Program). We will present results of our preliminary attempts (supported by NIH) to
select full-length clones from full-length-enriched libraries and to generate subtracted full-length-enriched
libraries.
16
Analysis of Newly Identified Human cDNAs encoding Large Proteins : Integration of the Genomic and
cDNA Sequence Data to Move Beyond the Identification of Transcribed Sequences
Osamu Ohara
Department of Human Gene Research, Kazusa DNA Research Institute, Kisarazu, Japan
Over the past six years, we have been studying the protein-coding sequences of unidentified human genes. Our
cDNA analysis is unique in that we have focused our sequencing efforts on large cDNA clones (>4 kb) encoding
large proteins (>50 kDa). This approach has been taken because large cDNAs are not extensively analyzed and
large proteins are often encoded by large cDNAs and frequently involved in various mammalian cellular
processes. For this purpose, we constructed a set of strictly size-selected cDNA libraries which enabled us to
isolate clones with insert sizes of interest on a random sampling basis. These clones were then further selected
according to novel sequences at their 5’ and 3’ ends and by their protein-coding potentials, prior to complete
sequencing. The cDNA sequencing was done by a shotgun method with 5- to 10-fold sequence redundancy.
Our study has concentrated on cDNAs isolated from the human brain and the number of cDNA sequences thus
identified and designated by a systematic gene code containing KIAA plus a four-digit number has reached
2000 to date, with an average size of approximately 5 kb. Since the number of genes encoding large proteins is
expected to be only about 10% of the total number of human genes, the number of KIAA genes in the public
databases (1643 entries, August 2000) is quite significant considering it represents genes expressed only in the
brain.
As the human genome sequencing project enters the last phase in which the draft sequences are finalized, it
has become even more evident that cDNA sequence data will serve an important complementary role for the
interpretation of the sequence of the human genome. Furthermore, the cDNA data offers a variety of information
regarding post-transcriptional events, such as alternative splicing and RNA editing. On the other hand, the
genome sequence can help considerably with the resolution of problems in cDNA technology, most of which
originate form the fact that cDNAs are nothing but artificial copies of mRNAs. Therefore, integration of our cDNA
sequence data with the publicly available genomic sequence data is an urgent and crucial task for us, and will
enable us to move beyond the identification of transcribed sequences.
Over 2.4 Million Expressed Sequence Tags (ESTs) and Counting..…
1
1
2
1
3
1
Sandra W. Clifton , Deana Pape , Marco Marra , LaDeana Hillier , Zhengyan Kan , Jarrett Glasscock ,
1
1
1
Raymond Yeh , Warren Gish , and the Washington University GSC EST Sequencing Group
1. Washington University School of Medicine, St. Louis, MO, USA
2. British Columbia Cancer Research Centre, Vancouver, British Columbia, Canada
3. Washington University, St. Louis, MO, USA
The Genome Sequencing Center at Washington University School of Medicine, has contributed over 2.4 million
single pass sequences to the public databases, as a result of our participation in several expressed sequence
tag (EST) projects. The projects include three human, one mouse, one soybean, two parasitic nematodes, two
SAGE (Serial Analysis of Gene Expression), one moss, one Toxoplasma gondii (including Sarcocystis neurona
and Neospora caninum), and one Eimeria tenella. We also have completed a Genome Sequence Survey
(GSS) project for Leishmania major.
The Human ESTs have been used, in conjunction with the human genome mapping effort, also being performed
at Washington University, as one of the tools to anchor contigs for the human genome fingerprint map. In
addition, many ESTs have been aligned to the nearly completed human genomic sequence, using a software
tools developed at Washington University, the Transcript Assembly Program (TAP) and Eugene. High
throughput SNP mining of the human transcriptome has been aided by another automated software pipeline
using the tool POLYBAYES, also developed at Washington University School of Medicine. The mouse ESTs are
being used for fingerprint mapping of the mouse genome.
Zebrafish (Danio rerio) is likely to be one of the next model organisms chosen for genome sequencing. The
zebrafish ESTs being produced at the Genome Sequencing Center (GSC) are already being used, in addition to
other resources, to build a marker dense physical map to provide candidate genes for use in positional cloning
and for ORF recognition in cloning of insertion sites.
17
Identification of mouse and zebrafish homologs of human genes by EST sequencing will facilitate the functional
analysis of these genes not feasible in humans. The mapping progress and software tools will be discussed.
For further information, see: http://genome.wustl.edu/gsc/esthmpg.html
Sequencing and Analysis of Full-Length cDNAs in the German cDNA Network
Stefan Wiemann1, Bernd Weil, Ruth Wellenreuther, Sabine Krieger, Wilhelm Ansorge, Michael Böcher, Helmut
Blöcker, Helmut Blum, Andreas Düsterhöft, Jürgen Lauber, Andreas Beyer, Karl Köhrer, Christian Gruber,
Hans-Werner Mewes, Brigitte Obermaier, Birgit Ottenwälder, Dagmar Heubner, Rolf Wambutt, Jeremy
Simpson, Rainer Pepperkok, Annemarie Poustka
1Molecular Genome Analysis, German Cancer Research Center, Im Neuenheimer Feld 506, D-69120
Heidelberg, Germany, and the German cDNA Network
We have formed a network in the frame of the German Genome Project aiming at the generation and
sequencing of novel full-length cDNAs, and the comprehensive functional analysis the deduced proteins. The
project started in September 1997. Over 3,600 cDNAs (> 8.8 Mb) have been sequenced since.
We use the set of fully sequenced clones in combination with the EST-sequenced clones to generate a master
set of full-length clones for employment in subsequent functional analysis. All sequences are first analyzed for
possible protein function in silico. To systematically characterize function of the encoded proteins in vivo, we
initially determine the subcellular localization of the proteins. A progress report of the network activities and the
achievements will be presented.
In future projects cellular assays will be applied to comprehensively unravel the role of most or all proteins in
view of their function(s), the pathways they are involved in and possible disease relations.
NEDO cDNA Sequencing Project
Sumio Sugano
Laboratory of Genome Structure Analysis, Human Genome Center, The Institute of Medical Science, The
University of Tokyo, Japan
cDNA is an important resource for identifying exons or new genes and expressed sequence tag (EST) has been
used extensively to annotate the finished sequence of human chromosome number 21 and 22 and the “draft”
sequence of the human genome. Although the great usefulness of EST resource, there are some limitations in
ESTs which are essentially partial sequences with limited accuracy. The entire sequences of the full-length
cDNAs are useful for determining the exact mRNA start site, the exact pattern of splicing and the entire coding
region. Full-length cDNAs are also important resource the functional analysis.
The Cap targeted selection procedure of full-length cDNAs developed by us and others improved the content of
full-length cDNA clones within cDNA libraries. These libraries together with the improvement of sequencing
techniques allowed us to start a project of collecting and sequencing of putative full-length cDNA clones. Here
we like to report our result on the first 8000 sequences.
18
From Leukemia Patient to Full-Length cDNA Sequence
Margolin J, Villalon, D.K., Luna R.A., Tsang Y, Yu W., Bouck J., Wu G., Hale S., Richard Gibbs
Human Genome Sequencing Center and Department of Pediatrics, Baylor College of Medicine, Houston,
Texas, USA
We have constructed a scalable pathway extending from pathologic specimens to full-length sequenced cDNA
clones. This begins with very high-risk pediatric leukemia patients who have required pheresis in order to
debulk their tumor prior to the initiation of chemotherapy. This provides a large (often > 10e10 leukemic cells)
specimen from which mRNA is purified. The Cap-Trapping procedure is used for isolating full-length (FL)
transcripts from this mRNA pool. To avoid the loss of the longer inserts, a lambda cre-lox replacement vector
was used for cloning and EST sequencing. The patients are cared for and the libraries constructed at the Texas
Children’s Cancer Center, Baylor College Medicine. The sequencing and informatics are performed at the
Baylor College of Medicine-Human Genome Sequencing Center (BCM-HGSC). The EST data are processed
through a series of new informatic tools which include: vector trimming, analysis of sequence quality, BLAST
searches and pipeline redundancy checks. Clones are re-arrayed and sent for concatenated cDNA library
construction and sequencing at the BCM-HGSC. To date 26,280 ESTs and >1,000 FL-cDNA sequences have
been completed. An additional 1300 FL-clones are in sequence assembly and 2,000 FL clones in concatenated
cDNA sequencing (CCS) library construction. Current plans to increase the production pipeline to >1000 FLcDNA sequences per month are in place. Efforts to further automate EST analysis, selection of clones for CCS,
and pre-processing of CCS reads will be described. These efforts represent a new way to characterize a
disease with high throughput cDNA sequencing, which in addition to the sequence information, results in the
generation of full-length cDNAs arrayed in a manner suitable for future applications.
RIKEN Mouse cDNA Encyclopedia Project
Jun Kawai, Piero Carninci, and Yoshihide Hayashizaki
Genome Exploration Research Group, RIKEN, Genomic Sciences Center (GSC) and Genome Science
Laboratory, RIKEN, Tsukuba Institute, Core Research of Evolutional Science and Technology (CREST), Japan
Science and Technology Corporation (JST), Tsukuba-shi, Ibaraki, 305-0074, Japan
RIKEN is proceeding the mouse encyclopedia project, which consists of three phases; (1) collection of fulllength cDNAs, (2) sequencing of them, and (4) mapping them on the chromosomes. Full-length cDNA allows all
subsequent functional characterization such as prediction of protein and function, protein-protein interaction,
structural investigtion, expression analysis, protein expression, etc.
Gene discovery based on one-pass sequencing of Cap-trapper full-length cDNA libraries has been over years
very demanding. In fact, cloning efficiency of short cDNA is largely more effective than long cDNAs. Cloning
rarely expressed genes and genes specifically expressed in restricted tissues constituted a technical challenge.
To address these problems, we have developed techniques for the construction of large insert size,
normalized/subtracted full-length cDNA libraries even from microdissected tissues that do not involve the use of
PCR. We estimate that our clones cover about 60% of mouse genes. At the moment, more than 90% of
Unigene clones contain a sequence from our clones, and a large part is constituted only from our clones. Our
analysis based on the currently available data suggests also that the number of mouse genes exceeds 100,000.
To determine the full-sequences efficiently, we are applying three sequencing strategies, One pass for short
size clones (less than 1.5kb), Primer walking for the middle size clones (1.5 - 2.5kb), and shotgun sequencing
for the long size clones (more than 2.5kb). So far, about 20,000 cDNA clones have been sequenced, of which
average size is around1.2 kb long. Average length of ORF is 660 bp.
Furthermore, we made an attempt to map in silico our cDNA sequences onto the human genome draft
sequence. Results are suggesting novel gene candidates, which have not yet been predicted by the analysis of
human genome sequences with aid of the exon predict programs. These genes could not be identified by any
other method to date, and brought to our attention only by mapping the RIKEN clones to the human genome.
The statistical features found from the comparative studies are also presented.
19
Shotgun Sequencing the Human Transcriptome with Open Reading Frame ESTs (ORESTES)
Andrew J.G. Simpson and the members of the FAPESP/LICRHuman Cancer Genome Project consortium of ONSA, São Paulo, Brazil
The high throughput partial sequencing of expressed human genes has generated a database of fundamental
importance for defining the human transcriptome and annotating the human genome. The multiple coverage
afforded by ESTs, is also permitting the identification of alternatively spliced transcript variants and single
nucleotide polymorphisms. Traditional ESTs provide data from the extremities of transcripts. It is clear that we
need a similar level of coverage from the central portions of human transcripts to exploit the EST concept as
fully as possible. To this end, we have been producing ESTs that are known as Open Reading Frame ESTs, or
ORESTES, which are generated using a low stringency PCR strategy. The technique both biases for the centre
of transcripts and also partially normalizes the transcript population resulting in a quite distinct pattern of
transcript sequences than that produced by conventional ESTs. Thus, in conjunction with conventional ESTs,
ORESTES allow considerable progress to be made in covering the transcriptome by an essentially shotgun
approach. To date we have generated in excess of 500,000 sequences derived from human tumours within the
FAPESP/LICR-Human Cancer Genome Project that are being systematically deposited in GenBank. We are
also actively pursuing the construction and validation of contigs, composed of ORESTES and conventional
ESTs, which represent individual transcripts as our institutional contribution to the compilation of the complete
human transcriptome. The work will be continued until 1-million ORESTES have been generated. We expect to
achieve this benchmark by the end of 2000. Thereafter, we intend to continue to pursue transcript definition, by
contig construction and validation, until all human transcripts have been characterized. The FAPESP/LICRHuman Cancer Genome Project is being pursued with financing from FAPESP and the Ludwig Institute for
Cancer Research and is being undertaken by a consortium of more than 30 laboratories in the state of São
Paulo, Brazil. The project is the largest ever in the history of life sciences research in Brazil.
SESSION 3: CDNA CLUSTERING AND GENOME ANNOTATION
Clustering Enriches the I.M.A.G.E. Collection
Peg Folta
Lawrence Livermore National Laboratory, Livermore, CA, USA
Expression (I.M.A.G.E.) Consortium's underlying goal has been to provide the public with resources to advance
the discovery of genes. At approximately three million clones, the Consortium manages the largest public
collection of cDNAs, the gene-containing segments of DNA. Expressed Tag Sequences (ESTs) from the
I.M.A.G.E. collection represent approximately 75% of the human dbEST database stored at the National Center
for Biotechnology Information (NCBI).
In order to reduce redundancy in the collection, discover new genes, and identify the best representative clone
for each gene, the IMAGEne clustering tool was created. IMAGEne uses all EST and full-insert sequence
originating from I.M.A.G.E. clones, along with EST sequences from The Institute of Genomic Research.
IMAGEne first generates known gene clusters seeded by the NCBI RefSeq. Candidate gene clusters are then
created based on remaining sequence homology and clone membership. Members of the clusters are ranked
primarily according to size.
Sophisticated query and display tools significantly increase the value of the clusters.The user can query against
the clusters with a keyword, clone id, cluster id, GenBank accession number, or sequence. The user can select
from the resulting clusters to see a Java display of all cluster members aligned to their associated known gene
or consensus sequence(s) and an associated table of information (e.g. ids, links to GenBank, and library, size,
and sequence verification). Candidate gene clusters often contain several contigs within a cluster, which
represent 3' and 5' ends or alternative splice variants.
IMAGEne currently reduces 1.65 million human sequences into 5,379 known gene clusters, 54,605 candidate
gene clusters, and 147,970 viable singletons. Initial clustering was done on Lawrence Livermore National
20
Laboratory's Massively Parallel Processing computers, but updates to the clusters are performed on smaller
computer systems.
IMAGEne continues to be enhanced as the needs and resources of the community change. A non-collection,
non-species specific version of IMAGEne will soon be available for collaborators to aid in the analysis of various
organisms. Initial targets are white rot fungus, rice, and chicken. Sequence form the NIH's Mammalian Gene
Collection Project will be used to augment the NCBI's RefSeq listing of known genes, creating a better
representative view of known or putative genes.
IMAGEne maintains a listing of clones that best represent each cluster. To promote gene discovery, the
I.M.A.G.E. Consortium has significantly ramped their re-arraying efforts and to distribute full length and putative
full length re-arrays in both mouse and human.
For further information contact Peg Folta ([email protected]). This work was performed by Lawrence Livermore
National Laboratory under the auspices of the U.S. Department of Energy Contract Number W-7405-Eng-48.
An Alternate Transcription Map of Chromosome 22 Based on Verified Transcript Variation
Win Hide, Janet Kelso, Tzu-Ming Chern, Peter van Heusden and Vladimir Babenko
South African National Bioinformatics Institute, Bellville, South Africa
The first pass analysis of data being generated by the genome projects has revealed that alternate transcription
is a ubiquitous event, but reliable and verified transcript products are not yet available for complete
understanding of the biology of alternate transcription. An analysis of transcript variation analysis of transcript
variation as verified by relative comparison of clustered, processed ESTS with SwissProt, Chromosome 22
exons, mouse expressed transcripts and literature will be presented. An high fidelity assessment of the of
internal exon skipping, exon repetition, and exon boundary variation will be discussed, together with a combined
expression-based, and genome-based gene total estimate.
UniGene, the Genome, and the Transcriptome
Lukas Wagner
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health,
Bethesda, MD, USA
I will discuss the UniGene dataset in detail, with particular attention to artifacts which may arise in the course of
preparing cDNA libraries and to insights and improvements obtained from the Human Genome. I will present
results from large-scale comparison of ESTs with human genomic sequence, which will include estimates of the
rate of alternative splicing and of the frequency of processed pseudogenesin the human genome.
I will discuss methods for identifying cDNA clones containing the complete coding sequence. I will provide an
overview of the Mammalian Gene Collection, an effort of the National Institutes of Health to produce such
clones for public distribution.
Relevant URLs:
http://www.ncbi.nlm.nih.gov/UniGene/
http://www.ncbi.nlm.nih.gov/MGC/
21
The TIGR Gene Indices: Reconstruction and Annotation of Transcribed Sequences
Ingeborg Holt, Feng Liang, Geo Pertea, Svetlana Karamycheva and John Quackenbush
The Institute for Genomic Research, Rockville, MD 20850, USA
A goal of the Genome Project is identification of the complete set of genes within each organism and the role
played by these genes in development and disease. The sequencing of Expressed Sequence Tags (ESTs) has
provided a first glimpse of the collection of transcribed sequences in a variety of organisms, but significant
additional information can be obtained by a thorough analysis of the EST data. TIGR’s analysis of the world’s
collection of EST sequence data, captured in the TIGR Gene Indices (<http://www.tigr.org/tdb/tgi.shtml>.),
provides assembled consensus sequences that are of high confidence and represent our best estimate of the
collection of transcribed sequences underlying the ESTs. We maintain Gene Indices for a variety of species,
including human, mouse, rat, Drosophila, zebrafish, rice, tomato, maize, soybean, and Arabidopsis. Collectively,
the Gene Indices represent a unique resource for the comparative analysis of eukaryotic genes and may
provide insight into gene function, regulation, and evolution.
Using the Tentative Consensus (TC) sequences in the Gene Indices, we recently developed the TIGR
Orthologous Gene Alignment (TOGA) database. TOGA is designed to identify orthologous genes, including a
significant number of novel orthologs, and to serve as a “cross-reference” between genomes, linking the Gene
Indices for the surveyed species. This database represents the most extensive catalog of eukaryotic
orthologues available, providing a valuable resource for gene identification, elucidation of functional domains,
and analysis of gene and genome evolution.
We have also developed a variety of tools to integrate EST, genomic, and mapping data across species,
allowing us to begin to realize the promise inherent in the completion of the sequencing of a variety of genomes.
Searching for the Protein Coding Genes on the Human Genome Sequence
William Saurin, Hugues Roest-Crollius, Olivier Jaillon, Alain Bernot, Lucie Friedlander, Abel Ureta-Vidal, Gabor
Gyapay, Jean Weissenbach
Genoscope and CNRS-FRE 2231, Evry, France
Most of the human genome sequence is now availlable, but the identification of genes on the DNA sequence
remains a difficult task. Various tools including similarity searches and exon/gene prediction programs are used
for this purpose. Similarity searches are limited because the vertebrate protein/cDNA set is not yet complete.
The use of a compact vertebrate genome (the pufferfish, Fugu rubripes) in protein similarity searches has
proven to be extremely valuable for the identification of mammalian coding sequences. We built a search tool
(called Exofish, for Exon FInding by Sequence Homology) that combines a specific setting of TBLASTX and a
collection of random DNA sequences reads representing at present a third of the genome of the pufferfish
Tetraodon nigroviridis (closely related to Fugu). Exofish detects sequence matches in 2/3 of several sets of
human genes with a backround of false positive matches below 1%.
Exofish has been successively applied to the December 99 and June 2000 versions of the « working draft » of
the human genome. The latter analysis indicates that the protein coding gene number is now around 27,000,
somewhat below our earlier estimates of 28,000-30,000. Exofish analysis of the Unigene set of human ETSs
indicates that about 50% of the coding fraction of the human genome is still missing in the public sequence
databanks.
About 15% of the total number of exons detected by Exofish on human chromosome 22 fall outside annotated
genes. A more detailed analysis of this annotation has been performed using new full length cDNA sequences.
The results suggests however, that (1) most of the Exofish detected exons falling outside annotations actually
belong to actual genes, (2) many of the annotated genes are not yet accurately delimitated and (3) a number of
these genes will merge together.
All these observations indicate that a valuable annotation of the human genome sequence still requires
enlarged sets of additional sequence data (cDNAs, related genomes) for comparison purposes. In addition,
22
since any sequence analysis method suffers some limitations, it is essential to rely on a panel of tools that are
as diverse as possible.
Harvesting the Human Genome : A World-Wide Endeavor
Doron Lancet
Head, the Crown Human Genome Center, The Weizmann Institute of Science, Rehovot 76100, Israel
As the Human Genome Project progresses, and the First Draft is nearly finished, a most urgent need is data
integration and utilization. At our Genome Center, serving as a national laboratory for Israel, we have devised a
model for a role for smaller projects worldwide, in which we combine gene discovery, integrated database
development and a focused effort in the field of DNA microarrays. All three activities revolve around the
development of a strong capacity in computational genomics and bioinformatics.
For gene discovery, our main effort is through collaborations with the Israeli medical community. We strive to
utilize the unique attributes of populations in Israel, constituting many genetically-defined ethnic groups. Six
gene discovery projects are underway, some already culminated in the identification of a gene underlying a
monogenic disease. We depart from a linkage map, and utilize the power of sequence data mining and
integration to identify gene candidate. Large scale DNA sequencing is used to reveal mutations within all
identified exons.
In the Field of data integration, we have developed three software tools. The first is GeneCards, a novel
functional genomics compendium combining automated data mining and context-related navigation support
(Trends in Genetics 13: 163 (1997).Bioinformatics 14:656-664 (1998); URL: bioinfo.weizmann.ac.il/cards). It
automatically identifies new HUGO approved gene symbols, extracts relevant information from multiple public
databases, and creates a Card for the each gene. The second tool is Unified DataBase (UDB), in which novel
concepts of genome-wide map and sequence integration are implemented (Genome Digest 4(3): 15 (1997),
URL: bioinfo.weizmann.ac.il/udb). It merges method-specific genome maps with genomic sequence information
(Sequence Based Repositioning). The third tool is GESTALT, (Bioinformatics 2000 May;16(5):482-483) a
GEnomic Sequence Total Analysis and Lookup Tool (http://bioinfo.weizmann.ac.il/GESTALT). It constitutes a
workbench for automatic integration and visualization of large-scale genomic sequence analyses.
For DNA arrays, we have acquired an Affymetrix GeneChip system, which has already been successfully used
with human, mouse and yeast expression arrays, as well as with the P53 mutation array. In parallel, we have
established the complementary array spotting and scanning technology, to meet additional species and gene
group requirements. The data are integrated with our in-house developed GeneCards.
Reconstructing the Human Transcriptome from the 3' End
Philipp Bucher
Swiss Institute of Bioinformatics, Lausanne, Switzerland
We present results from a systematic effort to reconstruct the structure of all human mRNAs from EST data. Our
approach is different from others in that: (i) we use whenever possible the chromatograms rather then the EST
sequences in Genbank as input data, (ii) we assemble genes from a collection of genomic exon sequences
defined by ESTs rather than directly from error-prone EST sequences, (iii) we use a collection of well
documented poly-adenylation sites as anchor points for gene assembly and bona fide gene delimiters, (iv) we
use the mapping of ESTs to genomic sequences as a means to detect chimeric ESTs and other artifacts. Based
on an analysis of the partial transcriptome obtained in this way, we present new estimates of the number of
human genes, the frequency of alternative splicing and poly-adenylation, and the frequency of genes occurring
in introns of other genes.
23
WEDNESDAY, NOVEMBER 8, 2000
MERCREDI 8 NOVEMBRE 2000
SESSION 4: TRANSCRIPTOME ANALYSIS
Gene Expression Profiling of Primary Breast Carcinomas Using Nylon Arrays of Candidates Genes
1, 2
2
1, 3
2
François Bertucci , Rémi Houlgatte , Daniel Birnbaum and Catherine Nguyen
1. Laboratoire de Biologie des Tumeurs, Institut Paoli-Calmettes (IPC), IFR57, Marseille, France
2. TAGC, CIML Luminy U136 INSERM-CNRS UMR 145, IFR57, Marseille, France
3. Laboratoire d'Oncologie Moléculaire, U.119 Inserm, IFR57, Marseille, France
Breast cancer is characterized by an important histoclinical heterogeneity requiring the identification of new
parameters to predict the natural story of the disease and its sensitivity to treatment. A large-scale molecular
characterization of breast cancer could help in this context. Analysis of gene expression on a large-scale is an
increasingly recognized method for functional and clinical investigations based on the now extensive catalogue
of known or partially sequenced genes. The accessibility of this approach can be enhanced by using readily
available technology (cDNA arrays on Nylon with radioactive detection) and the IMAGE resource to assemble
sets of targets.
Using similar cDNA arrays and from only 5 micrograms of total RNA from each tumor sample, we then studied
the quantitative expression levels of 176 candidate genes in 34 random primary breast carcinomas. Analysis of
results was done along three directions: comparison of tumor samples, gene correlations, and correlations of
molecular data with conventional histoclinical prognostic features. The study evidenced extensive heterogeneity
of breast tumors at the transcriptional level. Hierarchical clustering identified two molecularly distinct groups of
tumors with a different outcome not predicted by histoclinical parameters. Gene correlations were detected,
suggesting a degree of organization of gene expression in breast tumors. No correlation was found with the age
of patients, tumor size, histological type and grade. However, expression of genes was differential in tumors
with lymph node metastasis and according to the estrogen receptor status ; ERBB2 expression was strongly
correlated with the lymph node status and that of GATA3 with the presence of estrogen receptors.
Our results identified new potential targets of carcinogenesis and ways to group tumors according to outcome.
They show that the systematic use of cDNA array testing holds great promise to improve the prediction of
prognosis and chemosensitivity of breast cancer and to provide new therapeutic targets.
Statistical Analysis, Normalization and Reproducibility of Microarray Data
Roger E. Bumgarner
Department of Microbiology, University of Washington, Seattle WA, 98195 USA
Over the past several years there has been a flood of literature containing microarray data. However, very few
papers contain estimates of the error in the measurements. There are several reasons for this ranging from the
expense associated with replicate measurements, to inexperience with the technology, to the lack of error
estimates in many software packages. During the past three years we have investigated the reproducibility of
microarray data and developed methods for data normalization and error analysis. In addition, we have
developed a software package for the analysis of microarray data that:
1) Estimates the error in microarray measurements from single or multiple measurements,
2) Normalizes the 2 color data using a unique algorithm the accounts for non-linearities in the Cy3-Cy5 ratio
3) Allows one to select genes which are differentially expressed by a statistically significant amount and
4) Provides links between the data and publicly available data repositories
(eg. GeneCards).
This talk will cover the experimental and data manipulation methodologies we have developed, the
reproducibility of microarray data and the necessity of error analysis in the successful use of microarrays.
24
DNA Array Applications in a Diverse Academic Setup
(1)
(3)
(2)
(2)
(2)
(1)
Shirley Horn-Saban , Doron Ginsberg , Opher Gileadi , Orly Reiner , Tsviya Olender , Marilyn Safran ,
(2)
(2)
(1)
Naama Barkai , Doron Lancet and Menachem Rubinstein
(1)
(2)
The Crown Human Genome Center, Department of Biological Services and
Department of Molecular
(3)
Genetics, Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
The Genome Center at the Weizmann Institute of Science focuses on genome methodologies and provides
genome-related practical know-how as well as computing-intensive tools. The DNA array unit is part of this
infrastructure. It aims at giving solutions to any research project involvong DNA arrays. For this purpose the unit
harbors both state-of-the-art technologies, namely an Affymetrix GeneChip system as well as microarraying
facilities (arrayer and scanner) to meet the needs of a broad range of Israeli scientists.
The GeneChip system has proven successful for diverse research projects using human, mouse, rat and yeast
expression arrays, as well as p53 re-sequencing arrays. Examplary projects involved the search for transcription
factor target genes, such as the E2F in rat, crucial for G1/S transition regulation, and the TFIIH factor in yeast.
Microarraying was integrated lately into the infrastructure, and was already used for printing various “homemade” libraries on both membranes and glass matrices, including a project attempting to detect genes that are
differentially expressed in the dorsal telencephalon of Lis1 mutants compared to wild-type, using cDNA
subtraction libraries obtained from Lis1 mutant mice.
The data accumulated by DNA array experiments is integrated with additional bioinformatics software. A
dedicated software was developed to link gene expression data with our in-house developed GeneCards
(Bioinformatics 1998;14(8):656-64) and Unified Database (UDB) packages, as well as to EST tissue sources
(UniGene). Additional methods of analysis, including deterministic annealing-based clustering are being used
and further developed for system level analysis of expression data (PNAS USA 1999; 96(12):6745-50).
To enhance microarraying facilities, we are currently in the process of amplifying the full yeast ORF library, as a
first step in accumulating whole-genome commercial libraries for various organisms, to be used as genomic
resources in a university research setting.
Explaining Gene Expression Clusters Through Integration of Genome Annotation and Microarray Data
Terry Gaasterland, Alexander Sczyrba, Jie Qin
Rockefeller University, New York, NY, USA
Gene expression studies elucidate potentially co-regulated genes. Sequence annotation of the corresponding
clones and their gene products provides information to explain clusters of genes with patterns of gene
expression of interest to the user. We evaluate clusters of co-expressed genes through gene ontology (GO)
terms, metabolic pathway information, and genomic location of prokaryotic homologs. Putative operons,
functional categories, and pathways that are implicated in a gene cluster are in turn evaluated for gene
expression patterns of their corresponding genes. This bi-directional analysis using comparative genome
annotations and gene expression data is a first step toward enabling users to evaluate the gene expression
patterns of molecular sub-systems. We have prototyped the method as the « Cluster Explorer » module of the
TANGO (Transcriptome Analysis of Genomes) system.
Exploring Human Transcriptomes Using cDNA Macro and Microarray Technologies
Sandrine Imbeaud
Genexpress, CNRS ERS1984, 19, rue Guy Mocquet, 94801 Villejuif Cedex, France
Exploration means looking around, observing, describing and mapping undiscovered territory, not testing
theories or models. The fact that the entire human genome sequence is becoming available has being an
exhilarating reminder that much of the natural world remains to be explored at the molecular level. The vast
majority of the sequences generates code for genes of unknown function with, as yet, unknown role in disease.
To understand gene function, it is helpful to know when and where it is expressed, and under what
25
circumstances the expression level is affected. The goal is to discover things we neither knew nor expected,
and to see relationships and connections among the elements. Beyond questions of individual gene functions
are questions concerning functional pathways and how cellular components work together to regulate and carry
th
out cellular processes. During the last half of the 20 century, the analysis of the regulation and function of
genes has largely been driven by step-by-step studies of individual genes and proteins. In the past decade, a
paradigm shift has emerged in which we are now able to produce large amounts of data about many genes in a
highly parallel and rapidly serialized manner.
For several years, our team has investigated human transcriptomes through high-throughput gene expression
profiling developing high-density array approaches. Set of “hybridization signatures” is collected by quantitative
analysis of signal intensities of thousands of arrayed cDNA clones, dotted onto membrane (macroarray) and
glass slide (microarray), hybridized with complex cDNA targets derived by reverse transcription of mRNA from
various tissues and processed using software tools specifically adapted. This made it possible to identify new
collections of genes specifically or preferentially expressed in human brain, skeletal and cardiac muscle tissues,
to explore muscular aging and differentiation and to dissect the molecular processes that could be defective in
cancer cells. High-density array technology revealed to be a powerful tool capable of producing large gene
expression data sets thus immediately raising a number of questions: What is the validity and quantitative
accuracy of the observed changes? High attention has been focused on standardisation, quality control
assessment and authentication procedures. Which genes should be prioritized for further study? How does one
determine whether a given gene is a cause rather a consequence of diseased states? How can information
collected from multiple samples and many people be organized that allows biological questions to be asked and
answered? Together with the scientific community, we have now to determine how to manage and share these
massive amounts of data.
Analysis of Gene Expression in Xenopus Embryos Identifies Metabolic Pathways, Predicts Gene
Function and Provides a Global View of Embryonic Patterning
Nicolas Pollet and Christof Niehrs
Departement of Molecular Embryology - DKFZ - Im Neuenheimer Feld 280 - D-69120 Heidelberg - Germany
In multicellular eukaryotes, the genetic programme is expressed in complex and ever-changing temporal and
spatial patterns throughout development and differentiation. The description and analysis of these patterns is
crucial to elucidate the biological roles of genes and to understand the network of genetic interactions that
underlies the process of normal development.
To explore the molecular anatomy of the vertebrate embryo, we have systematically analysed gene expression
during early development of the Xenopus frog using whole-mount in situ hybridization.
About 25 % of cDNAs analysed represent differentially expressed genes and about 5 % show highly
regionalized expression. Among the genes identified, we found novel cell-type specific « marker » genes and
potential developmental regulators. A cluster analysis was made by comparing gene expression patterns to
derive a novel parameter, « tissue relatedness ». Partial cDNA sequences and expression patterns are
documented and assembled into a database, « Axeldb », publically available at the URL <http://www.dkfzheidelberg.de/abt0135/axeldb.htm>
Four « synexpression groups » representing genes with shared, complex expression pattern that predict
molecular pathways involved in patterning and differentiation were identified. According to their probable
functional significance these groups are designated as Delta1, Bmp4, ER-import and Chromatin group. Within
synexpression groups, a likely function of genes without sequence similarity can be predicted. The results
indicate that synexpression groups have strong prognostic value. These sets of co-regulated genes show a
striking parallel to the operon, and may be a key determinant facilitating evolutionary change leading to animal
diversity.
In conclusion, our study describes a functional genomics approach to investigate genes expressed during early
development, provides global insight into embryonic patterning and highlights the modular genetic architecture
of eukaryotic genomes.
26
Adapter-Tagged Competitive PCR and its Application to the Mammalian Central Nervous System
Kikuya Kato
Taisho Laboratory of Functional Genomics, Nara Institute of Science and Technology, 8916-5 Takayama,
Ikoma, Nara, Japan
Adapter-tagged competitive PCR is an advanced form of quantitative competitive PCR. Internal standards and
calibration curves are unnecessary, and consequently the assay can be conducted in a single tube. With the aid
of a capillary sequencer, our laboratory can constantly perform quantitation of expression levels with 1000
genes per day. This technique is ideal for cases where RNA preparations are too complex for microarrays. We
first applied the technique to the development of mouse cerebellar cortex.
Expression patterns of 1869 genes were determined using ATAC PCR at 6 time points during mouse postnatal
cerebellar development. The expression patterns were classified into 12 clusters that were further assembled
into three groups by hierarchic cluster analysis. Among the 1869 genes, 1053 known genes were assigned to
90 functional categories. A statistically significant correlation was found between the clusters or groups of gene
expression and the functional categories. Genes involved in oncogenesis or protein synthesis were highly
expressed during the earlier stages of development. Those responsible for brain functions such as
neurotransmitter receptor and synapse components were more active during the later stages of development.
Many other genes also showed expression patterns reported in the literature. The gene expression patterns and
the inferred functions were in good agreement with anatomical as well as physiological observations made
during the developmental process.
The analysis was further extended to a microscopic level. The developmental cerebellar cortex consists of four
layers of granule cells: the proliferating zone of external germinal layer, differentiating zone of external germinal
layer, outer granule cell layer, and inner granule cell layer. Each layer was isolated by laser capture
microdissection, and expression levels of more than 400 genes were assayed by ATAC-PCR. A detailed
overview of gene expression at structures defined at the microscopic level was obtained.
Storing, Managing and Analyzing Microarray Data
Alvis Brazma
European Bioinformatics Institute, Hinxton, UK
Microarrays allow monitoring the gene expression levels for tens of thousands of genes in parallel and are
already producing huge amounts of valuable data. Analysis and handling of such data is becoming one of the
major bottlenecks in the utilization of the technology. The raw microarray data are images, which have to be
transformed into gene expression matrices -- tables where rows represent genes, columns represent various
samples such as different tissues, and values at each position characterize the expression level of the particular
gene in the particular sample. These matrices have to be analyzed further, if any knowledge about the
underlying biological processes is to be extracted. Storing and annotating these data is also a nontrivial
problem. We will discuss all these mentioned aspects of gene expression data managing and analysis, as well
as our efforts to establish international standards for microarray data representation and annotation, and a
public repository for such data.
27
SESSION 5: TRANSCRIPTOMES, PROTEOMES AND SYSTEMS BIOLOGY
Analysis of Genomes and Transcriptomes in Terms of the Occurrence of Protein Parts and Features
Mark Gerstein
Molecular Biophysics & Biochemistry Department, Yale University, New Haven, CT 06520, USA
My talk will focus on analyzing genomes and gene-expression data in terms of the finite list of protein ‘parts".
Depending on context, a part could be a structural fold or sequence superfamily. I will touch on the following
topics:
* How one can compare different genomes in terms occurrence of various parts in them. And how this idea can
be extended to compare the representation of parts in the genome versus the transcriptome. In particular, this
allows one to see what protein features are enriched in highly expressed proteins.
* How one can analyze the relationship between where a part is located and its transcriptome occurrence -- i.e.
between a protein's subcellular localization and its level of gene expression. We extend this work to develope a
formal Bayesian system for predicting subcellular localization, partially based on gene expression data.
* To what degree is protein function and protein-protein interactions related to similarities in the level of gene
expression. Based on developing a statistical significance formalism, I will argue that while there is a definite
relationship for certain classes of protein functions and protein-protein interactions, the relationship is not
general and global. The absence of correlation is principally due to the inconsistent way protein function is
defined.
REFERENCES
http://bioinfo.mbb.yale.edu
M Gerstein & R Jansen (2000). "The current excitement in bioinformatics, analysis of whole-genome expression
data: How does it relate to protein structure and function?" Curr. Opin. Struc. Biol. (in press).
A Drawid, R Jansen & M Gerstein (2000). "Gene Expression Levels are Correlated with Protein Subcellular
Localization," Trends in Genetics 16: 426-430.
A Drawid & M Gerstein (2000). "A Bayesian System Integrating Expression Data with Sequence Patterns for
Localizing Proteins: Comprehensive Application to the Yeast Genome," J. Mol. Biol. 301:1059-75
R Jansen & M Gerstein (2000). "Analysis of the Yeast Transcriptome with Broad Structural and Functional
Categories: Characterizing Highly Expressed Proteins," Nuc. Acids Res. 28:1481-1488
M Gerstein (1998). "Patterns of Protein-Fold Usage in Eight Microbial Genomes: A Comprehensive Structural
Census," Proteins 33: 518-534.
Bridging Genomics with Proteomics: DNA and Protein Analysis on Arrays
Holger Eickhoff, Wilfried Nietfeld, Arif Malik, Neeraj Tandon, Lajos Nyarsik, Martin Horn, Thomas Przewieslik,
1
Elke Rohlfs, Eryk-Witold Wolski, Angelika Lüking, Johannes Schuchardt and Hans Lehrach
Max-Planck Institut für Molekulare Genetik, Abteilung Lehrach, Ihnestr. 73, 14195 Berlin, Germany. Tel: **4930-84131405, Fax: **49-30-84131380,
1
Innovationskolleg Theoretische Biologie, Humboldt-Universität zu Berlin, Invalidenstraße 43, 10 115 Berlin,
Germany
In order to fully understand any complex network of gene interactions it is necessary to screen many genetic
samples in parallel as rapidly as possible. We will describe those steps, which have been automated and
miniaturised in our laboratory to enable a high throughput and highly parallel hybridisation based approach to
genome analysis.
In co-operation with the Resource Centre in the German Genome Project (http://www.rzpd.de) we have
clustered EST sequences from various sequencing projects (man, mouse, arabidopsis) according to the
28
available sequence data. At present, we work with rearrayed UNIGENE sets of up to 10.000 clones on glass
slides for RNA expression analysis. Therefore the clones are PCR amplified and bound covalently to glass.
Experimental series were done by hybridisation of labelled RNAs to arrays of the selected gene fragments.
Images from RNA-hybridisations of tissues derived from different developmental stages were compared and
analysed with in house developed software for spot recognition and spot quantitation. In addition to the EST
clustering we use oligonucleotide fingerprinting to identify new cDNA clones from new libraries. These libraries
are now routinely cloned into expression vectors, permitting direct expression of the corresponding proteins.
HIS-tagged and expressed proteins were immobilised as lysates and purified products as arrays on membranes
and glass, where they allow to study protein-protein and protein-DNA interactions. The whole set-up is highly
automated and uses state of the art automation technology from PCR and microarraying to automated
hybridisation chambers with online detection and data analysis.
In current proteomic approaches isolated proteins from 2 dimensional gels are identified in sequence databases
using mass spectrometrically determined peptide maps or sequence tags of individual proteolytic cleavage
products. Obviously, this approach is limited to known proteins. To overcome this limitation, we have developed
a novel concept: each protein is specified by a minimal set of structural information readily accessible by mass
spectrometry, which we have hereby designated as ‘minimal protein identifier’ (MPI). MPIs contain accurate
molecular masses of enzymatic cleavage products in conjunction with fragment-ion data, and are recorded by
MALDI-MS. MPIs can be generated from excised 2-DE gel spots and from recombinant proteins, such as from a
Uniprotein set, and can be used to identify the homologus proteins on 2-DE gels and vice-versa.
Once recorded, MPIs allow rapid recognition of known, as well as unknown, gene products. At the same time,
MPIs allow identification of proteins in sequence databases. Equipped with these features, MPIs enable save
comparison of 2-DE gels run with different biological samples independent of their format, resolution, and
applied separation technology. This approach results in more reliable protein identification as measured MPIs
from 2-DE gel spots are compared with measured MPIs from the expressed proteins, instead of DNA sequencepredicted MPIs. The availability of the cDNA clones of the recombinant proteins allows the direct linkage of
these 2D results to RNA expression analysis on microarrays.
In addition to arraying applications the recombinant proteins are used for large-scale crystallisation. The
crystallisation set-up is a core facility of the protein structure factory in Berlin, which is funded by the BMBF
(http://www.fu-berlin.de/psf). The crystallisation-store has a capacity of 10.000 crystallisation plates with
automated online crystal detection for hanging and sitting drops.
We are currently working on a further miniaturisation in arraying and crystallisation. As a result we have adopted
a highly parallel piezoelectronic drop on demand technology to dispense biological samples. We have
implemented a multihead piezo-jet microarraying system that is able to aspirate genetic samples out of
microtitreplates. A linear 16 nozzle and a 4 x 4 nozzle multi head permit the construction of large microarrays on
2
a variety of surfaces. Since the spot density obtained by this system is more than 4000 clones/cm , we have
developed higher resolution detection systems based on a laser scanning principle. The detection system scans
areas of 22 cm x 22 cm with 20-micrometer resolution for two colors at a sensitivity of less than 1 attomole
fluorescent dye per spot. The system is used for RNA expression- and SNP (Single Nucleotide Polymorphism)
analysis.
The high throughput data generation mode has increasingly to be complemented by corresponding high
throughput bioinformatics systems. These bioinformatics systems have to be able to extract much more
information and insights into biological processes from the rapidly accumulating data, than possible by purely
manual techniques. These tools are used for complex data analysis focussing on the analysis of many
thousands of genes simultaneously.
One scientific goal of our projects is to create new computational approaches for the investigation of molecular
mechanisms of differential gene expression and to apply them for prediction of expression profiling of any
genome. The knowledge deduced will be incorporated in a general network-model of gene expression
regulation, giving insight into crucial pathways in development and maintenance of organisms. An integral part
of our approach is the development of mathematical/statistical techniques and algorithms for pattern recognition
in very large data sets.
29
Proteomics and the Challenge of Hydrophobic Membrane Proteins: the Example of Chloroplast
Membranes
Rolland N., Seigneurin-Berny D., M. Ferro*, Garin J.* and Joyard J.
Laboratoire de Physiologie Cellulaire Végétale, UMR 5019 CNRS/CEA/Université Joseph Fourier, and
*Laboratoire de Chimie des Protéines, CEA-Grenoble, F-38054 Grenoble-cedex 9, France
As a complementary approach to genome projects, proteomic analyses have been set up to identify new gene
products. One of the major challenges in proteomics concerns membrane proteins, especially the minor ones.
Although 2D-PAGE remains the most efficient way of separating protein mixtures, almost no hydrophobic
proteins are found on 2D-gel separations of membrane proteins. Using chloroplast membranes (thylakoids and
envelope membranes) as a model, we have optimized a procedure, based on the differential solubilization of
membrane proteins in chloroform/methanol mixtures, to extract and concentrate the most hydrophobic
membrane proteins. Propensity of hydrophobic proteins to partition in chloroform/methanol mixtures was directly
correlated with the Res/TM ratio (number of amino acid residues/number of putative transmembrane regions).
This was shown to be valid for thylakoids as well as for envelope membranes, thus demonstrating the versatility
of the procedure (Seigneurin-Berny et al., 1999, Plant J. 19, 217-228 ; Ferro et al., 2000, Electrophoresis, in
press). In both cases, chloroform/methanol extraction of membrane proteins (a) eliminates peripheral proteins
as well as soluble contaminants from membrane fractions and (b) limits protein pattern complexity. Indeed,
when compared to the generally used differential solubilization in detergents, chloroform/methanol extractions
seems to be the best compromise to combine enrichment of highly hydrophobic proteins and complete
elimination of the hydrophilic ones.
Combining the use of classical SDS-PAGE and MS/MS, our procedure enables identification of hydrophobic
proteins, whatever their isoelectric point was, that can be minor membrane components. It complements
classical proteomic studies of membrane fractions separated by 2D-PAGE, which provide mostly informations
about peripheral membrane polypeptides. Chloroform/methanol extraction is thus likely to become a versatile
tool to recover the hydrophobic proteome from other membrane systems. For this reason, this subcellular
specific proteomic tool is particularly well adapted to eukaryote subcellular proteomic studies.
Tools for Functional Genomics Using Transcript Profiles and Proteomics
Joakim Lundeberg
Department of Biotechnology, Royal Institute of Technology, Stockholm, Sweden
High-throughput systems for analysis of genetic variability, transcript profiling and affinity-reagent based
proteomics have been established based on “in-house” experience and robotics. Tag sequencing projects
involving EST and SNP analysis have been developed based on pyrosequencing, a novel sequencing by
synthesis technique. To allow large-scale applications using this technology a semi-automated production line
have been established that enables rapid analysis of SNPs (MTP/10 min) and ESTs (MTP/40 min). Furthermore
systems for analysis of minute amounts of samples have been developed for both transcript profiling and SNP
purposes. To allow for comparative studies of experimental and public domain data a visualization tool has been
developed that allows for “datamining” of accumulated information leading to new starting points for subsequent
proteome analysis. For these latter purposes, efficient bacterial expression systems have been established for
analysis of cDNAs or exons predicted from genomic data. The obtained protein products are used to create
affinity reagents for functional characterisation of gene products.
Proteomic Strategies in Cancer
Julio E. Celis
Department of Medical Biochemistry and Danish Centre for Human Genome Research, University of Aarhus,
Ole Worms Allé build. 170, DK-8000 Aarhus C, Denmark
Proteomics is an emerging area of the post-genomic era that uses a plethora techniques to resolve, quantitate,
rapidly survey the identity of proteins, annotate, as well as to identify their interacting partners. In combination
with DNA microarrays, these technologies promise to revolutionise biology as they are expected to reveal gene
30
regulation events involved in disease progression as well as to generate potential targets for drug discovery and
diagnostics. Here, I will highlight the potential of proteomics for the study of bladder cancer progression using
biopsy specimens.
Proteomics Databases
Amos Bairoch
Swiss Institute of Bioinformatics, Geneva, Switzerland
SWISS-PROT is an annotated protein sequence database established in 1986 and maintained collaboratively,
since 1987, by the Department of Medical Biochemistry of the University of Geneva and the EMBL Data Library
(now the EMBL Outstation - The European Bioinformatics Institute (EBI)). The SWISS-PROT protein sequence
data bank consists of sequence entries. Sequence entries are composed of different line-types, each with their
own format. For standardization purposes the format of SWISS-PROT follows as closely as possible that of the
EMBL Nucleotide Sequence Database.
The SWISS-PROT database distinguishes itself from other protein sequence databases by three distinct
criteria:
(i) Annotation: in SWISS-PROT, as in most other sequence databases, two classes of data can be
distinguished: the core data and the annotation. For each sequence entry the core data consists of the
sequence data; the citation information (bibliographical references) and the taxonomic data (description of the
biological source of the protein) while the annotation consists of the description of the following items:
function(s) of the protein; post-translational modification(s), for example carbohydrates, phosphorylation,
acetylation, GPI-anchor, etc.; domains and sites, for example calcium binding regions, ATP-binding sites, zinc
fingers, homeobox, kringle, etc.; secondary structure; quaternary structure, for example homodimer,
heterotrimer, etc.; similarities to other proteins; disease(s) associated with deficiencie(s) in the protein;
sequence conflicts, variants, etc.
We try to include as much annotation information as possible in SWISS-PROT. To obtain this information we
use, in addition to the publications that report new sequence data, review articles to periodically update the
annotations of families or groups of proteins. We also make use of external experts, who have been recruited to
send us their comments and updates concerning specific groups of proteins.
We believe that our having systematic recourse both to publications other than those reporting the core data
and to subject referees represents a unique and beneficial feature of SWISS- PROT. In SWISS-PROT,
annotation is mainly found in the comment lines (CC), in the feature table (FT) and in the keyword lines (KW).
Most comments are classified by `topics'; this approach permits the easy retrieval of specific categories of data
from the database.
(ii) Minimal redundancyany sequence databases contain, for a given protein sequence, separate entries which
correspond to different literature reports. In SWISS-PROT we try as much as possible to merge all these data
so as to minimize the redundancy of the database. If conflicts exist between various sequencing reports, they
are indicated in the feature table of the corresponding entry.
(iii) Integration with other databases : it is important to provide the users of biomolecular databases with a
degree of integration between the three types of sequence-related databases (nucleic acid sequences, protein
sequences and protein tertiary structures) as well as with specialized data collections. SWISS- PROT is
currently cross-referenced with 30 different databases. Cross-references are provided in the form of pointers to
information related to SWISS-PROT entries and found in data collections other than SWISS-PROT.
I will also describe other ongoing efforts of the Swiss Institute of Bioinformatics performed in collaboration with
the European Bioinformatics Institute, such as TREMBL, a computer-annotated supplement of SWISS-PROT
that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS-PROT.
Other proteomics databases available from SIB (http://www.expasy.ch) are - PROSITE, a database of protein
families and domains which consists of biologically significant sites, patterns and profiles that help to reliably
identify to which known protein family (if any) a new sequence belongs;
31
SWISS-2DPAGE which contains data on proteins identified on various 2-D PAGE reference maps. You can
locate these proteins on the 2-D PAGE maps or display the region of a 2-D PAGE map where one might expect
to find a protein from SWISS-PROT;
SWISS-3DIMAGE, an image database which strives to provide high quality pictures of biological
macromolecules with known three-dimensional structure. The database contains mostly images of
experimentally elucidated structures, but also provides views of well accepted theoretical protein models.
The images are provided in several useful formats; both mono and stereo pictures are generally available
The Physiome Project : Integrating from Genomics to Function or vice versa
James B. Bassingthwaighte
University of Washington, Seattle, WA, 98195-7962, USA
The Physiome is the quantitative description of the functioning organism in normal and pathophysiological
states. It is built upon the morphome, the quantitative description of anatomical structure, chemical and
biochemical composition, and material properties of an intact organism, including its genome, proteome, and the
structures of cells, tissues and organs, up to the whole intact organism. The Physiome Project is beginning as a
program to design, develop, implement, test and document, archive and disseminate quantitative information
and integrative models of the functional behavior of the components and of intact organisms from bacteria to
man. A fundamental and major feature of the program is the databasing of the basic observations for retrieval
and evaluation. Given Genome, Transcriptome, Proteome, then Chance, Necessity, and Environment influence
the phenotypic form and the functional entity, the Physiome. Prediction from the genome is still more often
empirical than not, and straightforward logic commonly fails. The problem is not simply that a gene gives rise to
several proteins, or that mRNA/protein ratios are widely scattered, but that control and function are governed
through a multiplicity of interacting systems. These can be sorted out only through integrative modeling.
The are many problems in developing large scale systems descriptions in biology. Integrative models have
many submodels. The submodels are themselves complex, requiring consideration of spatial and temporal
events and processes; they have to be linked to one another while preserving mass balance and giving
accurate representation of the variables within non-linear complex biochemical networks with many signalling
and controlling pathways. Micro-compartmentalization vitiates the use of simplified model structures. The wide
range of rate constants in the equations makes computation costly.
A most serious problem is the current lack of databases on physiological information. While the genomics and
proteomics communities have organized large database systems, at higher levels of biology there are almost
none, While we are nearly drowning in new information being published each day, the data are not entered into
databases. "Simple" things like tissue composition, material properties and mechanical behavior of cells and
tissues are not generally available.
Technologies allowing many groups to work together are being rapidly developed. Internet II will facilitate this
immensely. When problems are complex, a particular working group can be expert in only a small part of the
overall program. The strategies to be worked out must therefore include how to pull models composed of many
submodels together even when the expertise in each is scattered amongst diverse institutions. The technologies
of bioinformatics will contribute greatly to this effort.
The successful development of comprehensive models of biological systems is a key to strategizing toward
interventional genomics, pharmaceutics and drug design. Carefully integrated models will become gradually
better predictors of the results of interventions. When they are very good, which will take some time, models will
be useful in predicting the side effects and long term effects of drugs and toxins, and to predict where genomic
intervention will be effective and where the multiple redundancies in our biological systems will render a
proposed intervention useless. The Physiome Project will provide the integrating scientific basis for the Genes
to Health initiative, and make physiological genomics a reality applicable to whole organisms, from bacteria to
man.
32
DOE Genome Scale Expression Efforts
Marvin Stodolsky
US Department of Energy, Office of Science, Genome Task Group, Germantown, MD, USA
The Human Genome Initative of the US Department of Energy was begun in 1986 and dedicated support for the
expression/cDNA related projects began in 1990. This support has included the cDNA clone management
services of I.M.A.G.E at the Lawrence Livermore National Laboratory, many smaller cDNA projects and the
recent series of Workshops on Complete cDNA Sequencing (WCCS). A core facet of DOE planning is to forsee
bottlenecks in R&D and to implement plans to alleviate them. This meeting coincides with a substantial on-going
transition in expression studies. With the completion of the draft sequence of the human genome, almost all of
the ESTs with their messenger RNAs they represent now have candidate locations on the chromosomes.
Models for the source chromosomal genes are generated both through both experimental studies and the
computational methods, including those of the Annotation Consortium centered at the Oak Ridge National
Laboratory. A core continuing task is to provide improved tools and resources for genome scale, multi-tissue
and multi-condition analyses of gene expression. DOE plans in this arena will be related.
THURSDAY, NOVEMBER 9, 2000
JEUDI 9 NOVEMBRE 2000
SESSION 6: APPLICATIONS IN BIOLOGY, BIOTECHNOLOGY AND MEDICINE
Production and Quality Assessment of Full-Length-Enriched cDNA Libraries and their use in
transcriptome profiling using microarrays
Claudio Schneider
LNCIB (Laboratorio Nazionale CIB), AREA Science Park, Padriciano 99, 34012 Trieste, Italy
Technologies for obtaining full-length-enriched cDNA libraries will be presented along with their quality
assessment.The advantage of using full-length cDNA deposited in microarray format with respect to partial
cDNAs for hybridization in transcriptome profiling will be considered within a case-study. Requirement for
increased detection sensitivity to uncover low-abundance transcripts will be adressed. Transcriptome profiling
associated with 'in vitro' growth arrest and p53 response will be analyzed in relation to specific formexamples.
Integration of the obtained transcriptome profiling with the associated protein-interaction profiling will be
highlighted as a way to interlock pathway dissection to phenotipic response.
Gene Expression Profiling of 3 Solid Tumors
1
1
1,4
1
1
5
3
3
1,2
H. Sültmann , W. Huber , J. Boer , F. Wilmer , R. Wittig , B. Korn , L. Füzegi , B. Gunawan , S. Haas , A. v.
2
2
1
Heydebreck , M. Vingron , A. Poustka
2
1
Abt. Molekulare Genomanalyse, Theoretische Bioinformatik, Deutsches Krebs-forschungszentrum, Im
3
Neuenheimer Feld 280, D-69120 Heidelberg; Zentrum Pathologie, Georg-August-Universität, Robert-Koch-Str.
4
40, D-37075 Göttingen; Leiden Univ. Medical Center, Wassenarseweg 72, Leiden 2333AL, NL;
5
Resourcenzentrum im Deutschen Humangenomprojekt, Im Neuenheimer Feld 506, D-69120 Heidelberg.
Cancer cells show altered gene expression compared to normal cells. Knowledge of the changes in gene
expression for certain types and stages of tumors can give insight into the molecular changes involved in tumor
development and progression and provide molecular markers for tumor diagnosis and prognosis. We use the
cDNA array hybridization technology as a high throughput method to determine the expression levels of 32,000
different human genes and ESTs spotted in duplicate onto nylon membranes. Normal tissue and primary tumor
+
33
tissue of the same patient are used to isolate poly (A) RNA which is reverse transcribed into P-labelled single
stranded cDNA. The hybridization of both cDNA populations is performed on different membranes using a
standardized protocol. Phosphoimage plates are exposed to the membranes, and expression profiles are
calculated through spotwise quantification of the signal distribution.
33
We have collected array expression data for 37 renal cell carcinoma samples (predominantly clear cell types) of
different tumor stages and differentiation grades, and for the corresponding normal tissues of the same patients.
More than 1700 genes were identified with statistical significance to be expressed at different levels between
normal and tumor tissues. Among these were several genes which had been known to be differentially
expressed in renal carcinoma, e.g. vimentin, VEGF, haptoglobin, metallothionein, and kininogen. In addition to
the genes known to be associated with kidney cancer, many other genes and ESTs were found. Our data allow
the definition of genes that are significantly transcribed only in certain tumor stages (e.g. in metastases). A
detailed analysis of the correlation of gene expression with tumor progression in the renal cell carcinoma is
currently being performed. Similar experiments for brain and breast tumors are in progress.
The renal cell carcinoma specific genes, as well as a selection of genes which are known to have oncogenic
potential in other cancer types, have been amplified by PCR and spotted onto glass slides to build a kidney
tumor specific gene array. With these, we will conduct a further focused investigation on the differential
transcription of genes in renal cell carcinoma.
A queryable database combining expression data for all genes on the array with histopathological and clinical
follow-up information for the tumor material as well as tools to mine these large data sets, are under
development.
The Cancer Gene Anatomy Program (CGAP) and the Mammalian Gene Collection (MGC) : cDNA
Resources for the Community
Robert Strausberg
National Cancer Institute, Bethesda, MD, USA
Over the past three years the NCI has established a Tumor Gene Index of sequence tags derived from cancers
and their normal precursors. The collection now includes over one million EST tags and three million SAGE
tags. Through the CGAP web site (cgap.nci.nih.gov), informatics tools are provided to facilitate application of
the sequences and clones to cancer research. A summary of the current resource and its uses will be
discussed. The NIH Mammalian Gene Collection (MGC) seeks to identify and sequence full open reading frame
clones derived from human and mouse genes. The clones are first sequenced from the 5' end to identify
potentially full-ORF clones and non-redundant clones are then subjected to full-insert sequencing. At present
(September 2000) the collection includes approximately 160,000 5'end reads, of which 102,000 are from human
cDNA libraries. These human clones include about 43,000 that appear to have the full N-terminal coding region,
and these form a non-redundant set of clones derived from about 6,300 genes. Full-insert sequencing for the
non-redundant set is currently underway. A detailed description of the MGC approach and results will be
presented.
Expression Profiling in Patient Tissue for Insights into Etiology and Pathophysiology of Progressive
Disease
Eric P Hoffman, Yi-Wen Chen, Po Zhao, Rehannah Borup
Research Center for Genetic Medicine, Children’s National Medical Center, Washington DC 20010, USA
Genome-wide expression profiling of patient tissues using Affymetrix or cDNA microarrays is widely believed to
hold promise for understanding disease etiology, pathophysiology, and monitoring of therapeutics. The use of
diseased patient tissues introduces variables which must be considered in experimental design, such as
background genetic differences, heterogeneity in tissue sample, and others.
Muscle may be an ideal tissue in which to conduct expression profiling. It is typically biopsied in disease
patients, biopsies are flash frozen in a manner ideal for RNA isolation in adequate quantities, cell content of
muscle is relatively simple, and frozen sections are easily interpreted for immunohistochemistry verification of
expression array differences in gene expression. Also, normal muscle responds to a series of environmental
stimuli, such as exercise, atrophy, and tissue damage, and is again typically biopsied in clinical studies of
34
normal populations. Genetic polymorphic variants in muscle expressed genes dictate responsiveness to
exercise, including skill at specific types of sport (such as strength, endurance, speed).
Here, we present a series of experiments using expression profiling of muscular dystrophy patient muscle
biopsies. We present methods that control for tissue heterogeneity, and genetic background differences
between individuals. We also show data on the reproducibility of the method, which underscores the need for
multiple experiments. We report the use of expression profiling to define the pathophysiological cascades
involved in the progression of two muscular dystrophies with known primary biochemical defects, dystrophindeficiency (Duchenne muscular dystrophy), and β-sarcoglycan deficiency (a dystrophin-associated protein). We
employed a novel protocol for expression profiling in human patient tissues using mixed samples of multiple
patients and iterative comparisons of duplicate datasets. Using this approach with patient muscle biopsies, we
successfully define novel aspects of the molecular pathophysiology, both cell autonomous and non-cell
autonomous, which explain downstream histological and clinical consequences of these biochemical
deficiencies. We found evidence for both incomplete differentiation of patient muscle, and for de-differentiation
of myofibers to alternative lineages with advancing age. One such differentially expressed gene that we
characterized in detail, α cardiac actin, showed persistent expression after birth in 60% of myofibers despite the
absence of degeneration/regeneration in the muscle. The majority (80%) of myofibers remained strongly
positive for this protein throughout the course of the disease. Other developmentally-regulated genes that
showed widespread overexpression in these muscular dystrophies included embryonic myosin heavy chain,
versican, acetylcholine receptor, SPARC/osteonectin, and thrombospondin 4. We hypothesize that the
abnormal Ca2+ influx in dystrophin- and β-sarcoglycan-deficient myofibers leads to altered developmental
programming of developing and regenerating myofibers. The finding of upregulation of HLA-DR and Factor XIII
led to the novel identification of dendritic cell infiltration in dystrophic muscle; these cells likely mediate immune
responses and microenvironmental changes in muscle. Finally, we document a general metabolic crisis in
dystrophic muscle, with large-scale down-regulation in mitochondrial gene expression.
Molecular Immunology 2000: Transcriptional Profiling of Regulatory Vα24JαQ T cells from Identical
Twins Discordant for Type I Diabetes, and a New Mechanism for Regulation of the Immune Response
S. Brian Wilson, Michael C. Byrne and Jack L. Strominger
Dana Farber Cancer Institute, Boston MA 02115, Genetics Institute. Cambridge MA 02140, and Department of
Molecular and Cellular Biology, Harvard University, Cambridge MA 02138, USA
In a study of identical twins discordant for Type I diabetes, the diabetic probands were observed to have a
marked deficiency in the number of regulatory Vα24JαQ T cells as well as a defect in secretion of IL-4 from
these T cells (1). The question whether the effect on IL-4 secretion was unique was examined by transcriptional
profiling of the V α24J αQ T cells from a diabetic proband and from her identical non-diabetic twin. Since 226
transcripts were altered by activation of the T cells from the normal twin and only 86 in the clone derived from
the diabetic twin, the deficit in IL-4 secretion was not unique. Furthermore, the observed transcriptional profiles
strongly suggested a role for these regulatory T cells in the recruitment and activation of cells in the myeloid
lineage (2). Stimulation of Vα24JαQ T cells through their T cell receptor resulted in the activation of a number of
transcripts important for recruitment and differentiation of myeloid cells (i.e. MIP-1α, GM-CSF, IL-4) and several
involved in cytolysis (perforin, granzymes, and granulysin). Moreover, myeloid dendritic cells (DC) were found to
express CD1d, the ligand for the invariant TCR of Vα24JαQ T cells. Myeloid dendritic cells both activated
V α24J αQ T cells, and were susceptible to lysis by these same regulatory T cells in a CD1d-restrricted fashion.
Since myeloid dendritic cells are a major source of IL-12 that is required for Th1 cell differentiation, their
elimination by lysis is a mechanism for limiting the generation of Th1 cells and thus regulating the balance of
Th1 and Th2 responses (3).
S. Brian Wilson, Kent S.C., Patton K.T., Orban T., Jackson R.A., Exley M., Porcelli S., Schatz D.A., Atkinson
M.A., Balk S.P., Strominger J.L., Hafler D.A.Extreme Th1 bias of invariant Valpha24JalphaQ T cells in type 1
diabetes. Nature 391:177-181 (1998).
S. Brian Wilson, Kent, S. C., Horton, H.F., Hill, A.A., Bollyky, P.L., Hafler, D. A., Strominger, J.L. and Bryne, M.
Multiple differences in gene expression in regulatory Vα24J αQ T cells from identical twins discordant for type I
diabetes. Proc. Natl. Acad. Sci. 97: 7411-7416 (2000).
35
Otto O. Yang, Racke F.K., Nguyen P.T., Gausling R., Severino M.E., Horton H.F., Byrne M.C., Strominger J.L.,
Wilson S.B. CD1d on myeloid dendritic cells stimulates cytokine secretion from and cytolytic activity of
Valpha24JalphaQ T cells: a feedback mechanism for immune regulation. J. Immunol. 165:3756-3762 (2000).
A new highly sensitive microarray approach for differential screening using radioactive probes
S. Dumas, T. Vujasinovic, H. Salin, S. Maitrejean (1), C. Menini, and J. Mallet
LGN, UMR 9923, CNRS, Hôpital Pitié Salpêtrière, 75013 Paris, France
(1) Biospace Mesures, 10 rue Mercoeur, 75011 Paris, France
We have developed experimental procedures and signal filtering algorithms making radioactive labelling highly
suitable for gene expression screenings on microarrays. These procedures make it possible to perform
simultaneous hybridisation of two differently-labelled radioactive probes on a given microarray. This approach
gives the highest sensitivity currently available in signal detection (as compared to fluorescent labelling, the gain
in signal detection sensitivity is of >102). It allows expression profiling experiments using sub-microgram
amounts of unamplified messenger RNA from small biological samples. We were able to detect very lowexpressed mRNAs (<1 mRNA copy / 105 mRNA molecules), even when starting from very low amounts of
sample (100ng of poly-A RNA corresponding to approximatively 5mg of tissue per hybridisation experiment).
We show that 3H-labelling is fully detected on glass-support microarrays. In the present state of art,
simultaneous hybridisation procedures can be performed by comparing 3H with either 33P or 35S (or 32P). The
5-µm pixel size of the MicroImager (Biospace Mesures, Paris, France) is satisfactory for microarray analysis.
About 10,000 spots could be analysed on a given array with radioactive labelling. Considering the high absolute
sensitivity in signal detection and the low background of this technique, it should theoretically make it possible to
detect reproducibly less than 2-fold gene expression modulations of low-expressed genes.
High Throughput SNP scoring using Rolling Circle Amplification
Tony Smith
Amersham Pharmacia Biotech, Amersham Laboratories, White Lion Road, Amersham, Buckinghamshire, UK
The increasing availability of a dense map of single nucleotide polymorphism (SNP) markers makes possible
genome-wide scans for genotype-phenotype associations. The density of markers required is the subject of
considerable discussion, but it is clear that such experiments will require SNP scoring technology capable of
very high throughput and therefore issues of cost per assay and ease of automation are important.
SNiPer  is an SNP scoring system that has been developed to address these issues. It is based on allelespecific amplification of polymorphic loci directly from genomic DNA. The technology combines ligation of SNP1
specific open circle probes and Rolling Circle Amplification in a single tube process. Detection is carried out
using two generic primers labelled with FRET dye pairs in a homogeneous microtitre plate format. Results will
be presented demonstrating the accuracy and sensitivity of this system.
1
Lizardi, P.M. et al, Nature Genetics 19, pp225-231, 1999.
36
Populations, SNPs and Chips in Common Disease Mapping
Andres Metspalu
Institute of Molecular and Cell Biology, University of Tartu, Estonian Biocenter, Estonia and International Agency
for Research on Cancer, Lyon, France
Many common diseases like cardiovascular disease, diabetes, cancer, asthma etc. are the result of a complex
interaction between environmental factors and susceptibility alleles of multiple genes. Traditional linkage
analysis is not suitable for identifying these alleles (all are potential drug targets or drug candidates) because
they do not segregate in a Mendelian fashion and it can find alleles only with substantial effects to disease
predisposition. Therefore alternative strategies are needed. One way to succeed is to use genome-wide scan of
cases and controls and perform association studies based on linkage disequilibrium. Now the question is what
population to use for the study? What is the sample size? How many and what type of SNP markers we need?
What is the best genotyping technology in regard of throughput, fidelity and cost? We propose Estonian
population for large-scale association studies using high density SNP mapping with 60 to 100 thousand SNP
markers. Before genotyping the disease status (phenotype) of individuals voluntarily participating in the project
will be recorded. We hope to collect up to one million phenotypes -75% of the population (www.genomics.ee).
We have developed a single nucleotide polymorphism scoring system for high throughput SNP analysis. The
method is based upon an array of oligonucleotides immobilized via a 5’-end amino linker on an amino-coated
glass slide surface. Oligonucleotides are selected from the sense and antisense genomic sequence so that their
3’-ends are one base pair upstream of the SNP. A dsPCR product containing the SNP is used as a template.
This dsPCR product is fragmented with UNG and treated with AP to inactivate the dNTPs before primer
extension with fluorescent ddNTPs. A four channel imaging system (Asper FD-003) has been developed
consisting of a total internal reflection fluorescence excitation mechanism combined with high-resolution CCD. A
Software package (Genorama 3.0) has been developed for SNP scoring (Kurg et al. 2000, Genetic Testing 4, 17). Oligonucleotide design, quality, DNA Polymerase, dye terminators, template DNA quality and special
software tools are all critical for the optimal results. Once developed for each specific set of oligonucleotides,
these SNP chips are working and are more competitive when compared with other SNP scoring platforms. This
is even more profound when the number of SNPs to be analyzed increases from hundreds to many thousands.
SESSION 7: FUTURE PERSPECTIVES
Gene Identification Projects at TIGEM
Giuseppe Borsani
Tigem - Telethon Institute of Genetics and Medicine, Via Pietro Castellino 111, 80131 Napoli, Italy
The mission of our Institute is to study human inherited diseases. Since its beginning, Tigem‘s strengths have
been in the field of disease gene identification and in the characterization of the molecular defects underlying
human genetic diseases. A number of disease genes have been in fact identified by Tigem researchers during
the last three years: Opitz syndrome (Quaderi et al., Nature Genetics, 1997), hereditary spastic paraplegia
(Casari et al., Cell, 1998), lysinuric protein intolerance (Borsani et al., Nature Genetics, 1999) non type I
cystinuria (Feliubadalo et al., Nature Genetics, 1999), and mucolipidosis type IV (Bassi et.al., AJHG, 2000).
Tigem researchers have also been involved in systematic (“genome wide”) projects with an emphasis on the
experimental approach, rather than on the biological problem. These projects have taken advantage of the
presence of core facilities, such as Sequencing, Bioinformatics, and cDNA Library Screening, among the others.
One of the most successful of this type was the Drosophila Related Expressed Sequences (DRES) project,
which led to the identification of many novel human genes homologous to Drosophila mutant genes (Banfi et al.,
Nature Genetics, 1996). This effort generated several additional, and more focused projects on candidate
disease genes. Another important systematic gene identification project in our laboratory is represented by the
analysis of the distal short arm of the X chromosome (Xp22), a study started more than ten years ago.
We believe that a mixture of systematic and focused projects is of great advantage. Systematic approaches
allow researchers to benefit from the resources generated by the Human Genome Project. In addition, they
37
generate novel ideas and tools for more “in-depth” projects. On the other hand, only the latter type of projects
will provide us with detailed and reliable information on gene function and dysfunction.
The Human Genome, Transcriptome Analysis, Medicine and Cancer
Gert-Jan van Ommen
Leiden University, Leiden, The Netherlands
Our past and future projects in genomics entail YAC and MAC (re)construction and transgenics, FISH
development, high-throughput robotics and DNA-chip expression array research, in the context of the Leiden
Genome Technology Center (GTC) and in close collaboration with the LUMC department of Molecular and
Cellular Biology.
Our cancer genetics program aims to further insights in the cell biology and etiology of common cancers and to
improve diagnosis and prevention. Breast, colon and skin cancer are studied at all levels, from clinicalepidemiological and cellular studies to mouse models. Together with the LUMC departments of Pathology,
Gastro-Enterology and Dermatology, we have made widely recognised contributions to the diagnosis,
epidemiology and genotype-phenotype correlation in breast cancer (BRCA1 and BRCA2), colorectal cancer
(FAP and HNPCC) and melanoma (p16/CDK2). The mouse colorectal cancer genetics program aims at the
control by the APC/ß-catenin signalling pathway of cell adhesion, migration and differentiation in development
and tumorigenesis. Ongoing research: From the study of knockout mice with different levels of truncated APC,
and crosses with repair gene knockouts, we conclude that the remaining capacity of APC to downregulate ßcatenin is a key factor in tumorigenesis. Also for BRCA1 we have recently established a mouse model for
further mechanistic study. Our current melanoma work shows a strong risk-modifying effect of melanocortin
receptor-1 polymorphism, which also affects skin type. Future work in cancer genetics aims at risk- and
phenotype-modifying factors, using epidemiological and animal studies.
Acute Promyelocytic Leukemia: A Model for Gene Transcriptional Regulation-based Therapy
Sai-Juan Chen, Qing-Hua Zhang, Zhen-Yi Wang, Zhu Chen
Shanghai Institute of Hematology, Rui Jin Hospital affiliated to Shanghai Second Medical University, 197 Rui Jin
Road II, Shanghai 200025, China
Acute promyelocytic leukemia (APL) is characterized by specific chromosomal translocation t(15;17) which
fuses the RAR α gene to PML gene in the great majority of patients and variant translocations t(11;17)(q23;q21),
t(5;17) and t(11;17)(q13;q21) resulting in PLZF-RARα, NPM-RARα and NuMA-RARα fusion genes in a small
subset of patients. Current data suggest that PML-RARα and other fusion receptors play a key role in APL
pathogenesis through antagonizing the retinoic acid (RA) signalling and the regulatory pathways mediated by
fusion partners, as well as decreasing the sensitivity to RA in receptor’s interaction with nuclear corepressor
complex. The leukemogenic effect of the fusion genes were confirmed in transgenic animals by several groups
including our own. APL is the first human cancer which responds to the differentiation-inducing effect of all-trans
RA (ATRA) and arsenic trioxide. The therapeutic effect of ATRA has been associated with the direct modulation
of PML-RAR α and its interaction with corepressor, the restoration of the wild-type RAR/RXR regulatory pathway
and the regulation of the transcriptional expression of genes downstream of RAR/RXR. Since transcriptional
regulation represents the link between APL pathogenesis and ATRA differentiation therapy, analyzing gene
expression patterns in APL cells before and after ATRA treatment is a useful approach to identify genes whose
functions are involved in this new cancer treatment. Using an APL cell line-NB4 as in vitro model and three
techniques for gene expression scanning, namely cDNA array, differential display-PCR and suppression
subtractive hybridization, we have recognized 169 genes including eight novel ones to be modulated by ATRA.
A chronologically well-coordinated regulation of these genes seems to constitute a balanced functional network
governing the decreased cellular proliferation ability, the initiation and progression of maturation, and the
maintenance of cell survival before terminal differentiation. Accordingly, several signal pathways such as MAPK,
cAMP/PKA, interferon/STAT and AP-1 are implicated in the choreography. Cycloheximide inhibition test
revealed that the transcriptional regulation of 8 induced and 24 repressed genes appeared to be protein
synthesis-independent. By comparing the expression pattern of NB4 cells with that of NB4-R1, an
38
ATRA-resistant subclone which can be reinduced into differentiation in the presence of cAMP, we identified a
group of genes which are in fact responsive to cAMP. This suggests that ATRA-triggered maturation requires a
cooperation between retinoic acid pathway and the cytosolic signaling. Recently, we also found that arsenic
trioxide can modulate the PML-RARα oncoprotein and induce its degradation. Moreover, arsenic trioxide is able
to regulate the transcriptional expression of a number of genes important for the control of cell differentiation
and apoptosis. The effect of this drug on cellular gene expression profiles seems to be related to its ability to
modulate the histone acetylation status. Taken together, our work suggests that ATRA and arsenic trioxideinduced differentiation of APL cells represent a new model in cancer therapy through targeting the cellular
machinery in regulating gene expression. This concept may be extended to other human cancers with the better
understanding of the human genome, transcriptome and proteome.
Transcriptional Regulation of Cell Cycle Regulatory and Apoptosis Genes by DNA Damage Induced by
Camptothecin: Microarray Analysis of Dose- and Time-Dependent Effects
1
1
1
2
1
1
2
Yves Pommier . Yi Zhou , William C. Reinhold , Lance Miller , Fuad G. Gwadry , Lawrence H. Smith , E. Liu ,
1
1*
Kurt W. Kohn , John N. Weinstein
1
Laboratory of Molecular Pharmacology, Division of Basic Sciences, Building 37/5D-02, National Cancer
2
Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, USA. National Cancer Institute (NCI)
Microarray Facility, Advanced Technology Center, Gaithersburg, Maryland, USA.
cDNA microarray technology holds the promise of genome-wide gene expression profiling. Even at its current
level, this technology allows us to establish associations between characteristic gene expression patterns and
molecular responses to therapy. In this study, we used cDNA microarrays of 1,694 cancer-interesting genes to
monitor the gene expression consequences of the treatment of HCT116 human colon cancer cells with the
topoisomerase I inhibitor camptothecin (CPT). We did so as a function of time and concentration because
otherwise one would be likely to miss or misunderstand major portions of the downstream molecular
consequences of the treatment. CPT generates double strand DNA breaks during DNA replication and delays or
arrests cell cycle progression. To obtain a homogenous cellular response, we used aphidicolin to synchronize
the cells in S-phase prior to CPT treatment. Treatment with 20 nM CPT caused reversible (temporary) G2 delay,
whereas treatment with 1000 nM CPT causes irreversible (permanent and lethal) G2 arrest. Thirty-three genes,
divided into 3 groups, showed characteristic changes as a consequence of treatment. Group I genes consisted
of mitosis-related genes, including cyclin B1 and centrosome-related genes that were upregulated after 20 nM
treatment during the extended G2/M transition, whereas they were decreased during permanent G2 arrest at
high CPT concentration. In contrast, group III genes included a group of p53-activated stress response genes,
including p21, 14.3.3, Fas, and wip1, which were up-regulated after 1,000-nM CPT treatment but remained
unchanged in the cells treated with 20 nM CPT. Group II genes, many of them involved in cellular metabolism,
were downregulated during the cell cycle delay of cells treated with 20 nM CPT. These findings suggest that
DNA damage can disrupt cell cycle-regulated gene expression. The gene expression changes identified in this
work reveal remarkably coordinated DNA-damage responses, apoptosis, and cell cycle events at the
transcription level.
39
SESSION 8: ETHICAL, LEGAL AND ECONOMICAL ISSUES
Patentability of Life and Ethics
Noëlle Lenoir
Conseil Constitutionnel, Paris, France
The title of my talk illustrates the originality of the European approach of patenting.
For Europeans, intellectual property has been approached for a long time not only from a technical but also a
« moral » point of view. Exclusion from patenting of inventions which are considered « contrary to ordre public
th
and good morality» was already part of the first European treaties on patenting in the 19 century. The same
approach is today transcribed in the more modern form of ethics in the 1998 European directive on the legal
protection of biotechnology inventions, which prohibits for example the patenting of inventions related to
procedures for human cloning or those including the commercial and industrial use of human embryos. How can
we ensure the effective implementation of the very precise prescriptions of the 1998 directive in the content of
the accelerated evolution of research based on human body parts (genes, proteins, cells, particularly embryonic
stem cells) ?
Another difficulty arises: how to reconcile the traditional principle of common as well as civil law of « non
commercialization of the human body » together with systems designed to ensure the profitable exploitation of
the results of research on human living matter. Thus, some argue in Europe that commercialization of genetic
sequences by private companies involved in sequencing the human genome world constitute an infringement to
the non commercialization of the human body. Under this assumption, should we consider sharing the profits
derived from these research activities or their consequences, if not with individuals, at least with patient
organizations ? The example of Iceland which has licensed through a specific law its genetics data to DeCode
should stimulate our thinking.
As fundamental as the first two topics is the protection of privacy for persons whose genetic data or biological
samples are stored and used in research and industry. This is a question which is not yet frontally addressed,
but which must be studied urgently at an international level, i.e. at least through a concerted action between the
United States, Europe and Japan.
Biology and human genetics, as a whole, are without borders. Thus all these questions must be addressed at
this international level.
Patenting Genome Research Tools and the Law
Rebecca Eisenberg
University of Michigan Law School, Ann Harbor, MI, USA
Over the past 15 years, a number of legal and commercial developments have converged to make intellectual
property issues particularly salient in biomedical research.
A series of judicial and administrative decisions has expanded the categories of patentable subject matter in the
life sciences. For many years it appeared that patents on living subject matter would violate the longstanding
principle that one may not patent products or phenomena of nature. But in 1980 the US Supreme Court held in
the case of Diamond v. Chakrabarty that a living, genetically altered organism may qualify for patent protection
as a new manufacture or composition of matter under Section 101 of the US Patent Code. Characterizing
Chakrabarty's invention as « a new bacterium with markedly different characteristics from any found in nature »
and « not nature's handiwork, but his own », the Court indicated that Congress intended the patent laws to
cover « anything under the sun that is made by man ». With this broad directive from the Supreme Court, the
US Patent and Trademark Office (PTO) expanded the categories of living subject matter that it considered
eligible for patent protection to include plants and animals.
40
During the same time period, the explosion of commercial interest in the field, and the concomitant emergence
of commercial biotechnology companies, have amplified the importance of intellectual property in the biomedical
sciences. Many biotechnology firms have found a market niche somewhere between the fundamental research
that typifies the work of university and government laboratories and the end product development that occurs in
more established commercial firms. To survive financially in this niche, biotechnology firms need intellectual
property rights in discoveries that arise considerably upstream from commercial product markets. This creates
pressure to patent discoveries that are closer to the work of research scientists than to ultimate consumer
products.
Another contemporaneous development that has contributed to the prevalence of intellectual property in
biomedical research is the passage of the Bayh-Dole Act and the Stevenson-Wydler Act in 1980, and a series of
subsequent acts that refine those statutes and expand their reach. These statutes encourage research
institutions to patent discoveries made in the course of government-sponsored research. For some institutions
involved in health-related research, this represented a 180° shift in policy. A generation ago, the prevailing
wisdom was that the best way to assure full utilization of publicly-sponsored research results for the public good
was to make them freely available to the public. Today, federal policy reflects the opposite assumption. The
current belief is that if research results are made widely available to anyone who wants them, they will languish
in government and university archives, unable to generate commercial interest in picking up where the
government leaves off and using the results to develop commercial products. To make government-sponsored
research discoveries attractive candidates for commercial development, institutions performing the research are
encouraged to obtain patents and to offer licenses to the private sector. As a result, institutions that perform
fundamental research have an incentive to patent the sorts of early stage discoveries that in an earlier era
would have been dedicated to the public domain. A big part of the resulting increase in patenting activity among
public sector research institutions has been in the life sciences.
Taken together, these factors have created a research environment in which early stage discoveries are
increasingly likely to be patented, and access to patented discoveries is increasingly likely to be significant to
the ongoing work of research laboratories.
I will address how these changes apply specifically to the field of genome research by discussing : (i) patents as
a strategy for protection of intellectual property; (ii) the benefits and costs of patents; (iii) requirements for patent
protection; (iv) the significance of experimental use exemption.
Legal Problems Related to Gene Patents
Joseph Straus
Max-Planck Institute for Foreign and International Patent, Copyright and Competition Law, Munich, Germany
The impact of patents on researchers' work, differences in propensity toward patenting. Key legal issues:
Eligibility for patent protection - DNA sequences a discovery or an invention? Patentability requirements and
scope of protection - dependency. TRIPS Agreement as mandatory international standard and the EU Directive
on the legal protection of biotechnological inventions. From Erythropoietin (EPO) patent to the US Patent on
Human Kinase Homologs - a retrospect on patent granting practice of the US Patent Office and the European
Patent Office. HUGO's position on patenting in genomics (1992-2000) in view of the Blair-Clinton Statement.
The current response of the US and EU authorities to patent applications for, inter alia, Expressed Sequence
Tags (ESTs) and Single Nucleotide Polymorphisms (SNPs). Effects and Scope of Protection of Gene Patents
and the European attempt to ease the dependency problem. Is there a need for further legislative action?
41
From Functional Genomics to Integrated Economy in Biotechnology
Bernard Pau
Director CNRS UMR 5094, Institute of Biotechnology and Pharmacology, Montpellier, France
Functional genomics is a key piece in the puzzle of life sciences and technologies. It emerges as a « golden
gate » between genetics (from molecular dissection to vision of the genome organisation) and physiology (from
dynamics to specialisation of organs/cells). It is thus no more possible to discriminate, at every step along this
sophisticated way to discovery, between pure knowledge acquisition and fully dedicated application :
fundamental research in life sciences today is closer than ever to medical consequences. Drug discovery and
development of « block busters », for instance, fully depends on the quality of integration that pharmaceutical
biotechnology can achieve from basic research to clinical trials. Obviously, functional genomics is there a key
player, taking a major part in : the exploration of pathologies (i.e.: identification of effector genes in invasive
infectious agents), pharmacology at the signalisation level (i.e.: characterization of the metabolic pathways
engaged in the phenotype of drug resistance), developments in animal/human toxicology (i.e. : toxicogenomics
at early stage of clinical trials).
This integration deeply concerns also the quality of partnership that public research can set up with industry
including small and medium size enterprises. It will have a profound impact on their intellectual property strategy
and potency, on their economic capacity (in terms of return on investment) and even on their position in the field
of knowledge acquisition. France has perfectly identified this challenge and decided to consider genomics as a
national priority in its innovation policy.
Closing Address
Geneviève Berger
Director General of the CNRS, Paris, France
Geneviève Berger has been appointed Director-General of the CNRS.
Geneviève Berger was born in 1955 in Moselle, France. From an early age her education followed an
interdisciplinary path. In 1974 she entered the "École Normale Supérieure" of Cachan, where she graduated
after having passed the « agrégation », the highest competitive examination for teachers in France, in the
Physical Sciences. While there, she developed a deep interest in applied physics. She went on to complete her
studies by writing a doctoral thesis in the physical sciences and earning a doctorate in medicine. She also
completed a specialization in nuclear medicine and the application of radioelements to medicine and biology. In
addition, she wrote a doctoral thesis in human biology and was given the authorization to supervise research.
Greatly attached to a multidisciplinary approach, in 1991 Geneviève Berger founded an interdisciplinary
laboratory, the Parametric Imaging Laboratory. It was set up in the teaching hospital center « Broussais HôtelDieu » (Paris), where she was appointed professor and interim head of the department of biophysics and
nuclear medicine. This laboratory conducts research on medical imaging and the medical and biological
applications of ultrasound.
Geneviève Berger helped give this laboratory a strong bias towards technology transfer as an extension of basic
research. Her first success was the design and international marketing of the world's first ultrasound bone
imaging device, used mainly for osteoporosis. The laboratory has also investigated the use of ultrasound for
very high resolution, non-destructive visualization and characterization of ocular structures, cartilage, and even
arteries. Geneviève Berger has been invited to a number of countries to discuss her work, particularly the
United States. She received the CNRS Silver Medal for the Life Sciences in 1994 and the Yves Rocard Prize
from the French Physics Society in 1997.
In 1995, she was elected president of the « Treatment and Drugs: Design and Resources » section of the
National Committee for Scientific Research. She left this position in 1998 in order to head the Bio-Engineering
Department of the Directorate for Technology of the Ministry of National Education, Research and Technology.
42
In 1998, she was honored as Chevalier of the "Palmes académiques" (a decoration for service to education in
France) and as Chevalier of the Legion of Honor.
She was appointed Director of Technology at the Ministry of Research in January 2000.
43
ABSTRACT AUTHORS INDEX
Auffray, C.
………………………………………………………………………….………………..15
Bairoch, A. ………………………………………………………………………………………………31
Bassingthwaighte, J.B. ………………………………………………………………………………32
Berger, G. ………………………………………………………………………………………….……42
Bonaldo, M.F. ………………………………………………………………………………….……..16
Borsani, G. ………..……………………………………………………………………………………..37
Brazma, A. ……………………………………………………………………………………..………..27
Bucher, P. ………………………………………………………………………………………………..23
Bumgarner, R. …………………………………………………………………………………………..24
Celis, J. …………………………………………………………………………………………………..30
Chen, Z. ……...…………………………………………………………………………………………..38
Clifton, S. …….…………………………………………………………………………………………..17
Eickhoff, H. …..…………………………………………………………………………………………..28
Eisenberg, R. ..…………………………………………………………………………………………..40
Folta, P. …………………………………………………………………………………………………..20
Gaasterland, T. ……………………………………………………………………………………..…..25
Gerstein, M. ……………………………………………………………………………………………..28
Gros, F. ……….…………………………………………………………………………………....…..15
Hide, W. ……………………………………………………………………………………………….....21
Hoffman, E. ….…………………………………………………………………………………………..34
Horn-Saban, S. ……………………………………………………………………………………….....25
Imbeaud, S. ……………………………………………………………………………………………..25
Joyard, J. …….…………………………………………………………………………………………..30
Kato, K.……….…………………………………………………………………………………………..27
Kawai, J.……..……………………………………………………………………………………….…..19
Lancet, D. ………………………………………………………………………………………………..23
Lenoir, N. …….…………………………………………………………………………………………..40
Lundeberg, J. …………………………………………………………………………………………..30
Mallet, J. …….…………………………………………………………………………………………..36
Margolin, J. ….…………………………………………………………………………………………..19
Mayor, F. …….………………………………………………………………………………………..15
Metspalu, A. ……………………………………………………………………………………………..37
Nguyen, C.………………………………………………………………………………………………..24
Ohara, O. ………………………………………………………………………………………………..17
Pau, B. ……….…………………………………………………………………………………………..42
Pollet, N. …….…………………………………………………………………………………………..26
Pommier, Y. ……………………………………………………………………………………………..39
Poustka, A. ……………………………………………………………………………………………..33
Quackenbush, J ………….……………………………..…………….……………………….………..22
Saurin, W. ………………………………………………………………………………………………..22
Schneider, C. ..…………………………………………………………………………………………..33
Simpson, A. ……………………………………………………………………………………………..20
Smith, T. ……..…………………………………………………………………………………………..36
Stodolsky, M. ..…………………………………………………………………………………………..33
Straus, J. …….…………………………………………………………………………………………..41
Strausberg, R. …………………………………………………………………………………………..34
Strominger, J. …………………………………………………………………………………………..35
Sugano, S. …..…………………………………………………………………………………………..18
van Ommen, G.J. ……………………………………………………………………………………..38
Wagner, L. …..…………………………………………………………………………………………..21
Wiemann, S. .…………………………………………………………………………………………..18
44