abstracts book - Campus des laboratoires de Villejuif
Transcription
abstracts book - Campus des laboratoires de Villejuif
TRANSCRIPTOME 2000 From Functional Genomics to Systems Biology Paris, 6-9 November 2000 - Pasteur Institute De la Génomique fonctionnelle à la Biologie des systèmes Paris, 6-9 Novembre 2000 - Institut Pasteur ABSTRACTS / RESUMES Organizing Committee / Comité d'organisation Charles Auffray (CNRS, France) Bento Soares (University of Iowa, USA) Sumio Sugano (University of Tokyo, Japan) TRANSCRIPTOME 2000 FROM FUNCTIONAL GENOMICS TO SYSTEMS BIOLOGY (DE LA GENOMIQUE FONCTIONNELLE A LA BIOLOGIE DES SYSTEMES) NOVEMBER 6-9, 2000 - INSTITUT PASTEUR, PARIS, FRANCE TABLE OF CONTENTS General Information Introduction …………………………………………………………………3 …………..…………………………………………………………..….7 Scientific Program Speakers Abstracts …………………………………………………………………8 …………….…………………………………..……..15 Index …………..…….……………………………………………………………….44 2 GENERAL INFORMATION (INFORMATION GENERALE) SCIENTIFIC COMMITTEE (COMITE SCIENTIFIQUE) Charles Auffray, Genexpress, CNRS, Villejuif, France Bento Soares, University of Iowa, Iowa City, USA Sumio Sugano, University of Tokyo, Tokyo, Japan LOCAL ORGANIZING COMMITTEE (COMITE D’ORGANISATION LOCAL) Genexpress, CNRS, Villejuif, France Institut Pasteur, Paris, France General Secretariat (Secrétariat Général) Odile Brasier Flavie Brocher Bénédicte Ecoutin Web site and Poster (Site web et affiche) Bertrand Bed’hom Scientific and Social Program (Programme scientifique et social) Charles Auffray Sylvie Bortoli Charles Decraene Nicole Adeline Fayein Sandrine Imbeaud Betina Porcel-Setterblad Béatrice Soury-Ségurens Patrick Zaborski Rima Zoorob Philippe Glaser 3 ACKNOWLEDGMENTS (REMERCIEMENTS) The organizers of the conference gratefully acknowledge the contributions of the following institutions, charities and companies for their support Les organisateurs de la conférence remercient vivement les institutions, associations et sociétés suivantes pour leur soutien : For their patronage (Pour leur patronage) Le Ministre de la Recherche L’Académie des Sciences The European Union For their support (Pour leur soutien) Amersham Pharmacia Biotech Association Française contre les Myopathies Biospace Instruments et Mesures Centre National de la Recherche Scientifique Compaq Department of Energy InforMax Institut National de la Santé et de la Recherche Médicale Institut Pasteur Ligue Nationale contre le Cancer Ministère de l’Economie, des Finances, et de l’Industrie Ministère de la Recherche National Cancer Institute Novartis 4 CHECK-IN (ENREGISTREMENT) During the scientific sessions the conference registration desk will be located in the hall of the C.I.S. (Centre d’Information Scientifique) and staff will be available to assist you with any requests or questions during session hours. Pendant les séances scientifiques, le bureau d’enregistrement sera situé dans le hall du C.I.S. (Centre d’Information Scientifique) et le personnel sera disponible pour répondre à vos questions durant les sessions. SESSIONS (SEANCES) All sessions will be held in the C.I.S. Auditorium. A name badge is required for all sessions. Toutes les séances se tiendront dans l’Auditorium du C.I.S. Un badge nominatif sera indispensable pour assister aux sessions. EXHIBITS (STANDS) All booths will be located close to the C.I.S. Auditorium. Les stands seront situés près de l’Auditorium du C.I.S. MEALS (REPAS) As shown on the program schedule, lunches and coffee breaks are included with registration for conference participants, as well as the « Wine and cheese » cocktail on Monday, November 6. Other dinners are not included, but the schedule allows ample time to have meals on your own. A Gala Evening will be held on Tuesday November 7, on « Le Grand Pavois » for a cruise-dinner, an unforgettable experience in the heart of Paris. Advance registration additional payment for the gala evening was required and gala evening tickets distributed with conference materials at registration check-in. Transportation for gala evening registrants will leave the Institut Pasteur promptly after the last conference on Tuesday evening. Due to space limitations, we are unable to accomodate guests for lunches or coffee breaks. Guests have not permitted access to the scientific sessions. Comme indiqué dans le programme, les déjeuners et les pauses-café sont compris dans le montant des droits d’inscription, ainsi que le cocktail d’inauguration du lundi 6 novembre. Les autres dîners ne sont pas compris, mais le programme laisse le temps nécessaire pour dîner dans Paris selon le choix de chacun. Une soirée de gala est prévue le mardi 7 novembre à bord du « Grand Pavois » pour un dîner-croisière, un moment inoubliable au cœur de Paris. Les droits d’inscription de ce dîner ont été acquittés avant le congrès, et les tickets remis 5 avec les autres documents lors de l’enregistrement. Le transport sera organisé pour la soirée de gala à 19h15 précises devant l’Institut Pasteur le mardi soir. En raison du nombre limité de places, nous ne sommes pas en mesure de prendre en charge les accompagnants lors des déjeuners ou pauses-café. Les accompagnants n’ont pas accès aux séances scientifiques. 6 INTRODUCTION TRANSCRIPTOME 2000 is part of a series of conferences and coordination workshops initiated by the founders of the IMAGE Consortium (Charles Auffray, Greg Lennon, Mihael Polymeropoulos, Bento Soares) with active support from the DOE (Marvin Stodolsky) and numerous public and private organizations. TRANSCRIPTOME 2000 follows the conference organized last year in Japan by Nobuo Nomura and Michio Oishi at the Kazusa DNA Research Institute. Speakers from all over the world will discuss and debate the most recent advances in the emerging field of functional genomics, the study of biological systems based on global knowledge of genomes, transcriptomes and proteomes. Transcription of DNA into RNA followed by translation of messenger RNA into proteins are the fundamental mechanisms underlying the functioning of living organisms. The discovery of reverse transcription of mRNA into DNA allowed the development of cDNA cloning, one of the fundamental techniques of genetic engineering described for the first time 25 years ago. Some of the pioneers who contributed to the elucidation of these mechanisms will present an historic overview of this great endeavour and their vision of the future. *** TRANSCRIPTOME 2000 s'inscrit dans la série de conférences et de réunions de coordination amorcée par les fondateurs du Consortium IMAGE (Charles Auffray, Greg Lennon, Mihael Polymeropoulos, Bento Soares) avec le soutien actif du Department of Energy américain (Marvin Stodolsky) et de nombreuses organisations publiques et privées. TRANSCRIPTOME 2000 fait suite à la conférence organisée l’an dernier au Japon par Nobuo Nomura et Michio Oishi au Kazusa DNA Research Institute. Lors de TRANSCRIPTOME 2000, des conférenciers venus du monde entier discuteront et débattront des avancées les plus récentes du champ émergent de la génomique fonctionnelle, l'étude des systèmes biologiques fondée sur la connaissance globale des génomes, des transcriptomes et des protéomes. La transcription de l'ADN en ARN, suivie de la traduction des ARN messagers en protéines sont les mécanismes fondamentaux qui sous-tendent le fonctionnement des êtres vivants. La découverte de la rétro transcription d'ARN en ADN a permis le développement du clonage d'ADNc, l'une des techniques fondamentales du génie génétique décrite pour la première fois il y a 25 ans. Certains des pionniers qui ont contribué à élucider ces mécanismes vous présenteront une rétrospective historique de cette grande aventure et leur vision du futur. Charles Auffray, Genexpress, CNRS, Villejuif, France Bento Soares, Université d’Iowa, Iowa City, USA Sumio Sugano, Université de Tokyo, Tokyo, Japon 7 SCIENTIFIC PROGRAM (PROGRAMME SCIENTIFIQUE) MONDAY, NOVEMBER 6, 2000 2:00 - 5:00 pm (14:00 - 17:00) LUNDI 6 NOVEMBRE 2000 Reception and Registration of Participants (Accueil et inscription des participants) Hall of C.I.S. Auditorium Auditorium, Hall du C.I.S.) 5:00 - 7:00 pm (17:00 - 19:00) Opening Session (Séance d'ouverture) 5:00 - 5:30 pm (17:00 - 17:30) Welcome Addresses (Allocutions de bienvenue) Philippe Kourilsky, Director General of Institut Pasteur, Paris, France Francis Galibert, on Behalf of the Director of the Life Sciences Department (représentant la Directrice du Département des Sciences de la Vie), CNRS, Paris, France Hervé Chneiweiss, on Behalf of the French Minister of Research (représentant le Ministre de la Recherche), Paris, France 5:30 - 7:00 pm (17:30 - 19:00) Session 1 : 25 Years of cDNA Research (1ere séance: 25 ans de recherche avec les ADNc) Chairperson (Modérateur) : Federico Mayor, Fundacion para una Cultura de Paz, Madrid, Spain 5:30 - 6:00 pm (17:30 - 18:00) The Human Genome, Health and Bioethics (Le génome humain, la santé et la bioéthique) Federico Mayor, Fundacion para una Cultura de Paz, Madrid, Spain 6:00 - 6:30 pm (18:00 - 18:30) From the « Messenger » Saga to the Transcriptome Era (De la saga du « messager » à l’ère du transcriptome) François Gros, Académie des Sciences, Paris, France 6:30 - 7:00 pm (18:30 - 19:00) TRANSCRIPTOME 2000 : From Functional Genomics to Systems Biology (TRANSCRIPTOME 2000 : De la génomique fonctionnelle à la biologie des systèmes) Charles Auffray, Genexpress, CNRS, Villejuif, France 7:00 - 8:30 pm (19:00 - 20:30) Welcome Cocktail (Cocktail de bienvenue) 8 TUESDAY, NOVEMBER 7, 2000 8:30 - 1:00 am (8:30 - 13:00) MARDI 7 NOVEMBRE 2000 Session 2 : cDNA Cloning and Sequencing (2e séance : Clonage et séquençage d’ADNc) Chairpersons (Modérateurs) : Bento Soares, University of Iowa, Iowa City, USA, and Sumio Sugano, University of Tokyo, Tokyo, Japan 8:30 - 9:00 am Novel Approaches for Gene Discovery and Selection of Full-Length cDNAs for the Mammalian Gene Collection Program (MGC) (Nouvelles approches pour la découverte de gènes et la sélection d’ADNc complets pour le programme « collection de gènes mammaliens » MGC) Maria de Fatima Bonaldo, University of Iowa, Iowa City, IA, USA 9:00 - 9:30 am Analysis of Newly Identified Human cDNAs encoding Large Proteins : Integration of the Genomic and cDNA Sequence Data to Move Beyond the Identification of Transcribed Sequences (Analyse d’ADNc humains nouvellement identifiés encodant de grandes protéines : Intégration des données génomiques et des séquences d’ADNc pour dépasser le stade de l’identification des séquences transcrites) Omahu Ohara, Kazusa DNA Research Institute, Kisarazu, Japan 9:30 - 10:00 am Over 2.4 Million Expressed Sequence Tags (ESTs) and Counting..… (Plus de 2,4 millions d’étiquettes de séquences exprimées, comptage en cours…) Sandra Clifton, Washington University, St Louis, MD, USA 10:00 - 10:30 am Sequencing and Analysis of Full-Length cDNAs in the German cDNA Network (Séquençage et analyse d’ADNc complets dans le réseau allemand) Stefan Wiemann, DKFZ, Heidelberg, Germany 10:30 - 11:00 am Coffee Break (Pause café) 11:00 - 11:30 am NEDO cDNA Sequencing Project (Le projet NEDO de séquençage d’ADNc) Sumio Sugano, University of Tokyo, Tokyo, Japan 11:30 - 12:00 pm From Leukemia Patient to Full-Length cDNA Sequence (Du patient leucémique à la séquence complète d’ADNc) Judy Margolin, Baylor College of Medicine, Houston, TX, USA 12:00 - 12:30 pm RIKEN Mouse cDNA Encyclopedia Project (Le projet d’encyclopédie des ADNc de Souris au RIKEN) Jun Kawai, RIKEN, Tsukuba, Japan 12:30 - 1:00 pm (12:30 - 13:00) Shotgun Sequencing the Human Transcriptome with Open Reading Frame ESTs (ORESTES) (Séquençage par mitraillage du transcriptome humain à l’aide d’étiquettes des cadres ouverts de lecture de séquences exprimées) Andrew Simpson, Ludwig Institute for Cancer Research, Sao Paulo, Brazil 1:00 - 2:30 pm (13:00 - 14:30) Lunch buffet (Déjeuner buffet) 9 2:30 - 6:30 pm (14:30 - 18:30) Session 3 : cDNA Clustering and Genome Annotation (3e Séance : Regroupement d’ADNc et annotation du génome) Chairpersons (Modérateurs) : Winston Hide, SANBI, Capetown, South Africa and Doron Lancet, Weizmann Institute, Rehovot, Israel 2:30 - 3:00 pm (14:30 - 15:00) Clustering Enriches the I.M.A.G.E. Collection (Le regroupement enrichit la collection I.M.A.G.E) Peg Folta, Department of Energy, Livermore, CA, USA 3:00 - 3:30 pm An Alternate Transcription Map of Chromosome 22 Based on Verified Transcript Variation (Une carte transcriptionnelle alternative du chromosome 22 fondée sur la vérification de la variabilité des transcrits) Winston Hide, SANBI, Capetown, South Africa (15:00 - 15:30) 3:30 - 4:00 pm (15:30 - 16:00) UniGene, the Genome, and the Transcriptome (UniGene, le génome et le transcriptome) Lukas Wagner, NCBI, Bethesda, MD, USA 4:00 - 4:30 pm (16:00 - 16:30) The TIGR Gene Indices : Reconstruction and Annotation of Transcribed Sequences (Les index de TIGR : Reconstruction et annotation de séquences transcrites) John Quackenbush, TIGR, Gaithersburg, MD, USA 4:30 - 5:00 pm (16:30 - 17:00) Coffee Break (Pause café) 5:00 - 5:30 pm (17:00 - 17:30) Searching for the Protein Coding Genes on the Human Genome Sequence (Recherche des gènes encodant des protéines dans la séquence du génome humain) William Saurin, Génoscope, Evry, France 5:30 - 6:00 pm (17:30 - 18:00) Harvesting the Human Genome : A World-Wide Endeavor (Moissonner le génome humain : Un effort international) Doron Lancet, Weizmann Institute, Rehovot, Israel 6:00 - 6:30 pm (18:00 - 18:30) Reconstructing the Human Transcriptome from the 3' End (Reconstruction du transcriptome humain à partir de l'extrémité 3’) Philipp Bucher, Swiss Institute of Bioinformatics, Lausanne, Switzerland 8:30 - 12:00 am (20:30 - minuit) Gala Evening (Soirée de Gala) 10 WEDNESDAY, NOVEMBER 8, 2000 8:30 - 1:00 pm (8:30 - 13:00) MERCREDI 8 NOVEMBRE 2000 Session 4 : Transcriptome Analysis (4e séance : L’analyse de transcriptomes) Chairpersons (Modérateurs) : Bertrand Jordan, Génopole, CNRS/INSERM, Marseille, France and Roger Bumgarner, University of Washington, Seattle, USA 8:30 - 9:00 am Gene Expression Profiling of Primary Breast Carcinomas Using Nylon Arrays of Candidates Genes (Profilage de l'expression génique de carcinomes primaires du sein en utilisant des réseaux de gènes candidats sur Nylon) Catherine Nguyen, Génopole, CNRS/INSERM, Marseille, France 9:00 - 9:30 am Statistical Analysis, Normalization and Reproducibility of Microarray Data (Analyse statistique, normalisation et reproductibilité des données collectées avec des microréseaux) Roger Bumgarner, University of Washington, Seattle, WA, USA 9:30 - 10:00 am DNA Array Applications in a Diverse Academic Setup (Applications des réseaux d’ADN dans un environnement académique divers) Shirley Horn-Saban, Weizmann Institute, Rehovot, Israel 10:00 - 10:30 am Explaining Gene Expression Clusters Through Integration of Genome Annotation and Microarray Data (Explication des regroupements de gènes exprimés par l'intégration de l'annotation du génome avec les données collectées avec des microréseaux) Terry Gaasterland, Rockefeller University, New York, NY, USA 10:30 - 11:00 am Coffee Break (Pause café) 11:00 - 11:30 am Exploring Human Transcriptomes Using cDNA Macro and Microarray Technologies (Exploration de transcriptomes humains à l’aide des technologies de macro et de microréseaux d’ADNc) Sandrine Imbeaud, Genexpress, CNRS, Villejuif, France 11:30 - 12:00 pm Analysis of Gene Expression in Xenopus Embryos Identifies Metabolic Pathways, Predicts Gene Function and Provides a Global View of Embryonic Patterning (L’analyse de l’expression des gènes chez les embryons de Xénope identifie des voies métaboliques, prédit la fonction des gènes et fournit une vue globale de la structuration de l’embryon) Nicolat Pollet, DKFZ, Heidelberg, Germany 12:00 - 12:30 pm Adapter-Tagged Competitive PCR and its Application to the Mammalian Central Nervous System (L’amplification par PCR compétitive à l’aide d’adapteurs étiquetés et son application à l’étude du système nerveux central mammalien) Kikuya Kato, Nara Institute of Science and Technology, Nara, Japan 12:30 - 1:00 pm (12:30 - 13:00) Storing, Managing and Analyzing Microarray Data (Stockage, gestion et analyse des données collectées avec des microréseaux) Alvis Brazma, European Bioinformatics Institute, Hinxton, UK 1:00 - 2:30 pm (13:00 - 14:30) Lunch Buffet (Déjeuner buffet) 11 2:30 - 7:00 pm (14:30 - 19:00) Session 5 : Transcriptomes, Proteomes and Systems Biology (5e séance : Transcriptomes, protéomes et biologie des systèmes) Chairperson (Modérateur) : Margaret Buckingham, Institut Pasteur, Paris, France 2:30 - 3:00 pm (14:30 - 15:00) Analysis of Genomes and Transcriptomes in Terms of the Occurrence of Protein Parts and Features (Analyse de génomes et de transcriptomes en terme d’occurrence des caractéristiques et des éléments des protéines) Mark Gerstein, Yale University, New Haven, CT, USA 3:00 - 3:30 pm (15:00 - 15:30) Bridging Genomics with Proteomics: DNA and Protein Analysis on Arrays (Relier la génomique à la protéomique : analyse d’ADN et de protéines sur des réseaux) Holger Eickhoff, Max-Planck Institute, Berlin, Germany 3:30 - 4:00 pm Proteomics and the Challenge of Hydrophobic Membrane Proteins : the Example of Chloroplast Membranes (La protéomique et le défi des protéines de membrane hydrophobes : l’exemple des membranes des chloroplastes) Jacques Joyard, Génopole, CEA, Grenoble, France (15:30 - 16:00) 4:00 - 4:30 pm (16:00 - 16:30) Tools for Functional Genomics Using Transcript Profiles and Proteomics (Outils pour la génomique fonctionnelle utilisant les profils d’expression et la protéomique) Joakim Lundeberg, Royal Institute of Technology, Stockholm, Sweden 4:30 - 5:00 pm (16:30 - 17:00) Coffee Break (Pause café) 5:00 - 5:30 pm (17:00 - 17:30) Proteomic Strategies in Cancer (Stratégies protéomiques en cancérologie) Julio Celis, University of Aarhus, Denmark 5:30 - 6:00 pm (17:30 - 18:00) Proteomics Databases (Bases de données protéomiques) Amos Bairoch, Swiss Institute of Bioinformatics, Geneva, Switzerland 6:00 - 6:30 pm (18:00 - 18:30) The Physiome Project : Integrating from Genomics to Function or vice versa (Le projet physiome : Intégration de la génomique à la fonction ou vice versa) James B. Bassingthwaighte, University of Washington, Seattle, WA, USA 6:30 - 7:00 pm (18:30 - 19:00) DOE Genome Scale Expression Efforts (Les projets d’étude d’expression à l’échelle du génome au DOE) Marvin Stodolsky, Department of Energy, Germantown, MD, USA 12 THURSDAY, NOVEMBER 9, 2000 8:30 - 1:00 pm JEUDI 9 NOVEMBRE 2000 Session 6 : Applications in Biology, Biotechnology and Medicine (6e séance : Applications en biologie, biotechnologie et médecine) Chairpersons (Modérateurs) : Michel Caboche, Génoplante, INRA, Evry, France and Greg Lennon, VeraGene, Potomac, MD, USA 8:30 - 9:00 am Production and Quality Assessment of Full-Length-Enriched cDNA Libraries and their use in transcriptome profiling using microarrays (Production et contrôle-qualité de banques d’ADNc enrichies en clones complets et leur utilisation pour le profilage de transcriptomes avec des microréseaux) Claudio Schneider, Laboratorio Nazionale CIB, Trieste, Italy 9:00 - 9:30 am Gene Expression Profiling of 3 Solid Tumors (Profilage de l'expression génique de 3 tumeurs solides) Annemarie Poustka, DKFZ, Heidelberg, Germany 9:30 - 10:00 am The Cancer Gene Anatomy Program (CGAP) and the Mammalian Gene Collection (MGC) : cDNA Resources for the Community (Le programme « anatomie des gènes et cancer » (CGAP) et la collection de gènes mammaliens (MGC) : Ressources d'ADNc pour la communauté) Robert Strausberg, National Cancer Institute, Bethesda, MD, USA 10:00 - 10:30 am Expression Profiling in Patient Tissue for Insights into Etiology and Pathophysiology of Progressive Disease (Le profilage d'expression dans les tissus de patients pour obtenir un aperçu de l'étiologie et de la physiopathologie des maladies progressives) Eric Hoffman, George Washington University, Washington, DC, USA 10:30 - 11:00 am Coffee Break (Pause café) 11:00 - 11:30 am Molecular Immunology 2000: Transcriptional Profiling of Regulatory Vα24JαQ T cells from Identical Twins Discordant for Type I Diabetes, and a New Mechanism for Regulation of the Immune Response (Immunologie moléculaire 2000: profilage transcriptionnel de cellules T régulatrices V α24J αQ de jumeaux identiques discordants pour le diabète de type 1, et un nouveau mécanisme pour la régulation de la réponse immune) Jack Strominger, Harvard University, Cambridge, MA, USA 11:30 - 12:00 pm A new highly sensitive microarray approach for differential screening using radioactive probes (Une nouvelle approche hautement sensible utilisant des microréseaux pour le criblage différentiel à l’aide de sondes radioactives) Jacques Mallet, CNRS, Paris, France 12:00 - 12:30 pm High Throughput SNP scoring using Rolling Circle Amplification (Enregistrement de SNP à haut débit en utilisant l'amplification par cercle déroulant) Tony Smith, Amersham Pharmacia Biotech, Amersham, Buckinghamshire, UK 12:30 - 1:00 pm (12:30 - 13 00) Populations, SNPs and Chips in Common Disease Mapping (Populations, SNP et biopuces pour la cartographie des maladies communes) Andres Metspalu, University of Tartu, Tartu, Estonia 1:00 - 2:30 pm (13:00 - 14:30) Lunch Buffet (Déjeuner buffet) 13 2:30 - 4:30 pm (14:30 - 16:30) Session 7 : Future Perspectives (7e séance : Futures perspectives) Chairperson (Modérateur) : Zhu Chen, Shanghai University, Shanghai, China 2:30 - 3:00 pm (14:30 - 15:00) Gene Identification Projects at TIGEM (Projets d’identification des gènes au TIGEM) Giuseppe Borsani, TIGEM, Napoli, Italy 3:00 - 3:30 pm (15:00 - 15:30) The Human Genome, Transcriptome Analysis, Medicine and Cancer (Le génome humain, l'analyse de transcriptomes, la médecine et le cancer) Gert-Jan van Ommen, Leiden University, Leiden, The Netherlands 3:30 - 4:00 pm Acute Promyelocytic Leukemia: A Model for Gene Transcriptional Regulationbased (15:30 - 16:00) Therapy (Leucémies aiguës promyélocytaires : un modèle de thérapie fondée sur la régulation transcriptionnelle des gènes) Zhu Chen, Shanghai University, Shanghai, China 4:00 - 4:30 pm (16:00 - 16:30) Transcriptional Regulation of Cell Cycle Regulatory and Apoptosis Genes by DNA Damage Induced by Camptothecin: Microarray Analysis of Dose- and TimeDependent Effects (Régulation transcriptionnelle de gènes impliqués dans la régulation du cycle cellulaire et l’apoptose sous l’effet de lésions de l’ADN induites par la camptothécine : Analyse des effets temporels et dose-dépendants à l’aide de microréseaux) Yves Pommier, National Cancer Institute, Bethesda, MD, USA 4:30 - 5:00 pm (16:30 - 17:00) Coffee Break (Pause Café) 5:00 - 7:00 pm (17:00 - 19:00) Session 8 : Ethical, Legal and Economical Issues (8e séance : Questions éthiques, légales et économiques) Chairperson (Modérateur) : Rebecca Eisenberg, University of Michigan, Ann Arbor, MI, USA 5:00 - 5:30 pm (17:00 - 17:30) Patentability of Life and Ethics (Brevetabilité de la vie et éthique) Noëlle Lenoir, Conseil Constitutionnel, Paris, France 5 30 - 6:00 pm (17:30 - 18:00) Patenting Genome Research Tools and the Law (Le brevetage des outils de la recherche génomique et le droit) Rebecca Eisenberg, University of Michigan, Ann Arbor, MI, USA 6:00 - 6:30 pm (18:00 - 18:30) Legal Problems Related to Gene Patents (Problèmes légaux liés aux brevets sur les gènes) Joseph Straus, MPI for Foreign and International Patent, Munich, Germany 6:30 - 7:00 pm (18:30 - 19:00) From Functional Genomics to Integrated Economy in Biotechnology (De la génomique fonctionnelle à l’intégration économique en biotechnologie) Bernard Pau, CNRS, Montpellier, France 7:00 - 7:30 pm (19:00 - 19:30) Closing Address (Allocution de clôture) Geneviève Berger, Director General of CNRS, Paris, France 14 SPEAKERS ABSTRACTS MONDAY, NOVEMBER 6, 2000 SESSION 1: 25 YEARS OF CDNA RESEARCH The Human Genome, Health and Bioethics Federico Mayor Fundacion para una Cultura de Paz, Madrid, Spain Federico Mayor, President of « Fundacion para una cultura de paz » studied Pharmacy in Madrid, graduating with a Doctorate. He obtained the Chair of Biochemistry at the Faculty of Pharmacy of the University of Granada, and from 1968-72 was Rector. In 1974 he co-founded and directed the Centre for Molecular Biology « Severo Ochoa » of the Higher Council for Scientific Research until he was appointed Deputy Director-General of UNESCO in 1987 and elected Director General in 1993. His scientific work includes more than 80 articles on brain metabolism, peri-natal biochemistry and molecular pathologies of the new-born. He has directed and supervised more than 40 doctoral theses and is a member of a score of international scientific academies and associations. He has also published three volumes of poetry. From the « Messenger » Saga to the Transcriptome Era François Gros Académie des Sciences, Paris, France François Gros is Secrétaire perpétuel of the Academy of Sciences and Honorary Professor at both the Collège de France and the Pasteur Institute in Paris. His scientific work has been devoted to molecular biology. Since the very beginning of his research, he has been interested in the way genes function and in gene regulation, but also in protein biosynthesis. He has been awarded the Gold Medal of the Pontifical Academy, and a number of prizes: Lacassagne Foundation, Charles Leopold Meyer Prize and von Humboldt Foundation. TRANSCRIPTOME 2000 : From Functional Genomics to Systems Biology Charles Auffray Genexpress, CNRS, Villejuif, France Over the past decade, large-scale systematic sequencing of cDNA libraries has provided an initial description of the transcriptome, the entire set of gene transcripts of man and several animal and plant organisms. Speakers will discuss progress in full-length cDNA cloning and quality control in large-scale sequencing programs. They will also address the challenges of clustering the information collected to help genome annotation at the time when the complete or working draft of genome sequences are becoming available. Differential hybridization using arrays of cDNA clones is as old as cDNA cloning. Recent advances in materials, optics, electronics, robotics, chemistry, genetic engineering and informatics have permitted the development of integrated platforms allowing the parallel study of tens of thousands of transcripts in a variety of normal and pathological conditions. Speakers will discuss the challenges in quality assessment, formatting, comparing and validating the large amount of data collected using various platforms, the need for a public repository of cDNA 15 array and in situ hybridization data, and similar problems which are arising in the study of proteomes, the entire sets of proteins which are governing the functioning of cells, organs and organisms. The emergence of functional genomics represents a transition from mostly analytical, hypothesis-driven research to a complementary global, exploratory mode that will ultimately bridge understanding of chemistry and physiology by integrating knowledge of the fine details of all molecular structures and mechanisms together with their natural or pathological variations. Speakers will illustrate the impact of this trend in the study of the biology of the immune, muscular and nervous systems, and that of cancer and cardiovascular diseases. Advances in genome research provide everyday a deeper insight into the mechanisms of life, thereby promising to change our vision of the world and of ourselves, and to speed the understanding and treatment of diseases. Public outreach programs and intense media coverage are triggering both growing public awareness and concern. There is a need for both open, universal dissemination of genomics knowledge and for promotion of innovation through mechanisms ensuring the sustainable development of new diagnostics, drugs and treatments. International experts will discuss the ethical, legal, and economical issues involved. TUESDAY, NOVEMBER 7, 2000 SESSION 2: CDNA CLONING AND SEQUENCING Novel Approaches for Gene Discovery and Selection of Full-Length cDNAs for the Mammalian Gene Collection Program (MGC) 1 1 1 1 1 1 Maria de Fatima Bonaldo , Sergey Malchenko , Brian Berger , Irina Koroleva , Einat Snir , Tamara Kucaba , 2 1 2 1, 3 Chad Roberts , Todd Scheetz , Tom Casavant & Marcelo Bento Soares 2 3 1 Departments of Pediatrics, Electrical and Computer Engineering, Physiology and Biophysics, The University of Iowa, Iowa City, Iowa 52242, USA In the last three years, we have identified approximately 50,000 unique rat cDNAs/ESTs (“Rat Gene Discovery and Mapping Program”), 36,000 unique mouse cDNAs (Brain Molecular Anatomy Project), and 25,000 human cDNAs (Cancer Genome Anatomy Project). Our gene discovery strategy, based on the generation of ESTs from serially subtracted normalized libraries, has proven most successful and it has enabled us to achieve unprecedented rates of EST discovery. However novel technologies are now needed for identification of the rarest mRNAs, often not represented in conventional cDNA libraries, and for the synthesis, cloning and selection of complete collections of full-length mammalian cDNAs. To facilitate discovery of rare mRNAs, with support from the U. S. Department of Energy, we have developed technology for preferential cloning of mRNAs not represented (or under-represented) in normalized libraries derived from the same starting RNA population. We have applied this method to construct a mouse hippocampus cDNA library enriched for rare mRNAs. Preliminary characterization of this library by sequencing and by microarray hybridization indicates that a significant enrichment has been achieved. The development of a comprehensive collection of full-length cDNAs by the Mammalian Gene Collection Program will require not only construction of full-length-enriched libraries but also the development of technology for (a) identification/selection of full-length cDNAs in full-length-enriched libraries, and (b) construction of subtracted libraries enriched for novel full-length cDNAs (i.e., those not yet represented in the Mammalian Gene Collection Program). We will present results of our preliminary attempts (supported by NIH) to select full-length clones from full-length-enriched libraries and to generate subtracted full-length-enriched libraries. 16 Analysis of Newly Identified Human cDNAs encoding Large Proteins : Integration of the Genomic and cDNA Sequence Data to Move Beyond the Identification of Transcribed Sequences Osamu Ohara Department of Human Gene Research, Kazusa DNA Research Institute, Kisarazu, Japan Over the past six years, we have been studying the protein-coding sequences of unidentified human genes. Our cDNA analysis is unique in that we have focused our sequencing efforts on large cDNA clones (>4 kb) encoding large proteins (>50 kDa). This approach has been taken because large cDNAs are not extensively analyzed and large proteins are often encoded by large cDNAs and frequently involved in various mammalian cellular processes. For this purpose, we constructed a set of strictly size-selected cDNA libraries which enabled us to isolate clones with insert sizes of interest on a random sampling basis. These clones were then further selected according to novel sequences at their 5’ and 3’ ends and by their protein-coding potentials, prior to complete sequencing. The cDNA sequencing was done by a shotgun method with 5- to 10-fold sequence redundancy. Our study has concentrated on cDNAs isolated from the human brain and the number of cDNA sequences thus identified and designated by a systematic gene code containing KIAA plus a four-digit number has reached 2000 to date, with an average size of approximately 5 kb. Since the number of genes encoding large proteins is expected to be only about 10% of the total number of human genes, the number of KIAA genes in the public databases (1643 entries, August 2000) is quite significant considering it represents genes expressed only in the brain. As the human genome sequencing project enters the last phase in which the draft sequences are finalized, it has become even more evident that cDNA sequence data will serve an important complementary role for the interpretation of the sequence of the human genome. Furthermore, the cDNA data offers a variety of information regarding post-transcriptional events, such as alternative splicing and RNA editing. On the other hand, the genome sequence can help considerably with the resolution of problems in cDNA technology, most of which originate form the fact that cDNAs are nothing but artificial copies of mRNAs. Therefore, integration of our cDNA sequence data with the publicly available genomic sequence data is an urgent and crucial task for us, and will enable us to move beyond the identification of transcribed sequences. Over 2.4 Million Expressed Sequence Tags (ESTs) and Counting..… 1 1 2 1 3 1 Sandra W. Clifton , Deana Pape , Marco Marra , LaDeana Hillier , Zhengyan Kan , Jarrett Glasscock , 1 1 1 Raymond Yeh , Warren Gish , and the Washington University GSC EST Sequencing Group 1. Washington University School of Medicine, St. Louis, MO, USA 2. British Columbia Cancer Research Centre, Vancouver, British Columbia, Canada 3. Washington University, St. Louis, MO, USA The Genome Sequencing Center at Washington University School of Medicine, has contributed over 2.4 million single pass sequences to the public databases, as a result of our participation in several expressed sequence tag (EST) projects. The projects include three human, one mouse, one soybean, two parasitic nematodes, two SAGE (Serial Analysis of Gene Expression), one moss, one Toxoplasma gondii (including Sarcocystis neurona and Neospora caninum), and one Eimeria tenella. We also have completed a Genome Sequence Survey (GSS) project for Leishmania major. The Human ESTs have been used, in conjunction with the human genome mapping effort, also being performed at Washington University, as one of the tools to anchor contigs for the human genome fingerprint map. In addition, many ESTs have been aligned to the nearly completed human genomic sequence, using a software tools developed at Washington University, the Transcript Assembly Program (TAP) and Eugene. High throughput SNP mining of the human transcriptome has been aided by another automated software pipeline using the tool POLYBAYES, also developed at Washington University School of Medicine. The mouse ESTs are being used for fingerprint mapping of the mouse genome. Zebrafish (Danio rerio) is likely to be one of the next model organisms chosen for genome sequencing. The zebrafish ESTs being produced at the Genome Sequencing Center (GSC) are already being used, in addition to other resources, to build a marker dense physical map to provide candidate genes for use in positional cloning and for ORF recognition in cloning of insertion sites. 17 Identification of mouse and zebrafish homologs of human genes by EST sequencing will facilitate the functional analysis of these genes not feasible in humans. The mapping progress and software tools will be discussed. For further information, see: http://genome.wustl.edu/gsc/esthmpg.html Sequencing and Analysis of Full-Length cDNAs in the German cDNA Network Stefan Wiemann1, Bernd Weil, Ruth Wellenreuther, Sabine Krieger, Wilhelm Ansorge, Michael Böcher, Helmut Blöcker, Helmut Blum, Andreas Düsterhöft, Jürgen Lauber, Andreas Beyer, Karl Köhrer, Christian Gruber, Hans-Werner Mewes, Brigitte Obermaier, Birgit Ottenwälder, Dagmar Heubner, Rolf Wambutt, Jeremy Simpson, Rainer Pepperkok, Annemarie Poustka 1Molecular Genome Analysis, German Cancer Research Center, Im Neuenheimer Feld 506, D-69120 Heidelberg, Germany, and the German cDNA Network We have formed a network in the frame of the German Genome Project aiming at the generation and sequencing of novel full-length cDNAs, and the comprehensive functional analysis the deduced proteins. The project started in September 1997. Over 3,600 cDNAs (> 8.8 Mb) have been sequenced since. We use the set of fully sequenced clones in combination with the EST-sequenced clones to generate a master set of full-length clones for employment in subsequent functional analysis. All sequences are first analyzed for possible protein function in silico. To systematically characterize function of the encoded proteins in vivo, we initially determine the subcellular localization of the proteins. A progress report of the network activities and the achievements will be presented. In future projects cellular assays will be applied to comprehensively unravel the role of most or all proteins in view of their function(s), the pathways they are involved in and possible disease relations. NEDO cDNA Sequencing Project Sumio Sugano Laboratory of Genome Structure Analysis, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Japan cDNA is an important resource for identifying exons or new genes and expressed sequence tag (EST) has been used extensively to annotate the finished sequence of human chromosome number 21 and 22 and the “draft” sequence of the human genome. Although the great usefulness of EST resource, there are some limitations in ESTs which are essentially partial sequences with limited accuracy. The entire sequences of the full-length cDNAs are useful for determining the exact mRNA start site, the exact pattern of splicing and the entire coding region. Full-length cDNAs are also important resource the functional analysis. The Cap targeted selection procedure of full-length cDNAs developed by us and others improved the content of full-length cDNA clones within cDNA libraries. These libraries together with the improvement of sequencing techniques allowed us to start a project of collecting and sequencing of putative full-length cDNA clones. Here we like to report our result on the first 8000 sequences. 18 From Leukemia Patient to Full-Length cDNA Sequence Margolin J, Villalon, D.K., Luna R.A., Tsang Y, Yu W., Bouck J., Wu G., Hale S., Richard Gibbs Human Genome Sequencing Center and Department of Pediatrics, Baylor College of Medicine, Houston, Texas, USA We have constructed a scalable pathway extending from pathologic specimens to full-length sequenced cDNA clones. This begins with very high-risk pediatric leukemia patients who have required pheresis in order to debulk their tumor prior to the initiation of chemotherapy. This provides a large (often > 10e10 leukemic cells) specimen from which mRNA is purified. The Cap-Trapping procedure is used for isolating full-length (FL) transcripts from this mRNA pool. To avoid the loss of the longer inserts, a lambda cre-lox replacement vector was used for cloning and EST sequencing. The patients are cared for and the libraries constructed at the Texas Children’s Cancer Center, Baylor College Medicine. The sequencing and informatics are performed at the Baylor College of Medicine-Human Genome Sequencing Center (BCM-HGSC). The EST data are processed through a series of new informatic tools which include: vector trimming, analysis of sequence quality, BLAST searches and pipeline redundancy checks. Clones are re-arrayed and sent for concatenated cDNA library construction and sequencing at the BCM-HGSC. To date 26,280 ESTs and >1,000 FL-cDNA sequences have been completed. An additional 1300 FL-clones are in sequence assembly and 2,000 FL clones in concatenated cDNA sequencing (CCS) library construction. Current plans to increase the production pipeline to >1000 FLcDNA sequences per month are in place. Efforts to further automate EST analysis, selection of clones for CCS, and pre-processing of CCS reads will be described. These efforts represent a new way to characterize a disease with high throughput cDNA sequencing, which in addition to the sequence information, results in the generation of full-length cDNAs arrayed in a manner suitable for future applications. RIKEN Mouse cDNA Encyclopedia Project Jun Kawai, Piero Carninci, and Yoshihide Hayashizaki Genome Exploration Research Group, RIKEN, Genomic Sciences Center (GSC) and Genome Science Laboratory, RIKEN, Tsukuba Institute, Core Research of Evolutional Science and Technology (CREST), Japan Science and Technology Corporation (JST), Tsukuba-shi, Ibaraki, 305-0074, Japan RIKEN is proceeding the mouse encyclopedia project, which consists of three phases; (1) collection of fulllength cDNAs, (2) sequencing of them, and (4) mapping them on the chromosomes. Full-length cDNA allows all subsequent functional characterization such as prediction of protein and function, protein-protein interaction, structural investigtion, expression analysis, protein expression, etc. Gene discovery based on one-pass sequencing of Cap-trapper full-length cDNA libraries has been over years very demanding. In fact, cloning efficiency of short cDNA is largely more effective than long cDNAs. Cloning rarely expressed genes and genes specifically expressed in restricted tissues constituted a technical challenge. To address these problems, we have developed techniques for the construction of large insert size, normalized/subtracted full-length cDNA libraries even from microdissected tissues that do not involve the use of PCR. We estimate that our clones cover about 60% of mouse genes. At the moment, more than 90% of Unigene clones contain a sequence from our clones, and a large part is constituted only from our clones. Our analysis based on the currently available data suggests also that the number of mouse genes exceeds 100,000. To determine the full-sequences efficiently, we are applying three sequencing strategies, One pass for short size clones (less than 1.5kb), Primer walking for the middle size clones (1.5 - 2.5kb), and shotgun sequencing for the long size clones (more than 2.5kb). So far, about 20,000 cDNA clones have been sequenced, of which average size is around1.2 kb long. Average length of ORF is 660 bp. Furthermore, we made an attempt to map in silico our cDNA sequences onto the human genome draft sequence. Results are suggesting novel gene candidates, which have not yet been predicted by the analysis of human genome sequences with aid of the exon predict programs. These genes could not be identified by any other method to date, and brought to our attention only by mapping the RIKEN clones to the human genome. The statistical features found from the comparative studies are also presented. 19 Shotgun Sequencing the Human Transcriptome with Open Reading Frame ESTs (ORESTES) Andrew J.G. Simpson and the members of the FAPESP/LICRHuman Cancer Genome Project consortium of ONSA, São Paulo, Brazil The high throughput partial sequencing of expressed human genes has generated a database of fundamental importance for defining the human transcriptome and annotating the human genome. The multiple coverage afforded by ESTs, is also permitting the identification of alternatively spliced transcript variants and single nucleotide polymorphisms. Traditional ESTs provide data from the extremities of transcripts. It is clear that we need a similar level of coverage from the central portions of human transcripts to exploit the EST concept as fully as possible. To this end, we have been producing ESTs that are known as Open Reading Frame ESTs, or ORESTES, which are generated using a low stringency PCR strategy. The technique both biases for the centre of transcripts and also partially normalizes the transcript population resulting in a quite distinct pattern of transcript sequences than that produced by conventional ESTs. Thus, in conjunction with conventional ESTs, ORESTES allow considerable progress to be made in covering the transcriptome by an essentially shotgun approach. To date we have generated in excess of 500,000 sequences derived from human tumours within the FAPESP/LICR-Human Cancer Genome Project that are being systematically deposited in GenBank. We are also actively pursuing the construction and validation of contigs, composed of ORESTES and conventional ESTs, which represent individual transcripts as our institutional contribution to the compilation of the complete human transcriptome. The work will be continued until 1-million ORESTES have been generated. We expect to achieve this benchmark by the end of 2000. Thereafter, we intend to continue to pursue transcript definition, by contig construction and validation, until all human transcripts have been characterized. The FAPESP/LICRHuman Cancer Genome Project is being pursued with financing from FAPESP and the Ludwig Institute for Cancer Research and is being undertaken by a consortium of more than 30 laboratories in the state of São Paulo, Brazil. The project is the largest ever in the history of life sciences research in Brazil. SESSION 3: CDNA CLUSTERING AND GENOME ANNOTATION Clustering Enriches the I.M.A.G.E. Collection Peg Folta Lawrence Livermore National Laboratory, Livermore, CA, USA Expression (I.M.A.G.E.) Consortium's underlying goal has been to provide the public with resources to advance the discovery of genes. At approximately three million clones, the Consortium manages the largest public collection of cDNAs, the gene-containing segments of DNA. Expressed Tag Sequences (ESTs) from the I.M.A.G.E. collection represent approximately 75% of the human dbEST database stored at the National Center for Biotechnology Information (NCBI). In order to reduce redundancy in the collection, discover new genes, and identify the best representative clone for each gene, the IMAGEne clustering tool was created. IMAGEne uses all EST and full-insert sequence originating from I.M.A.G.E. clones, along with EST sequences from The Institute of Genomic Research. IMAGEne first generates known gene clusters seeded by the NCBI RefSeq. Candidate gene clusters are then created based on remaining sequence homology and clone membership. Members of the clusters are ranked primarily according to size. Sophisticated query and display tools significantly increase the value of the clusters.The user can query against the clusters with a keyword, clone id, cluster id, GenBank accession number, or sequence. The user can select from the resulting clusters to see a Java display of all cluster members aligned to their associated known gene or consensus sequence(s) and an associated table of information (e.g. ids, links to GenBank, and library, size, and sequence verification). Candidate gene clusters often contain several contigs within a cluster, which represent 3' and 5' ends or alternative splice variants. IMAGEne currently reduces 1.65 million human sequences into 5,379 known gene clusters, 54,605 candidate gene clusters, and 147,970 viable singletons. Initial clustering was done on Lawrence Livermore National 20 Laboratory's Massively Parallel Processing computers, but updates to the clusters are performed on smaller computer systems. IMAGEne continues to be enhanced as the needs and resources of the community change. A non-collection, non-species specific version of IMAGEne will soon be available for collaborators to aid in the analysis of various organisms. Initial targets are white rot fungus, rice, and chicken. Sequence form the NIH's Mammalian Gene Collection Project will be used to augment the NCBI's RefSeq listing of known genes, creating a better representative view of known or putative genes. IMAGEne maintains a listing of clones that best represent each cluster. To promote gene discovery, the I.M.A.G.E. Consortium has significantly ramped their re-arraying efforts and to distribute full length and putative full length re-arrays in both mouse and human. For further information contact Peg Folta ([email protected]). This work was performed by Lawrence Livermore National Laboratory under the auspices of the U.S. Department of Energy Contract Number W-7405-Eng-48. An Alternate Transcription Map of Chromosome 22 Based on Verified Transcript Variation Win Hide, Janet Kelso, Tzu-Ming Chern, Peter van Heusden and Vladimir Babenko South African National Bioinformatics Institute, Bellville, South Africa The first pass analysis of data being generated by the genome projects has revealed that alternate transcription is a ubiquitous event, but reliable and verified transcript products are not yet available for complete understanding of the biology of alternate transcription. An analysis of transcript variation analysis of transcript variation as verified by relative comparison of clustered, processed ESTS with SwissProt, Chromosome 22 exons, mouse expressed transcripts and literature will be presented. An high fidelity assessment of the of internal exon skipping, exon repetition, and exon boundary variation will be discussed, together with a combined expression-based, and genome-based gene total estimate. UniGene, the Genome, and the Transcriptome Lukas Wagner National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA I will discuss the UniGene dataset in detail, with particular attention to artifacts which may arise in the course of preparing cDNA libraries and to insights and improvements obtained from the Human Genome. I will present results from large-scale comparison of ESTs with human genomic sequence, which will include estimates of the rate of alternative splicing and of the frequency of processed pseudogenesin the human genome. I will discuss methods for identifying cDNA clones containing the complete coding sequence. I will provide an overview of the Mammalian Gene Collection, an effort of the National Institutes of Health to produce such clones for public distribution. Relevant URLs: http://www.ncbi.nlm.nih.gov/UniGene/ http://www.ncbi.nlm.nih.gov/MGC/ 21 The TIGR Gene Indices: Reconstruction and Annotation of Transcribed Sequences Ingeborg Holt, Feng Liang, Geo Pertea, Svetlana Karamycheva and John Quackenbush The Institute for Genomic Research, Rockville, MD 20850, USA A goal of the Genome Project is identification of the complete set of genes within each organism and the role played by these genes in development and disease. The sequencing of Expressed Sequence Tags (ESTs) has provided a first glimpse of the collection of transcribed sequences in a variety of organisms, but significant additional information can be obtained by a thorough analysis of the EST data. TIGR’s analysis of the world’s collection of EST sequence data, captured in the TIGR Gene Indices (<http://www.tigr.org/tdb/tgi.shtml>.), provides assembled consensus sequences that are of high confidence and represent our best estimate of the collection of transcribed sequences underlying the ESTs. We maintain Gene Indices for a variety of species, including human, mouse, rat, Drosophila, zebrafish, rice, tomato, maize, soybean, and Arabidopsis. Collectively, the Gene Indices represent a unique resource for the comparative analysis of eukaryotic genes and may provide insight into gene function, regulation, and evolution. Using the Tentative Consensus (TC) sequences in the Gene Indices, we recently developed the TIGR Orthologous Gene Alignment (TOGA) database. TOGA is designed to identify orthologous genes, including a significant number of novel orthologs, and to serve as a “cross-reference” between genomes, linking the Gene Indices for the surveyed species. This database represents the most extensive catalog of eukaryotic orthologues available, providing a valuable resource for gene identification, elucidation of functional domains, and analysis of gene and genome evolution. We have also developed a variety of tools to integrate EST, genomic, and mapping data across species, allowing us to begin to realize the promise inherent in the completion of the sequencing of a variety of genomes. Searching for the Protein Coding Genes on the Human Genome Sequence William Saurin, Hugues Roest-Crollius, Olivier Jaillon, Alain Bernot, Lucie Friedlander, Abel Ureta-Vidal, Gabor Gyapay, Jean Weissenbach Genoscope and CNRS-FRE 2231, Evry, France Most of the human genome sequence is now availlable, but the identification of genes on the DNA sequence remains a difficult task. Various tools including similarity searches and exon/gene prediction programs are used for this purpose. Similarity searches are limited because the vertebrate protein/cDNA set is not yet complete. The use of a compact vertebrate genome (the pufferfish, Fugu rubripes) in protein similarity searches has proven to be extremely valuable for the identification of mammalian coding sequences. We built a search tool (called Exofish, for Exon FInding by Sequence Homology) that combines a specific setting of TBLASTX and a collection of random DNA sequences reads representing at present a third of the genome of the pufferfish Tetraodon nigroviridis (closely related to Fugu). Exofish detects sequence matches in 2/3 of several sets of human genes with a backround of false positive matches below 1%. Exofish has been successively applied to the December 99 and June 2000 versions of the « working draft » of the human genome. The latter analysis indicates that the protein coding gene number is now around 27,000, somewhat below our earlier estimates of 28,000-30,000. Exofish analysis of the Unigene set of human ETSs indicates that about 50% of the coding fraction of the human genome is still missing in the public sequence databanks. About 15% of the total number of exons detected by Exofish on human chromosome 22 fall outside annotated genes. A more detailed analysis of this annotation has been performed using new full length cDNA sequences. The results suggests however, that (1) most of the Exofish detected exons falling outside annotations actually belong to actual genes, (2) many of the annotated genes are not yet accurately delimitated and (3) a number of these genes will merge together. All these observations indicate that a valuable annotation of the human genome sequence still requires enlarged sets of additional sequence data (cDNAs, related genomes) for comparison purposes. In addition, 22 since any sequence analysis method suffers some limitations, it is essential to rely on a panel of tools that are as diverse as possible. Harvesting the Human Genome : A World-Wide Endeavor Doron Lancet Head, the Crown Human Genome Center, The Weizmann Institute of Science, Rehovot 76100, Israel As the Human Genome Project progresses, and the First Draft is nearly finished, a most urgent need is data integration and utilization. At our Genome Center, serving as a national laboratory for Israel, we have devised a model for a role for smaller projects worldwide, in which we combine gene discovery, integrated database development and a focused effort in the field of DNA microarrays. All three activities revolve around the development of a strong capacity in computational genomics and bioinformatics. For gene discovery, our main effort is through collaborations with the Israeli medical community. We strive to utilize the unique attributes of populations in Israel, constituting many genetically-defined ethnic groups. Six gene discovery projects are underway, some already culminated in the identification of a gene underlying a monogenic disease. We depart from a linkage map, and utilize the power of sequence data mining and integration to identify gene candidate. Large scale DNA sequencing is used to reveal mutations within all identified exons. In the Field of data integration, we have developed three software tools. The first is GeneCards, a novel functional genomics compendium combining automated data mining and context-related navigation support (Trends in Genetics 13: 163 (1997).Bioinformatics 14:656-664 (1998); URL: bioinfo.weizmann.ac.il/cards). It automatically identifies new HUGO approved gene symbols, extracts relevant information from multiple public databases, and creates a Card for the each gene. The second tool is Unified DataBase (UDB), in which novel concepts of genome-wide map and sequence integration are implemented (Genome Digest 4(3): 15 (1997), URL: bioinfo.weizmann.ac.il/udb). It merges method-specific genome maps with genomic sequence information (Sequence Based Repositioning). The third tool is GESTALT, (Bioinformatics 2000 May;16(5):482-483) a GEnomic Sequence Total Analysis and Lookup Tool (http://bioinfo.weizmann.ac.il/GESTALT). It constitutes a workbench for automatic integration and visualization of large-scale genomic sequence analyses. For DNA arrays, we have acquired an Affymetrix GeneChip system, which has already been successfully used with human, mouse and yeast expression arrays, as well as with the P53 mutation array. In parallel, we have established the complementary array spotting and scanning technology, to meet additional species and gene group requirements. The data are integrated with our in-house developed GeneCards. Reconstructing the Human Transcriptome from the 3' End Philipp Bucher Swiss Institute of Bioinformatics, Lausanne, Switzerland We present results from a systematic effort to reconstruct the structure of all human mRNAs from EST data. Our approach is different from others in that: (i) we use whenever possible the chromatograms rather then the EST sequences in Genbank as input data, (ii) we assemble genes from a collection of genomic exon sequences defined by ESTs rather than directly from error-prone EST sequences, (iii) we use a collection of well documented poly-adenylation sites as anchor points for gene assembly and bona fide gene delimiters, (iv) we use the mapping of ESTs to genomic sequences as a means to detect chimeric ESTs and other artifacts. Based on an analysis of the partial transcriptome obtained in this way, we present new estimates of the number of human genes, the frequency of alternative splicing and poly-adenylation, and the frequency of genes occurring in introns of other genes. 23 WEDNESDAY, NOVEMBER 8, 2000 MERCREDI 8 NOVEMBRE 2000 SESSION 4: TRANSCRIPTOME ANALYSIS Gene Expression Profiling of Primary Breast Carcinomas Using Nylon Arrays of Candidates Genes 1, 2 2 1, 3 2 François Bertucci , Rémi Houlgatte , Daniel Birnbaum and Catherine Nguyen 1. Laboratoire de Biologie des Tumeurs, Institut Paoli-Calmettes (IPC), IFR57, Marseille, France 2. TAGC, CIML Luminy U136 INSERM-CNRS UMR 145, IFR57, Marseille, France 3. Laboratoire d'Oncologie Moléculaire, U.119 Inserm, IFR57, Marseille, France Breast cancer is characterized by an important histoclinical heterogeneity requiring the identification of new parameters to predict the natural story of the disease and its sensitivity to treatment. A large-scale molecular characterization of breast cancer could help in this context. Analysis of gene expression on a large-scale is an increasingly recognized method for functional and clinical investigations based on the now extensive catalogue of known or partially sequenced genes. The accessibility of this approach can be enhanced by using readily available technology (cDNA arrays on Nylon with radioactive detection) and the IMAGE resource to assemble sets of targets. Using similar cDNA arrays and from only 5 micrograms of total RNA from each tumor sample, we then studied the quantitative expression levels of 176 candidate genes in 34 random primary breast carcinomas. Analysis of results was done along three directions: comparison of tumor samples, gene correlations, and correlations of molecular data with conventional histoclinical prognostic features. The study evidenced extensive heterogeneity of breast tumors at the transcriptional level. Hierarchical clustering identified two molecularly distinct groups of tumors with a different outcome not predicted by histoclinical parameters. Gene correlations were detected, suggesting a degree of organization of gene expression in breast tumors. No correlation was found with the age of patients, tumor size, histological type and grade. However, expression of genes was differential in tumors with lymph node metastasis and according to the estrogen receptor status ; ERBB2 expression was strongly correlated with the lymph node status and that of GATA3 with the presence of estrogen receptors. Our results identified new potential targets of carcinogenesis and ways to group tumors according to outcome. They show that the systematic use of cDNA array testing holds great promise to improve the prediction of prognosis and chemosensitivity of breast cancer and to provide new therapeutic targets. Statistical Analysis, Normalization and Reproducibility of Microarray Data Roger E. Bumgarner Department of Microbiology, University of Washington, Seattle WA, 98195 USA Over the past several years there has been a flood of literature containing microarray data. However, very few papers contain estimates of the error in the measurements. There are several reasons for this ranging from the expense associated with replicate measurements, to inexperience with the technology, to the lack of error estimates in many software packages. During the past three years we have investigated the reproducibility of microarray data and developed methods for data normalization and error analysis. In addition, we have developed a software package for the analysis of microarray data that: 1) Estimates the error in microarray measurements from single or multiple measurements, 2) Normalizes the 2 color data using a unique algorithm the accounts for non-linearities in the Cy3-Cy5 ratio 3) Allows one to select genes which are differentially expressed by a statistically significant amount and 4) Provides links between the data and publicly available data repositories (eg. GeneCards). This talk will cover the experimental and data manipulation methodologies we have developed, the reproducibility of microarray data and the necessity of error analysis in the successful use of microarrays. 24 DNA Array Applications in a Diverse Academic Setup (1) (3) (2) (2) (2) (1) Shirley Horn-Saban , Doron Ginsberg , Opher Gileadi , Orly Reiner , Tsviya Olender , Marilyn Safran , (2) (2) (1) Naama Barkai , Doron Lancet and Menachem Rubinstein (1) (2) The Crown Human Genome Center, Department of Biological Services and Department of Molecular (3) Genetics, Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel The Genome Center at the Weizmann Institute of Science focuses on genome methodologies and provides genome-related practical know-how as well as computing-intensive tools. The DNA array unit is part of this infrastructure. It aims at giving solutions to any research project involvong DNA arrays. For this purpose the unit harbors both state-of-the-art technologies, namely an Affymetrix GeneChip system as well as microarraying facilities (arrayer and scanner) to meet the needs of a broad range of Israeli scientists. The GeneChip system has proven successful for diverse research projects using human, mouse, rat and yeast expression arrays, as well as p53 re-sequencing arrays. Examplary projects involved the search for transcription factor target genes, such as the E2F in rat, crucial for G1/S transition regulation, and the TFIIH factor in yeast. Microarraying was integrated lately into the infrastructure, and was already used for printing various “homemade” libraries on both membranes and glass matrices, including a project attempting to detect genes that are differentially expressed in the dorsal telencephalon of Lis1 mutants compared to wild-type, using cDNA subtraction libraries obtained from Lis1 mutant mice. The data accumulated by DNA array experiments is integrated with additional bioinformatics software. A dedicated software was developed to link gene expression data with our in-house developed GeneCards (Bioinformatics 1998;14(8):656-64) and Unified Database (UDB) packages, as well as to EST tissue sources (UniGene). Additional methods of analysis, including deterministic annealing-based clustering are being used and further developed for system level analysis of expression data (PNAS USA 1999; 96(12):6745-50). To enhance microarraying facilities, we are currently in the process of amplifying the full yeast ORF library, as a first step in accumulating whole-genome commercial libraries for various organisms, to be used as genomic resources in a university research setting. Explaining Gene Expression Clusters Through Integration of Genome Annotation and Microarray Data Terry Gaasterland, Alexander Sczyrba, Jie Qin Rockefeller University, New York, NY, USA Gene expression studies elucidate potentially co-regulated genes. Sequence annotation of the corresponding clones and their gene products provides information to explain clusters of genes with patterns of gene expression of interest to the user. We evaluate clusters of co-expressed genes through gene ontology (GO) terms, metabolic pathway information, and genomic location of prokaryotic homologs. Putative operons, functional categories, and pathways that are implicated in a gene cluster are in turn evaluated for gene expression patterns of their corresponding genes. This bi-directional analysis using comparative genome annotations and gene expression data is a first step toward enabling users to evaluate the gene expression patterns of molecular sub-systems. We have prototyped the method as the « Cluster Explorer » module of the TANGO (Transcriptome Analysis of Genomes) system. Exploring Human Transcriptomes Using cDNA Macro and Microarray Technologies Sandrine Imbeaud Genexpress, CNRS ERS1984, 19, rue Guy Mocquet, 94801 Villejuif Cedex, France Exploration means looking around, observing, describing and mapping undiscovered territory, not testing theories or models. The fact that the entire human genome sequence is becoming available has being an exhilarating reminder that much of the natural world remains to be explored at the molecular level. The vast majority of the sequences generates code for genes of unknown function with, as yet, unknown role in disease. To understand gene function, it is helpful to know when and where it is expressed, and under what 25 circumstances the expression level is affected. The goal is to discover things we neither knew nor expected, and to see relationships and connections among the elements. Beyond questions of individual gene functions are questions concerning functional pathways and how cellular components work together to regulate and carry th out cellular processes. During the last half of the 20 century, the analysis of the regulation and function of genes has largely been driven by step-by-step studies of individual genes and proteins. In the past decade, a paradigm shift has emerged in which we are now able to produce large amounts of data about many genes in a highly parallel and rapidly serialized manner. For several years, our team has investigated human transcriptomes through high-throughput gene expression profiling developing high-density array approaches. Set of “hybridization signatures” is collected by quantitative analysis of signal intensities of thousands of arrayed cDNA clones, dotted onto membrane (macroarray) and glass slide (microarray), hybridized with complex cDNA targets derived by reverse transcription of mRNA from various tissues and processed using software tools specifically adapted. This made it possible to identify new collections of genes specifically or preferentially expressed in human brain, skeletal and cardiac muscle tissues, to explore muscular aging and differentiation and to dissect the molecular processes that could be defective in cancer cells. High-density array technology revealed to be a powerful tool capable of producing large gene expression data sets thus immediately raising a number of questions: What is the validity and quantitative accuracy of the observed changes? High attention has been focused on standardisation, quality control assessment and authentication procedures. Which genes should be prioritized for further study? How does one determine whether a given gene is a cause rather a consequence of diseased states? How can information collected from multiple samples and many people be organized that allows biological questions to be asked and answered? Together with the scientific community, we have now to determine how to manage and share these massive amounts of data. Analysis of Gene Expression in Xenopus Embryos Identifies Metabolic Pathways, Predicts Gene Function and Provides a Global View of Embryonic Patterning Nicolas Pollet and Christof Niehrs Departement of Molecular Embryology - DKFZ - Im Neuenheimer Feld 280 - D-69120 Heidelberg - Germany In multicellular eukaryotes, the genetic programme is expressed in complex and ever-changing temporal and spatial patterns throughout development and differentiation. The description and analysis of these patterns is crucial to elucidate the biological roles of genes and to understand the network of genetic interactions that underlies the process of normal development. To explore the molecular anatomy of the vertebrate embryo, we have systematically analysed gene expression during early development of the Xenopus frog using whole-mount in situ hybridization. About 25 % of cDNAs analysed represent differentially expressed genes and about 5 % show highly regionalized expression. Among the genes identified, we found novel cell-type specific « marker » genes and potential developmental regulators. A cluster analysis was made by comparing gene expression patterns to derive a novel parameter, « tissue relatedness ». Partial cDNA sequences and expression patterns are documented and assembled into a database, « Axeldb », publically available at the URL <http://www.dkfzheidelberg.de/abt0135/axeldb.htm> Four « synexpression groups » representing genes with shared, complex expression pattern that predict molecular pathways involved in patterning and differentiation were identified. According to their probable functional significance these groups are designated as Delta1, Bmp4, ER-import and Chromatin group. Within synexpression groups, a likely function of genes without sequence similarity can be predicted. The results indicate that synexpression groups have strong prognostic value. These sets of co-regulated genes show a striking parallel to the operon, and may be a key determinant facilitating evolutionary change leading to animal diversity. In conclusion, our study describes a functional genomics approach to investigate genes expressed during early development, provides global insight into embryonic patterning and highlights the modular genetic architecture of eukaryotic genomes. 26 Adapter-Tagged Competitive PCR and its Application to the Mammalian Central Nervous System Kikuya Kato Taisho Laboratory of Functional Genomics, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, Japan Adapter-tagged competitive PCR is an advanced form of quantitative competitive PCR. Internal standards and calibration curves are unnecessary, and consequently the assay can be conducted in a single tube. With the aid of a capillary sequencer, our laboratory can constantly perform quantitation of expression levels with 1000 genes per day. This technique is ideal for cases where RNA preparations are too complex for microarrays. We first applied the technique to the development of mouse cerebellar cortex. Expression patterns of 1869 genes were determined using ATAC PCR at 6 time points during mouse postnatal cerebellar development. The expression patterns were classified into 12 clusters that were further assembled into three groups by hierarchic cluster analysis. Among the 1869 genes, 1053 known genes were assigned to 90 functional categories. A statistically significant correlation was found between the clusters or groups of gene expression and the functional categories. Genes involved in oncogenesis or protein synthesis were highly expressed during the earlier stages of development. Those responsible for brain functions such as neurotransmitter receptor and synapse components were more active during the later stages of development. Many other genes also showed expression patterns reported in the literature. The gene expression patterns and the inferred functions were in good agreement with anatomical as well as physiological observations made during the developmental process. The analysis was further extended to a microscopic level. The developmental cerebellar cortex consists of four layers of granule cells: the proliferating zone of external germinal layer, differentiating zone of external germinal layer, outer granule cell layer, and inner granule cell layer. Each layer was isolated by laser capture microdissection, and expression levels of more than 400 genes were assayed by ATAC-PCR. A detailed overview of gene expression at structures defined at the microscopic level was obtained. Storing, Managing and Analyzing Microarray Data Alvis Brazma European Bioinformatics Institute, Hinxton, UK Microarrays allow monitoring the gene expression levels for tens of thousands of genes in parallel and are already producing huge amounts of valuable data. Analysis and handling of such data is becoming one of the major bottlenecks in the utilization of the technology. The raw microarray data are images, which have to be transformed into gene expression matrices -- tables where rows represent genes, columns represent various samples such as different tissues, and values at each position characterize the expression level of the particular gene in the particular sample. These matrices have to be analyzed further, if any knowledge about the underlying biological processes is to be extracted. Storing and annotating these data is also a nontrivial problem. We will discuss all these mentioned aspects of gene expression data managing and analysis, as well as our efforts to establish international standards for microarray data representation and annotation, and a public repository for such data. 27 SESSION 5: TRANSCRIPTOMES, PROTEOMES AND SYSTEMS BIOLOGY Analysis of Genomes and Transcriptomes in Terms of the Occurrence of Protein Parts and Features Mark Gerstein Molecular Biophysics & Biochemistry Department, Yale University, New Haven, CT 06520, USA My talk will focus on analyzing genomes and gene-expression data in terms of the finite list of protein ‘parts". Depending on context, a part could be a structural fold or sequence superfamily. I will touch on the following topics: * How one can compare different genomes in terms occurrence of various parts in them. And how this idea can be extended to compare the representation of parts in the genome versus the transcriptome. In particular, this allows one to see what protein features are enriched in highly expressed proteins. * How one can analyze the relationship between where a part is located and its transcriptome occurrence -- i.e. between a protein's subcellular localization and its level of gene expression. We extend this work to develope a formal Bayesian system for predicting subcellular localization, partially based on gene expression data. * To what degree is protein function and protein-protein interactions related to similarities in the level of gene expression. Based on developing a statistical significance formalism, I will argue that while there is a definite relationship for certain classes of protein functions and protein-protein interactions, the relationship is not general and global. The absence of correlation is principally due to the inconsistent way protein function is defined. REFERENCES http://bioinfo.mbb.yale.edu M Gerstein & R Jansen (2000). "The current excitement in bioinformatics, analysis of whole-genome expression data: How does it relate to protein structure and function?" Curr. Opin. Struc. Biol. (in press). A Drawid, R Jansen & M Gerstein (2000). "Gene Expression Levels are Correlated with Protein Subcellular Localization," Trends in Genetics 16: 426-430. A Drawid & M Gerstein (2000). "A Bayesian System Integrating Expression Data with Sequence Patterns for Localizing Proteins: Comprehensive Application to the Yeast Genome," J. Mol. Biol. 301:1059-75 R Jansen & M Gerstein (2000). "Analysis of the Yeast Transcriptome with Broad Structural and Functional Categories: Characterizing Highly Expressed Proteins," Nuc. Acids Res. 28:1481-1488 M Gerstein (1998). "Patterns of Protein-Fold Usage in Eight Microbial Genomes: A Comprehensive Structural Census," Proteins 33: 518-534. Bridging Genomics with Proteomics: DNA and Protein Analysis on Arrays Holger Eickhoff, Wilfried Nietfeld, Arif Malik, Neeraj Tandon, Lajos Nyarsik, Martin Horn, Thomas Przewieslik, 1 Elke Rohlfs, Eryk-Witold Wolski, Angelika Lüking, Johannes Schuchardt and Hans Lehrach Max-Planck Institut für Molekulare Genetik, Abteilung Lehrach, Ihnestr. 73, 14195 Berlin, Germany. Tel: **4930-84131405, Fax: **49-30-84131380, 1 Innovationskolleg Theoretische Biologie, Humboldt-Universität zu Berlin, Invalidenstraße 43, 10 115 Berlin, Germany In order to fully understand any complex network of gene interactions it is necessary to screen many genetic samples in parallel as rapidly as possible. We will describe those steps, which have been automated and miniaturised in our laboratory to enable a high throughput and highly parallel hybridisation based approach to genome analysis. In co-operation with the Resource Centre in the German Genome Project (http://www.rzpd.de) we have clustered EST sequences from various sequencing projects (man, mouse, arabidopsis) according to the 28 available sequence data. At present, we work with rearrayed UNIGENE sets of up to 10.000 clones on glass slides for RNA expression analysis. Therefore the clones are PCR amplified and bound covalently to glass. Experimental series were done by hybridisation of labelled RNAs to arrays of the selected gene fragments. Images from RNA-hybridisations of tissues derived from different developmental stages were compared and analysed with in house developed software for spot recognition and spot quantitation. In addition to the EST clustering we use oligonucleotide fingerprinting to identify new cDNA clones from new libraries. These libraries are now routinely cloned into expression vectors, permitting direct expression of the corresponding proteins. HIS-tagged and expressed proteins were immobilised as lysates and purified products as arrays on membranes and glass, where they allow to study protein-protein and protein-DNA interactions. The whole set-up is highly automated and uses state of the art automation technology from PCR and microarraying to automated hybridisation chambers with online detection and data analysis. In current proteomic approaches isolated proteins from 2 dimensional gels are identified in sequence databases using mass spectrometrically determined peptide maps or sequence tags of individual proteolytic cleavage products. Obviously, this approach is limited to known proteins. To overcome this limitation, we have developed a novel concept: each protein is specified by a minimal set of structural information readily accessible by mass spectrometry, which we have hereby designated as ‘minimal protein identifier’ (MPI). MPIs contain accurate molecular masses of enzymatic cleavage products in conjunction with fragment-ion data, and are recorded by MALDI-MS. MPIs can be generated from excised 2-DE gel spots and from recombinant proteins, such as from a Uniprotein set, and can be used to identify the homologus proteins on 2-DE gels and vice-versa. Once recorded, MPIs allow rapid recognition of known, as well as unknown, gene products. At the same time, MPIs allow identification of proteins in sequence databases. Equipped with these features, MPIs enable save comparison of 2-DE gels run with different biological samples independent of their format, resolution, and applied separation technology. This approach results in more reliable protein identification as measured MPIs from 2-DE gel spots are compared with measured MPIs from the expressed proteins, instead of DNA sequencepredicted MPIs. The availability of the cDNA clones of the recombinant proteins allows the direct linkage of these 2D results to RNA expression analysis on microarrays. In addition to arraying applications the recombinant proteins are used for large-scale crystallisation. The crystallisation set-up is a core facility of the protein structure factory in Berlin, which is funded by the BMBF (http://www.fu-berlin.de/psf). The crystallisation-store has a capacity of 10.000 crystallisation plates with automated online crystal detection for hanging and sitting drops. We are currently working on a further miniaturisation in arraying and crystallisation. As a result we have adopted a highly parallel piezoelectronic drop on demand technology to dispense biological samples. We have implemented a multihead piezo-jet microarraying system that is able to aspirate genetic samples out of microtitreplates. A linear 16 nozzle and a 4 x 4 nozzle multi head permit the construction of large microarrays on 2 a variety of surfaces. Since the spot density obtained by this system is more than 4000 clones/cm , we have developed higher resolution detection systems based on a laser scanning principle. The detection system scans areas of 22 cm x 22 cm with 20-micrometer resolution for two colors at a sensitivity of less than 1 attomole fluorescent dye per spot. The system is used for RNA expression- and SNP (Single Nucleotide Polymorphism) analysis. The high throughput data generation mode has increasingly to be complemented by corresponding high throughput bioinformatics systems. These bioinformatics systems have to be able to extract much more information and insights into biological processes from the rapidly accumulating data, than possible by purely manual techniques. These tools are used for complex data analysis focussing on the analysis of many thousands of genes simultaneously. One scientific goal of our projects is to create new computational approaches for the investigation of molecular mechanisms of differential gene expression and to apply them for prediction of expression profiling of any genome. The knowledge deduced will be incorporated in a general network-model of gene expression regulation, giving insight into crucial pathways in development and maintenance of organisms. An integral part of our approach is the development of mathematical/statistical techniques and algorithms for pattern recognition in very large data sets. 29 Proteomics and the Challenge of Hydrophobic Membrane Proteins: the Example of Chloroplast Membranes Rolland N., Seigneurin-Berny D., M. Ferro*, Garin J.* and Joyard J. Laboratoire de Physiologie Cellulaire Végétale, UMR 5019 CNRS/CEA/Université Joseph Fourier, and *Laboratoire de Chimie des Protéines, CEA-Grenoble, F-38054 Grenoble-cedex 9, France As a complementary approach to genome projects, proteomic analyses have been set up to identify new gene products. One of the major challenges in proteomics concerns membrane proteins, especially the minor ones. Although 2D-PAGE remains the most efficient way of separating protein mixtures, almost no hydrophobic proteins are found on 2D-gel separations of membrane proteins. Using chloroplast membranes (thylakoids and envelope membranes) as a model, we have optimized a procedure, based on the differential solubilization of membrane proteins in chloroform/methanol mixtures, to extract and concentrate the most hydrophobic membrane proteins. Propensity of hydrophobic proteins to partition in chloroform/methanol mixtures was directly correlated with the Res/TM ratio (number of amino acid residues/number of putative transmembrane regions). This was shown to be valid for thylakoids as well as for envelope membranes, thus demonstrating the versatility of the procedure (Seigneurin-Berny et al., 1999, Plant J. 19, 217-228 ; Ferro et al., 2000, Electrophoresis, in press). In both cases, chloroform/methanol extraction of membrane proteins (a) eliminates peripheral proteins as well as soluble contaminants from membrane fractions and (b) limits protein pattern complexity. Indeed, when compared to the generally used differential solubilization in detergents, chloroform/methanol extractions seems to be the best compromise to combine enrichment of highly hydrophobic proteins and complete elimination of the hydrophilic ones. Combining the use of classical SDS-PAGE and MS/MS, our procedure enables identification of hydrophobic proteins, whatever their isoelectric point was, that can be minor membrane components. It complements classical proteomic studies of membrane fractions separated by 2D-PAGE, which provide mostly informations about peripheral membrane polypeptides. Chloroform/methanol extraction is thus likely to become a versatile tool to recover the hydrophobic proteome from other membrane systems. For this reason, this subcellular specific proteomic tool is particularly well adapted to eukaryote subcellular proteomic studies. Tools for Functional Genomics Using Transcript Profiles and Proteomics Joakim Lundeberg Department of Biotechnology, Royal Institute of Technology, Stockholm, Sweden High-throughput systems for analysis of genetic variability, transcript profiling and affinity-reagent based proteomics have been established based on “in-house” experience and robotics. Tag sequencing projects involving EST and SNP analysis have been developed based on pyrosequencing, a novel sequencing by synthesis technique. To allow large-scale applications using this technology a semi-automated production line have been established that enables rapid analysis of SNPs (MTP/10 min) and ESTs (MTP/40 min). Furthermore systems for analysis of minute amounts of samples have been developed for both transcript profiling and SNP purposes. To allow for comparative studies of experimental and public domain data a visualization tool has been developed that allows for “datamining” of accumulated information leading to new starting points for subsequent proteome analysis. For these latter purposes, efficient bacterial expression systems have been established for analysis of cDNAs or exons predicted from genomic data. The obtained protein products are used to create affinity reagents for functional characterisation of gene products. Proteomic Strategies in Cancer Julio E. Celis Department of Medical Biochemistry and Danish Centre for Human Genome Research, University of Aarhus, Ole Worms Allé build. 170, DK-8000 Aarhus C, Denmark Proteomics is an emerging area of the post-genomic era that uses a plethora techniques to resolve, quantitate, rapidly survey the identity of proteins, annotate, as well as to identify their interacting partners. In combination with DNA microarrays, these technologies promise to revolutionise biology as they are expected to reveal gene 30 regulation events involved in disease progression as well as to generate potential targets for drug discovery and diagnostics. Here, I will highlight the potential of proteomics for the study of bladder cancer progression using biopsy specimens. Proteomics Databases Amos Bairoch Swiss Institute of Bioinformatics, Geneva, Switzerland SWISS-PROT is an annotated protein sequence database established in 1986 and maintained collaboratively, since 1987, by the Department of Medical Biochemistry of the University of Geneva and the EMBL Data Library (now the EMBL Outstation - The European Bioinformatics Institute (EBI)). The SWISS-PROT protein sequence data bank consists of sequence entries. Sequence entries are composed of different line-types, each with their own format. For standardization purposes the format of SWISS-PROT follows as closely as possible that of the EMBL Nucleotide Sequence Database. The SWISS-PROT database distinguishes itself from other protein sequence databases by three distinct criteria: (i) Annotation: in SWISS-PROT, as in most other sequence databases, two classes of data can be distinguished: the core data and the annotation. For each sequence entry the core data consists of the sequence data; the citation information (bibliographical references) and the taxonomic data (description of the biological source of the protein) while the annotation consists of the description of the following items: function(s) of the protein; post-translational modification(s), for example carbohydrates, phosphorylation, acetylation, GPI-anchor, etc.; domains and sites, for example calcium binding regions, ATP-binding sites, zinc fingers, homeobox, kringle, etc.; secondary structure; quaternary structure, for example homodimer, heterotrimer, etc.; similarities to other proteins; disease(s) associated with deficiencie(s) in the protein; sequence conflicts, variants, etc. We try to include as much annotation information as possible in SWISS-PROT. To obtain this information we use, in addition to the publications that report new sequence data, review articles to periodically update the annotations of families or groups of proteins. We also make use of external experts, who have been recruited to send us their comments and updates concerning specific groups of proteins. We believe that our having systematic recourse both to publications other than those reporting the core data and to subject referees represents a unique and beneficial feature of SWISS- PROT. In SWISS-PROT, annotation is mainly found in the comment lines (CC), in the feature table (FT) and in the keyword lines (KW). Most comments are classified by `topics'; this approach permits the easy retrieval of specific categories of data from the database. (ii) Minimal redundancyany sequence databases contain, for a given protein sequence, separate entries which correspond to different literature reports. In SWISS-PROT we try as much as possible to merge all these data so as to minimize the redundancy of the database. If conflicts exist between various sequencing reports, they are indicated in the feature table of the corresponding entry. (iii) Integration with other databases : it is important to provide the users of biomolecular databases with a degree of integration between the three types of sequence-related databases (nucleic acid sequences, protein sequences and protein tertiary structures) as well as with specialized data collections. SWISS- PROT is currently cross-referenced with 30 different databases. Cross-references are provided in the form of pointers to information related to SWISS-PROT entries and found in data collections other than SWISS-PROT. I will also describe other ongoing efforts of the Swiss Institute of Bioinformatics performed in collaboration with the European Bioinformatics Institute, such as TREMBL, a computer-annotated supplement of SWISS-PROT that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS-PROT. Other proteomics databases available from SIB (http://www.expasy.ch) are - PROSITE, a database of protein families and domains which consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs; 31 SWISS-2DPAGE which contains data on proteins identified on various 2-D PAGE reference maps. You can locate these proteins on the 2-D PAGE maps or display the region of a 2-D PAGE map where one might expect to find a protein from SWISS-PROT; SWISS-3DIMAGE, an image database which strives to provide high quality pictures of biological macromolecules with known three-dimensional structure. The database contains mostly images of experimentally elucidated structures, but also provides views of well accepted theoretical protein models. The images are provided in several useful formats; both mono and stereo pictures are generally available The Physiome Project : Integrating from Genomics to Function or vice versa James B. Bassingthwaighte University of Washington, Seattle, WA, 98195-7962, USA The Physiome is the quantitative description of the functioning organism in normal and pathophysiological states. It is built upon the morphome, the quantitative description of anatomical structure, chemical and biochemical composition, and material properties of an intact organism, including its genome, proteome, and the structures of cells, tissues and organs, up to the whole intact organism. The Physiome Project is beginning as a program to design, develop, implement, test and document, archive and disseminate quantitative information and integrative models of the functional behavior of the components and of intact organisms from bacteria to man. A fundamental and major feature of the program is the databasing of the basic observations for retrieval and evaluation. Given Genome, Transcriptome, Proteome, then Chance, Necessity, and Environment influence the phenotypic form and the functional entity, the Physiome. Prediction from the genome is still more often empirical than not, and straightforward logic commonly fails. The problem is not simply that a gene gives rise to several proteins, or that mRNA/protein ratios are widely scattered, but that control and function are governed through a multiplicity of interacting systems. These can be sorted out only through integrative modeling. The are many problems in developing large scale systems descriptions in biology. Integrative models have many submodels. The submodels are themselves complex, requiring consideration of spatial and temporal events and processes; they have to be linked to one another while preserving mass balance and giving accurate representation of the variables within non-linear complex biochemical networks with many signalling and controlling pathways. Micro-compartmentalization vitiates the use of simplified model structures. The wide range of rate constants in the equations makes computation costly. A most serious problem is the current lack of databases on physiological information. While the genomics and proteomics communities have organized large database systems, at higher levels of biology there are almost none, While we are nearly drowning in new information being published each day, the data are not entered into databases. "Simple" things like tissue composition, material properties and mechanical behavior of cells and tissues are not generally available. Technologies allowing many groups to work together are being rapidly developed. Internet II will facilitate this immensely. When problems are complex, a particular working group can be expert in only a small part of the overall program. The strategies to be worked out must therefore include how to pull models composed of many submodels together even when the expertise in each is scattered amongst diverse institutions. The technologies of bioinformatics will contribute greatly to this effort. The successful development of comprehensive models of biological systems is a key to strategizing toward interventional genomics, pharmaceutics and drug design. Carefully integrated models will become gradually better predictors of the results of interventions. When they are very good, which will take some time, models will be useful in predicting the side effects and long term effects of drugs and toxins, and to predict where genomic intervention will be effective and where the multiple redundancies in our biological systems will render a proposed intervention useless. The Physiome Project will provide the integrating scientific basis for the Genes to Health initiative, and make physiological genomics a reality applicable to whole organisms, from bacteria to man. 32 DOE Genome Scale Expression Efforts Marvin Stodolsky US Department of Energy, Office of Science, Genome Task Group, Germantown, MD, USA The Human Genome Initative of the US Department of Energy was begun in 1986 and dedicated support for the expression/cDNA related projects began in 1990. This support has included the cDNA clone management services of I.M.A.G.E at the Lawrence Livermore National Laboratory, many smaller cDNA projects and the recent series of Workshops on Complete cDNA Sequencing (WCCS). A core facet of DOE planning is to forsee bottlenecks in R&D and to implement plans to alleviate them. This meeting coincides with a substantial on-going transition in expression studies. With the completion of the draft sequence of the human genome, almost all of the ESTs with their messenger RNAs they represent now have candidate locations on the chromosomes. Models for the source chromosomal genes are generated both through both experimental studies and the computational methods, including those of the Annotation Consortium centered at the Oak Ridge National Laboratory. A core continuing task is to provide improved tools and resources for genome scale, multi-tissue and multi-condition analyses of gene expression. DOE plans in this arena will be related. THURSDAY, NOVEMBER 9, 2000 JEUDI 9 NOVEMBRE 2000 SESSION 6: APPLICATIONS IN BIOLOGY, BIOTECHNOLOGY AND MEDICINE Production and Quality Assessment of Full-Length-Enriched cDNA Libraries and their use in transcriptome profiling using microarrays Claudio Schneider LNCIB (Laboratorio Nazionale CIB), AREA Science Park, Padriciano 99, 34012 Trieste, Italy Technologies for obtaining full-length-enriched cDNA libraries will be presented along with their quality assessment.The advantage of using full-length cDNA deposited in microarray format with respect to partial cDNAs for hybridization in transcriptome profiling will be considered within a case-study. Requirement for increased detection sensitivity to uncover low-abundance transcripts will be adressed. Transcriptome profiling associated with 'in vitro' growth arrest and p53 response will be analyzed in relation to specific formexamples. Integration of the obtained transcriptome profiling with the associated protein-interaction profiling will be highlighted as a way to interlock pathway dissection to phenotipic response. Gene Expression Profiling of 3 Solid Tumors 1 1 1,4 1 1 5 3 3 1,2 H. Sültmann , W. Huber , J. Boer , F. Wilmer , R. Wittig , B. Korn , L. Füzegi , B. Gunawan , S. Haas , A. v. 2 2 1 Heydebreck , M. Vingron , A. Poustka 2 1 Abt. Molekulare Genomanalyse, Theoretische Bioinformatik, Deutsches Krebs-forschungszentrum, Im 3 Neuenheimer Feld 280, D-69120 Heidelberg; Zentrum Pathologie, Georg-August-Universität, Robert-Koch-Str. 4 40, D-37075 Göttingen; Leiden Univ. Medical Center, Wassenarseweg 72, Leiden 2333AL, NL; 5 Resourcenzentrum im Deutschen Humangenomprojekt, Im Neuenheimer Feld 506, D-69120 Heidelberg. Cancer cells show altered gene expression compared to normal cells. Knowledge of the changes in gene expression for certain types and stages of tumors can give insight into the molecular changes involved in tumor development and progression and provide molecular markers for tumor diagnosis and prognosis. We use the cDNA array hybridization technology as a high throughput method to determine the expression levels of 32,000 different human genes and ESTs spotted in duplicate onto nylon membranes. Normal tissue and primary tumor + 33 tissue of the same patient are used to isolate poly (A) RNA which is reverse transcribed into P-labelled single stranded cDNA. The hybridization of both cDNA populations is performed on different membranes using a standardized protocol. Phosphoimage plates are exposed to the membranes, and expression profiles are calculated through spotwise quantification of the signal distribution. 33 We have collected array expression data for 37 renal cell carcinoma samples (predominantly clear cell types) of different tumor stages and differentiation grades, and for the corresponding normal tissues of the same patients. More than 1700 genes were identified with statistical significance to be expressed at different levels between normal and tumor tissues. Among these were several genes which had been known to be differentially expressed in renal carcinoma, e.g. vimentin, VEGF, haptoglobin, metallothionein, and kininogen. In addition to the genes known to be associated with kidney cancer, many other genes and ESTs were found. Our data allow the definition of genes that are significantly transcribed only in certain tumor stages (e.g. in metastases). A detailed analysis of the correlation of gene expression with tumor progression in the renal cell carcinoma is currently being performed. Similar experiments for brain and breast tumors are in progress. The renal cell carcinoma specific genes, as well as a selection of genes which are known to have oncogenic potential in other cancer types, have been amplified by PCR and spotted onto glass slides to build a kidney tumor specific gene array. With these, we will conduct a further focused investigation on the differential transcription of genes in renal cell carcinoma. A queryable database combining expression data for all genes on the array with histopathological and clinical follow-up information for the tumor material as well as tools to mine these large data sets, are under development. The Cancer Gene Anatomy Program (CGAP) and the Mammalian Gene Collection (MGC) : cDNA Resources for the Community Robert Strausberg National Cancer Institute, Bethesda, MD, USA Over the past three years the NCI has established a Tumor Gene Index of sequence tags derived from cancers and their normal precursors. The collection now includes over one million EST tags and three million SAGE tags. Through the CGAP web site (cgap.nci.nih.gov), informatics tools are provided to facilitate application of the sequences and clones to cancer research. A summary of the current resource and its uses will be discussed. The NIH Mammalian Gene Collection (MGC) seeks to identify and sequence full open reading frame clones derived from human and mouse genes. The clones are first sequenced from the 5' end to identify potentially full-ORF clones and non-redundant clones are then subjected to full-insert sequencing. At present (September 2000) the collection includes approximately 160,000 5'end reads, of which 102,000 are from human cDNA libraries. These human clones include about 43,000 that appear to have the full N-terminal coding region, and these form a non-redundant set of clones derived from about 6,300 genes. Full-insert sequencing for the non-redundant set is currently underway. A detailed description of the MGC approach and results will be presented. Expression Profiling in Patient Tissue for Insights into Etiology and Pathophysiology of Progressive Disease Eric P Hoffman, Yi-Wen Chen, Po Zhao, Rehannah Borup Research Center for Genetic Medicine, Children’s National Medical Center, Washington DC 20010, USA Genome-wide expression profiling of patient tissues using Affymetrix or cDNA microarrays is widely believed to hold promise for understanding disease etiology, pathophysiology, and monitoring of therapeutics. The use of diseased patient tissues introduces variables which must be considered in experimental design, such as background genetic differences, heterogeneity in tissue sample, and others. Muscle may be an ideal tissue in which to conduct expression profiling. It is typically biopsied in disease patients, biopsies are flash frozen in a manner ideal for RNA isolation in adequate quantities, cell content of muscle is relatively simple, and frozen sections are easily interpreted for immunohistochemistry verification of expression array differences in gene expression. Also, normal muscle responds to a series of environmental stimuli, such as exercise, atrophy, and tissue damage, and is again typically biopsied in clinical studies of 34 normal populations. Genetic polymorphic variants in muscle expressed genes dictate responsiveness to exercise, including skill at specific types of sport (such as strength, endurance, speed). Here, we present a series of experiments using expression profiling of muscular dystrophy patient muscle biopsies. We present methods that control for tissue heterogeneity, and genetic background differences between individuals. We also show data on the reproducibility of the method, which underscores the need for multiple experiments. We report the use of expression profiling to define the pathophysiological cascades involved in the progression of two muscular dystrophies with known primary biochemical defects, dystrophindeficiency (Duchenne muscular dystrophy), and β-sarcoglycan deficiency (a dystrophin-associated protein). We employed a novel protocol for expression profiling in human patient tissues using mixed samples of multiple patients and iterative comparisons of duplicate datasets. Using this approach with patient muscle biopsies, we successfully define novel aspects of the molecular pathophysiology, both cell autonomous and non-cell autonomous, which explain downstream histological and clinical consequences of these biochemical deficiencies. We found evidence for both incomplete differentiation of patient muscle, and for de-differentiation of myofibers to alternative lineages with advancing age. One such differentially expressed gene that we characterized in detail, α cardiac actin, showed persistent expression after birth in 60% of myofibers despite the absence of degeneration/regeneration in the muscle. The majority (80%) of myofibers remained strongly positive for this protein throughout the course of the disease. Other developmentally-regulated genes that showed widespread overexpression in these muscular dystrophies included embryonic myosin heavy chain, versican, acetylcholine receptor, SPARC/osteonectin, and thrombospondin 4. We hypothesize that the abnormal Ca2+ influx in dystrophin- and β-sarcoglycan-deficient myofibers leads to altered developmental programming of developing and regenerating myofibers. The finding of upregulation of HLA-DR and Factor XIII led to the novel identification of dendritic cell infiltration in dystrophic muscle; these cells likely mediate immune responses and microenvironmental changes in muscle. Finally, we document a general metabolic crisis in dystrophic muscle, with large-scale down-regulation in mitochondrial gene expression. Molecular Immunology 2000: Transcriptional Profiling of Regulatory Vα24JαQ T cells from Identical Twins Discordant for Type I Diabetes, and a New Mechanism for Regulation of the Immune Response S. Brian Wilson, Michael C. Byrne and Jack L. Strominger Dana Farber Cancer Institute, Boston MA 02115, Genetics Institute. Cambridge MA 02140, and Department of Molecular and Cellular Biology, Harvard University, Cambridge MA 02138, USA In a study of identical twins discordant for Type I diabetes, the diabetic probands were observed to have a marked deficiency in the number of regulatory Vα24JαQ T cells as well as a defect in secretion of IL-4 from these T cells (1). The question whether the effect on IL-4 secretion was unique was examined by transcriptional profiling of the V α24J αQ T cells from a diabetic proband and from her identical non-diabetic twin. Since 226 transcripts were altered by activation of the T cells from the normal twin and only 86 in the clone derived from the diabetic twin, the deficit in IL-4 secretion was not unique. Furthermore, the observed transcriptional profiles strongly suggested a role for these regulatory T cells in the recruitment and activation of cells in the myeloid lineage (2). Stimulation of Vα24JαQ T cells through their T cell receptor resulted in the activation of a number of transcripts important for recruitment and differentiation of myeloid cells (i.e. MIP-1α, GM-CSF, IL-4) and several involved in cytolysis (perforin, granzymes, and granulysin). Moreover, myeloid dendritic cells (DC) were found to express CD1d, the ligand for the invariant TCR of Vα24JαQ T cells. Myeloid dendritic cells both activated V α24J αQ T cells, and were susceptible to lysis by these same regulatory T cells in a CD1d-restrricted fashion. Since myeloid dendritic cells are a major source of IL-12 that is required for Th1 cell differentiation, their elimination by lysis is a mechanism for limiting the generation of Th1 cells and thus regulating the balance of Th1 and Th2 responses (3). S. Brian Wilson, Kent S.C., Patton K.T., Orban T., Jackson R.A., Exley M., Porcelli S., Schatz D.A., Atkinson M.A., Balk S.P., Strominger J.L., Hafler D.A.Extreme Th1 bias of invariant Valpha24JalphaQ T cells in type 1 diabetes. Nature 391:177-181 (1998). S. Brian Wilson, Kent, S. C., Horton, H.F., Hill, A.A., Bollyky, P.L., Hafler, D. A., Strominger, J.L. and Bryne, M. Multiple differences in gene expression in regulatory Vα24J αQ T cells from identical twins discordant for type I diabetes. Proc. Natl. Acad. Sci. 97: 7411-7416 (2000). 35 Otto O. Yang, Racke F.K., Nguyen P.T., Gausling R., Severino M.E., Horton H.F., Byrne M.C., Strominger J.L., Wilson S.B. CD1d on myeloid dendritic cells stimulates cytokine secretion from and cytolytic activity of Valpha24JalphaQ T cells: a feedback mechanism for immune regulation. J. Immunol. 165:3756-3762 (2000). A new highly sensitive microarray approach for differential screening using radioactive probes S. Dumas, T. Vujasinovic, H. Salin, S. Maitrejean (1), C. Menini, and J. Mallet LGN, UMR 9923, CNRS, Hôpital Pitié Salpêtrière, 75013 Paris, France (1) Biospace Mesures, 10 rue Mercoeur, 75011 Paris, France We have developed experimental procedures and signal filtering algorithms making radioactive labelling highly suitable for gene expression screenings on microarrays. These procedures make it possible to perform simultaneous hybridisation of two differently-labelled radioactive probes on a given microarray. This approach gives the highest sensitivity currently available in signal detection (as compared to fluorescent labelling, the gain in signal detection sensitivity is of >102). It allows expression profiling experiments using sub-microgram amounts of unamplified messenger RNA from small biological samples. We were able to detect very lowexpressed mRNAs (<1 mRNA copy / 105 mRNA molecules), even when starting from very low amounts of sample (100ng of poly-A RNA corresponding to approximatively 5mg of tissue per hybridisation experiment). We show that 3H-labelling is fully detected on glass-support microarrays. In the present state of art, simultaneous hybridisation procedures can be performed by comparing 3H with either 33P or 35S (or 32P). The 5-µm pixel size of the MicroImager (Biospace Mesures, Paris, France) is satisfactory for microarray analysis. About 10,000 spots could be analysed on a given array with radioactive labelling. Considering the high absolute sensitivity in signal detection and the low background of this technique, it should theoretically make it possible to detect reproducibly less than 2-fold gene expression modulations of low-expressed genes. High Throughput SNP scoring using Rolling Circle Amplification Tony Smith Amersham Pharmacia Biotech, Amersham Laboratories, White Lion Road, Amersham, Buckinghamshire, UK The increasing availability of a dense map of single nucleotide polymorphism (SNP) markers makes possible genome-wide scans for genotype-phenotype associations. The density of markers required is the subject of considerable discussion, but it is clear that such experiments will require SNP scoring technology capable of very high throughput and therefore issues of cost per assay and ease of automation are important. SNiPer is an SNP scoring system that has been developed to address these issues. It is based on allelespecific amplification of polymorphic loci directly from genomic DNA. The technology combines ligation of SNP1 specific open circle probes and Rolling Circle Amplification in a single tube process. Detection is carried out using two generic primers labelled with FRET dye pairs in a homogeneous microtitre plate format. Results will be presented demonstrating the accuracy and sensitivity of this system. 1 Lizardi, P.M. et al, Nature Genetics 19, pp225-231, 1999. 36 Populations, SNPs and Chips in Common Disease Mapping Andres Metspalu Institute of Molecular and Cell Biology, University of Tartu, Estonian Biocenter, Estonia and International Agency for Research on Cancer, Lyon, France Many common diseases like cardiovascular disease, diabetes, cancer, asthma etc. are the result of a complex interaction between environmental factors and susceptibility alleles of multiple genes. Traditional linkage analysis is not suitable for identifying these alleles (all are potential drug targets or drug candidates) because they do not segregate in a Mendelian fashion and it can find alleles only with substantial effects to disease predisposition. Therefore alternative strategies are needed. One way to succeed is to use genome-wide scan of cases and controls and perform association studies based on linkage disequilibrium. Now the question is what population to use for the study? What is the sample size? How many and what type of SNP markers we need? What is the best genotyping technology in regard of throughput, fidelity and cost? We propose Estonian population for large-scale association studies using high density SNP mapping with 60 to 100 thousand SNP markers. Before genotyping the disease status (phenotype) of individuals voluntarily participating in the project will be recorded. We hope to collect up to one million phenotypes -75% of the population (www.genomics.ee). We have developed a single nucleotide polymorphism scoring system for high throughput SNP analysis. The method is based upon an array of oligonucleotides immobilized via a 5’-end amino linker on an amino-coated glass slide surface. Oligonucleotides are selected from the sense and antisense genomic sequence so that their 3’-ends are one base pair upstream of the SNP. A dsPCR product containing the SNP is used as a template. This dsPCR product is fragmented with UNG and treated with AP to inactivate the dNTPs before primer extension with fluorescent ddNTPs. A four channel imaging system (Asper FD-003) has been developed consisting of a total internal reflection fluorescence excitation mechanism combined with high-resolution CCD. A Software package (Genorama 3.0) has been developed for SNP scoring (Kurg et al. 2000, Genetic Testing 4, 17). Oligonucleotide design, quality, DNA Polymerase, dye terminators, template DNA quality and special software tools are all critical for the optimal results. Once developed for each specific set of oligonucleotides, these SNP chips are working and are more competitive when compared with other SNP scoring platforms. This is even more profound when the number of SNPs to be analyzed increases from hundreds to many thousands. SESSION 7: FUTURE PERSPECTIVES Gene Identification Projects at TIGEM Giuseppe Borsani Tigem - Telethon Institute of Genetics and Medicine, Via Pietro Castellino 111, 80131 Napoli, Italy The mission of our Institute is to study human inherited diseases. Since its beginning, Tigem‘s strengths have been in the field of disease gene identification and in the characterization of the molecular defects underlying human genetic diseases. A number of disease genes have been in fact identified by Tigem researchers during the last three years: Opitz syndrome (Quaderi et al., Nature Genetics, 1997), hereditary spastic paraplegia (Casari et al., Cell, 1998), lysinuric protein intolerance (Borsani et al., Nature Genetics, 1999) non type I cystinuria (Feliubadalo et al., Nature Genetics, 1999), and mucolipidosis type IV (Bassi et.al., AJHG, 2000). Tigem researchers have also been involved in systematic (“genome wide”) projects with an emphasis on the experimental approach, rather than on the biological problem. These projects have taken advantage of the presence of core facilities, such as Sequencing, Bioinformatics, and cDNA Library Screening, among the others. One of the most successful of this type was the Drosophila Related Expressed Sequences (DRES) project, which led to the identification of many novel human genes homologous to Drosophila mutant genes (Banfi et al., Nature Genetics, 1996). This effort generated several additional, and more focused projects on candidate disease genes. Another important systematic gene identification project in our laboratory is represented by the analysis of the distal short arm of the X chromosome (Xp22), a study started more than ten years ago. We believe that a mixture of systematic and focused projects is of great advantage. Systematic approaches allow researchers to benefit from the resources generated by the Human Genome Project. In addition, they 37 generate novel ideas and tools for more “in-depth” projects. On the other hand, only the latter type of projects will provide us with detailed and reliable information on gene function and dysfunction. The Human Genome, Transcriptome Analysis, Medicine and Cancer Gert-Jan van Ommen Leiden University, Leiden, The Netherlands Our past and future projects in genomics entail YAC and MAC (re)construction and transgenics, FISH development, high-throughput robotics and DNA-chip expression array research, in the context of the Leiden Genome Technology Center (GTC) and in close collaboration with the LUMC department of Molecular and Cellular Biology. Our cancer genetics program aims to further insights in the cell biology and etiology of common cancers and to improve diagnosis and prevention. Breast, colon and skin cancer are studied at all levels, from clinicalepidemiological and cellular studies to mouse models. Together with the LUMC departments of Pathology, Gastro-Enterology and Dermatology, we have made widely recognised contributions to the diagnosis, epidemiology and genotype-phenotype correlation in breast cancer (BRCA1 and BRCA2), colorectal cancer (FAP and HNPCC) and melanoma (p16/CDK2). The mouse colorectal cancer genetics program aims at the control by the APC/ß-catenin signalling pathway of cell adhesion, migration and differentiation in development and tumorigenesis. Ongoing research: From the study of knockout mice with different levels of truncated APC, and crosses with repair gene knockouts, we conclude that the remaining capacity of APC to downregulate ßcatenin is a key factor in tumorigenesis. Also for BRCA1 we have recently established a mouse model for further mechanistic study. Our current melanoma work shows a strong risk-modifying effect of melanocortin receptor-1 polymorphism, which also affects skin type. Future work in cancer genetics aims at risk- and phenotype-modifying factors, using epidemiological and animal studies. Acute Promyelocytic Leukemia: A Model for Gene Transcriptional Regulation-based Therapy Sai-Juan Chen, Qing-Hua Zhang, Zhen-Yi Wang, Zhu Chen Shanghai Institute of Hematology, Rui Jin Hospital affiliated to Shanghai Second Medical University, 197 Rui Jin Road II, Shanghai 200025, China Acute promyelocytic leukemia (APL) is characterized by specific chromosomal translocation t(15;17) which fuses the RAR α gene to PML gene in the great majority of patients and variant translocations t(11;17)(q23;q21), t(5;17) and t(11;17)(q13;q21) resulting in PLZF-RARα, NPM-RARα and NuMA-RARα fusion genes in a small subset of patients. Current data suggest that PML-RARα and other fusion receptors play a key role in APL pathogenesis through antagonizing the retinoic acid (RA) signalling and the regulatory pathways mediated by fusion partners, as well as decreasing the sensitivity to RA in receptor’s interaction with nuclear corepressor complex. The leukemogenic effect of the fusion genes were confirmed in transgenic animals by several groups including our own. APL is the first human cancer which responds to the differentiation-inducing effect of all-trans RA (ATRA) and arsenic trioxide. The therapeutic effect of ATRA has been associated with the direct modulation of PML-RAR α and its interaction with corepressor, the restoration of the wild-type RAR/RXR regulatory pathway and the regulation of the transcriptional expression of genes downstream of RAR/RXR. Since transcriptional regulation represents the link between APL pathogenesis and ATRA differentiation therapy, analyzing gene expression patterns in APL cells before and after ATRA treatment is a useful approach to identify genes whose functions are involved in this new cancer treatment. Using an APL cell line-NB4 as in vitro model and three techniques for gene expression scanning, namely cDNA array, differential display-PCR and suppression subtractive hybridization, we have recognized 169 genes including eight novel ones to be modulated by ATRA. A chronologically well-coordinated regulation of these genes seems to constitute a balanced functional network governing the decreased cellular proliferation ability, the initiation and progression of maturation, and the maintenance of cell survival before terminal differentiation. Accordingly, several signal pathways such as MAPK, cAMP/PKA, interferon/STAT and AP-1 are implicated in the choreography. Cycloheximide inhibition test revealed that the transcriptional regulation of 8 induced and 24 repressed genes appeared to be protein synthesis-independent. By comparing the expression pattern of NB4 cells with that of NB4-R1, an 38 ATRA-resistant subclone which can be reinduced into differentiation in the presence of cAMP, we identified a group of genes which are in fact responsive to cAMP. This suggests that ATRA-triggered maturation requires a cooperation between retinoic acid pathway and the cytosolic signaling. Recently, we also found that arsenic trioxide can modulate the PML-RARα oncoprotein and induce its degradation. Moreover, arsenic trioxide is able to regulate the transcriptional expression of a number of genes important for the control of cell differentiation and apoptosis. The effect of this drug on cellular gene expression profiles seems to be related to its ability to modulate the histone acetylation status. Taken together, our work suggests that ATRA and arsenic trioxideinduced differentiation of APL cells represent a new model in cancer therapy through targeting the cellular machinery in regulating gene expression. This concept may be extended to other human cancers with the better understanding of the human genome, transcriptome and proteome. Transcriptional Regulation of Cell Cycle Regulatory and Apoptosis Genes by DNA Damage Induced by Camptothecin: Microarray Analysis of Dose- and Time-Dependent Effects 1 1 1 2 1 1 2 Yves Pommier . Yi Zhou , William C. Reinhold , Lance Miller , Fuad G. Gwadry , Lawrence H. Smith , E. Liu , 1 1* Kurt W. Kohn , John N. Weinstein 1 Laboratory of Molecular Pharmacology, Division of Basic Sciences, Building 37/5D-02, National Cancer 2 Institute (NCI), National Institutes of Health (NIH), Bethesda, Maryland, USA. National Cancer Institute (NCI) Microarray Facility, Advanced Technology Center, Gaithersburg, Maryland, USA. cDNA microarray technology holds the promise of genome-wide gene expression profiling. Even at its current level, this technology allows us to establish associations between characteristic gene expression patterns and molecular responses to therapy. In this study, we used cDNA microarrays of 1,694 cancer-interesting genes to monitor the gene expression consequences of the treatment of HCT116 human colon cancer cells with the topoisomerase I inhibitor camptothecin (CPT). We did so as a function of time and concentration because otherwise one would be likely to miss or misunderstand major portions of the downstream molecular consequences of the treatment. CPT generates double strand DNA breaks during DNA replication and delays or arrests cell cycle progression. To obtain a homogenous cellular response, we used aphidicolin to synchronize the cells in S-phase prior to CPT treatment. Treatment with 20 nM CPT caused reversible (temporary) G2 delay, whereas treatment with 1000 nM CPT causes irreversible (permanent and lethal) G2 arrest. Thirty-three genes, divided into 3 groups, showed characteristic changes as a consequence of treatment. Group I genes consisted of mitosis-related genes, including cyclin B1 and centrosome-related genes that were upregulated after 20 nM treatment during the extended G2/M transition, whereas they were decreased during permanent G2 arrest at high CPT concentration. In contrast, group III genes included a group of p53-activated stress response genes, including p21, 14.3.3, Fas, and wip1, which were up-regulated after 1,000-nM CPT treatment but remained unchanged in the cells treated with 20 nM CPT. Group II genes, many of them involved in cellular metabolism, were downregulated during the cell cycle delay of cells treated with 20 nM CPT. These findings suggest that DNA damage can disrupt cell cycle-regulated gene expression. The gene expression changes identified in this work reveal remarkably coordinated DNA-damage responses, apoptosis, and cell cycle events at the transcription level. 39 SESSION 8: ETHICAL, LEGAL AND ECONOMICAL ISSUES Patentability of Life and Ethics Noëlle Lenoir Conseil Constitutionnel, Paris, France The title of my talk illustrates the originality of the European approach of patenting. For Europeans, intellectual property has been approached for a long time not only from a technical but also a « moral » point of view. Exclusion from patenting of inventions which are considered « contrary to ordre public th and good morality» was already part of the first European treaties on patenting in the 19 century. The same approach is today transcribed in the more modern form of ethics in the 1998 European directive on the legal protection of biotechnology inventions, which prohibits for example the patenting of inventions related to procedures for human cloning or those including the commercial and industrial use of human embryos. How can we ensure the effective implementation of the very precise prescriptions of the 1998 directive in the content of the accelerated evolution of research based on human body parts (genes, proteins, cells, particularly embryonic stem cells) ? Another difficulty arises: how to reconcile the traditional principle of common as well as civil law of « non commercialization of the human body » together with systems designed to ensure the profitable exploitation of the results of research on human living matter. Thus, some argue in Europe that commercialization of genetic sequences by private companies involved in sequencing the human genome world constitute an infringement to the non commercialization of the human body. Under this assumption, should we consider sharing the profits derived from these research activities or their consequences, if not with individuals, at least with patient organizations ? The example of Iceland which has licensed through a specific law its genetics data to DeCode should stimulate our thinking. As fundamental as the first two topics is the protection of privacy for persons whose genetic data or biological samples are stored and used in research and industry. This is a question which is not yet frontally addressed, but which must be studied urgently at an international level, i.e. at least through a concerted action between the United States, Europe and Japan. Biology and human genetics, as a whole, are without borders. Thus all these questions must be addressed at this international level. Patenting Genome Research Tools and the Law Rebecca Eisenberg University of Michigan Law School, Ann Harbor, MI, USA Over the past 15 years, a number of legal and commercial developments have converged to make intellectual property issues particularly salient in biomedical research. A series of judicial and administrative decisions has expanded the categories of patentable subject matter in the life sciences. For many years it appeared that patents on living subject matter would violate the longstanding principle that one may not patent products or phenomena of nature. But in 1980 the US Supreme Court held in the case of Diamond v. Chakrabarty that a living, genetically altered organism may qualify for patent protection as a new manufacture or composition of matter under Section 101 of the US Patent Code. Characterizing Chakrabarty's invention as « a new bacterium with markedly different characteristics from any found in nature » and « not nature's handiwork, but his own », the Court indicated that Congress intended the patent laws to cover « anything under the sun that is made by man ». With this broad directive from the Supreme Court, the US Patent and Trademark Office (PTO) expanded the categories of living subject matter that it considered eligible for patent protection to include plants and animals. 40 During the same time period, the explosion of commercial interest in the field, and the concomitant emergence of commercial biotechnology companies, have amplified the importance of intellectual property in the biomedical sciences. Many biotechnology firms have found a market niche somewhere between the fundamental research that typifies the work of university and government laboratories and the end product development that occurs in more established commercial firms. To survive financially in this niche, biotechnology firms need intellectual property rights in discoveries that arise considerably upstream from commercial product markets. This creates pressure to patent discoveries that are closer to the work of research scientists than to ultimate consumer products. Another contemporaneous development that has contributed to the prevalence of intellectual property in biomedical research is the passage of the Bayh-Dole Act and the Stevenson-Wydler Act in 1980, and a series of subsequent acts that refine those statutes and expand their reach. These statutes encourage research institutions to patent discoveries made in the course of government-sponsored research. For some institutions involved in health-related research, this represented a 180° shift in policy. A generation ago, the prevailing wisdom was that the best way to assure full utilization of publicly-sponsored research results for the public good was to make them freely available to the public. Today, federal policy reflects the opposite assumption. The current belief is that if research results are made widely available to anyone who wants them, they will languish in government and university archives, unable to generate commercial interest in picking up where the government leaves off and using the results to develop commercial products. To make government-sponsored research discoveries attractive candidates for commercial development, institutions performing the research are encouraged to obtain patents and to offer licenses to the private sector. As a result, institutions that perform fundamental research have an incentive to patent the sorts of early stage discoveries that in an earlier era would have been dedicated to the public domain. A big part of the resulting increase in patenting activity among public sector research institutions has been in the life sciences. Taken together, these factors have created a research environment in which early stage discoveries are increasingly likely to be patented, and access to patented discoveries is increasingly likely to be significant to the ongoing work of research laboratories. I will address how these changes apply specifically to the field of genome research by discussing : (i) patents as a strategy for protection of intellectual property; (ii) the benefits and costs of patents; (iii) requirements for patent protection; (iv) the significance of experimental use exemption. Legal Problems Related to Gene Patents Joseph Straus Max-Planck Institute for Foreign and International Patent, Copyright and Competition Law, Munich, Germany The impact of patents on researchers' work, differences in propensity toward patenting. Key legal issues: Eligibility for patent protection - DNA sequences a discovery or an invention? Patentability requirements and scope of protection - dependency. TRIPS Agreement as mandatory international standard and the EU Directive on the legal protection of biotechnological inventions. From Erythropoietin (EPO) patent to the US Patent on Human Kinase Homologs - a retrospect on patent granting practice of the US Patent Office and the European Patent Office. HUGO's position on patenting in genomics (1992-2000) in view of the Blair-Clinton Statement. The current response of the US and EU authorities to patent applications for, inter alia, Expressed Sequence Tags (ESTs) and Single Nucleotide Polymorphisms (SNPs). Effects and Scope of Protection of Gene Patents and the European attempt to ease the dependency problem. Is there a need for further legislative action? 41 From Functional Genomics to Integrated Economy in Biotechnology Bernard Pau Director CNRS UMR 5094, Institute of Biotechnology and Pharmacology, Montpellier, France Functional genomics is a key piece in the puzzle of life sciences and technologies. It emerges as a « golden gate » between genetics (from molecular dissection to vision of the genome organisation) and physiology (from dynamics to specialisation of organs/cells). It is thus no more possible to discriminate, at every step along this sophisticated way to discovery, between pure knowledge acquisition and fully dedicated application : fundamental research in life sciences today is closer than ever to medical consequences. Drug discovery and development of « block busters », for instance, fully depends on the quality of integration that pharmaceutical biotechnology can achieve from basic research to clinical trials. Obviously, functional genomics is there a key player, taking a major part in : the exploration of pathologies (i.e.: identification of effector genes in invasive infectious agents), pharmacology at the signalisation level (i.e.: characterization of the metabolic pathways engaged in the phenotype of drug resistance), developments in animal/human toxicology (i.e. : toxicogenomics at early stage of clinical trials). This integration deeply concerns also the quality of partnership that public research can set up with industry including small and medium size enterprises. It will have a profound impact on their intellectual property strategy and potency, on their economic capacity (in terms of return on investment) and even on their position in the field of knowledge acquisition. France has perfectly identified this challenge and decided to consider genomics as a national priority in its innovation policy. Closing Address Geneviève Berger Director General of the CNRS, Paris, France Geneviève Berger has been appointed Director-General of the CNRS. Geneviève Berger was born in 1955 in Moselle, France. From an early age her education followed an interdisciplinary path. In 1974 she entered the "École Normale Supérieure" of Cachan, where she graduated after having passed the « agrégation », the highest competitive examination for teachers in France, in the Physical Sciences. While there, she developed a deep interest in applied physics. She went on to complete her studies by writing a doctoral thesis in the physical sciences and earning a doctorate in medicine. She also completed a specialization in nuclear medicine and the application of radioelements to medicine and biology. In addition, she wrote a doctoral thesis in human biology and was given the authorization to supervise research. Greatly attached to a multidisciplinary approach, in 1991 Geneviève Berger founded an interdisciplinary laboratory, the Parametric Imaging Laboratory. It was set up in the teaching hospital center « Broussais HôtelDieu » (Paris), where she was appointed professor and interim head of the department of biophysics and nuclear medicine. This laboratory conducts research on medical imaging and the medical and biological applications of ultrasound. Geneviève Berger helped give this laboratory a strong bias towards technology transfer as an extension of basic research. Her first success was the design and international marketing of the world's first ultrasound bone imaging device, used mainly for osteoporosis. The laboratory has also investigated the use of ultrasound for very high resolution, non-destructive visualization and characterization of ocular structures, cartilage, and even arteries. Geneviève Berger has been invited to a number of countries to discuss her work, particularly the United States. She received the CNRS Silver Medal for the Life Sciences in 1994 and the Yves Rocard Prize from the French Physics Society in 1997. In 1995, she was elected president of the « Treatment and Drugs: Design and Resources » section of the National Committee for Scientific Research. She left this position in 1998 in order to head the Bio-Engineering Department of the Directorate for Technology of the Ministry of National Education, Research and Technology. 42 In 1998, she was honored as Chevalier of the "Palmes académiques" (a decoration for service to education in France) and as Chevalier of the Legion of Honor. She was appointed Director of Technology at the Ministry of Research in January 2000. 43 ABSTRACT AUTHORS INDEX Auffray, C. ………………………………………………………………………….………………..15 Bairoch, A. ………………………………………………………………………………………………31 Bassingthwaighte, J.B. ………………………………………………………………………………32 Berger, G. ………………………………………………………………………………………….……42 Bonaldo, M.F. ………………………………………………………………………………….……..16 Borsani, G. ………..……………………………………………………………………………………..37 Brazma, A. ……………………………………………………………………………………..………..27 Bucher, P. ………………………………………………………………………………………………..23 Bumgarner, R. …………………………………………………………………………………………..24 Celis, J. …………………………………………………………………………………………………..30 Chen, Z. ……...…………………………………………………………………………………………..38 Clifton, S. …….…………………………………………………………………………………………..17 Eickhoff, H. …..…………………………………………………………………………………………..28 Eisenberg, R. ..…………………………………………………………………………………………..40 Folta, P. …………………………………………………………………………………………………..20 Gaasterland, T. ……………………………………………………………………………………..…..25 Gerstein, M. ……………………………………………………………………………………………..28 Gros, F. ……….…………………………………………………………………………………....…..15 Hide, W. ……………………………………………………………………………………………….....21 Hoffman, E. ….…………………………………………………………………………………………..34 Horn-Saban, S. ……………………………………………………………………………………….....25 Imbeaud, S. ……………………………………………………………………………………………..25 Joyard, J. …….…………………………………………………………………………………………..30 Kato, K.……….…………………………………………………………………………………………..27 Kawai, J.……..……………………………………………………………………………………….…..19 Lancet, D. ………………………………………………………………………………………………..23 Lenoir, N. …….…………………………………………………………………………………………..40 Lundeberg, J. …………………………………………………………………………………………..30 Mallet, J. …….…………………………………………………………………………………………..36 Margolin, J. ….…………………………………………………………………………………………..19 Mayor, F. …….………………………………………………………………………………………..15 Metspalu, A. ……………………………………………………………………………………………..37 Nguyen, C.………………………………………………………………………………………………..24 Ohara, O. ………………………………………………………………………………………………..17 Pau, B. ……….…………………………………………………………………………………………..42 Pollet, N. …….…………………………………………………………………………………………..26 Pommier, Y. ……………………………………………………………………………………………..39 Poustka, A. ……………………………………………………………………………………………..33 Quackenbush, J ………….……………………………..…………….……………………….………..22 Saurin, W. ………………………………………………………………………………………………..22 Schneider, C. ..…………………………………………………………………………………………..33 Simpson, A. ……………………………………………………………………………………………..20 Smith, T. ……..…………………………………………………………………………………………..36 Stodolsky, M. ..…………………………………………………………………………………………..33 Straus, J. …….…………………………………………………………………………………………..41 Strausberg, R. …………………………………………………………………………………………..34 Strominger, J. …………………………………………………………………………………………..35 Sugano, S. …..…………………………………………………………………………………………..18 van Ommen, G.J. ……………………………………………………………………………………..38 Wagner, L. …..…………………………………………………………………………………………..21 Wiemann, S. .…………………………………………………………………………………………..18 44