Agenda - Osthus GmbH
Transcription
Agenda - Osthus GmbH
LIDER Roadmapping Workshop on: Content Analytics and Linked Data in Healthcare and Medicine Munich, July 13th, 2015 Location of the Workshop: Siemens AG CT RTC BAM KMR Otto-Hahn-Ring 6 81739 München Deutschland Room: Geb. 31 - Raum 31.280 Directions: https://www.realestate.siemens.com/hq/downloads/muenchen_perlach_de.pdf Organizers Philipp Cimiano (Bielefeld University) Roman Klinger (Bielefeld University, Stuttgart University) Ulli Waltinger (Siemens AG) Confirmed Participants (in alphabetical order) Markus Bundschuss (Roche Deutschland Holding GmbH) Johannes Forster (IBM) Juliane Fluck (Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen) Stefan Geißler (TEMIS Deutschland GmbH) Andis Lagzdiņš (TILDE) Bettina Klimek (Universit#t Leipzig) Stefano Marmonti (MarkLogic) Werner Müller (CNR Center for Neuronal Regeneration e.V. und HHU Düsseldorf) Heiner Oberkampf (OSTHUS GmbH) Karsten Quast (Böhringer Ingelheim Pharma GmbH & Co. KG) Dietrich Rebholz-Schuhmann (University of Zurich) Jasmin Saric (Böhringer Ingelheim Pharma GmbH & Co. KG) Stephan Schindewolf (SAP AG) Hinrich Schütze (Ludwig-Maximilians-Universität München) Thorsten Schoedl (Roche Deutschland Holding GmbH) Martin Sedlmayr (Friedrich-Alexander-Universität Erlangen-Nürnberg) Daniel Sickert (Siemens Healthcare GmbH) Daniel Sonntag (Deutsches Forschungszentrum für künstliche Intelligenz GmbH) Luca Toldo (Merck) Volker Tresp (Siemens AG, Ludwig-Maximilians-Universität München) Workshop Program 10:00 – 11:30 Content Analytics in pre-clinical contexts and in support of translational medicine Heiner Oberkampf: „Semantics for Integrated Laboratory Analytical Processes – the Allotrope Perspective“ Hans Werner Müller: “Preclinical decision support system to enhance the chances of success for experimental therapies in clinical translation“ Jasmin Saric: “Semantic Data Integration supporting translational medicine” Dietrich Rebholz Schuhmann: “Phenotype resources for medical use: translational and personalized medicine in action” Karsten Quast: tba 11:30: - 13:00 Content Analytics in Healthcare Daniel Sickert: “Learning from the clinical routine: how to retrieve the data treasure?“ Martin Sedlmayer: „On Secondary Use of Clinical Data“ Juliane Fluck: „Common challenges for information retrieval and extraction in scientific publications, patents or EHRs“ Volker Tresp: „Machine Learning with Healthcare Knowledge Graphs” 13:00 - 14:00 Lunch 14:00 – 15:00 Content Analytics in Healthcare (2) Andis Lagzdiņš: „Multilingual solutions for content analytics in healthcare“ Daniel Sonntag: „Medical Cyber Physical Systems“ MarkLogic: „Creating profiles to support fraud detection“ MarkLogic: „Linking data across disparate data sources“ 15:00 – 15:00 Coffee break 15:30 – 16:30 Text Mining and Data Mining Technologies Luca Toldo: “Social media in Pharmacovigilance“ Markus Bundschuss: “Text Mining at Roche” Hinrich Schütze: “Text representations for text mining” Stefan Geißler: “Semi-automatic support for deriving structured vocabularies” 16:30 – 17:00: Closing discussion and final remarks Abstracts Common challenges for information retrieval and extraction in scientific publications, patents or EHRs Juliane Fluck Most of the knowledge in medicine and biology is not yet electronically available in a structured form but mainly in scientific publications or other textual communications. To make it available for data analytics several challenges have to be met - some are directly connected to the field of text mining, others are more global. The first challenge concerns data access ranging from license restrictions for full text articles to data protection in the area of personal medical data. The next hurdle to pass is the data format – text visualisation formats such as HTML with embedded Java script and different codings, Word documents or PDF documents are common and cause difficulties for automatic extraction systems. Hence, the standardisation of text exchange formats would improve considerably the extraction quality. Terminologies, hierarchies and ontologies are important for all areas of semantic integration and data analysis. However, they are often not available in the granularity needed for the application area. Another challenge is the complexity in biomedical relationship extraction. The support of further method development is necessary. Common bottleneck for such improvements are missing or low quality training data and inadequate evaluation standards and environments. Other important aspects are stable processing environments with flexible workflows for large scale data processing. Finally, it is most challenging to satisfying our end users. Trade off between simple and complex user interfaces has to be considered and usability and efficiency of the developed tools have to be taken into account. Preclinical decision support system to enhance the chances of success for experimental therapies in clinical translation Hans Werner Müller Clinical translation in neuology faces a strong disproportion between the immense preclinical research effort and the lack of successful clinical therapies in disorders like, e.g., central nervous system (CNS) trauma, stroke or neurodegenerative diseases (M Alzheimer, M Parkinson). Focusing on brain and spinal cord trauma, thus far no clinical trial based on innovative preclinical therapeutic approaches yielded functional recovery in human patients [1]. Instead, injury of the adult CNS of mammals results in lasting deficits like permanent motor and sensory impairments due to lack of profound neuronal regeneration. Especially, spinal cord injured patients (Fig. 1) remain paralyzed for the rest of their lives and often suffer from additional complications. Fig. 1: MRI of a cervical spinal cord publications injury (SCI) from the year 1950 from the database Figure 2: Development of the number of for central nervous system injury to the year 2013, as retrieved PubMed. Fortunately, basic and preclinical research in the field of central nervous system trauma is advancing at a fast pace and yields over 8,000 new publications per year growing at an exponential rate (Fig. 2), yielding a total amount of approximately 150,000 PubMedlisted papers today. The vast amount of newly published information exceeds by far the capacity of the individual scientist to absorb the relevant knowledge [2]. Therefore, the knowledge that underlies decision making in selecting the most promising therapeutic interventions for further research or clinical trials is notoriously incomplete and, consequently, valuable resources are wasted on numerous fruitless or redundant experiments [3, 4]. Until now, there is no objective measure or decision supporting system for translation of a SCI therapy from the animal to the human patient. In order to design promising clinical trials, two major requirements must be met: - Firstly, an evidence-based grading approach for ranking preclinical therapies to select most promising candidates for clinical trials is of outmost importance. - Secondly, meta-studies to combine results from different studies in order to obtain novel knowledge and to detect relations between experimental conditions and potential weaknesses of a therapy or an experimental model. Meta-analyses on gene or protein expression are common, but meta-studies regarding outcomes of preclinical animal studies are rarely carried out. Thus there is an urgent need for a comprehensive knowledge base which aggregates the current state of research in traumatic brain (TBI, >100,000 papers) and spinal cord (>40,000 papers) injury and repair. This would require automatic information extraction for big textual data analytics. The information extraction approach should extract the relevant facts about experimental therapies from all relevant peer-reviewed scientific publications currently available. The facts should be collected in a database being capable of aggregating and filtering them in order to objectively grade the prospective translational success of a particular therapeutic approach. This would certainly speed up translation of the selected most promising innovative therapies into clinical application for the benefit of the patients and, in addition, reduce the enormous socio-economic costs by saving personal and financial ressources. References [1] Linard Filli and Martin E. Schwab. The rocky road to translation in spinal cord repair. Ann Neurol, 72(4):491–501, 2012. [2] Corie Lok. Literature mining: Speed reading. Nature, 463(7280):416–418, Jan 2010. [3] Florian Prinz, Thomas Schlange, and Khusru Asadullah. Believe it or not: how much can we rely on published data on potential drug targets? Nat Rev Drug Discov, 10(9):712, Sep 2011. [4] Oswald Steward, Phillip G. Popovich, W. Dalton Dietrich, and Naomi Kleitman. Replication and reproducibility in spinal cord injury research. Exp Neurol, 233(2):597– 605, Feb 2012. Semantics for Integrated Laboratory Analytical Processes – the Allotrope Perspective Heiner Oberkampf The software environment currently found in the analytical community consists of a patchwork of incompatible software, proprietary and non-standardized file formats, which is further complicated by incomplete, inconsistent and potentially inaccurate metadata. To overcome these issues, the Allotrope Foundation develops a comprehensive and innovative Framework consisting of metadata dictionaries, data standards, and class libraries for managing analytical data throughout its lifecycle. The talk describes how laboratory data and their semantic metadata descriptions are brought together to ease the management of vast amount of data that underpin almost every aspect of drug discovery and development. Machine Learning with Healthcare Knowledge Graphs Volker Tresp A number of successful graph-based knowledge representations, such as DBpedia, YAGO, or the Google Knowledge Graph, have recently been developed and are the basis of applications ranging from the support of search to the realization of question answering systems. Statistical machine learning can play an important role in knowledge graphs as well. By exploiting statistical relational patterns one can predict the likelihood of new facts, find entity clusters and determine if two entities refer to the same real world object. Furthermore, one can analyze new entities and map them to existing entities (recognition) and predict likely relations for the new entity. These learning tasks can elegantly be approached by first transforming the knowledge graph into a 3-way tensor where two of the modes represent the entities in the domain and the third mode represents the relation type. Generalization is achieved by tensor factorization using, e.g., the RESCAL approach. A particular feature of RESCAL is that it exhibits collective learning where information can propagate in the knowledge graph to support a learning task. I will present applications of our approach in our BMWifunded project in Clinical Data Intelligence where Knowledge Graphs integrate patient data and background knowledge from different sources and multi-way tensors are the basis for statistical inference and decision support. Text representations for text mining Hinrich Schütze If we define text mining as the finding of patterns in large text collections, then the representation of the text is key: many important patterns can only be detected if the text is represented in a form that is conducive to generalization and reduces sparseness. I will review recent work in my group in two areas that can contribute to text representations that support text mining: morphological analysis and deep learning embeddings. Semi-automatic support for deriving structured vocabularies Stefan Geißler Real-world use of Text Analytics software in industry settings relies to quite some extent on the availability of structured vocabularies (thesauri, ontologies) that describe the domain of interest. These resources require constant maintenance, extension, updating, always under time pressure and most of the time the respective work will be done by domain experts who are typically not experts in Ontology Learning or Text Mining. We outline an environment, the Luxid WebStudio, that supports domain experts with some of the required functionality and outline also where going beyond this would be a much welcomed step forward and would represent a fine area of research and development cooperation for experts in Active Learning, Ontology Acquisition, GUI design and related fields. Medical Cyber Physical Systems Daniel Sonntag We explain the background of medical cyber-physical systems (MCPS) and discuss some recent promising directions. MCPS are context-aware, networked systems of medical sensor and actuation devices. Human-in-the-loop MCPS are of particular interest: active input modes include digital pens, smartphones, and automatic handwriting recognition for a direct digitalisation of patient data. On Secondary Use of Clinical Data Martin Sedlmayer Healthcare is collecting huge amounts of data during direct patient care. In parallel, big efforts are undertaken to learn from the data in clinical studies in order to determine effective treatments, detect new drugs or to improve patient safety. Recent trends of "Secondary Use" e.g. in the context of translational research as well as personalized medicine greatly increase the need to efficiently turn the raw data into valuable medical knowledge. In order to unlock the value of collected data, we require two things: sufficient quantities of semantically rich data and knowledge about the data in order to understand the findings and place actions upon them. Generating semantically rich data by annotation with standardized terminologies is - as for now - a very tedious task. Although there is no shortage of medical terminologies in all kinds of quality, quantity and overlap, their practical use has not widely been realized. Free availability, translations into local language, and usability-driven integration issues have to be solved in order to embed such concepts into the routine. Collecting sufficient quantities of high quality data is also not an easy task. Not only in the case of rare diseases, but many clinical suffer from a lack of study participants. Pharmaceutical companies lose millions of Euros per day due to late recruitment. Sharing or publishing valuable data, however, is hindered by a diversified landscape of privacy regulations and interpretations on local, national, European and international level. The presentations will concentrate on the aspects of secondary use of clinical data in the light of linked and open data by discussing immediate requirements as well as looking at international projects as examples of first steps. Phenotype resources for medical use: translational and personalized medicine in action Dietrich Rebholz-Schuhmann Capturing single phenotype traits or the full phenotype description is a complex task due to the large number of traits that form the phenotype, and due to the different types of qualities linked to individual phenotypes (e.g., lack of an organ, insufficient function, increase/decrease of a physiological parameter). For human, mouse and other model organisms, specific resources have been produced to capture the description of a phenotype. This talk will focus on the use of public phenotype resources for biomedical data analytics.