Agenda - Osthus GmbH

Transcription

Agenda - Osthus GmbH
LIDER Roadmapping Workshop on:
Content Analytics and Linked Data in
Healthcare and Medicine
Munich, July 13th, 2015
Location of the Workshop:
Siemens AG
CT RTC BAM KMR
Otto-Hahn-Ring 6
81739 München
Deutschland
Room: Geb. 31 - Raum 31.280
Directions:
https://www.realestate.siemens.com/hq/downloads/muenchen_perlach_de.pdf
Organizers
Philipp Cimiano (Bielefeld University)
Roman Klinger (Bielefeld University, Stuttgart University)
Ulli Waltinger (Siemens AG)
Confirmed Participants (in alphabetical order)
Markus Bundschuss (Roche Deutschland Holding GmbH)
Johannes Forster (IBM)
Juliane Fluck (Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen)
Stefan Geißler (TEMIS Deutschland GmbH)
Andis Lagzdiņš (TILDE)
Bettina Klimek (Universit#t Leipzig)
Stefano Marmonti (MarkLogic)
Werner Müller (CNR Center for Neuronal Regeneration e.V. und HHU Düsseldorf)
Heiner Oberkampf (OSTHUS GmbH)
Karsten Quast (Böhringer Ingelheim Pharma GmbH & Co. KG)
Dietrich Rebholz-Schuhmann (University of Zurich)
Jasmin Saric (Böhringer Ingelheim Pharma GmbH & Co. KG)
Stephan Schindewolf (SAP AG)
Hinrich Schütze (Ludwig-Maximilians-Universität München)
Thorsten Schoedl (Roche Deutschland Holding GmbH)
Martin Sedlmayr (Friedrich-Alexander-Universität Erlangen-Nürnberg)
Daniel Sickert (Siemens Healthcare GmbH)
Daniel Sonntag (Deutsches Forschungszentrum für künstliche Intelligenz GmbH)
Luca Toldo (Merck)
Volker Tresp (Siemens AG, Ludwig-Maximilians-Universität München)
Workshop Program
10:00 – 11:30 Content Analytics in pre-clinical contexts and in support of
translational medicine
Heiner Oberkampf: „Semantics for Integrated Laboratory Analytical Processes – the
Allotrope Perspective“
Hans Werner Müller: “Preclinical decision support system to enhance the chances of
success for experimental therapies in clinical translation“
Jasmin Saric: “Semantic Data Integration supporting translational medicine”
Dietrich Rebholz Schuhmann: “Phenotype resources for medical use: translational
and personalized medicine in action”
Karsten Quast: tba
11:30: - 13:00 Content Analytics in Healthcare
Daniel Sickert: “Learning from the clinical routine: how to retrieve the data treasure?“
Martin Sedlmayer: „On Secondary Use of Clinical Data“
Juliane Fluck: „Common challenges for information retrieval and extraction in scientific
publications, patents or EHRs“
Volker Tresp: „Machine Learning with Healthcare Knowledge Graphs”
13:00 - 14:00 Lunch
14:00 – 15:00 Content Analytics in Healthcare (2)
Andis Lagzdiņš: „Multilingual solutions for content analytics in healthcare“
Daniel Sonntag: „Medical Cyber Physical Systems“
MarkLogic: „Creating profiles to support fraud detection“
MarkLogic: „Linking data across disparate data sources“
15:00 – 15:00 Coffee break
15:30 – 16:30 Text Mining and Data Mining Technologies
Luca Toldo: “Social media in Pharmacovigilance“
Markus Bundschuss: “Text Mining at Roche”
Hinrich Schütze: “Text representations for text mining”
Stefan Geißler: “Semi-automatic support for deriving structured vocabularies”
16:30 – 17:00: Closing discussion and final remarks
Abstracts
Common challenges for information retrieval and extraction in scientific
publications, patents or EHRs
Juliane Fluck
Most of the knowledge in medicine and biology is not yet electronically available in a
structured form but mainly in scientific publications or other textual communications. To
make it available for data analytics several challenges have to be met - some are
directly connected to the field of text mining, others are more global.
The first challenge concerns data access ranging from license restrictions for full text
articles to data protection in the area of personal medical data. The next hurdle to pass
is the data format – text visualisation formats such as HTML with embedded Java script
and different codings, Word documents or PDF documents are common and cause
difficulties for automatic extraction systems. Hence, the standardisation of text
exchange formats would improve considerably the extraction quality. Terminologies,
hierarchies and ontologies are important for all areas of semantic integration and data
analysis. However, they are often not available in the granularity needed for the
application area.
Another challenge is the complexity in biomedical relationship extraction. The support of
further method development is necessary. Common bottleneck for such improvements
are missing or low quality training data and inadequate evaluation standards and
environments.
Other important aspects are stable processing environments with flexible workflows for
large scale data processing. Finally, it is most challenging to satisfying our end users.
Trade off between simple and complex user interfaces has to be considered and
usability and efficiency of the developed tools have to be taken into account.
Preclinical decision support system to enhance the chances of success for
experimental therapies in clinical translation
Hans Werner Müller
Clinical translation in neuology faces a strong disproportion between the immense
preclinical research effort and the lack of successful clinical therapies in disorders like,
e.g., central nervous system (CNS) trauma, stroke or neurodegenerative diseases (M
Alzheimer, M Parkinson).
Focusing on brain and spinal cord trauma, thus far no clinical trial based on innovative
preclinical therapeutic approaches yielded functional recovery in human patients [1].
Instead, injury of the adult CNS of mammals results in lasting deficits like permanent
motor and sensory impairments due to lack of profound neuronal regeneration.
Especially, spinal cord injured patients (Fig. 1) remain paralyzed for the rest of their
lives and often suffer from additional complications.
Fig. 1: MRI of a cervical spinal cord
publications injury (SCI)
from the year 1950
from the database
Figure 2: Development of the number of
for central nervous system injury
to the year 2013, as retrieved
PubMed.
Fortunately, basic and preclinical research in the field of central nervous system trauma
is advancing at a fast pace and yields over 8,000 new publications per year growing at
an exponential rate (Fig. 2), yielding a total amount of approximately 150,000 PubMedlisted papers today.
The vast amount of newly published information exceeds by far the capacity of the
individual scientist to absorb the relevant knowledge [2]. Therefore, the knowledge that
underlies decision making in selecting the most promising therapeutic interventions for
further research or clinical trials is notoriously incomplete and, consequently, valuable
resources are wasted on numerous fruitless or redundant
experiments [3, 4].
Until now, there is no objective measure or decision supporting system for translation of
a SCI therapy from the animal to the human patient. In order to design promising clinical
trials, two major requirements must be met:
- Firstly, an evidence-based grading approach for ranking preclinical therapies to select
most promising candidates for clinical trials is of outmost importance.
- Secondly, meta-studies to combine results from different studies in order to obtain
novel knowledge and to detect relations between experimental conditions and potential
weaknesses of a therapy or an experimental model. Meta-analyses on gene or protein
expression are common, but meta-studies regarding outcomes of preclinical animal
studies are rarely carried out.
Thus there is an urgent need for a comprehensive knowledge base which aggregates
the current state of research in traumatic brain (TBI, >100,000 papers) and spinal cord
(>40,000 papers) injury and repair. This would require automatic information extraction
for big textual data analytics. The information extraction approach should extract the
relevant facts about experimental therapies from all relevant peer-reviewed scientific
publications currently available. The facts should be collected in a database being
capable of aggregating and filtering them in order to objectively grade the prospective
translational success of a particular therapeutic approach.
This would certainly speed up translation of the selected most promising innovative
therapies into clinical application for the benefit of the patients and, in addition, reduce
the enormous socio-economic costs by saving personal and financial ressources.
References
[1] Linard Filli and Martin E. Schwab. The rocky road to translation in spinal cord repair.
Ann Neurol, 72(4):491–501, 2012.
[2] Corie Lok. Literature mining: Speed reading. Nature, 463(7280):416–418, Jan 2010.
[3] Florian Prinz, Thomas Schlange, and Khusru Asadullah. Believe it or not: how much
can we rely on published data on potential drug targets? Nat Rev Drug Discov,
10(9):712, Sep 2011.
[4] Oswald Steward, Phillip G. Popovich, W. Dalton Dietrich, and Naomi Kleitman.
Replication and reproducibility in spinal cord injury research. Exp Neurol, 233(2):597–
605, Feb 2012.
Semantics for Integrated Laboratory Analytical Processes – the Allotrope
Perspective
Heiner Oberkampf
The software environment currently found in the analytical community consists of a
patchwork of incompatible software, proprietary and non-standardized file formats,
which is further complicated by incomplete, inconsistent and potentially inaccurate
metadata. To overcome these issues, the Allotrope Foundation develops a
comprehensive and innovative Framework consisting of metadata dictionaries, data
standards, and class libraries for managing analytical data throughout its lifecycle. The
talk describes how laboratory data and their semantic metadata descriptions are
brought together to ease the management of vast amount of data that underpin almost
every aspect of drug discovery and development.
Machine Learning with Healthcare Knowledge Graphs
Volker Tresp
A number of successful graph-based knowledge representations, such as DBpedia,
YAGO, or the Google Knowledge Graph, have recently been developed and are the
basis of applications ranging from the support of search to the realization of question
answering systems. Statistical machine learning can play an important role in
knowledge graphs as well. By exploiting statistical relational patterns one can predict
the likelihood of new facts, find entity clusters and determine if two entities refer to the
same real world object. Furthermore, one can analyze new entities and map them to
existing entities (recognition) and predict likely relations for the new entity. These
learning tasks can elegantly be approached by first transforming the knowledge graph
into a 3-way tensor where two of the modes represent the entities in the domain and the
third mode represents the relation type. Generalization is achieved by tensor
factorization using, e.g., the RESCAL approach. A particular feature of RESCAL is that
it exhibits collective learning where information can propagate in the knowledge graph
to support a learning task. I will present applications of our approach in our BMWifunded project in Clinical Data Intelligence where Knowledge Graphs integrate patient
data and background knowledge from different sources and multi-way tensors are the
basis for statistical inference and decision support.
Text representations for text mining
Hinrich Schütze
If we define text mining as the finding of patterns in large text collections, then the
representation of the text is key: many important patterns can only be detected if the
text is represented in a form that is conducive to generalization and reduces
sparseness. I will review recent work in my group in two areas that can contribute to text
representations that support text mining: morphological analysis and deep learning
embeddings.
Semi-automatic support for deriving structured vocabularies
Stefan Geißler
Real-world use of Text Analytics software in industry settings relies to quite some extent
on the availability of structured vocabularies (thesauri, ontologies) that describe the
domain of interest. These resources require constant maintenance, extension, updating,
always under time pressure and most of the time the respective work will be done by
domain experts who are typically not experts in Ontology Learning or Text Mining.
We outline an environment, the Luxid WebStudio, that supports domain experts with
some of the required functionality and outline also where going beyond this would be a
much welcomed step forward and would represent a fine area of research and
development cooperation for experts in Active Learning, Ontology Acquisition, GUI
design and related fields.
Medical Cyber Physical Systems
Daniel Sonntag
We explain the background of medical cyber-physical systems (MCPS) and discuss
some recent promising directions. MCPS are context-aware, networked systems of
medical sensor and actuation devices. Human-in-the-loop MCPS are of particular
interest: active input modes include digital pens, smartphones, and automatic
handwriting recognition for a direct digitalisation of patient data.
On Secondary Use of Clinical Data
Martin Sedlmayer
Healthcare is collecting huge amounts of data during direct patient care. In parallel, big
efforts are undertaken to learn from the data in clinical studies in order to determine
effective treatments, detect new drugs or to improve patient safety. Recent trends of
"Secondary Use" e.g. in the context of translational research as well as personalized
medicine greatly increase the need to efficiently turn the raw data into valuable medical
knowledge. In order to unlock the value of collected data, we require two things:
sufficient quantities of semantically rich data and knowledge about the data in order to
understand the findings and place actions upon them. Generating semantically rich
data by annotation with standardized terminologies is - as for now - a very tedious task.
Although there is no shortage of medical terminologies in all kinds of quality, quantity
and overlap, their practical use has not widely been realized. Free availability,
translations into local language, and usability-driven integration issues have to be
solved in order to embed such concepts into the routine. Collecting sufficient quantities
of high quality data is also not an easy task. Not only in the case of rare diseases, but
many clinical suffer from a lack of study participants. Pharmaceutical companies lose
millions of Euros per day due to late recruitment. Sharing or publishing valuable data,
however, is hindered by a diversified landscape of privacy regulations and
interpretations on local, national, European and international level. The presentations
will concentrate on the aspects of secondary use of clinical data in the light of linked and
open data by discussing immediate requirements as well as looking at international
projects as examples of first steps.
Phenotype resources for medical use: translational and personalized medicine in
action
Dietrich Rebholz-Schuhmann
Capturing single phenotype traits or the full phenotype description is a complex task due
to the large number of traits that form the phenotype, and due to the different types of
qualities linked to individual phenotypes (e.g., lack of an organ, insufficient function,
increase/decrease of a physiological parameter). For human, mouse and other model
organisms, specific resources have been produced to capture the description of a
phenotype. This talk will focus on the use of public phenotype resources for biomedical
data analytics.

Documents pareils