Vorlesung SS 2012 Multilinguale Mensch

Transcription

Vorlesung SS 2012 Multilinguale Mensch
Vorlesung SS 2012
Multilinguale Mensch-Maschine
Kommunikation
Prof. Dr. Tanja Schultz
Einführung
Dipl.-Inform. Tim Schlippe
Dienstag, 17. April 2012
1
Überblick
Vorlesung 1: Übersicht und Einführung
Allgemeine Informationen zur Vorlesung
Vorstellen des Lehrstuhls
interACT
Hinführung zum Thema
Anwendungsbeispiele
Einführung
•
•
•
•
•
2
Allgemeine Informationen: Vorlesung
Weiterführende Vorlesung im Hauptdiplom
– Vorkenntnisse sind nicht erforderlich
Prüfungsmöglichkeit:
– Ja, in Kognitive Systeme und Anthropomatik
Turnus:
– Jährlich im SS, 4+0
– Prüfung nur während der Vorlesungszeit (frühzeitig anmelden!)
Termine:
Einführung
– Di 14:00 – 15:30 (HS -101) und Do 14:00 – 15:30 (SR 131)
– Start 19.04.2012, Ende 19.07.2012
DozentInnen:
– Prof. Dr. Tanja Schultz
– Dipl.-Inform. Tim Schlippe
– Weitere MitarbeiterInnen des LS
3
Allgemeine Informationen: Vorlesung
Alle Vorlesungsunterlagen befinden sich unter
http://csl.anthropomatik.kit.edu > Studium und Lehre
> SS2012 > Multilinguale Mensch-Maschine Kommunikation
– Alle Folien als pdf (kein passwd Schutz)
– Aktuelle Änderungen, Ankündigungen, Syllabus
– Gegebenenfalls zusätzliches Material (Papers)
Grundlagen für Prüfungen:
– Vorlesungsinhalt, Folien, zusätzliches Material
Einführung
Fragen, Probleme und Kommentare sind jederzeit während der
Vorlesung willkommen, oder im persönlichen Gespräch: CSL,
Laborgebäude Kinderklinik, Geb. 50.21, Adenauerring 4
– Tanja Schultz ([email protected]), Raum 113
– Tim Schlippe ([email protected]), Raum 117
Sprechstunden Tanja Schultz nach Vereinbarung
4
Allgemeine Informationen: CSL
Lehrstuhl für Kognitive Systeme seit 1. Juni 2007
– Karlsruher Institut für Technologie,
Fakultät für Informatik
– Institut für Anthropomatik (neu seit 2009)
– Homepage: http://csl.anthropomatik.kit.edu
– Adresse: Adenauerring 4, 76131 Karlsruhe
Einführung
Kontakt:
– Prof. Dr.-Ing. Tanja Schultz
• [email protected]
• +49 721 608 46300
– Sekretariat Frau Helga Scherer
• [email protected]
• +49 721 608 46312
5
Forschung: Human-Centered Technologies
Anwendungsfeld Mensch-Maschine Interaktion
Herasusforerderungen und Aufgagen:
Produktivität und Usability
Einführung
Anwendungsfeld Mensch-Mensch Kommunikation
Herausforderung und Aufgaben:
Sprachenvielfalt, kulturelle Barrieren
Aufwand und Kosten
Kommunikation des Menschen mit seiner Umwelt
im weitesten Sinn:
Sprache, Bewegung, Biosignale
Technologien und Methoden:
Erkennen, Verstehen, Identifizieren
Statistische Modellierung, Klassifikation, ...
6
Lehre am CSL – Winter
Einführung
Wintersemester
• Biosignale und Benutzerschnittstellen
– 4+0, prüfbar in Kognitive Systeme und Anthropomatik
– Einführung in Erfassung und Interpretation von Biosignalen
– Anwendungsbeispiele
• Analyse und Modellierung menschlicher Bewegungen
– Einführung in die Analyse, Modellierung, und Erkennung menschlicher Bewegungsabläufe (gemeinsam mit Dr. Annika Wörner)
– 2+0, prüfbar in Kognitive Systeme und Anthropomatik
• Design und Evaluation Innovativer Benutzerschnittstellen
– 2+0, prüfbar in Kognitive Systeme und Anthropomatik
• Multilingual Speech Processing
– 2+0, Praktikum
– Entwicklung von Spracherkennungssystemen mittels Rapid
Language Adaptation Tools
7
Lehre am CSL – Winter
Wintersemester
• Praktikum Biosignale 2: Emotion und Kognition
Einführung
– 2+0
– Aufzeichnung und Analyse von Biosignalen (z.B. Puls,
Hautleitwert, Atmung) zur Erfassung emotionaler und
kognitiver Prozesse des Menschen
8
Lehre am CSL – Sommer
Sommersemester
• Multilinguale Mensch-Maschine Kommunikation
– 4+0, prüfbar in Kognitive Systeme und Anthropomatik
– Einführung in die automatische Spracherkennung und -verarbeitung
– Signalverarbeitung, statistische Modellierung, praktische Ansätze
und Methoden, Multilingualität
– Anwendungen in Mensch-Mensch Kommunikation und MenschMaschine Interaktion
– Anwendungsbeispiele
• Praktikum: Biosignale
Einführung
– Praktische Entwicklung
• Aufnahme von Bewegungsdaten (in Koop mit Sportinstitut)
• Verschiedene Biosensoren (Vicon, Beschleunigungssensoren, EMG)
• Automatischer Bewegungserkennung
9
Lehre am CSL – Sommer
Sommersemester
• Kognitive Modellierung
– 2+0, prüfbar in Kognitive Systeme und Anthropomatik
– Modellierung menschlicher Kognition und menschlichen Affekts im
Kontext der Mensch-Maschine-Interaktion
– Modelle menschlichen Verhaltens, menschliches Lernen
(Zusammenhang und Unterschiede zu maschinellen Lernverfahren),
Repräsentation von Wissen, Emotionsmodelle, und kognitive
Architekturen
Einführung
• Methoden der Biosignalverarbeitung
– 2+0, prüfbar in Kognitive Systeme und Anthropomatik
– algorithmische Methoden der modernen Biosignalverarbeitung
10
Arbeiten am CSL
Bachelor
Master
Studienarbeiten
Diplomarbeiten
Hiwi-Jobs
Einführung
•
•
•
•
•
11
Development of adaptive dialog system
• CSL develops successful EEG-based workload recognition
system
– Is the user fully attentive or distracted?
• Integrated into speech dialog system to adapt its behavior
– Simple example: High workload  use shorter, simpler utterances
• Your task for BA/MA/SA/DA thesis: Implement a workload
adaptive speech dialog system for more complex tasks (in Java)
– Explore possibilities for intelligent, “cognitive” system strategies to react to
high workload
– Creativity is encouraged and rewarded!
Einführung
• Learn about…
– application of speech recognition
– design of intelligent speech dialog systems
– usability and user-centered design
• Contact: [email protected]
12
SA/BA/DA/MA: Web-derived Pronunciations
Aufgaben:
• Finden und Extrahieren von Aussprachen im WWW
• Sicherstellung der Qualität
• Auswertung des Einfluss auf Spracherkennungssysteme
Einführung
Benötigte Kenntnisse:
• Grundlagenwissen Spracherkennung
• Programmierkenntnisse, z.B. in Perl oder PHP
• Spaß an Informatik und Linguistik
13
Ab sofort bei:
Tim Schlippe ([email protected])
Hörerliste
• Ausfüllen!
Fach, Semester Mtr.-Nr
Informatik, 36
Einführung
N Nachname, Vorname
1 SCHULTZ, Tanja
2
14
Email
[email protected]
Literatur
Xuedong Huang, Alex Acero and Hsiao-wuen Hon, Spoken
Language Processing, Prentice Hall PTR, NJ, 2001
($81.90 internet price)
Rabiner and Juang, Fundamentals of Speech Recognition,
Prentice Hall Signal Processing Series, Englewood Cliffs,
NJ, 1993
Einführung
Jelinek, Statistical Methods for Speech Recognition, MIT
Press, Cambridge, MA, 1997 ($35)
Schultz and Kirchhoff, Multilingual Speech Processing,
Elsevier, Academic Press, 2006
(ask the authors for discounts!)
+ diverse Artikel (pdf), die wir im Web zur Verfügung stellen
(wirklich lesen!)
15
Nützliche Links, Zusätzliches Material
• Alle Folien werden als pdf ins Web gestellt
http://csl.anthropomatik.kit.edu > Studium und Lehre
> SS2012 > Multilinguale Mensch-Maschine Kommunikation
• Elektronisches Archiv vieler Publikationsbände und Berichte
(Proceedings) der wichtigsten Konferenzen zum Thema
“Speech and Language”
 ICASSP (International Conference on Acoustics, Speech, and Signal
Processing)
 Interspeech (Zusammenschluss von Eurospeech und ICSLP)
 ASRU (Automatic Speech Recognition and Understanding)
Einführung
 ACL (Association of Comp Linguistics), NA-ACL (North American ACL)
 HLT (Human Language Technologies) …
16
Nützliche Links, Zusätzliches Material
• Biosignale und Benutzerschnittstellen (Schultz)
– Sprache als ein Biosignal in einem allgemeineren Rahmen
• Maschinelle Übersetzung (Waibel)
– Zusammenhang: Sprachübersetzung, statistische Methoden,
Sprachmodellierung
• Mustererkennung (Beyerer)
– Grundlagen Mustererkennung
• Automatische Spracherkennung (Waibel/Stüker)
Einführung
– Grundlagen Spracherkennung (WS)
• Praktikum: Multilingual Speech Processing (Schultz)
• Praktikum: Automatische Spracherkennung (Waibel)
• Seminar: Sprach-zu-Sprach-Übersetzung (Waibel)
17
Allgemeine Information: Ziel der Veranstaltung
Ziele der Vorlesung
•Sprache in der Mensch-Maschine Kommunikation
– Vor- und Nachteile von Sprache als Eingabesignal
– Aspekte der Multilingualität in der Spracherkennung
Einführung
•Grundlagen der Spracherkennung
–
–
–
–
–
–
Grundbegriffe
Sprachproduktion und Perzeption
Digitale Signalverarbeitung, Merkmalsextraktion
Statistische Modellierung, Klassifikation
Akustische Modellierung, HMMs
Sprachmodellierung
•Weitere Themen der Sprachverarbeitung
– Dialogmodellierung, Synthese, (Übersetzung: bei Prof. Waibel)
•Anwendungsbeispiele aus der Forschung
18
Heute: Anwendungsbeispiele
Einführung
• Spracherkennung: Von Spracheingabesignal nach Text
• Sprachsynthese: Von Text nach Sprachausgabesignal
• Sprachübersetzung (über Sprachengrenzen):
Von Sprachsignal in Sprache L1 zu Sprachsignal in L2
= Spracherkennung + MT + Sprachsynthese
• Sprachverstehen, Zusammenfassen
= Von Spracheingabesignal nach Bedeutung
• Sprachaktivität ist aber nicht nur das Was wird gesprochen
Wer spricht? → SprecherIDentifizierung
Welche Sprache wird gesprochen? → LanguageID
Über was wird gesprochen? → TopicID
Wie wird gesprochen? → EmotionID
Zu wem wird gesprochen? → Focus of Attention
• Übersetzung (über Speziesgrenzen): Beispiel Delphine
19
Introduction
Einführung
• Each of the lessons covers one topic from
“speech recognition and understanding”
• It covers the most important areas of today’s research
and also discusses some historic issues
• The goal of the course is to introduce you to the science
of automatic speech recognition and understanding
• Today‘s topic:
– Why are we doing Speech Recognition?
• What are the advantages and disadvantages
– Where is it useful?
• Examples of applications, demos
20
Why Automatic Speech Recognition?
ADVANTAGES:
•
Einführung
•
•
•
•
•
Natural way of communication for human beings
 No practicing necessary for users, i.e. speech
does not require any teaching as opposed to
reading/writing
 High bandwidth (speaking is faster than typing)
Additional communication channel (Multimodality)
Hands and eyes are free for other tasks
→ Works in the car / on the run / in the dark
Mobility (microphones are smaller than keyboards)
Some communication channels (e.g. phone) are designed
for speech
...
21
Why Automatic Speech Recognition?
DISADVANTAGES:
•
Einführung
•
•
•
•
Unusable where silence/confidentiality is required
(meetings, library, spoken access codes)
… we are working on solutions (see later)
Still unsatisfactory recognition rate when:
 Environment is very noisy (party, restaurant, train)
 Unknown or unlimited domains
 Uncooperative speakers (whisper, mumble, …)
Problems with accents, dialects, code-switching
Cultural factors (e.g. collectivism, uncertainty avoidance)
Speech input is still more expensive than keyboard
22
Input Speeds (Characters per Minute)
Mode
Handwriting
Typewriter
Einführung
Stenography
Speech
23
Standard
Best
200
500
200
1000
500
2000
1000
4000
Where is Speech Recognition and Understanding useful
Human - Machine Interaction:
1. Remote control applications
• Operating Machines over the Phone
2. Hands/Eyes busy or not useful
• Speech Recognition in cars
• Help for Physically Challenged, Nurse bots
3. Authentication
• Speaker Identification/Verification/Segmentation
• Language/Accent Identification
Einführung
4. Entertainment / Convenience
• Speech Recognition for Entertainment
• Gaming
5. Indexing and Transcribing Acoustic Documents
• Archive, Summarize, Search and Retrieve
24
Where is Speech Recognition and Understanding useful
Human - Human Interaction:
Einführung
1. Mediate communication across language boundaries
• Speech Translation
• Language Learning
• Synchronization / Sign Language
2. Support human interaction
• Meeting and Lecture systems
• Non-verbal Cue Identification
• Multimodal applications
• Speech therapy support
25
Operating Machines over the Phone
• Remote Controlled Home
 Operate heating / air conditioning, turn lights on/off, check email
• Voice-Operated Answering Machine

•
Call answering machine from anywhere and discuss recent calls
Access Databases
 Pittsburgh Bus Information with CMU’s Let’s Go at 412-268-3526
 Check the weather with MIT’s Jupiter at 1-888-573-8255
 Zugauskunft (Erlangen), Telefonauskunft, Fluggesellschaften, Kino
Einführung
• Call Center
 Route or dispatch calls, 911 emergency line
 AT&T: How may I help you?
The HMIHY system was deployed in 2001, and according to AT&T
was handling more than 2 million calls per month by the end of 2001.
• Use Interactive Services worldwide
 Plan your next trip with an artificial travel agent
26
Hands-Free / Eyes-Free Tasks
• Hands and/or Eyes are busy with tools
 Radio repair
 Construction site
• Hands and/or Eyes are needed to operate machines/cars
 Hold the steering wheel
 Pull levers, turn knobs, operate switches
 Watch the street while driving
 Monitor production line
Einführung
• Hands are working on other people
 Hair stylist cutting hair
 Surgeon working on a patient
• Hands and/or Eyes are not helpful in the environment
 Dark rooms (photography)
 Outer Space (remote control)
27
Speech Recognition in Cars
• Use your cellular phone while keeping your hands on the
wheel and eyes on the street, e.g. voice dialing
• Operate your audio device while driving
• Dictate messages (e-mails, SMS)
TODAY several companies and services
are emerging which do exactly this
Einführung
• Talk to your personal digital assistant
• Navigation Ask your way through a foreign city
Find the nearest restaurant
28
Support in everyday life, Help for Elderly and Physically Challenged
People who are immobile such as lying in bed/hospital or who can‘t
their hands due to illness or accidents
• operate parts of their environment/machines by voice
• ask a robot for help
Einführung
Nursebot Pearl and Florence:
CMU‘s Robotic assistant for the elderly
ISAC feeding a physically challenged individual
Center for Intelligent Systems, Vanderbilt Univ
Children with speaking disorders make significant improvements by trying
to make a speech recognizer understand them
Children with dyslexia and similar problems learn to read faster using
automatic speech recognition
29
Einführung
Information in Sprache
Speech
Recognition
Words
Language
Recognition
Language Name
Turkish
Speaker
Recognition
Speaker Name
Umut
Onune baksana be adam!
:
:
:
:
Emotion
Recognition
Emotion
Angry
Accent
Recognition
Accent
Istanbul
Topic ID:
Entity Tracking:
Acoustic Scene:
Discourse Analysis:
Chemicals
Istanbul
Bus Station
Negotiation
Tanja Schultz, Speaker Characteristics, In: C. Müller (Ed.) Speaker Classification, Lecture Notes in Computer
Science / Artificial Intelligence, Springer, Heidelberg - Berlin - New York, Volume 4343.
30
Speaker Recognition
Identification
Whose voice is it?
Verification/Detection
?
Is it Sally’s voice?
?
?
?
Segmentation and Clustering
Where are the speaker changes?
Tim
Einführung
Which segments are from the same speaker?
Will
31
Speaker Identification/Verification/Recognition
Verification
verify someone’s claimed identity, e.g.
is the person who s/he claims to be
Instead of password:
say something instead of typing
Identification
“who is speaking”
Identifies a speaker from an enrolled
population by searching the database
Personalized behavior:
Einführung
customize machine reaction automatically to the current user
Recognition
Often used to refer to all problems of
verification, identification,
segmentation&clustering
32
Speaker Segmentation and Clustering
Segmentation: Automatically segment incoming speech by speaker
Clustering: cluster segments of the same speaker
Adaptation: use parameters that are optimized recognition for specific speaker
Einführung
Mandarin Broadcast News
Speaker turn miss
Overlapping speech 33
Speech over noise
Language Identification
o
o
o
o
Auswahl Erkenner (bei multilingualer Spracherkennung)
Anrufweiterleitung (z.B. 911 emergency line)
Datenanalyse, Auswahl
Spezialfall: Akzenterkennung
o Optimierung aller Systemparameter auf Sprecherakzent
o E-Language Learning
Einführung
Japanese
Tanja Schultz, Identifizierung von Sprachen -Exemplarisch aufgezeigt am Beispiel der Sprachen Deutsch, Englisch und Spanisch,
Diplomarbeit, Institut für Logik, Komplexität und Deduktionssysteme, Universität Karlsruhe, April 1995
34
FarSID: Far-Field Speaker Recognition
• Originalsignal
• Effekt Echo
• Effekt Distance
Einführung
• Effect Raumgröße (1-m Distanz, .5-sec Echo)
Klein
Q. Jin, Y. Pan, T. Schultz, Far-Field Speaker Recognition, Proceedings of the IEEE International
Conference on Acoustics, Speech, and Signal Processing, ICASSP, Toulouse, France, 2006
35
Global Communication
Einführung
The dream (?) of communicating
across language boundaries
- A babelfish for everybody • Fun, Everyday life:
• Chat in your mother tongue
Worldwide
• Travel without comm. problems
• Business:
• Negotiate and being sure that
your partner is getting it right
• Computer has no stakes, e.g.
neutral translation, not lopsided
• Face-to-Face Communication
• Over the phone or internet
• Text-to-Text vs Speech-to-Speech
36
„The building of the tower of Babel“,
1563 by Pieter Brueghel,
Kunsthistorisches Museum, Vienna
The building of the Tower of Babel
and the Confusion of Tongues
(languages) in ancient Babylon
mentioned in Genesis
"Babel" is composed of two words
"baa“meaning "gate" and "el," "god."
Hence, "the gate of god.“ A related
word in Hebrew, "balal" means
"confusion."
GALE
GALE = Global Autonomous Language Exploitation:
Process huge volumes of speech and text data in
multiple languages (Arabic, Chinese, English)
• Broadcast News, Shows, Telephone Conversations
Apply automatic technology to spoken and written language:
• Absorb, Analyze, and Interpret
Einführung
Deliver pertinent information in easy-to-understand
forms to monolingual analysts
Three engines:
- Transcription,
- Translation,
- Distillation
37
Demonstration GALE – Chinese TV
Mandarin
Broadcast News
CCTV
recorded in the US
over satellite
Transforming the
Mandarin speech
Into Chinese text
using Automatic
Speech Recognition
ASR
Einführung
SMT
Translating from
Chinese text into
English text
using Statistical
Machine Translation
H. Yu, Y.C. Tam, T. Schaaf, S. Stüker, Q. Jin, M. Noamany, T. Schultz, The ISL RT04 Mandarin Broadcast
News Evaluation System, EARS Rich Transcription Workshop, Palisades, NY, November 2004
38
PDA Speech Translation in Mobile Scenarios
• Tourism
– Needs in Foreign Country
– International Events
Einführung
• Conferences
• Business
• Olympics
• Humanitarian Needs
– Humanitarian, Government
– Emergency line 911
– USA, multicultural
population
• Army, peace corps
A. Waibel, A. Badran, A. Black, R. Frederking, D. Gates, A. Lavie, L. Levin, K. Lenzo, L Mayfield Tomokiyo,
J. Reichert, T. Schultz, D. Wallace, M. Woszczyna, J. Zhang, Speechalator: Two-way Speech-to-Speech
Translation in your Hand. HLT-NAACL 2003, Edmonton, Alberta, Canada, 2003
39
Verbmobil
Einführung
Talk to people (face-to-face) from/in other countries in your own
language.
A step towards Startrek's "Universal Translator“
40
Mobility: Personal Digital Assistants
Use your PDA or cellular phone to get help
• Navigation
• Translation
• Information (travel, transportation, medical, ...)
Einführung
Demo
41
RLAT: Rapid Language Adaptation Tools
Major Problem: Tremendous costs and time for development
– Very few languages ( 50 out of 6900) with many resources
– Lack of conventions (e.g. Languages without writing system)
– Gap between technology and language expertise
 SPICE: Intelligent system that learns language from user
–
–
–
–
Speech Processing: Interactive Creation and Evaluation toolkit
Develop web-based toolkits for Speech Processing: ASR, MT, TTS
http://cmuspice.org
http://csl.ira.uka.de/rlat-dev
Demo
• Interactive efficient learning
Einführung
Interactive learning:
– Solicite knowledge from user in the loop
– Rapid adaptation of language independent models
Efficiency:
– Reduce time and costs by a factor of 10
T. Schultz, A. Black, S. Badaskar, M. Hornyak, J. Kominek, SPICE: Web-based Tools for Rapid Language
Adaptation in Speech Processing Systems, Proceedings42
of Interspeech, Antwerp, Belgium, August 2007
Meeting Room
Einführung
The Meeting Browser is a powerful tool that allows us to record a new
meeting, review or summarize an existing meeting or search a set of
existing meetings for a particular speaker, topic, or idea.
http://www.is.cs.cmu.edu/meeting_room/
43
Indexing Acoustic Documents
The world is flooded with
information.
More and more
information is coming
through audio-visual
channels.
Einführung
Trying to find information
in acoustic documents
needs an intelligent
acoustic search engine.
44
View4You / Informedia
Einführung
Automatically records Broadcast News and allows the
user to retrieve video segments of news items for
different topics using spoken language input
Kemp/Waibel 1999
45
Einführung
Education, Learning Languages
• LISTEN: Automated reading tutor that listens
to a child read it aloud a displayed text, and
helps where needed.
• CHENGO: web-based language learning in a
gaming environment for English, Chinese
• Programm CALL at CMU on Computer
Assisted Language Learning
46
Robust and Confidential Speech Recognition
Einführung
Traditional Speech Recognition:
• Capture the acoustic sound wave by microphone
• Transform signal into electrical energy
Requirements and Challenges:
• Audibility:
Speech needs to be perceivable by microphone
(no low voice or whispering, no silent speech)
• Interference: Speech disturbs others
(no speaking in libraries, theaters, meetings)
• Privacy: Speech signal can be captured by others
(no confidential phone calls in public places)
• Robustness:
Signal is corrupted by noisy environment
(difficult to recognize in restaurants, bars, cars)
47
Bone-conduction
• When we speak normally our body is a resonance box
Skin and bones vibrate when we speak (try this!)
• Capture this vibration by so-called bone-conducting
Stethoscopic
or skin-conducting microphones
Microphone
Einführung
Zheng et al.
Jou et al. / Intecs
• Whispered speech is defined as:
– the articulated production of respiratory sound
– with few or no vibration of the vocal-folds
– produced by the motion of the articulator apparatus
– transmitted through the soft tissue or bones of the head
48
Nakajima
Electromyography – Silent Speech
Approach:
– Surface Electromyography (EMG)
– Surface = No needles
– Electro = electrical activity
– Myo = muscle
– Graphy = recording
EMG-Signal
s1
s1 – s2
s2
Einführung
- Measure the electrical activity of facial
muscles by capturing the electrical
capacity differences
- MOTION is recorded, not acoustic signal
 silently moving the lips / articulators
is good enough
49
SILENT SPEECH
Demo
Demo Lautlose Kommunikation
Einführung
• http://csl.ira.uka.de/SilentSpeech
50
Delphinisch
Kommunikation über Sprachgrenzen  über Speziesgrenzen
• Zusammenarbeit mit Wild Dolphins Project
• freilebende Atlantis Spotted Dolphins
• Bestimmung, Verhalten, Kommunikation
• Kommunikation mit Delphinen
• Delphine versuchen Kontakt aufzunehmen
• Information 20Mio Jahre alte Spezies
http://wilddolphinproject.com
Einführung
• “Dolphone” und “Delphinisch”
• Lautproduktion, Perzeption, Frequenz,
Medium
• Mustererkennung, Extraktion,
Clustering, Statistische Modellierung
• Audio- und Video indexing, archiving, retrieval
• Audioaufnahme, -analyse, -synthese, -übersetzung
51
Even Beyond Human Speech …
Towards Communication with Dolphins
Why do we want to talk to Dolphins?
• They might have a lot to say (20Mio old species)
• It is a challenging scientific problem
- Cross language boundaries 
Cross species boundaries
- Different sound production, perception, …
- Different medium (water), transmission, omni-directional
• Nothing is known about dolphins’ language
• It involves spending a lot of time in the Bahamas 
Einführung
Why do Dolphins want to talk to us?
We don’t know …
… but there is evidence that they try hard
CMU: www.cs.cmu.edu/~tanja
Wild Dolphin Project
(http://wilddolphinproject.com)
52
Quaero
Einführung
• Collaborative research and development program
• Developing multimedia and multilingual indexing and
management tools
 e.g. automatic analysis, classification, extraction and
exploration of information
• Facilitate extraction of information in unlimited quantities of
multimedia and multilingual documents, including written texts,
speech and music audio files, and images and videos.
• Available to everyone via personal computers, television and
handheld terminals.
53
Conclusions
Speech:
• Is the most natural way of communication for human beings
• Does not require any teaching or practicing
• Has high bandwidth (speaking is faster than typing)
• Supplements other communication channels (Multimodality)
Einführung
Speech Recognition is useful:
• In hands-busy and eyes-busy environments
• For mobile / small devices
• Support in everyday life, Help for physically challenged folks
Speech Recognition and Understanding:
• Allows to (remotely) operate Machines
• Supports global communication between humans
• Break language (and maybe sometimes cultural) barriers
54