The SPEERAL Decoder

Transcription

The SPEERAL Decoder
PN/LIANocera Pascal
11/15/06
The SPEERAL
Decoder
NOCERA Pascal
Laboratoire d ’Informatique d ’Avignon
AGROPARC
BP 1228, 84911 AVIGNON Cedex 9
Tel : 04.90.84.35.07
E-mail : [email protected]
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
The SPEERAL System
Stochastic approach
wˆ = Arg max w P( w / X )
wˆ = Arg max w P( X / w) • P( w)
Acoustic
Parameters
CE SOIR
Decoder
w
X
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
Find the best hypothesis among all the possible
hypotheses with the A* algorithm.
FLAVOR workshop
The SPEERAL System
Stochastic approach
wˆ = Arg max w P( w / X )
wˆ = Arg max w P( X / w) • P( w)
t
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
Acoustic
Models
Linguistic
Models
FLAVOR workshop
Complexité en Reconnaissance
Automatique de la Parole
1
PN/LIANocera Pascal
11/15/06
Acoustic Models
Hidden Markov Models
Gaussian Mixture Models
Contextual Models (Phonemes)
S1
S2
S3
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
FLAVOR workshop
Acoustic Model Toolkit
Parameterization program
Text to phone program
 Alignment program
 HMM learning program
 Supervised and unsupervised Model Adaptation


LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
– MLLR
– MAP
– Structural Model Space Transformation
FLAVOR workshop
Linguistic Models
 Stochastic
Language Models
– N-grams
– Class based language models
n
P(W1n)=!P(wi /h)
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
i=1
FLAVOR workshop
Complexité en Reconnaissance
Automatique de la Parole
2
PN/LIANocera Pascal
11/15/06
Linguistic Model Toolkit
 Text
Normalization Tools
Model Training
 Language
– CMU toolkit
– SRI toolkit
– AT&T toolkit
 Language
 Lexicon
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
Model Compilation
Compilation
FLAVOR workshop
Standard A* algorithm
«
best-first » search algorithm
– Extend the best path to generate new candidates
– Assign a score F(x) to all explored path
F(x) = g(x) + h(x)
g(x) combines Language Model and acoustic scores
h(x) estimates the probability of the best extension
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
– Keep the list of explored paths as a priority queue
– When the best path reaches ‘end’ then stop
FLAVOR workshop
Standard A* algorithm (2/2)

Requires an admissible heuristic function
– h(x) underestimates the true remaining cost path (the
more accurate the better).

LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
Heuristics samples
– h(x) = 0
• Breadth-First search
– h(x) = true remaining cost (i.e. F(x) never changes)
• Deterministic search
FLAVOR workshop
Complexité en Reconnaissance
Automatique de la Parole
3
PN/LIANocera Pascal
11/15/06
The SPEERAL System

Language model

Lexical, phonetic and acoustic knowledge source
– Stochastic n-gram LM (n=3)
– Acoustic model (HMM, …)
– Decoding vocabulary (lexicon)
– Input signal  Phoneme lattice
• ( p, beg, end, sc ) with score sc = P(X beg..end/p)
+ …/…
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
FLAVOR workshop
Sounding function h

Remaining path estimation
– Acoustic score only
– Computed with a backward Viterbi, during the
phoneme lattice generation

Heuristic admissibility
– Underestimate remaining cost : no LM information
– Cannot be true cost (lack of LM information)
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
FLAVOR workshop
Lexicon

Prefix-tree organization
– Widely applied
– Compact representation
• search effort occurs at word begin
Lexicon
p2
p2
W1 : p 1p2p 3
W2 : p 1p3
W3 : p 2p1
W1
p1
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
p3
p3
p1
W2
W3
FLAVOR workshop
Complexité en Reconnaissance
Automatique de la Parole
4
PN/LIANocera Pascal
11/15/06
Search space
 Phoneme
lattice
of lexical trees
 Concatenation
W1W1
W1
Lexicon:
W3
Sentence beginning
W2W1W1
W2
W2W1W2
W3W2W1
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
W3W2W2
FLAVOR workshop
LM look-ahead
p2

Word anticipation
p2
p3
W1
p1
p3
p1
W2
W3
– n is a lexicon node
– wn is any leaf (i.e. word) of the sub-tree starting at n
• P(n/...wi-2 w i-1) = Part_LM(n, wi-2 wi-1 )
• Part_LM(n, wi-2 w i-1 ) = maxWn[P(w n/wi-2 wi-1)]
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
 Paths leading to improbable words are early penalized
FLAVOR workshop
Start-synchronous tree

Asynchronous search
– The search processes the same part (lexicon) with a
different history.

With start-synchronous capabilities
– Most advanced path can be reused when encountered
twice.
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
• For each frame x, the lexicon starting at x is stored.
• Only the deepest nodes (or leaves) are stored.
FLAVOR workshop
Complexité en Reconnaissance
Automatique de la Parole
5
PN/LIANocera Pascal
11/15/06
Principle (1/5)
Frame 0
p2
Frame t
p1
p2
p3
p1
p3
W1
W2
W3
Deepest lexicon nodes at frame 0
Deepest lexicon nodes at frame t
p2
p1
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
p2
p1
W3
p1
p2
p3
W1
p3
p1
FLAVOR workshop
Principle (2/5)
Frame 0
Frame t
p2
p2
p1
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
p2
p3
p1
W3
p1
p2
p3
W1
p3
p1
FLAVOR workshop
Principle (3/5)
Frame 0
Frame t
W2
p2
p2
p1
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
p2
p3
p1
p1
W3
p2
p3
W1
p3
p1
FLAVOR workshop
Complexité en Reconnaissance
Automatique de la Parole
6
PN/LIANocera Pascal
11/15/06
Principle (4/5)
Frame 0
Frame t
W2
p2
p2
p1
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
p2
p3
p1
W3
p1
p2
p3
W1
p3
p1
FLAVOR workshop
Principle (5/5) ….
Frame 0
Frame t
Frame t+n
p1
p2
W2
p2
p2
p1
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
p2
p3
p1
p2
p1
W1
p2
p1
W3
p3
p3
p1
FLAVOR workshop
Search space pruning
 Optimization
– If two candidates end with the same 3
words, only the best is kept.
 Cut
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
– Short candidates are dropped when their
distance increase too much with the
deepest.
FLAVOR workshop
Complexité en Reconnaissance
Automatique de la Parole
7
PN/LIANocera Pascal

11/15/06
ASR Output

Applications
– 1 best hypothesis
– N best hypothesis
– word graph
–
–
–
–
–
–
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
Transcription
Question answering
Named entities extraction
Information Retrieval
Call-type classification
…
FLAVOR workshop
French Broadcast News Campain
ESTER
Acoustic
Segmentation
Speaker
Segmentation
Broadcast News
(1h long show)
Acoustic
models
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
Speech
transcription
Language
models
Information
Extraction
FLAVOR workshop
System Description
 Acoustic Models :
• 10k HMM contextual
• 3.6k states
• 230k gaussian
 Lexicon
: 65K Words
model Combination :
 Language
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
• (Le Monde 87-02, 0.41)
• (Le Monde 02-03, 0.24)
• (ESTER, 0.35)
FLAVOR workshop
Complexité en Reconnaissance
Automatique de la Parole
8
PN/LIANocera Pascal
11/15/06
Results and Demonstration
 WER
≈ 25 % (10 RT)
 Demonstration
on TV
LABORATOIRE
D’I NFORMATIQUE
CERI
339 Chemin des Meinajariès
BP 1228
84911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09
Fax. + 33 (0)4 90 84 35 01
[email protected]
http://www.lia. univ-avignon.fr
FLAVOR workshop
Complexité en Reconnaissance
Automatique de la Parole
9

Documents pareils