The SPEERAL Decoder
Transcription
The SPEERAL Decoder
PN/LIANocera Pascal 11/15/06 The SPEERAL Decoder NOCERA Pascal Laboratoire d ’Informatique d ’Avignon AGROPARC BP 1228, 84911 AVIGNON Cedex 9 Tel : 04.90.84.35.07 E-mail : [email protected] LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr The SPEERAL System Stochastic approach wˆ = Arg max w P( w / X ) wˆ = Arg max w P( X / w) • P( w) Acoustic Parameters CE SOIR Decoder w X LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr Find the best hypothesis among all the possible hypotheses with the A* algorithm. FLAVOR workshop The SPEERAL System Stochastic approach wˆ = Arg max w P( w / X ) wˆ = Arg max w P( X / w) • P( w) t LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr Acoustic Models Linguistic Models FLAVOR workshop Complexité en Reconnaissance Automatique de la Parole 1 PN/LIANocera Pascal 11/15/06 Acoustic Models Hidden Markov Models Gaussian Mixture Models Contextual Models (Phonemes) S1 S2 S3 LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr FLAVOR workshop Acoustic Model Toolkit Parameterization program Text to phone program Alignment program HMM learning program Supervised and unsupervised Model Adaptation LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr – MLLR – MAP – Structural Model Space Transformation FLAVOR workshop Linguistic Models Stochastic Language Models – N-grams – Class based language models n P(W1n)=!P(wi /h) LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr i=1 FLAVOR workshop Complexité en Reconnaissance Automatique de la Parole 2 PN/LIANocera Pascal 11/15/06 Linguistic Model Toolkit Text Normalization Tools Model Training Language – CMU toolkit – SRI toolkit – AT&T toolkit Language Lexicon LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr Model Compilation Compilation FLAVOR workshop Standard A* algorithm « best-first » search algorithm – Extend the best path to generate new candidates – Assign a score F(x) to all explored path F(x) = g(x) + h(x) g(x) combines Language Model and acoustic scores h(x) estimates the probability of the best extension LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr – Keep the list of explored paths as a priority queue – When the best path reaches ‘end’ then stop FLAVOR workshop Standard A* algorithm (2/2) Requires an admissible heuristic function – h(x) underestimates the true remaining cost path (the more accurate the better). LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr Heuristics samples – h(x) = 0 • Breadth-First search – h(x) = true remaining cost (i.e. F(x) never changes) • Deterministic search FLAVOR workshop Complexité en Reconnaissance Automatique de la Parole 3 PN/LIANocera Pascal 11/15/06 The SPEERAL System Language model Lexical, phonetic and acoustic knowledge source – Stochastic n-gram LM (n=3) – Acoustic model (HMM, …) – Decoding vocabulary (lexicon) – Input signal Phoneme lattice • ( p, beg, end, sc ) with score sc = P(X beg..end/p) + …/… LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr FLAVOR workshop Sounding function h Remaining path estimation – Acoustic score only – Computed with a backward Viterbi, during the phoneme lattice generation Heuristic admissibility – Underestimate remaining cost : no LM information – Cannot be true cost (lack of LM information) LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr FLAVOR workshop Lexicon Prefix-tree organization – Widely applied – Compact representation • search effort occurs at word begin Lexicon p2 p2 W1 : p 1p2p 3 W2 : p 1p3 W3 : p 2p1 W1 p1 LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr p3 p3 p1 W2 W3 FLAVOR workshop Complexité en Reconnaissance Automatique de la Parole 4 PN/LIANocera Pascal 11/15/06 Search space Phoneme lattice of lexical trees Concatenation W1W1 W1 Lexicon: W3 Sentence beginning W2W1W1 W2 W2W1W2 W3W2W1 LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr W3W2W2 FLAVOR workshop LM look-ahead p2 Word anticipation p2 p3 W1 p1 p3 p1 W2 W3 – n is a lexicon node – wn is any leaf (i.e. word) of the sub-tree starting at n • P(n/...wi-2 w i-1) = Part_LM(n, wi-2 wi-1 ) • Part_LM(n, wi-2 w i-1 ) = maxWn[P(w n/wi-2 wi-1)] LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr Paths leading to improbable words are early penalized FLAVOR workshop Start-synchronous tree Asynchronous search – The search processes the same part (lexicon) with a different history. With start-synchronous capabilities – Most advanced path can be reused when encountered twice. LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr • For each frame x, the lexicon starting at x is stored. • Only the deepest nodes (or leaves) are stored. FLAVOR workshop Complexité en Reconnaissance Automatique de la Parole 5 PN/LIANocera Pascal 11/15/06 Principle (1/5) Frame 0 p2 Frame t p1 p2 p3 p1 p3 W1 W2 W3 Deepest lexicon nodes at frame 0 Deepest lexicon nodes at frame t p2 p1 LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr p2 p1 W3 p1 p2 p3 W1 p3 p1 FLAVOR workshop Principle (2/5) Frame 0 Frame t p2 p2 p1 LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr p2 p3 p1 W3 p1 p2 p3 W1 p3 p1 FLAVOR workshop Principle (3/5) Frame 0 Frame t W2 p2 p2 p1 LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr p2 p3 p1 p1 W3 p2 p3 W1 p3 p1 FLAVOR workshop Complexité en Reconnaissance Automatique de la Parole 6 PN/LIANocera Pascal 11/15/06 Principle (4/5) Frame 0 Frame t W2 p2 p2 p1 LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr p2 p3 p1 W3 p1 p2 p3 W1 p3 p1 FLAVOR workshop Principle (5/5) …. Frame 0 Frame t Frame t+n p1 p2 W2 p2 p2 p1 LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr p2 p3 p1 p2 p1 W1 p2 p1 W3 p3 p3 p1 FLAVOR workshop Search space pruning Optimization – If two candidates end with the same 3 words, only the best is kept. Cut LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr – Short candidates are dropped when their distance increase too much with the deepest. FLAVOR workshop Complexité en Reconnaissance Automatique de la Parole 7 PN/LIANocera Pascal 11/15/06 ASR Output Applications – 1 best hypothesis – N best hypothesis – word graph – – – – – – LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr Transcription Question answering Named entities extraction Information Retrieval Call-type classification … FLAVOR workshop French Broadcast News Campain ESTER Acoustic Segmentation Speaker Segmentation Broadcast News (1h long show) Acoustic models LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr Speech transcription Language models Information Extraction FLAVOR workshop System Description Acoustic Models : • 10k HMM contextual • 3.6k states • 230k gaussian Lexicon : 65K Words model Combination : Language LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr • (Le Monde 87-02, 0.41) • (Le Monde 02-03, 0.24) • (ESTER, 0.35) FLAVOR workshop Complexité en Reconnaissance Automatique de la Parole 8 PN/LIANocera Pascal 11/15/06 Results and Demonstration WER ≈ 25 % (10 RT) Demonstration on TV LABORATOIRE D’I NFORMATIQUE CERI 339 Chemin des Meinajariès BP 1228 84911 AVIGNON CEDEX 09 Tél. + 33 (0)4 90 84 35 09 Fax. + 33 (0)4 90 84 35 01 [email protected] http://www.lia. univ-avignon.fr FLAVOR workshop Complexité en Reconnaissance Automatique de la Parole 9