Multilingual Applications
Transcription
Multilingual Applications
,QWHUQDWLRQDOL]DWLRQ :H $SSOLFDWLRQV 0XOWLOLQJXD make WWW technology work with the many writing systems, languages, and cultural conventions of the global community http://www.w3.org/International/ W3C MultiLingual Computing & Technology Fred Popowich Issues include: markup, charactersets, fonts, … Tags @ Making applications/sites “regional” Issues include: language, culture, local information Maintenance is a key issue Localization Industry Standards Association www.lisa.com 1 “collection of particular cultural conventions such as language, date formats, number and currency formats, postal address formats and others” [MLC&T, 12(4), p63] Usually defined in terms of language-country combination What about conventions used in multiple countries? What about variations within a country A @ 0 : : : 4 3 < < 8 ? 9 => 9 - ; 5 67 4 5 3 * * ' ' +, ) & . ( ( ( " # # ! $% " A 0 2 /RFDOH 0 / /RFDOL]DWLRQ What role can automatic language translation play? $XWRPDWL 7UDQVODWLRQ # $ # The spectrum of “automation” FAMT, HAMT, MAHT Can be applied at various points During production During indexing During access Needs language specific resources Lexicons / dictionaries Multilingual corpora E How good is it? http://babelfish.altavista.com/tr Example I imagined that a beautiful unicorn was prancing through the woods. Alas, it was only a horse in disguise. > > A ; ?@ ; 1 = : B < < < 6 6 7 7 5 2 . . + + /0 - * 89 , , , 7UDQVODWLRQ E F 4 4 3 & & () ' ' D Option One: Indexing in source language of document Translation of queries Option Two Translating document into language of query C 'RFXPHQ 4 3 7UDQVODWLRQ % $ $ # $ $ $ http://www.searchtools.com/info/ir-crosslanguage.html definition IRTools /DQJXDJ " 5HWULHYDO ! ,QIRUPDWLR /DQJXDJ &URV 7UDQVODWLRQ 'RFXPHQ ! ! ! ! Resources Lexicons / Dictionaries Grammars Example-bases Generalization from examples 4XHU @ 7UDQVODWLRQ 7UDQVODWLRQ bank charges 1 1 0 'RFXPHQ / Translating documents (option 2) results in inaccuracies 7UDQVODWLRQ How does it work – phases Tagging (POS) Parsing Transfer Generation Translation J'ai imaginé qu'un bel unicorn prancing par les bois. Alas, c'était seulement un cheval dans le déguisement. I imagined that beautiful a unicorn prancing by wood. Alas, it was only one horse in the disguise. %DEHOILV - accusation, plainte (accusation) 2 1 - banque (financial institution) institution) banque (financial - bord, rive (of river) - charge, charger (a gun, battery) - talus, remblair (of earth) - inculper qn. (charge someone) - virer sur l’aile (aviation) - prix prix (cost) (cost) 5 < < ? 9 => 9 . ; 8 7 : : : 4 4 5 5 3 $ $ + + ( ( ,- * ' 67 ) ) ) # $ $ " %& # - foncer (charge about) ! >0/&7 ! +XPD @ Translation databases Lower translation costs (over time) More matches from database Reduced turnaround times Fewer words to translate Improved Consistency Increased Capacity E 'HYHORSHUVLG 0HWULFV Quality Measurements Acceptable vs. high quality databases Quality of source text 9 @ @ C = AB = 1 ? < D > > > 8 8 9 9 7 '2 . . + + /0 - * :; , , , & ' ' % () & 6 6 6 6 G G 5 F Tool burden Cost of buying, installing and supporting translation tool Alignment of existing translated material Process integration Training Licensing and support 0RU E 0HWULFV 3 'HYHORSHUVLG 4 # # $ # $ # " $LGH 7UDQVODWLR 7UDQVODWLRQ Context is important The more complex the query, the better? Resources Eurowordnet ELRA 4XHU 0DFKLQ /RFDOL]DWLRQ $ $ # " > > A ; ?@ ; 2 = : 8 9 < < 6 6 89 7 7 5 ( / / , , 01 . + 3 - - - ' ' Translate Edit Proof Review < B B B Parsing Converting Stripping )* ( ( ([HFXWH B 4 4 4 3UHSURFHVV & ! $ERYH % % $ File management Logging Storage Procurement On-line Off-line WK R D $IIHFW 'DWDEDV Setup Pre-process Execute Post-process Deliver 7UDQVODWLR 6HWXS R 3KDVH With Translation Databases +2 (+1) Pre-process +1 (+2) +4 (+5) Execute +1 (+2) +5 (+4) Post-process +1 (+2) +6 (+1) Deliver +1 (+1) +2 (+1) 1.6 0.6 ; Setup Base-Line Localization +1 (+1) ; ; Cost (benefit) ZKHUH¶ WK 52," : 6R 9 3URFHVV 8 * /RFDOL]DWLR Uploading Storage Opposite of Setup Layout Merging Opposite of Pre-process 'HOLYHU 3RVWSURFHVV Acknowledged as an “unscientific” and “subjective” assessment Dependent on the type of source data “Traditional translation database technology … is efficient in very specific context, but the problems of terminology updates, database updates after proof and so on are expensive [Y. Savourel] Score + 4 4 7 1 56 1 ( ' 3 0 / 2 2 2 , , ./ - - + % % " " & $ ! ) # # # (benefit/ cost)