Multilingual Applications

Transcription

Multilingual Applications
,QWHUQDWLRQDOL]DWLRQ
:H
$SSOLFDWLRQV
0XOWLOLQJXD
make WWW technology work with the many writing
systems, languages, and cultural conventions of the
global community
http://www.w3.org/International/
W3C
MultiLingual Computing & Technology
Fred Popowich
Issues include: markup, charactersets, fonts, …
Tags
@
Making applications/sites “regional”
Issues include:
language, culture, local information
Maintenance is a key issue
Localization Industry Standards
Association
www.lisa.com
1
“collection of particular cultural conventions
such as language, date formats, number and
currency formats, postal address formats and
others” [MLC&T, 12(4), p63]
Usually defined in terms of language-country
combination
What about conventions used in multiple
countries?
What about variations within a country
A
@
0
:
:
:
4
3
<
<
8
?
9
=>
9
-
;
5
67
4
5
3
*
*
'
'
+,
)
&
.
(
(
(
"
#
#
!
$%
"
A
0
2
/RFDOH
0
/
/RFDOL]DWLRQ
What role can automatic language translation play?
$XWRPDWL
7UDQVODWLRQ
#
$
#
The spectrum of “automation”
FAMT, HAMT, MAHT
Can be applied at various points
During production
During indexing
During access
Needs language specific resources
Lexicons / dictionaries
Multilingual corpora
E
How good is it?
http://babelfish.altavista.com/tr
Example
I imagined that a beautiful unicorn
was prancing through the woods.
Alas, it was only a horse in disguise.
>
>
A
;
?@
;
1
=
:
B
<
<
<
6
6
7
7
5
2
.
.
+
+
/0
-
*
89
,
,
,
7UDQVODWLRQ
E
F
4
4
3
&
&
()
'
'
D
Option One:
Indexing in source language of document
Translation of queries
Option Two
Translating document into language of
query
C
'RFXPHQ
4
3
7UDQVODWLRQ
%
$
$
#
$
$
$
http://www.searchtools.com/info/ir-crosslanguage.html
definition
IRTools
/DQJXDJ
"
5HWULHYDO
!
,QIRUPDWLR
/DQJXDJ
&URV
7UDQVODWLRQ
'RFXPHQ
!
!
!
!
Resources
Lexicons / Dictionaries
Grammars
Example-bases
Generalization from examples
4XHU
@
7UDQVODWLRQ
7UDQVODWLRQ
bank charges
1
1
0
'RFXPHQ
/
Translating documents (option 2) results
in inaccuracies
7UDQVODWLRQ
How does it work – phases
Tagging (POS)
Parsing
Transfer
Generation
Translation
J'ai imaginé qu'un bel unicorn prancing par
les bois. Alas, c'était seulement un cheval
dans le déguisement.
I imagined that beautiful a unicorn prancing
by wood. Alas, it was only one horse in the
disguise.
%DEHOILV
- accusation, plainte (accusation)
2
1
- banque
(financial institution)
institution)
banque (financial
- bord, rive (of river)
- charge, charger (a gun, battery)
- talus, remblair (of earth)
- inculper qn. (charge someone)
- virer sur l’aile (aviation)
- prix
prix (cost)
(cost)
5
<
<
?
9
=>
9
.
;
8
7
:
:
:
4
4
5
5
3
$
$
+
+
(
(
,-
*
'
67
)
)
)
#
$
$
"
%&
#
- foncer (charge about)
!
>0/&7
!
+XPD
@
Translation databases
Lower translation costs (over time)
More matches from database
Reduced turnaround times
Fewer words to translate
Improved Consistency
Increased Capacity
E
'HYHORSHUVLG
0HWULFV
Quality Measurements
Acceptable vs. high quality databases
Quality of source text
9
@
@
C
=
AB
=
1
?
<
D
>
>
>
8
8
9
9
7
'2
.
.
+
+
/0
-
*
:;
,
,
,
&
'
'
%
()
&
6
6
6
6
G
G
5
F
Tool burden
Cost of buying, installing and
supporting translation tool
Alignment of existing translated
material
Process integration
Training
Licensing and support
0RU
E
0HWULFV
3
'HYHORSHUVLG
4
#
#
$
#
$
#
"
$LGH
7UDQVODWLR
7UDQVODWLRQ
Context is important
The more complex the query, the
better?
Resources
Eurowordnet
ELRA
4XHU
0DFKLQ
/RFDOL]DWLRQ
$
$
#
"
>
>
A
;
?@
;
2
=
:
8
9
<
<
6
6
89
7
7
5
(
/
/
,
,
01
.
+
3
-
-
-
'
'
Translate
Edit
Proof
Review
<
B
B
B
Parsing
Converting
Stripping
)*
(
(
([HFXWH
B
4
4
4
3UHSURFHVV
&
!
$ERYH
%
%
$
File management
Logging
Storage Procurement
On-line
Off-line
WK
R
D
$IIHFW
'DWDEDV
Setup
Pre-process
Execute
Post-process
Deliver
7UDQVODWLR
6HWXS
R
3KDVH
With Translation
Databases
+2 (+1)
Pre-process
+1 (+2)
+4 (+5)
Execute
+1 (+2)
+5 (+4)
Post-process
+1 (+2)
+6 (+1)
Deliver
+1 (+1)
+2 (+1)
1.6
0.6
;
Setup
Base-Line
Localization
+1 (+1)
;
;
Cost (benefit)
ZKHUH¶
WK
52,"
:
6R
9
3URFHVV
8
*
/RFDOL]DWLR
Uploading
Storage
Opposite of Setup
Layout
Merging
Opposite of Pre-process
'HOLYHU
3RVWSURFHVV
Acknowledged as an “unscientific” and
“subjective” assessment
Dependent on the type of source data
“Traditional translation database technology
… is efficient in very specific context, but the
problems of terminology updates, database
updates after proof and so on are expensive
[Y. Savourel]
Score
+
4
4
7
1
56
1
(
'
3
0
/
2
2
2
,
,
./
-
-
+
%
%
"
"
&
$
!
)
#
#
#
(benefit/ cost)

Documents pareils