Mastering noise and silence in learner answers processing Simple

Transcription

Mastering noise and silence in learner answers processing Simple
Mastering noise and silence in learner answers processing
Simple techniques for analysis and diagnosis
Olivier Kraif, Claude Ponton, Alexia Blanchard
LIDILEM Laboratory, Stendhal University, Grenoble, France
{olivier.kraif; claude.ponton; alexia.blanchard}@u-grenoble3.fr
■ Learner answers analysis
Error
description
A real need for high quality feedback but4
Some facts:
♦ In most systems, analysis is only made by testing
character string identity
♦ NLP techniques in the field of CALL are underused
due to:
∗ the lack of reliability (noise, erroneous analyses)
∗ the high cost of implementation
♦ Lack of systematic follow up on experiments
♦ Overambitious and hardly attainable goals
Some hopes:
♦ Error detection alone may be a valuable step towards
didactic use
♦ Some straightforward and basic NLP techniques are
reliable enough
♦ To cope with the lack of reliability, it is possible to put
forward "Computer Aided" approaches rather than
"Automatized" processes (correction, evaluation,
feedback generation, activity generation, etc.)
Diagnosis
Enriched
learner
production
Detection/
description
Diagnosed
production
Annotation
Generic NLP
processes
Specific NLP
processes
(triangulation)
Enriched
expected
answer
Towards a low cost strategy
An empirical approach based on the following principles:
• Identifying the applications which allow the user to keep some leeway in interpreting results (partial analyses,
unsolved ambiguities, etc.) ⇒ machine aided correction, comprehension aids, activity generators, contentoriented tools
• Implementing first the most basic and reliable NLP techniques such as tokenization, POS tagging, lemmatization,
morphological analysis.
• Mastering, from the end-user (i.e. didactic) point of view, the short comings of Natural Language Processing. For
instance, in the context of an activity, the knowledge about the expected answer (EA) may yield additional data for
the given answer (GA) analysis.
• When ambiguities remain, multiple analyses may be integrated into the learning process, in order to help users
(teachers or learners) to make the right decisions.
• Developing a modular and declarative approach designed for resources and processes reusability, and allowing
end-users to define by themselves the relevant knowledge and parameters.
Learner
production
(GA)
♦
♦
♦
Expected
Answer
(EA)
Lemmatization
POS tagging
Morphological
analysis
Contextual
knowledge
Activity
Didactic
knowledge
Feedback
generation
ExoGen
Learner
■ The ExoGen system
General principle
Examples -
Simplification of triangulation :
The analysis is reduced to a comparison between EA and GA (no contextual analysis).
Resource: online inflected forms dictionary (http://abu.cnam.fr/)
glace
glacé
glacent
glacera
glaceraient
glacer
glacer
glacer
glacer
glacer
Ver:IPre+SG+P1:IPre+SG+P3:SPre+SG+P1:SPre+SG+P3:ImPre+SG+P2
Ver:PPas+Mas+SG
Ver:IPre+PL+P3:SPre+PL+P3
Ver:IFut+SG+P3
Ver:CPre+PL+P3
Analysis principle : Lesser difference heuristic, the analysis is guided by similarities
between potential tags of both EA and GA
EA: si j'avais su
GA: si j'aurais su
Common tags : Ver+SG
Category : Ver
Tags : IImp+SG+P1 or
IImp+SG+P2
Category : Ver
Tags : CPre+SG+P1 or
CPre+SG+P2
Disambiguated difference: IImp ¹ CPre Not disambiguated : P1 or P2
Examples of error
Description (automatically generated)
(9) avant de retourner [arriver] en Angleterre
Forme grammaticalement correcte (verbe infinitif),
mais on attendait une autre forme
et beaucoup d’échafaide [échafaudages]
Orthographe erronée ou mot inconnu du dictionnaire
Je dois me dépécher [dépêcher]
Orthographe erronée : problème d’accent
(9) sommes bien amusées et c’est vrai [juste] de dire que nous avons Forme grammaticalement correcte (adjectif ou adverbe ou nom mascudansé assez bien
lin singulier), mais on attendait une autre forme
C’était désespéré [désespérant] mais c’était la seule chance (9)
S’il s’agit du verbe désespérer :
Cas 1 [Masculin singulier] : On attend un participe présent et non un
participe passé
Pour moi l’ [cette] image crée une ambiance délassante
Forme grammaticalement correcte sur le plan de la catégorie
(déterminant), mais on attendait une autre forme avec d’autres traits
Le Premier ministre reste toujours un britannique [Britannique]
Exact, mais il faut une majuscule à l’initiale
Legend : Error found [correction]
Evaluation of error descriptions
EA=G A
a fte r g r a p h ic a l
n o r m a lis a tio n
All cases
Non ambiguous
Totally
disambiguated
Partially
disambiguated
Not
disambiguated
Correct
312
187
104
14
7
Incorrect
6
1
5
0
0
Precision
0,981
0,995
0,954
1
1
fa ls e
tr u e
C ase,
s p a c in g ,...
d iffe r e n c e s
Frida corpora (Granger, 2001)
G A = unknow n
fa ls e
tr u e
Forthcoming: integration of a morphological analyzer (Blanchard, 2007)
G A a n d E A s h a re
th e s a m e le m m a
G A c lo s e to E A
Aim: morphological analysis of unknown forms (paradigm confusion)
fa ls e
tr u e
GA = EA
e x c e p t d ia c r itic s
tr u e
D ia c r itic
d iffe r e n c e s
fa ls e
O r th o g r a p h ic a l
d iffe r e n c e
tr u e
A fo r m c lo s e to G A
e x is ts in th e le x ic o n
tr u e
O r th o g r a p h ic a l
d iffe r e n c e :
lis tin g o f th e
n e a r e s t fo r m s
G A a n d E A s h a re
th e s a m e c a te g o r y
tr u e
fa ls e
U nknow n
fo r m
fa ls e
Tag
d iffe r e n c e s
tr u e
fa ls e
fa ls e
G A and EA
s h a r e th e s a m e ta g s
Tag and
c a te g o r y
d iffe r e n c e s
e.g.
e.g.
e.g.
e.g.
e.g.
e.g.
"échafaide" "égales"
"considère"
"dépécher" "comtempler"
"CEE"
instead of instead of instead of
instead of
instead of
instead of
"échaffaudage" "égaux"
"considérer"
"dépêcher" "contempler"
"C.E.E."
■ Perspectives
G A a n d E A s h a re
th e s a m e c a te g o r y
tr u e
Lem m a
d iffe r e n c e s
fa ls e
Lem m a and
ta g d iffe r e n c e s
e.g.
e.g.
"prennent" "souffrons"
instead of
instead of
"saisissent" "subirons"
L e m m a , ta g
a n d c a te g o r y
d iffe r e n c e s
e.g.
"mieux"
instead of
"préférables"
General principle: segmentation of inflected forms into a [base form + inflection(s)] which are
interpreted linguistically
1. Integration into generic NLP processes in order to reduce numbers of unknown forms and
therefore to generate an analysis
2. Modifying tree analysis with checking inflectional model
Example
GA: attitudent
EA: attitudes
Category: N
Category: N
Tags: fem,plu
Tags: fem,plu
Model: inflection [-ent] (plu)
Model: inflection [-s] (plu)
This analysis allows description of “attitudent” as flexional error on plural
Completion of lesser difference analysis: integration of a wordnet or a thesaurus (semantic distance between lemmas)
Context analysis in order to disambiguate more precisely (depending on triangulation EA/GA/Context)
Definition of declarative rules to design a diagnosis process based on the lesser difference analysis (detection/description level). These rules should be applicable even in case of
residual ambiguity (e.g. suggestions, hypothesis, more general diagnosis,...)
Experimentation (work in progress): past participle agreement errors analysis in perfect tense (“passé composé”). Evaluation with end-users: French as a Foreign Language teachers /
learners

Documents pareils