Vorfeld

Transcription

Vorfeld
Starting a sentence in L2 German – Discourse annotation of a learner corpus
Heike Zinsmeister, University of Konstanz, Germany
Margit Breckle, Vilnius Pedagogical University, Lithuania
Introduction
The ALeSKo Corpus
From data collection to empirical studies:
• Local coherence
– Transition from one sentence to the next
– Entity-based coherence / discourse relation-based coherence
Hand-written text:
Vorfeld
pre-field
Verb-Second:
finite
verb
• Assumption
133 German
Europarl
turns
1. Brand-new / frame-setting
2. Element of a partly ordered set
(Poset)
Transcription:
verbal
complex
middle field
Coherence-related Vorfeld
Preference hierarchy of pragmatic
Vorfeld functions
Frau Thimm will go on a journey in winter. During ths journey she will relax and she
will learn about the culture of China. Therefore it is not just an escape.
– German Vorfeld as an ideal position for linking a sentence to its preceding discourse and
establishing local coherence
German
3. Backward-looking center
• Research question
(cf. Speyer 2007)
– Do Chinese L2 learners of German use the Vorfeld in the same way as L1 speakers?
• Method
– Annotation of categories and relations related to local coherence
– Contrastive interlanguage analysis of (comparable) L2 texts and L1 texts
Chinese
Brand-new
[Die Leute, die viele Reise machen,] haben immer mehr Geld als die, die selten reisen.
the people that many journeys do have always more money than those that seldom travel
‘The people who travel a lot always have more money than those who seldom travel.’
 The term “Leute” (‘people’) occurs earlier in the text but does not refer to ‘people, who go on many
journeys,’
Poset
MMAX2 (Müller & Strube 2006):
Frame-setting
Constituents that set a frame in which the sentence is interpreted (cf. Jacobs 2001), wdt08_10:
[In den Attraktionspunkten] werden (...) notwendige Einrichtungen konzentriert angeboten.
at the attraction_sites are necessary facilities focussed offered
‘Necessary facilities are especially offered at the attraction sites.’
 Locative frame
ADVtimeSV
zuótian zuě xià de hěn jǐn
yesterday snow descend CSC very incessant
‘Yesterday it snowed incessantly.’
ADVlocativeSV
qiáng shang pá zhe hěn duō bìhǔ
wall on climb DUR very many salamander
• Word order
– basic: SVO
– definite, non-topic object: SOdefV
‘The wall has a lot of salamanders
crawling on it.’
Study
• Data: all 43 L2 texts (relevant 884 Vorfelds); subcorpus of 24 L1 texts (764 Vorfelds)
• Assumption: Chinese topics roughly correspond to backward-looking centers and framesetting elements
Constituents that are linked referentially to a salient element in the previous sentence (cf. Grosz
et al. 1995), wdt07_22:
 Antecedent is highest on a saliency hierarchy in comparison to other potential antecedents:
subject > object(s) > others
‘These photos won’t fit in this envelope.’
• time or locative phrase
Backward-looking center
 Referential expression corefers with expression in previous sentence
envelope in fit can’t enter this several photo
• familiar referent
 Implicit set: “Tageszeiten” (‘times of the day’); elements of the set: “jeden Morgen” (‘every
morning’), “jeden Abend” (‘every evening’) – annotation aid: co-hyponyms
 Poset: “Jeden Abend”
35
32,5
significant:
30
χ2=5.61,
df=1, p<0.05
27,1
not significant,
if normalized
with text length
Two-step annotation process: (i) primary (partly parallel) annotation, (ii) expert decision.
Inter-annotator agreement on backward-looking center (167 Vorfelds,): α(coder1,coder2)=
0.21; α(coder1,experts)=0.53; α(coder2,experts)=0.33).
The ALeSKo corpus consists of L2 essays (advanced Chinese L2 learners of German,
level: ~B2), L1 essays, metadata, annotation, annotation guidelines:
• wdt07: 25 L2 texts – topic: Are holidays an unsuccessful escape from everyday life?
(6,902 tokens)
• wdt08: 18 L2 texts – topic: Does tourism support understanding among nations? (6,685
tokens)
• Falko Essays L1 0.5: 39 essays – different topics (34,155 tokens) (cf. Falko, online)
Future Work
• Discourse-related coherence: first results indicate that L2 learners mark contingency (e.g. Damit ‘hence’, Aus diesem Grund ‘therefore’) and
expansion (e.g. Ferner ‘furthermore’, Auf solche Frage ‘on such a question’) more often in the Vorfeld than L1 speakers
• Error annotation: marking of errors and target hypotheses (e.g., [Was macht den Tourismus anders als die anderen Branchen] , ist ....
(wdt08_02); word order error (finite verb), target hypothesis: Was den Tourismus anders als die anderen Branchen macht, ist ... )
• Readability and Vorfeld use: rating and rewriting experiment (cf. Rosén 2006)
• Application in language teaching: creation of teaching material for the training of the effects of the Vorfeld on (local) coherence
25
Percent
Durch Reisen können sie auch andere Kultur und Lebenstile kennenlernen.
by travelling can they also other culture and lifestyles get_to_know
[Sie] können auch ihre Kenntnisse durch Reisen erweitern.
they can also their knowledge by travelling broaden
‘By travelling, they1 can become acquainted to other culture and lifestyles. They1 can also broaden
their knowledge by travelling.’
OtopicSV
xìnfēng lǐ zhuāng bu jìn zhèi xiē zhàopiàn
Topic-prominent: the topic always
comes first.
• Topic
– what the sentence is about (Li and
Thompson 1989: 15)
– sets a spatial, temporal or
individual framework within which the
main predication holds (Li and
Thompson 1989: 85)
Constituents that are not linked referentially to the previous discourse (cf. Prince 1981),
wdt07_04:
[Jeden Morgen] stehen wir auf, um pünktlich zur Arbeit zu sein. (...)
every morning get we up for punctual to_the work to be (...)
[Jeden Abend] bleiben wir zu Hause, sehen sinnlose Serien im Fernsehn.
every evening stay we at home, watch senseless shows in_the television
‘Every morning, we get up for being at work in time. (...) Every evening, we stay at home, watch the
senseless shows on TV.’
Coherence-related functions are
important but sentence-internal,
presentational functions are predominant.
– letters to the editor: 32%
– scientific radio programme: 71%
EXMARaLDA (Schmidt 2004):
Categories and Examples
Constituents that belong to a partly ordered set of which other elements are already introduced
(cf. Prince 1999), wdt08_10:
(Dipper and Zinsmeister 2009)
The proportion of Vorfelds related to the
pre-context varies with the text type
21,3
20
L1
16,9
L2
15
10
7,5
5
7,3
2,4
1,9
0
brand-new
poset
frame-setting
backward-looking
center
L1
16,9
2,4
7,3
27,1
L2
21,3
1,9
7,5
32,5
Result
Almost no Verb-third errors. L2 speakers use the function backward-looking center
significantly more often in the Vorfeld than L1 (cf. Breckle & Zinsmeister, in preparation).
 transfer effect from topic-prominent L1 Chinese to L2 German
References
Margit Breckle and Heike Zinsmeister. 2009. Annotationsrichtlinien Funktion des Vorfelds. Manuscript. December 2009. Pedagogical
University Vilnius and University of Konstanz.
Margit Breckle and Heike Zinsmeister. In preparation. A corpus-based contrastive analysis of local coherence in L1 and L2 German. In
Proceedings of the HDLP conference. Frankfurt/Main [a.o.]: Peter Lang.
Stefanie Dipper and Heike Zinsmeister. 2009. The Role of the German Vorfeld for Local Coherence. In: Christian Chiarcos, Richard Eckart
de Castilho and Manfred Stede (eds.) Von der Form zur Bedeutung: Texte automatisch verarbeiten / From Form to Meaning:
Processing Texts Automatically. Tübingen: Narr. 69–79.
Falko, online. http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung-en/falko
Barbara Grosz, Arvind Joshi and Scott Weinstein. 1995. Centering: A Framework for Modeling the Local Coherence of Discourse.
Computational Linguistics, 21. 203–225.
Jacobs, Joachim. 2001. The dimensions of topic-comment. Linguistics, 39 (4). 641–681.
Charles N. Li and Sandra A. Thompson. 1989. Mandarin Chinese: A Functional Reference Grammar. Berkeley and Los Angeles, CA:
University of California Press.
Ellen F. Prince. 1981. Toward a taxonomy of given-new information. In Peter Cole (ed.) Radical Pragmatics. New York: Academic Press.
223–255.
Ellen F. Prince. 1999. How not to mark topics: ‘Topicalization’ in English and Yiddish. 8 Texas Linguistics Forum.
Christina Rosén. 2006. Warum klingt das nicht deutsch? Probleme der Informationsstrukturierung in deutschen Texten schwedischer Schüler
und Studenten. Stockholm: Almqvist & Wiksell International.
Helmut Schmid. 1994. Probabilistic Part-of-Speech Tagging Using Decision Trees. In: D. H. Jones and H. Somers (eds.) New Methods in
Language Processing, UCL Press, 154–164.
Thomas Schmidt. 2004. EXMARaLDA – ein Modellierungs- und Visualisierungsverfahren für die computergestützte Transkription
gesprochener Sprache. In Proceedings of Konvens. Vienna, Austria.
Augustin Speyer. 2007. Die Bedeutung der Centering Theory für Fragen der Vorfeldbesetzung im Deutschen. Zeitschrift für
Sprachwissenschaft, 26. 83–115.
ALeSKo homepage: http://ling.uni-konstanz.de/pages/home/zinsmeister/alesko.html

Documents pareils