T h t t t th i t t f 1. To what extent the generic structure of scientific articles – and more particularly in th fi ld f li i ti b t d?the field of linguistics – can be captured?
2. How instructive and useful are the structure and the regularities observed to organize g gand select the core features of the genre?
Poudat - CL2009 - 21/07/09
Corpus and linguistic features› A genre-homogeneous corpus› A genre-homogeneous corpus
224 articles
› A set of relevant features suited to the characteristics of scientific articles
TEI t tiTEI annotationMorphosyntactic annotation
Biber (1988), Malrieu and Rastier (2001), Habert et al (2000)Habert et al. (2000)
Poudat - CL2009 - 21/07/09
224 articles, 32 issues, 11 journals
Main publication date: 2000
35
40
20
25
30
Nb articles
10
15
20Nb issues
0
5
GE EL IO EL M AX LF SP IA M NX
LANGAGE
HEL
SEMIO
CIELSYNSEM
PRAX LF
RSPSCOLIAVERBUM
LINX
Poudat - CL2009 - 21/07/09
Document structure› <front>, <body> and
b k
Citations› <q> et <quote>
<back>› <head>› <div>
Examples " l "› <div>
<divtype=’introduction’>, <div type=’conclusion’>
› <p ana="exemplum">› <seg ana="exemplum">
<div type= conclusion >, <div type=’annonce de plan’>
› <note>
Formatting and typography› <note> ypog ap y
<foreign>foreign
Poudat - CL2009 - 21/07/09
Personal pronouns: Connectives:Personal pronouns:› Disambiguation of
anaphoric and impersonal IL
Connectives:› Opposition, causality,
consequence...p
› Annotation of the French ON
Linguistic metalanguage:› SN, SV /–ant or –wise
morphemes, etc.Numerals:› cardinal and ordinal
numbers, etc.
› Symbols: *, ?, ??, etc.
Verbs:numbers, etc.› structuring marks (e.g.
1.1.2.)› cross-references
› Modal verbs (falloir, pouvoir, devoir, etc.)
cross references
Poudat - CL2009 - 21/07/09
Within their linguistic class› Proportion of verbs conjugated with the future tense compared Proportion of verbs conjugated with the future tense compared
with the total conjugated verbs in the corpus / to the total verb forms in the corpus
In relation with other features› Collocations: verbs conjugated with the future tense are › Collocations: verbs conjugated with the future tense are
correlated with the WE pronounNous verrons ultérieurement…
› Grammatical rules: passé simple / imparfait› Grammatical rules: passé simple / imparfait› Text dimensions: verbs conjugated with the future tense are
correlated with imperativesR d f i dl di iReader-friendly dimension
Poudat - CL2009 - 21/07/09
Intercorrelated features forming the generic structure = core oppositions?generic structure = core oppositions?
Features that vary the least from one textto anotherto another› Variation coefficient› Elements that vary the least profile ++› Elements that vary the least – profile ++› Elements that vary the most – profile --
Poudat - CL2009 - 21/07/09
16
14
16
What varies the least:
10
12a a es e eas :
Length of the articleGeneral POS: nouns, adjectives, determiners…
Punctuation: dots, colons, commas and
6
8
, ,brackets
Persons: (3rd person) ON and impersonal ILTenses: présent, passé composé, infinitifs,
4
6 p p pmodaux présent, participes présent
0
2
Poudat - CL2009 - 21/07/09
Length ≈ 7333 tokens ≈ 19 pagesImpersonal pronouns: ON and ILImpersonal pronouns: ON and IL
7000
8000
9000
3000
4000
5000
6000
7000
0
1000
2000
3000
JE TU ON Impers. IL IL/ELLE NOUS VOUS ILS/ELLES
Poudat - CL2009 - 21/07/09
Length ≈ 7333 tokens ≈ 19 pagesImpersonal pronouns: ON (23% PP) and IL (21% PP)Impersonal pronouns: ON (23% PP) and IL (21% PP)
7000
8000
9000
3000
4000
5000
6000
7000
0
1000
2000
3000
JE TU ON Impers. IL IL/ELLE NOUS VOUS ILS/ELLES
Poudat - CL2009 - 21/07/09
Length ≈ 7333 tokens ≈ 19 pagesImpersonal pronouns: ON (23% PP) and IL (21% PP)Impersonal pronouns: ON (23% PP) and IL (21% PP)Punctuations: colons and brackets
40
50
60
10
20
30 Articles
Essays
Law texts
0
10Novels
Poudat - CL2009 - 21/07/09
Length ≈ 7333 tokens ≈ 19 pagesImpersonal pronouns: ON (23% PP) and IL (21% PP)Impersonal pronouns: ON (23% PP) and IL (21% PP)Punctuations: colons (7% p.) and brackets (19% p.)
20
25
5
10
15
Articles
Essays
Law texts
0
5Novels
Poudat - CL2009 - 21/07/09
Length ≈ 7333 tokens ≈ 19 pagesImpersonal pronouns: ON (23% PP) and IL (21% PP)Impersonal pronouns: ON (23% PP) and IL (21% PP)Punctuations: colons (7% p.) and brackets (19% p.)Tenses: présent (80% conjugated verbs) passé Tenses: présent (80% conjugated verbs), passé composé (5% CV)8090
304050607080
Articles
0102030
Essays
Law texts
Novels
Poudat - CL2009 - 21/07/09
16
14
16
What varies the most:P t ti b k l h l ti k
10
12 Punctuation: backslashes, exclamation marks, braces
Persons: 2nd persons and specificallyi d di j ti
6
8
possessives and disjunctivesTenses: passé simple, passé antérieur,
subjonctif imparfaitFormalization: domain specific symbols
4
6 Formalization: domain-specific symbols, interjections
0
2
Poudat - CL2009 - 21/07/09
2nd person pronouns: TU and VOUS
7000
8000
9000
3000
4000
5000
6000
7000
0
1000
2000
3000
JE TU ON Impers. IL IL/ELLE NOUS VOUS ILS/ELLES
Poudat - CL2009 - 21/07/09
2nd person pronouns: TU (2% PP) and VOUS (1% PP)
7000
8000
9000
3000
4000
5000
6000
7000
0
1000
2000
3000
JE TU ON Impers. IL IL/ELLE NOUS VOUS ILS/ELLES
Poudat - CL2009 - 21/07/09
2nd person pronouns: TU (2% PP) and VOUS (1% PP)Tenses: high variations of less used tenses: passé Tenses: high variations of less used tenses: passé simple, passé antérieur, subjonctif imparfait
40506070
0102030
Poudat - CL2009 - 21/07/09
Length ≈ 7333 tokens ≈ 19 pages
Impersonal pronouns ON (23% PP) and IL (21% PP) vs. 2ndp p ( ) ( )person pronouns TU (2% PP) and VOUS (1% PP)
Punctuations: colons (7% p.) and brackets (19% p.) vs.(7% p ) ( 9% p )backslashes, exclamation marks, braces (< 0,5% each)
Tenses: présent (80% conjugated verbs), passé composé andTenses: présent (80% conjugated verbs), passé composé andpassif (5% CV) vs. passé simple (1,8%), passé antérieur (0,26%), subjonctif imparfait (0,25%)
Formalization: domain-specific symbols, interjections
Specific and stable features in the corpus = core features = Specific and stable features in the corpus core features genre representation?
Poudat - CL2009 - 21/07/09
Mise en évidence expérimentale d'une organisation tomatotopique chez la soprano(Cantatrix sopranica L.)
Experimental demonstration of the tomatotopic organization in the Soprano(Cantatrix sopranica L.)
Georges PERECLaboratoire de physiologieFaculté de médecine Saint-AntoineParis, France
Georges PerecLaboratoire de physiologieFaculté de médecine Saint-AntoineParis, France
Les effets frappants du jet de tomates sur les sopranos, observés aux heures ultimes du siècle dernier par Marks et Spencer (1899) qui, les premiers, employèrent le terme de réaction de hurlements (RH), ont été largement décrits dans la littérature. Si de nombreuses études expérimentales
As observed at the turn of the century by Marks & Spencer (1899), who first named the ``yelling reaction'' (YR), the striking effects of tomato throwing on Sopranoes have been extensively described. Although numerous behavioral (Zeeg & Puss 1931; dans la littérature. Si de nombreuses études expérimentales
(Zeeg & Puss, 1931; Roux & Combaluzier, 1932; Sinon & coll., 1948), anatomopathologique(Hun & Deu, 1960), comparative (Karybb & Szyla, 1973) et prospective (Else & Vire, 1974) ont permis de décrire avec précision ces réponses caractéristiques, les données neuroanatomiques, aussi bien que neurophysiologiques sont en dépit de leur
Although numerous behavioral (Zeeg & Puss, 1931; Roux & Combaluzier, 1932; Sinon et al., 1948), pathological (Hun & Deu, 1960), comparative (Karybb& Szyla, 1973) and follow-up (Else & Vire, 1974) studies have permitted a valuable description of these typical responses, neuroanatomical, as well as
h i l i l d t i it f th i b aussi bien que neurophysiologiques sont, en dépit de leur grand nombre, étonnamment confuses. Dans leurs démonstrations désormais classiques, publiées dans la fin des années 20. Chou & Lai (1927 a, 16, c, 1928 a, 16, 1929 a, 1930) ont écarté l'hypothèse d'un simple réflexe nociceptif facio-facial qui avait été émise il y a de nombreuses années par certains auteurs (Mace & Doyne 1912; Payre &
neurophysiological data, are, in spite of their number, surprisingly confusing. In their henceforth late twenties' classical demonstrations, Chou & Lai (1927 a, b, c, 1928 a, b, 1929 a, 1930) have ruled out the hypothesis of a pure facio-facial nociceptive reflex that has been advanced for many years by a number of années par certains auteurs (Mace & Doyne, 1912; Payre &
Tairnelle, 1916; Sornette & Billevayze, 1925). authors (Mace & Doyne, 1912; Payre & Tairnelle, 1916; Sornette & Billevayzé, 1925).
Poudat - CL2009 - 21/07/09
Length ≈ 7333 tokens ≈ 19 pagesImpersonal pronouns ON (23% PP) and IL (21% PP) vs. 2nd
PUS
person pronouns TU (2% PP) and VOUS (1% PP)Punctuations: colons (7% p.) and brackets (19% p.) vs.backslashes exclamation marks braces (< 0 5% each)
CO
RP backslashes, exclamation marks, braces (< 0,5% each)Tenses: présent (80% conjugated verbs), passé composé and passif (5% CV) vs. passé simple (1,8%), passé
té i (0 26%) bj tif i f it (0 25%)antérieur (0,26%), subjonctif imparfait (0,25%)Formalization: domain-specific symbols, interjections
C
Length ≈ 2690 tokens ≈ 7 pagesImpersonal pronouns ON (14% PP) and IL (28% PP) vs. 2nd
person pronouns TU (0% PP) and VOUS (0% PP)
ATO
TOPI
C person pronouns TU (0% PP) and VOUS (0% PP)Punctuations: colons (4% p.) and brackets (34% p.) vs.backslashes, exclamation marks, braces (0% each)
TOM
A
Tenses: présent (36% conjugated verbs), passé composé and passif (5% CV) vs. passé simple, passé antérieur, subjonctif imparfait (0%)subjonctif imparfait (0%)Formalization: domain-specific symbols, interjections (0%)
Poudat - CL2009 - 21/07/09
Tenses: présent (80% conjugated verbs), passé composé and passif (5% CV) vs. passé simple
PUS
co posé a d pass (5% C ) s. passé s p e (1,8%), passé antérieur (0,26%), subjonctif imparfait (0,25%)
CO
RP + conditionnel (4,43%), futur (4,39%)
C
Tenses: présent (36% conjugated verbs), passé composé and passif (5% CV) vs. passé simple,
é té i bj tif i f it (0%)
ATO
TOPI
C passé antérieur, subjonctif imparfait (0%)+ conditionnel (0,41%), futur (0%)
TOM
A
No research framework, no research processi l dinvolved
Poudat - CL2009 - 21/07/09
Main Components Analysis (MCA)› Input: a matrix individuals (224 texts) x › Input: a matrix individuals (224 texts) x
observations (F: 136 / E: 80)› Examination of the factors (eigen values and ( g
linguistic characteristics)› Description of the first factorial map
DTM (Ludovic Lebart)http://www enst fr/egsh/lebart/http://www.enst.fr/egsh/lebart/
Poudat - CL2009 - 21/07/09
imparfait
passé antérieurplus q parfait
passé simplepassé antérieur
proper nouns
datesVOUS
conditionalinterjectionsJEquestion marks
TU
présentimpératifmodals – futur
i IL colonsbrackets
numeralsbraces ON
modals – présentsymbols
ling. acronymsimpers. IL colons
NOUS
slashes
Formalization marksPoudat - CL2009 - 21/07/09
imparfaitpl s q parfait
passé simplepassé antérieur
plus q parfait
proper nouns
datesVOUS
conditionalinterjectionsJEquestion marks
TUTU
présentimpératifmodals – futur
modals – présentimpers. IL colons
bracketsnumerals
braces
ON
Poudat - CL2009 - 21/07/09
symbolsling. acronyms
NOUS
slashes
Formalization marksPoudat - CL2009 - 21/07/09
Narrative articles = domain-specific, veryrarerareArticles containing an interlocutive dimension = domain-specificArticles containing formalization marks = Articles containing formalization marks domain-specificArticles containing the « positive coreArticles containing the « positive corefeatures » we found out
Poudat - CL2009 - 21/07/09
Narrative
Dialogue
Perec
Poudat - CL2009 - 21/07/09
imparfaitplus q parfait
passé antérieurpassé simple
commasordinalsproper nouns dates
i j i
VOUSconditionnel
length
ordinals
JETU
interjectionsquestion marks
présent
modals – présent modals – futur
symbolsling. acronyms
braces
numeralsslashes
NOUScolons impératif
ONimpers. IL
brackets
Formalization marksPoudat - CL2009 - 21/07/09
Heterogeneity of the genre of the research article in linguisticsresearch article in linguistics› Sub-fields associated with different writing
traditionstraditionsYet core features matching intuitionInterest and limits of morphosyntacticInterest and limits of morphosyntacticfeaturesC l i t t ?Cross-language generic structures?› Parallel, or comparable corpora and
t t ?tagsets?Poudat - CL2009 - 21/07/09