Algorithmic recognition of the Verb

Министерство образования Республики Беларусь Учреждение образования «Гомельский государственный университет им. Ф. Скорины» Филологический факультет Курсовая работа Algori hmic recog i io of he Verb Исполнитель: Студентка группы К-42 Марченко Т.Е. Гомель 2005 Co e I roduc io Basic assump io s a d some fac s 1 Algori hm for au oma ic recog i io of verbal a d omi al word groups 2 Lis s of markers used by Algori hm o 1 3 ex sample processed by he algori hm Examples of ha d checki g of he performa ce of he algori hm Co clusio Refere ces I roduc io he adve a d he subseque wide use of formal grammars for ex sy hesis a d for formal represe a io of he s ruc ure of he Se e ce could o produce adequa e resul s whe applied o ex a alysis. herefore a be er a d more sui able solu io was sough . Such a solu io was fou d i he algori hmic approach for he purposes of ex a alysis. he algori hmic approach uses series of i s ruc io s, wri e i a ural La guage a d orga ized i flow char s, wi h he aim of a alysi g cer ai aspec s of he gramma ical s ruc ure of he Se e ce. he procedures - i he form of a fi i e seque ce of i s ruc io s orga ized i a algori hm - are based o he gramma ical a d sy ac ical i forma io co ai ed i he Se e ce. he me hod used i his chap er closely follows he approach adop ed by he all-Russia group S a is ika Rechi i he 1970s a d described i a umber of publica io s (Kovcri , 1972: Mihailova, 1973; Georgiev, 1976). I is o be o ed, however, ha he resul s achieved by he algori hmic procedures described i his s udy by far exceed he resul s for he E glish la guage ob ai ed by Primov a d Soroki a (1970) usi g he same me hod. ( o preve u au horized commercial use he au hors published o ly he block-scheme of he algori hm.) Basic assump io s a d some fac s I is a well k ow fac ha ma y difficul ies are e cou ered i ex Processi g. A major difficul y, which if o removed firs would hamper a y fur her progress, is he ambigui y prese i he wordforms ha po e ially belo g o more ha o e Par of Speech whe ake ou of co ex . herefore i is esse ial o fi d he fea ures ha disambigua e he wordforms whe used i a co ex a d o defi e he disambigua io process algori hmically. As a firs s ep i his direc io we have chose o disambigua e hose wordforms which po e ially (whe ou of co ex , i a dic io ary) ca be a ribu ed o more ha o e Par of Speech a d where o e of he possibili ies is a Verb. hese possibili ies i clude Verb or ou (as i s ay), Verb or ou or Adjec ive (as i pai , crash), Verb or Adjec ive (as i calm), Verb or Par iciple (as i se led, asked, pu ), Verb or ou or Par iciple (as i ru , abode, bid), Verb or Adjec ive or Par iciple (as i closed), a d Verb or ou or Par iciple or Adjec ive (as i cu ). We'll s ar wi h he assump io ha for every wordform i he Se e ce here are o ly wo possibili ies: o be or o o be a Verb. herefore, o ly provisio ally, exclusively for he purposes of he prese ype of descrip io a d subseque algori hmic a alysis of he Se e ce, we shall assume ha all wordforms i he Se e ce which are o Verbs belo g o he o -verbal or omi al Word Group ( G).

As a resul of his defi i io , he G will i corpora e he ou , he Adjec ive, he Adverb, he umeral, he Pro ou , he Preposi io a d he Par iciple 1s used as a a ribu e (as i he bes selec ed audie ce) or as a Compleme (as i we'll regard his ma er se led). All he wordforms i he Se e ce which are Verbs form he Verbal Group (VG). he VG i cludes all mai a d Auxiliary Verbs, he Par icle o (used wi h he I fi i ive of he Verb), all verbal phrases co sis i g of a Verb a d a ou (such as ake place, ake par , e c.) or a Verb a d a Adverb (such as go ou , ge up, se aside, e c.), a d he Par iciple 2 d used i he compou d Verbal e ses (such as had arrived). he formal fea ures which help us recog ize he omi al or verbal charac er of a wordform are called 'markers' (Ses ier a d Dupuis, 1962). Some markers, such as he, a, a , a , by, o , i , e c. (mos of hem are Preposi io s), predic wi h 100 per ce accuracy he omi al a ure of he wordform immedia ely followi g hem (so lo g as he Preposi io s are o par of a phrasal Verb). O her markers, i cludi g wordform e di gs such as -i g a d -es, or a Preposi io which is also a Par icle such as o, e c., whe used si gly o heir ow (wi hou he help of o her markers) ca o predic accura ely he verbal or omi al charac er of a wordform. Co sideri g he fac ha o all markers give 100 per ce predic abili y (eve whe all markers i he immedia e vici i y of a wordform are ake i o co sidera io ), i becomes evide ha he e ire process of formal ex a alysis usi g his me hod is based, o a cer ai degree, o probabili y. he ques io is how o reduce he possible errors. o his purpose, he followi g procedures were used: a) he co ex of a wordform was explored for markers, movi g back a d for h up o hree words o he lef a d o he righ of he wordform; b; some algori hmic i s ruc io s preceded o hers i seque ce as a ma er of rule i order o ac as a addi io al scree i g; o decisio was ake prema urely, wi hou sufficie gramma ical a d sy ac ical evide ce bei g co ai ed i he markers; o i s ruc io was co sidered o be fi al wi hou sufficie checki g a d es s provi g he success ra e of i s performa ce. he algori hm prese ed i Sec io 3 below, umbered as Algori hm o 1 i.Georgicv, 1991), whe es ed o ex s chose a ra dom, correc ly recog ized o average 98 words ou of every 100. he algori hm uses Lis s of markers. Algori hm for au oma ic recog i io of verbal a d omi al word groups he block-scheme of he algori hm is show i Figure 1.1. Recog i io of Auxiliary Words, Abbrevia io s, Pu c ua io Marks a d figures of up o 3-le er le g h !'prese ed i Lis s) Words over 3-le cr le g h: search firs lef , he righ (up o 3 words i each direc io ) for markers (prese ed i Lis s) u il e ough evide ce is ga hered for a correc a ribu io of he ru i g word Ou pu resul : a ribu io of he ru i g word o o e of he groups (verbal or omi al)Figure 1.1 Block-scheme of Algori hm o 1 o e: he algori hm. 302 digi al i s ruc io s i all, is available o he I er e (see I er e Dow loads a he e d of he book). 1 Lis s of markers used by Algori hm o 1 (i) Lis o 1: for, ei, wo, o e, may, fig, a y, day, she, his, him, her, you, me , i s, six, sex, e , low, fa , old, few, ew, ow, sea, ye , ago, or, all, per, era, ra , lo , our, way, leg, hay, key, ea, lee, oak, big, who, ub, pe , law, hu , gu , wi , ha , po , how, far, ca , dog, ray, ho , op, via, why, Mrs, .,

e c. (ii) Lis o 2: was, are, o , ge , go , bid, had, did, due, see, saw, li , le , say,me , ro . off, fix, lie, die, dye, lay, si , ry, led, i , . . ., e c. (iii) Lis o 3: pay, dip, be , age, ca , ma , oil, e d, fu , dry, log, use, se , air, ag, map, bar, mug, mud, ar, op, pad, raw, row, gas, red, rig, fi , ow , le , aid, ac , cu , ax, pu , ., e c. (iv) Lis o 4: o, all, hus, bo h, ma y, may, migh , whe , Perso al Pro ou s, so, mus , would, of e , did, make, made, if, ca , will, shall, ., e c. (v) Lis o 5: whe , he, a, a , is, o, be, are, ha , which, was, some, o, will, ca , were, have, may, ha , has, bei g, made, where, mus , o her, such, would, each, he , should, here, hose, could, well, eve , propor io al, par icular(ly), havi g, ca o , ca ' , shall, la er, migh , ow, of e , had, almos , ca o , of, i , for, wi h, by, his, from, a , o , if, be wee , i o, hrough, per, over, above, because, u der, below, while, before, co cer i g, as, o e, ., e c. (vi) Lis o 6: wi h, his, ha , from, which, hese, hose, ha , he , where, whe , also, more, i o, o her, o ly, same, some, here, such, abou , leas , hem, early, ei her, while, mos , hus, each, u der, heir, hey, af er, less, ear, above, hree, bo h, several, below, firs , much, ma y, zero, eve , he ce, before, qui e, ra her, ill, u il, bes , dow , over, above, hrough, Reflexive Pro ou s, self, whe her, o o, o ce, si ce, oward (s), already, every, elsewhere, hi g, o hi g, always, perhaps, some imes, a y hi g, some hi g, every hi g, o herwise, of e , las , arou d, s ill, i s ead, foreword, la er, jus , behi d, ., e c. (vii)Lis o 7: I cludes all Irregular Verbs, wi h he followi g wordforms: Prese , Prese 3rd perso si gular, Pas a d Pas Par iciple. (viii)Lis o 8: - ed, -ded, -ied, - ed, -red, -sed, -ked, -wed, -bed, -hed, -ped -led, -ved, -reed, -ced, -med, -zed, -yed, -ued, ., e c.(ix)Lis o 9: -ous, -i y, -less, -ph, -'s (excep i i 's, wha 's, ha 's, here's, e c.), - ess, -e ce, -ic, -ее, -ly, -is, -al, - y, -que, -( )er, -( )or, - h (excep i wor h), -ul8, -me , -sio (s), ., e c. (x)Lis o 10: Comprises a full lis of all umerals (Cardi al a d Ordi al). 2 ex sample processed by he algori hm ex Word Group She G oddedVG Agai a d G Pa edVG My arm, a small familiar ges ure which always G Ma aged o co veyVG Bo h u ders a di g a d dismissal. G 3 Examples of ha d checki g of he performa ce of he algori hm Le us see how he followi g se e ce will be processed by Algori hm o 1, word by word: Her apar me was o a floor by i self a he op of wha had o ce bee a si gle dwelli g, bu which lo g ago was divided i o separa ely re ed livi g quar ers. Firs he algori hm picks up he firs word of he se e ce (of he ex ), i our case his is he word her, wi h i s ruc io o 1. he same i s ruc io always ascer ai s ha he ex has o e ded ye . he he algori hm proceeds o a alyse he word her by aski g ques io s abou i a d verifyi g he a swers o hose ques io s by compari g he word her wi h lis s of o her words a d Pu c ua io Marks, hus es ablishi g, gradually, ha he word her is o a Pu c ua io Mark ('opera io s 3-5), ha i is o a figure ( umber) ci her (opera io 5 7i, a d ha i s le g h exceeds wo le ers (opera io 8).

