Linguistics – Prepositions – 2500 words

There are not any differences in the frequency use of general prepositions in the spoken and written subcorpora

  1. Introduction

 

My attempt is to verify in this scientific report if the above mentioned hypothesis that there are not any differences in the frequency use of general prepositions in the written or spoken subcorpora is correct or incorrect. I expect my thesis to be right, because I think that the frequency use of prepositions is equal in both texts. This report will focus on features and not on categories.

 

The linguistic systems I discuss are general prepositions. Collecting my data from ICECUPIII I found a higher number of Text Units (40,163) and Hits (88,715) for general prepositions than for “phrasal” or “interrogative” prepositions, which means that the Feature “general prepositions” would probably be more effective to research. First I would like to explain their function in the text and give some contextualised examples.

 

A Preposition is usually defined as a word that links a noun, pronoun or gerund to other words (www.usingenglish.com/glossary/preposition.html). They can have a variety of meanings:

 

  • Direction – He’s going to the shops
  • Location – It is in the box
  • Time – He left after the lesson had finished
  • Possession – The Government of Italy

 

This scientific report will analyse the relation between general types of prepositions and other tags. The methodology that I will use is the Ice cup tree tool to collect the data and carry out descriptive analyses and Wordsmith Tools to generate a key word list and see if there is a significant difference in the use of general prepositions between the written- and spoken tag-list of both subcorpora.

 

 

 

The reason why I was interested in the investigation of general prepositions rather than other lexical or syntactical items was that prepositions are often a matter of confusion for foreigners. Sometimes even native speakers have difficulty distinguishing them. We often take prepositions for granted and only notice them when we experience difficulties in deciding which one to use. As high frequency words, prepositions are one of the most dynamic parts of speech. Used in unstressed syllables and pronounced fast and softly in oral speech, prepositions are often barely distinguishable to the listener.

 

  1. Descriptive Analyses from Data-collection in ICECUPIII

 

I began my research by collecting different data from the ICECUPIII-Tool to see if there is a difference in the frequency use of general prepositions in the written and spoken subcorpora. Therefore I did some descriptive analyses. I compared the expected Values by Hits in written subcorpora with the actual values by Hits in the spoken subcorpora. Then I compared the expected values by TU in the written subcorpora with the actual values by text units in spoken subcorpora. My findings are shown in the table listed below and will be analysed in the section that follows.

 

Title of the Table: Data-collection form the ICECUPIII

 

VariableFeatureText UnitsTU % HitsHits % Hits / TU
written27,46331.1 %423,70239,9 %15.3
spoken60,89468.9 %637,56260,1 %10.5
 
TOTALS88,3571,061,264
PREPOSITIONS43.60532,2 %102.3949,6 %2.3
general40,16392,1 %88,71586,6 %2.2
Written16,49141,1 %42,35447,7 %2.6
Spoken23,67258,9 %46.36152,3 %1.9

 

 

 

II.1 Discussion

 

First of all this table shows that the ratio of HITS per TU is considerably higher in the Written than the Spoken texts; 15.3 compared to 10.5 – more tags per TU in Written. Furthermore it can be seen that the ratio of HITS per TU is slightly higher in the Prep. (general) than the Prep. texts; 2.3 compared to 2.2 – more tags per TU in Written. The ratio of HITS per TU is different in the Written and Spoken Prep. (general) texts, 2.6 compared to 1.9

 

II.2 Comparison between expected and actual values by Hits in written and spoken subcorpora

 

Expected Values by Hits

Written: 88,715 x 39.9% = 35,397

42,354 (Actual) – 35,397 = 6,957

6,957 / 35,397 = 20 % more

than Expected (only 0.2 times more

frequent that expected)

 

Spoken: 88,715 x 60.1% = 53,318

46,361 (Actual) – 53,318 =  – 6,957

6,957 / 53,318 =  13,1 % less

than expected (only 0.130 times less

frequent than expected).

 

This result shows that there is variation from Expected results for Prep. (gen.) in the written and spoken texts by HITS although the Written HITS are 20% higher. This aspect is worth further investigation with p. values for significant difference.

 

II.3 Comparison between expected and actual values by TU in written and spoken subcorpora

 

Expected Values by TU

Written: 40,163 x 39,9 % = 16,025

16,491 (Actual) –  16,025 = 466

466 / 16,025 = 2.9% more than expected

(only 0.029 times more frequent

than expected).

 

Spoken: 40,163 x 60.1% = 24,137

23,672 (Actual) –  24,137 =  466

465 /24,137 = 1,92% less than Expected

(only 0.0192  times less frequent than expected).

 

 

 

This result shows that there is slight variation from Expected results for Prep. (general) by Written and Spoken TUs.

 

 

        III.     WORDSMITHS TOOLS (PREPGE in Keyness-List)

 

In Wordsmith Tools I generated a Keyness-taglist in order to compare the tags in the written and spoken sub-corpora. In order to do that, I had to create a wordlist first, which I did according to the steps which I will describe in more detail on page 7 of the methodology section. I then selected the written and spoken tag-files under choose text and used the “compare 2 lists-Tool” to compare them. In this way all significant tag relationships between them are calculated (PREPGEN perhaps included). The result was a taglist with key-tags showing the most frequent tags in both texts. These were: PUNCCOMMA, PUNCPER, NCOM.[1] I found the tag (PREPGE) on position 11. The table below shows my results from Wordsmiths Tools.

 

Result:

 

Table 1: Tags in Keyness-List

 

TagWrittentag %Spokentag %KeynessP-Value
PREPGE3.883.011,263.80.000000

 

 

Discussion:

 

This table proves that my hypothesis that there are not any differences in the frequency use of prepositions in the spoken and written subcorpora is not correct .

 

The frequency use of prepositions is higher in the written subcorpora than in the spoken, so there is a significant key-difference. This is according to my estimated results in ICECUPIII (descriptive analysis) and my mathematical calculated results in Wordsmith Tools (Keyness). The P-Value of 0.000000 shows that this is a 100% proved mathematical result. I assumed the results to be equal in the written and spoken subcorpora. An explanation for this unexpected result might give the following table, in which I investigated the difference in numbers of the most frequent prepositions like: of, in, on, at and to.

 

  1. Difference in the frequency use of propositions in both subcorpora

 

Table 2: Most frequent general prepositions

 

Frequent general

prepositions

Total CorpusWritten subcorpora% in written Spoken subcorpora% in spoken
of26,90813,31650%13,59251%
in16,1917,43846%8,75354%
on4,9952,11742%2,87858%
at3,7311,63344%2,09856%
to6,3562,92546%3,43154%

 

 

Calculation for “of” in the written and spoken subcorpora

(of) in the written subcorpora

26,908 x 40% =  10,76

13,316 – 10,76 = 2556

2556/10,76 = 23,8 %

 

(of) in spoken subcorpora

26,908 x 60% = 16,2

13,592 – 16,2 = – 2608

2608/16,2 = 16,1 %

 

 

Comparing the expected and actual frequency of each single preposition in spoken and written texts with the percentage of frequency of general prepositions in both texts, I found out the following according to above calculating example for each preposition.

 

 

 

 

Result:

 

PREPGEWritten subcorporaSpoken subcorpora
of23,8 %16,1 %
in19,9 %9,94 %
on5,9 %0,8 %
at8,9 %4,6 %
to17 %9,7 %

 

20% general prepositions in written subcorpora (according to my findings on page 4)

13,1 general prepositions in the spoken subcorpora (according to my findings on page 4=

 

There are more “of” than general prepositions in the written subcorpora and there are more “of” in the spoken subcorpora than general prepositions in text. This experiment can be repeated for each of the frequent prepositions and compared with the percentage of general prepositions in both texts, according to the percentage outlined above. The result then shows that the frequency of the most frequent general prepositions is different to the frequency of general prepositions in both texts. This is another argument which proves my thesis to be incorrect, because there are clear differences in frequency use among the most frequent prepositions in different text.

 

  1. Wordsmith-Tools (Concordance of Collocates and Patterns)

 

Methodology:

 

In Wordsmith-Tools I created a Concordance-List to analyse if there are any patterns, collocates, clusters or plots in both subcorpora. Therefore I followed the following steps:

 

  • I made datasets of the written and spoken subcorpora and saved them as plain (text) and then as tag-file (name tag). I deactivated Wordsmith settings “numbers included”.

 

  • Then I converted the plain text (name.txt) with the convert file. So a Wordlist of the clean text can be made and saved. This list can then be used to create a stop list (as a .txt in notepad) where the first lines will have to be deleted, so that there are only words in the list. The clean file has to be saved again (x/wsmith/text/folder).

 

  • The saved text file (name.txt) can be converted using “convtags” by following the same procedures of the converting procedure as above. The above mentioned stop list has to be activated and the “tags to ignore” in the wordlist settings has to be deactivated. Now a wordlist of a clean tag-file can be made and saved both as a .IST and as txt.file

 

  1. 1 Comparison between collocates in the spoken and written subcorpora:

 

Table 3: Collocates in the written subcorpora

 

NWordTotalLeftRightL4L3L2L1#R1R2R3R4
1PREPGE3608718053203412202380821600008011715
2NCOM2001910574944515806178377000314614549
3ARTDEF67122198451421921500041277380
4ADJGE50912440265117795660500011956788
5The4761624699133028009404117

 

Discussion:

My result shows that the 2 most frequent tags which collocate with prepositions general (PREPGE) are common nouns (NCOM) and the definitive article (ARTDEF). This is the same finding like in the spoken subcorpora, although the position in the sentence and the frequency in the written subcorpora are different. The common noun appears more frequent (20019) in the written corpus than in the spoken (15041). It also appears on the first position of the left side of the tag general preposition, but there it is more frequent (8377). The tag ARTDEF appears more frequent in the written subcorpora (6712) than in the spoken subcorpora (6712). It appears most frequent on the second position of the right side and is more frequent in the written than in the spoken subcorpora. The word which collocates most with general prepositions is “the” in both texts. The number of frequency is higher in spoken than in written texts (No. 4 to No.5).

 

 

Table 4: Collocates in spoken subcorpora:

 

NWordTotalLeftRightL4L3L2L1#R1R2R3R4
1PREPGE361791809620831105239671160001959201103
2NCOM15041706479771329257330052486295457
3ARTDEF552714944033143655300036362395
4THE41769440825710180367324016
5PRON

PERS

3631167019611102356500012054752

 

 

Discussion:

 

My result shows that the most common tag before a general preposition (PREPGE) is a common Noun (NCOM). 1/3 of the total of 36179 are common nouns. An example illustrating that result is: This is the house (NCOM) of (PREPGE) my father. This finding confirms the definition of the use of general prepositions, which I have given in the introduction. PRONPERS (personal pronoun), which has the frequency 5 in collocating with a general preposition appears a tag after the general preposition.

It appears 1205 times on the second position on the right side of a sentence. The definitive article (ARTDEF) is the third common tag linked with general prepositions. It appears most frequently (3636) on the second position of the right side after the tag general prepositions in a sentence. An example would be: general preposition (with) Steve (the) definitive article things went right.

 

 

 

 

 

 

 

 

 

 

 

V.2 Comparison of patterns between spoken and the written subcorpora:

 

Table 5: Patterns in spoken subcorpora

 

NL5L4L3L2L1R1R2R3R4
1artdefitncomofprepgetheartdefand
2ncominvintrineightyncomthe
3pregeonepregetoeighteenpropof
4pronpesortadvgeforjewsitpronpei
5vmontrsvmontronthisin

 

 

Table 6: Patterns in written subcorpora

 

NL5L4L3L2L1R1R2R3R4
1artdefncomncomofprepgetheartdefpuncco
2adjgeandvmontrinsuchancompuncper
3ncomnproppregeforinitialnpropand
4prepinadjegetofourtoadjegethe
5vmontryouvintrbyintertidof

 

 

Discussion:

 

The above mentioned tables show the occurrence of patterns in accordance with the tag (PREPGE). They show that the words “of” and “in” are the most frequent words which appear on the left side before the tag (PREPGE) in the written and spoken subcorpora. The Preposition “to” is more frequently used on the left side of a sentence before the tag (PREPGE) in spoken subcorpora than in written. Another clear pattern is that the two most frequent tags which appear on the left side in the spoken and written are the definitive article (ARTDEF/no. 1 column L5) and the common singular noun (NCOM/no.1 L3). The most frequent word which comes after the tag (PREPGE) is the word “the” on the second position of the right side in both texts.

 

 

 

  1. Summary and Conclusion:

 

This scientific report analysed the linguistic system of general prepositions and the patterns and collocates in which they occur. My findings show that, despite similarities in collocates and patterns of words and tags with general prepositions, there are also differences in these between spoken and written subcorpora. My results do confirm the grammar rule that a preposition is a word that links a noun, pronoun or gerund to other words (www.usingenglish.com/glossary/preposition.html). This is also mentioned in my introduction. Furthermore this scientific report proves that my hypothesis outlined in the title that there are no differences in the frequency use of general prepositions (PREPGE) is false, because according to my estimated results in ICECUPIII and the mathematical calculated results in the keyness-tag-list there are differences. This stood in contrast to my expectations, as I expected an equal or nearly similar distribution. However, my results do show that the frequency of general prepositions is higher in the written subcorpora than in the spoken subcorpora. As time and words are limited I could not explore in detail the interpretation of that specific result, so there is scope for further research. Based on the difference in frequency of the most frequent positions, my findings illustrate that they are not equally distributed in both subcorpora. In sum, the results of this report clearly indicate that the original thesis was not correct!

 

 

 

 

 

 

 

 

 

 

 

 

 

References:

 

McEnery, Tony, Xiao, Richard and Yukio Tono. Corpus-Based Language Studies. Oxon: Routledge, 2006

 

www.usingenglish.com/glossary/preposition.html

 

 

 

[1] PUNCCOMMA= Punctuation Comma, PUNCPER, NCOM= Common Noun