There are not any differences in the frequency use of general prepositions in the spoken and written subcorpora
My attempt is to verify in this scientific report if the above mentioned hypothesis that there are not any differences in the frequency use of general prepositions in the written or spoken subcorpora is correct or incorrect. I expect my thesis to be right, because I think that the frequency use of prepositions is equal in both texts. This report will focus on features and not on categories.
The linguistic systems I discuss are general prepositions. Collecting my data from ICECUPIII I found a higher number of Text Units (40,163) and Hits (88,715) for general prepositions than for “phrasal” or “interrogative” prepositions, which means that the Feature “general prepositions” would probably be more effective to research. First I would like to explain their function in the text and give some contextualised examples.
A Preposition is usually defined as a word that links a noun, pronoun or gerund to other words (www.usingenglish.com/glossary/preposition.html). They can have a variety of meanings:
- Direction – He’s going to the shops
- Location – It is in the box
- Time – He left after the lesson had finished
- Possession – The Government of Italy
This scientific report will analyse the relation between general types of prepositions and other tags. The methodology that I will use is the Ice cup tree tool to collect the data and carry out descriptive analyses and Wordsmith Tools to generate a key word list and see if there is a significant difference in the use of general prepositions between the written- and spoken tag-list of both subcorpora.
The reason why I was interested in the investigation of general prepositions rather than other lexical or syntactical items was that prepositions are often a matter of confusion for foreigners. Sometimes even native speakers have difficulty distinguishing them. We often take prepositions for granted and only notice them when we experience difficulties in deciding which one to use. As high frequency words, prepositions are one of the most dynamic parts of speech. Used in unstressed syllables and pronounced fast and softly in oral speech, prepositions are often barely distinguishable to the listener.
- Descriptive Analyses from Data-collection in ICECUPIII
I began my research by collecting different data from the ICECUPIII-Tool to see if there is a difference in the frequency use of general prepositions in the written and spoken subcorpora. Therefore I did some descriptive analyses. I compared the expected Values by Hits in written subcorpora with the actual values by Hits in the spoken subcorpora. Then I compared the expected values by TU in the written subcorpora with the actual values by text units in spoken subcorpora. My findings are shown in the table listed below and will be analysed in the section that follows.
Title of the Table: Data-collection form the ICECUPIII
|Variable||Feature||Text Units||TU %||Hits||Hits %||Hits / TU|
|written||27,463||31.1 %||423,702||39,9 %||15.3|
|spoken||60,894||68.9 %||637,562||60,1 %||10.5|
|PREPOSITIONS||43.605||32,2 %||102.394||9,6 %||2.3|
|general||40,163||92,1 %||88,715||86,6 %||2.2|
|Written||16,491||41,1 %||42,354||47,7 %||2.6|
|Spoken||23,672||58,9 %||46.361||52,3 %||1.9|
First of all this table shows that the ratio of HITS per TU is considerably higher in the Written than the Spoken texts; 15.3 compared to 10.5 – more tags per TU in Written. Furthermore it can be seen that the ratio of HITS per TU is slightly higher in the Prep. (general) than the Prep. texts; 2.3 compared to 2.2 – more tags per TU in Written. The ratio of HITS per TU is different in the Written and Spoken Prep. (general) texts, 2.6 compared to 1.9
II.2 Comparison between expected and actual values by Hits in written and spoken subcorpora
|Expected Values by Hits|
Written: 88,715 x 39.9% = 35,397
42,354 (Actual) – 35,397 = 6,957
6,957 / 35,397 = 20 % more
than Expected (only 0.2 times more
frequent that expected)
|Spoken: 88,715 x 60.1% = 53,318|
46,361 (Actual) – 53,318 = – 6,957
6,957 / 53,318 = 13,1 % less
than expected (only 0.130 times less
frequent than expected).
This result shows that there is variation from Expected results for Prep. (gen.) in the written and spoken texts by HITS although the Written HITS are 20% higher. This aspect is worth further investigation with p. values for significant difference.
II.3 Comparison between expected and actual values by TU in written and spoken subcorpora
|Expected Values by TU|
Written: 40,163 x 39,9 % = 16,025
16,491 (Actual) – 16,025 = 466
466 / 16,025 = 2.9% more than expected
(only 0.029 times more frequent
|Spoken: 40,163 x 60.1% = 24,137|
23,672 (Actual) – 24,137 = 466
465 /24,137 = 1,92% less than Expected
(only 0.0192 times less frequent than expected).
This result shows that there is slight variation from Expected results for Prep. (general) by Written and Spoken TUs.
III. WORDSMITHS TOOLS (PREPGE in Keyness-List)
In Wordsmith Tools I generated a Keyness-taglist in order to compare the tags in the written and spoken sub-corpora. In order to do that, I had to create a wordlist first, which I did according to the steps which I will describe in more detail on page 7 of the methodology section. I then selected the written and spoken tag-files under choose text and used the “compare 2 lists-Tool” to compare them. In this way all significant tag relationships between them are calculated (PREPGEN perhaps included). The result was a taglist with key-tags showing the most frequent tags in both texts. These were: PUNCCOMMA, PUNCPER, NCOM. I found the tag (PREPGE) on position 11. The table below shows my results from Wordsmiths Tools.
Table 1: Tags in Keyness-List
|Tag||Writtentag %||Spokentag %||Keyness||P-Value|
This table proves that my hypothesis that there are not any differences in the frequency use of prepositions in the spoken and written subcorpora is not correct .
The frequency use of prepositions is higher in the written subcorpora than in the spoken, so there is a significant key-difference. This is according to my estimated results in ICECUPIII (descriptive analysis) and my mathematical calculated results in Wordsmith Tools (Keyness). The P-Value of 0.000000 shows that this is a 100% proved mathematical result. I assumed the results to be equal in the written and spoken subcorpora. An explanation for this unexpected result might give the following table, in which I investigated the difference in numbers of the most frequent prepositions like: of, in, on, at and to.
- Difference in the frequency use of propositions in both subcorpora
Table 2: Most frequent general prepositions
|Total Corpus||Written subcorpora||% in written||Spoken subcorpora||% in spoken|
Calculation for “of” in the written and spoken subcorpora
|(of) in the written subcorpora|
26,908 x 40% = 10,76
13,316 – 10,76 = 2556
2556/10,76 = 23,8 %
|(of) in spoken subcorpora|
26,908 x 60% = 16,2
13,592 – 16,2 = – 2608
2608/16,2 = 16,1 %
Comparing the expected and actual frequency of each single preposition in spoken and written texts with the percentage of frequency of general prepositions in both texts, I found out the following according to above calculating example for each preposition.
|PREPGE||Written subcorpora||Spoken subcorpora|
|of||23,8 %||16,1 %|
|in||19,9 %||9,94 %|
|on||5,9 %||0,8 %|
|at||8,9 %||4,6 %|
|to||17 %||9,7 %|
20% general prepositions in written subcorpora (according to my findings on page 4)
13,1 general prepositions in the spoken subcorpora (according to my findings on page 4=
There are more “of” than general prepositions in the written subcorpora and there are more “of” in the spoken subcorpora than general prepositions in text. This experiment can be repeated for each of the frequent prepositions and compared with the percentage of general prepositions in both texts, according to the percentage outlined above. The result then shows that the frequency of the most frequent general prepositions is different to the frequency of general prepositions in both texts. This is another argument which proves my thesis to be incorrect, because there are clear differences in frequency use among the most frequent prepositions in different text.
- Wordsmith-Tools (Concordance of Collocates and Patterns)
In Wordsmith-Tools I created a Concordance-List to analyse if there are any patterns, collocates, clusters or plots in both subcorpora. Therefore I followed the following steps:
- I made datasets of the written and spoken subcorpora and saved them as plain (text) and then as tag-file (name tag). I deactivated Wordsmith settings “numbers included”.
- Then I converted the plain text (name.txt) with the convert file. So a Wordlist of the clean text can be made and saved. This list can then be used to create a stop list (as a .txt in notepad) where the first lines will have to be deleted, so that there are only words in the list. The clean file has to be saved again (x/wsmith/text/folder).
- The saved text file (name.txt) can be converted using “convtags” by following the same procedures of the converting procedure as above. The above mentioned stop list has to be activated and the “tags to ignore” in the wordlist settings has to be deactivated. Now a wordlist of a clean tag-file can be made and saved both as a .IST and as txt.file
- 1 Comparison between collocates in the spoken and written subcorpora:
Table 3: Collocates in the written subcorpora
My result shows that the 2 most frequent tags which collocate with prepositions general (PREPGE) are common nouns (NCOM) and the definitive article (ARTDEF). This is the same finding like in the spoken subcorpora, although the position in the sentence and the frequency in the written subcorpora are different. The common noun appears more frequent (20019) in the written corpus than in the spoken (15041). It also appears on the first position of the left side of the tag general preposition, but there it is more frequent (8377). The tag ARTDEF appears more frequent in the written subcorpora (6712) than in the spoken subcorpora (6712). It appears most frequent on the second position of the right side and is more frequent in the written than in the spoken subcorpora. The word which collocates most with general prepositions is “the” in both texts. The number of frequency is higher in spoken than in written texts (No. 4 to No.5).
Table 4: Collocates in spoken subcorpora:
My result shows that the most common tag before a general preposition (PREPGE) is a common Noun (NCOM). 1/3 of the total of 36179 are common nouns. An example illustrating that result is: This is the house (NCOM) of (PREPGE) my father. This finding confirms the definition of the use of general prepositions, which I have given in the introduction. PRONPERS (personal pronoun), which has the frequency 5 in collocating with a general preposition appears a tag after the general preposition.
It appears 1205 times on the second position on the right side of a sentence. The definitive article (ARTDEF) is the third common tag linked with general prepositions. It appears most frequently (3636) on the second position of the right side after the tag general prepositions in a sentence. An example would be: general preposition (with) Steve (the) definitive article things went right.
V.2 Comparison of patterns between spoken and the written subcorpora:
Table 5: Patterns in spoken subcorpora
Table 6: Patterns in written subcorpora
The above mentioned tables show the occurrence of patterns in accordance with the tag (PREPGE). They show that the words “of” and “in” are the most frequent words which appear on the left side before the tag (PREPGE) in the written and spoken subcorpora. The Preposition “to” is more frequently used on the left side of a sentence before the tag (PREPGE) in spoken subcorpora than in written. Another clear pattern is that the two most frequent tags which appear on the left side in the spoken and written are the definitive article (ARTDEF/no. 1 column L5) and the common singular noun (NCOM/no.1 L3). The most frequent word which comes after the tag (PREPGE) is the word “the” on the second position of the right side in both texts.
- Summary and Conclusion:
This scientific report analysed the linguistic system of general prepositions and the patterns and collocates in which they occur. My findings show that, despite similarities in collocates and patterns of words and tags with general prepositions, there are also differences in these between spoken and written subcorpora. My results do confirm the grammar rule that a preposition is a word that links a noun, pronoun or gerund to other words (www.usingenglish.com/glossary/preposition.html). This is also mentioned in my introduction. Furthermore this scientific report proves that my hypothesis outlined in the title that there are no differences in the frequency use of general prepositions (PREPGE) is false, because according to my estimated results in ICECUPIII and the mathematical calculated results in the keyness-tag-list there are differences. This stood in contrast to my expectations, as I expected an equal or nearly similar distribution. However, my results do show that the frequency of general prepositions is higher in the written subcorpora than in the spoken subcorpora. As time and words are limited I could not explore in detail the interpretation of that specific result, so there is scope for further research. Based on the difference in frequency of the most frequent positions, my findings illustrate that they are not equally distributed in both subcorpora. In sum, the results of this report clearly indicate that the original thesis was not correct!
McEnery, Tony, Xiao, Richard and Yukio Tono. Corpus-Based Language Studies. Oxon: Routledge, 2006
 PUNCCOMMA= Punctuation Comma, PUNCPER, NCOM= Common Noun