Linguistics - Prepositions - 2500 words

There are not any differences in the frequency use of general prepositions in the spoken and written subcorpora

Introduction

My attempt is to verify in this scientific report if the above mentioned hypothesis that there are not any differences in the frequency use of general prepositions in the written or spoken subcorpora is correct or incorrect. I expect my thesis to be right, because I think that the frequency use of prepositions is equal in both texts. This report will focus on features and not on categories.

The linguistic systems I discuss are general prepositions. Collecting my data from ICECUPIII I found a higher number of Text Units (40,163) and Hits (88,715) for general prepositions than for “phrasal” or “interrogative” prepositions, which means that the Feature “general prepositions” would probably be more effective to research. First I would like to explain their function in the text and give some contextualised examples.

A Preposition is usually defined as a word that links a noun, pronoun or gerund to other words (www.usingenglish.com/glossary/preposition.html). They can have a variety of meanings:

Direction – He’s going to the shops
Location – It is in the box
Time – He left after the lesson had finished
Possession – The Government of Italy

This scientific report will analyse the relation between general types of prepositions and other tags. The methodology that I will use is the Ice cup tree tool to collect the data and carry out descriptive analyses and Wordsmith Tools to generate a key word list and see if there is a significant difference in the use of general prepositions between the written- and spoken tag-list of both subcorpora.

The reason why I was interested in the investigation of general prepositions rather than other lexical or syntactical items was that prepositions are often a matter of confusion for foreigners. Sometimes even native speakers have difficulty distinguishing them. We often take prepositions for granted and only notice them when we experience difficulties in deciding which one to use. As high frequency words, prepositions are one of the most dynamic parts of speech. Used in unstressed syllables and pronounced fast and softly in oral speech, prepositions are often barely distinguishable to the listener.

Descriptive Analyses from Data-collection in ICECUPIII

I began my research by collecting different data from the ICECUPIII-Tool to see if there is a difference in the frequency use of general prepositions in the written and spoken subcorpora. Therefore I did some descriptive analyses. I compared the expected Values by Hits in written subcorpora with the actual values by Hits in the spoken subcorpora. Then I compared the expected values by TU in the written subcorpora with the actual values by text units in spoken subcorpora. My findings are shown in the table listed below and will be analysed in the section that follows.

Title of the Table: Data-collection form the ICECUPIII

Variable	Feature	Text Units	TU %	Hits	Hits %	Hits / TU

written		27,463	31.1 %	423,702	39,9 %	15.3
spoken		60,894	68.9 %	637,562	60,1 %	10.5

TOTALS		88,357		1,061,264

PREPOSITIONS		43.605	32,2 %	102.394	9,6 %	2.3
	general	40,163	92,1 %	88,715	86,6 %	2.2
Written		16,491	41,1 %	42,354	47,7 %	2.6
Spoken		23,672	58,9 %	46.361	52,3 %	1.9

II.1 Discussion

First of all this table shows that the ratio of HITS per TU is considerably higher in the Written than the Spoken texts; 15.3 compared to 10.5 – more tags per TU in Written. Furthermore it can be seen that the ratio of HITS per TU is slightly higher in the Prep. (general) than the Prep. texts; 2.3 compared to 2.2 – more tags per TU in Written. The ratio of HITS per TU is different in the Written and Spoken Prep. (general) texts, 2.6 compared to 1.9

II.2 Comparison between expected and actual values by Hits in written and spoken subcorpora

Expected Values by Hits

Written: 88,715 x 39.9% = 35,397

42,354 (Actual) – 35,397 = 6,957

6,957 / 35,397 = 20 % more

than Expected (only 0.2 times more

frequent that expected)

Spoken: 88,715 x 60.1% = 53,318

46,361 (Actual) – 53,318 = – 6,957

6,957 / 53,318 = 13,1 % less

than expected (only 0.130 times less

frequent than expected).

This result shows that there is variation from Expected results for Prep. (gen.) in the written and spoken texts by HITS although the Written HITS are 20% higher. This aspect is worth further investigation with p. values for significant difference.

II.3 Comparison between expected and actual values by TU in written and spoken subcorpora

Expected Values by TU

Written: 40,163 x 39,9 % = 16,025

16,491 (Actual) – 16,025 = 466

466 / 16,025 = 2.9% more than expected

(only 0.029 times more frequent

than expected).

Spoken: 40,163 x 60.1% = 24,137

23,672 (Actual) – 24,137 = 466

465 /24,137 = 1,92% less than Expected

(only 0.0192 times less frequent than expected).

This result shows that there is slight variation from Expected results for Prep. (general) by Written and Spoken TUs.

III. WORDSMITHS TOOLS (PREPGE in Keyness-List)

In Wordsmith Tools I generated a Keyness-taglist in order to compare the tags in the written and spoken sub-corpora. In order to do that, I had to create a wordlist first, which I did according to the steps which I will describe in more detail on page 7 of the methodology section. I then selected the written and spoken tag-files under choose text and used the “compare 2 lists-Tool” to compare them. In this way all significant tag relationships between them are calculated (PREPGEN perhaps included). The result was a taglist with key-tags showing the most frequent tags in both texts. These were: PUNCCOMMA, PUNCPER, NCOM.[1] I found the tag (PREPGE) on position 11. The table below shows my results from Wordsmiths Tools.

Result:

Table 1: Tags in Keyness-List

Tag	Writtentag %	Spokentag %	Keyness	P-Value
PREPGE	3.88	3.01	1,263.8	0.000000

Discussion:

This table proves that my hypothesis that there are not any differences in the frequency use of prepositions in the spoken and written subcorpora is not correct .

The frequency use of prepositions is higher in the written subcorpora than in the spoken, so there is a significant key-difference. This is according to my estimated results in ICECUPIII (descriptive analysis) and my mathematical calculated results in Wordsmith Tools (Keyness). The P-Value of 0.000000 shows that this is a 100% proved mathematical result. I assumed the results to be equal in the written and spoken subcorpora. An explanation for this unexpected result might give the following table, in which I investigated the difference in numbers of the most frequent prepositions like: of, in, on, at and to.

Difference in the frequency use of propositions in both subcorpora

Table 2: Most frequent general prepositions

Frequent general prepositions	Total Corpus	Written subcorpora	% in written	Spoken subcorpora	% in spoken
of	26,908	13,316	50%	13,592	51%
in	16,191	7,438	46%	8,753	54%
on	4,995	2,117	42%	2,878	58%
at	3,731	1,633	44%	2,098	56%
to	6,356	2,925	46%	3,431	54%

Calculation for “of” in the written and spoken subcorpora

(of) in the written subcorpora

26,908 x 40% = 10,76

13,316 – 10,76 = 2556

2556/10,76 = 23,8 %

(of) in spoken subcorpora

26,908 x 60% = 16,2

13,592 – 16,2 = – 2608

2608/16,2 = 16,1 %

Comparing the expected and actual frequency of each single preposition in spoken and written texts with the percentage of frequency of general prepositions in both texts, I found out the following according to above calculating example for each preposition.

Result:

PREPGE	Written subcorpora	Spoken subcorpora
of	23,8 %	16,1 %
in	19,9 %	9,94 %
on	5,9 %	0,8 %
at	8,9 %	4,6 %
to	17 %	9,7 %

20% general prepositions in written subcorpora (according to my findings on page 4)

13,1 general prepositions in the spoken subcorpora (according to my findings on page 4=

There are more “of” than general prepositions in the written subcorpora and there are more “of” in the spoken subcorpora than general prepositions in text. This experiment can be repeated for each of the frequent prepositions and compared with the percentage of general prepositions in both texts, according to the percentage outlined above. The result then shows that the frequency of the most frequent general prepositions is different to the frequency of general prepositions in both texts. This is another argument which proves my thesis to be incorrect, because there are clear differences in frequency use among the most frequent prepositions in different text.

Wordsmith-Tools (Concordance of Collocates and Patterns)

Methodology:

In Wordsmith-Tools I created a Concordance-List to analyse if there are any patterns, collocates, clusters or plots in both subcorpora. Therefore I followed the following steps:

I made datasets of the written and spoken subcorpora and saved them as plain (text) and then as tag-file (name tag). I deactivated Wordsmith settings “numbers included”.

Then I converted the plain text (name.txt) with the convert file. So a Wordlist of the clean text can be made and saved. This list can then be used to create a stop list (as a .txt in notepad) where the first lines will have to be deleted, so that there are only words in the list. The clean file has to be saved again (x/wsmith/text/folder).

The saved text file (name.txt) can be converted using “convtags” by following the same procedures of the converting procedure as above. The above mentioned stop list has to be activated and the “tags to ignore” in the wordlist settings has to be deactivated. Now a wordlist of a clean tag-file can be made and saved both as a .IST and as txt.file

1 Comparison between collocates in the spoken and written subcorpora:

Table 3: Collocates in the written subcorpora

N	Word	Total	Left	Right	L4	L3	L2	L1	#	R1	R2	R3	R4
1	PREPGE	36087	18053	2034	1220	23	808	2	16000	0	801	17	15
2	NCOM	20019	10574	9445	1580	617	8377	0	0	0	3146	145	49
3	ARTDEF	6712	2198	4514	2192	1	5	0	0	0	4127	7	380
4	ADJGE	5091	2440	2651	1779	56	605	0	0	0	1195	67	88
5	The	4761	62	4699	1	33	0	28	0	0	9	404	117

Discussion:

My result shows that the 2 most frequent tags which collocate with prepositions general (PREPGE) are common nouns (NCOM) and the definitive article (ARTDEF). This is the same finding like in the spoken subcorpora, although the position in the sentence and the frequency in the written subcorpora are different. The common noun appears more frequent (20019) in the written corpus than in the spoken (15041). It also appears on the first position of the left side of the tag general preposition, but there it is more frequent (8377). The tag ARTDEF appears more frequent in the written subcorpora (6712) than in the spoken subcorpora (6712). It appears most frequent on the second position of the right side and is more frequent in the written than in the spoken subcorpora. The word which collocates most with general prepositions is “the” in both texts. The number of frequency is higher in spoken than in written texts (No. 4 to No.5).

Table 4: Collocates in spoken subcorpora:

N	Word	Total	Left	Right	L4	L3	L2	L1	#	R1	R2	R3	R4
1	PREPGE	36179	18096	2083	1105	23	967	1	16000	1	959	20	1103
2	NCOM	15041	7064	7977	1329	2	5733	0	0	5	2486	29	5457
3	ARTDEF	5527	1494	4033	1436	5	53	0	0	0	3636	2	395
4	THE	4176	94	4082	5	71	0	18	0	3673	2	401	6
5	PRON PERS	3631	1670	1961	1102	3	565	0	0	0	1205	4	752

Discussion:

My result shows that the most common tag before a general preposition (PREPGE) is a common Noun (NCOM). 1/3 of the total of 36179 are common nouns. An example illustrating that result is: This is the house (NCOM) of (PREPGE) my father. This finding confirms the definition of the use of general prepositions, which I have given in the introduction. PRONPERS (personal pronoun), which has the frequency 5 in collocating with a general preposition appears a tag after the general preposition.

It appears 1205 times on the second position on the right side of a sentence. The definitive article (ARTDEF) is the third common tag linked with general prepositions. It appears most frequently (3636) on the second position of the right side after the tag general prepositions in a sentence. An example would be: general preposition (with) Steve (the) definitive article things went right.

V.2 Comparison of patterns between spoken and the written subcorpora:

Table 5: Patterns in spoken subcorpora

N	L5	L4	L3	L2	L1	R1	R2	R3	R4
1	artdef	it	ncom	of	prepge		the	artdef	and
2	ncom	in	vintr	in		eighty		ncom	the
3	prege	one	prege	to		eightee		nprop	of
4	pronpe	sort	advge	for		jews	it	pronpe	i
5	vmontr	s	vmontr	on			this		in

Table 6: Patterns in written subcorpora

N	L5	L4	L3	L2	L1	R1	R2	R3	R4
1	artdef	ncom	ncom	of	prepge		the	artdef	puncco
2	adjge	and	vmontr	in		such	a	ncom	puncper
3	ncom	nprop	prege	for		initial		nprop	and
4	prep	in	adjege	to		four	to	adjege	the
5	vmontr	you	vintr	by		intertid			of

Discussion:

The above mentioned tables show the occurrence of patterns in accordance with the tag (PREPGE). They show that the words “of” and “in” are the most frequent words which appear on the left side before the tag (PREPGE) in the written and spoken subcorpora. The Preposition “to” is more frequently used on the left side of a sentence before the tag (PREPGE) in spoken subcorpora than in written. Another clear pattern is that the two most frequent tags which appear on the left side in the spoken and written are the definitive article (ARTDEF/no. 1 column L5) and the common singular noun (NCOM/no.1 L3). The most frequent word which comes after the tag (PREPGE) is the word “the” on the second position of the right side in both texts.

Summary and Conclusion:

This scientific report analysed the linguistic system of general prepositions and the patterns and collocates in which they occur. My findings show that, despite similarities in collocates and patterns of words and tags with general prepositions, there are also differences in these between spoken and written subcorpora. My results do confirm the grammar rule that a preposition is a word that links a noun, pronoun or gerund to other words (www.usingenglish.com/glossary/preposition.html). This is also mentioned in my introduction. Furthermore this scientific report proves that my hypothesis outlined in the title that there are no differences in the frequency use of general prepositions (PREPGE) is false, because according to my estimated results in ICECUPIII and the mathematical calculated results in the keyness-tag-list there are differences. This stood in contrast to my expectations, as I expected an equal or nearly similar distribution. However, my results do show that the frequency of general prepositions is higher in the written subcorpora than in the spoken subcorpora. As time and words are limited I could not explore in detail the interpretation of that specific result, so there is scope for further research. Based on the difference in frequency of the most frequent positions, my findings illustrate that they are not equally distributed in both subcorpora. In sum, the results of this report clearly indicate that the original thesis was not correct!

References:

McEnery, Tony, Xiao, Richard and Yukio Tono. Corpus-Based Language Studies. Oxon: Routledge, 2006

www.usingenglish.com/glossary/preposition.html

[1] PUNCCOMMA= Punctuation Comma, PUNCPER, NCOM= Common Noun