Corpus in Focus (CIF)

Corpus in Focus (CIF)

Users
Tweets

30 Aug 2023

Here are eleven of the most widely used statistical methods in #CorpusLinguistics. #CorpusStatistics #Frequency Analysis: This is the cornerstone of corpus linguistics, used to identify the frequency of words, phrases, or syntactic structures. Biber’s “Variation Across Speech and Writing” (1988) is a classic study that employed frequency analysis to distinguish between spoken and written registers. #FrequencyAnalysis #Collocation Analysis: This method identifies words that tend to appear together more often than would be expected by chance. Stefanowitsch and Gries’ “Collostructions” (2003) is a notable study in this area. #CollocationAnalysis #Concordance Analysis: This involves examining all the occurrences of a particular word or phrase within its context. John Sinclair’s work, particularly in “Corpus, Concordance, Collocation” (1991), has been foundational. #ConcordanceAnalysis #Keyness Analysis: This method identifies statistically significant words in a corpus compared to a reference corpus. Paul Rayson’s “Matrix: A Statistical Method and Software Tool for Linguistic Analysis” (2003) is a key reference. #KeynessAnalysis #Cluster Analysis: This is used to group similar items in a corpus, often revealing patterns or themes. Douglas Biber’s “University Language” (2006) employed cluster analysis to study academic registers. #ClusterAnalysis #Mutual Information: This measures the strength of association between two words. Church and Hanks’ “Word Association Norms, Mutual Information, and Lexicography” (1990) is a seminal paper that introduced this concept. #MutualInformation #Log-Likelihood Ratio: This is used to test the significance of the difference between two proportions, often in comparing corpora. Dunning’s “Accurate Methods for the Statistics of Surprise and Coincidence” (1993) is a key study here. #LogLikelihoodRatio Principal Component Analysis (#PCA): This reduces the dimensionality of the data while retaining most of the original variance. Baayen et al.’s “Mixed-effects modeling with crossed random effects for subjects and items” (2008) utilized PCA. #PrincipalComponentAnalysis #Chi-Square Test: This tests the independence of two categorical variables. It was notably used in McEnery, Xiao, and Tono’s “Corpus-Based Language Studies” (2006). #ChiSquareTest #T-Score: This measures the “bond” between two words in a collocation. “Collocations in Use” (Hill and Lewis, 1997) is a study that employed T-Scores. #TScore #Log-Dice Statistics: This method is an advancement over Mutual Information and is particularly useful for large corpora. It provides a normalized score that allows for better comparison of word associations across different datasets. Rychlý’s “A Lexicographer-Friendly Association Score” (2008) is a seminal paper that introduced log-dice statistics as a lexicographer-friendly measure. #LogDiceStatistics Each of these methods has its own advantages and specific applications, making them invaluable in the toolkit of any corpus linguist. Whether you’re exploring lexical trends or syntactic structures, these methods offer robust, reliable ways to make sense of the data.

2,064

Corpus in Focus (CIF)

Corpus in Focus (CIF)@FocusCorpus

6 Jul 2023

Corpus linguistics offers us an empirical way to dissect the language of propaganda. Let's break it down: Keyword Analysis: This can help identify the most frequent or salient words used in propaganda texts. For instance, words like 'freedom,' 'patriot,' or 'enemy' may appear more frequently to elicit specific responses. Collocation Analysis: By studying how words tend to group together, we can understand the implicit meanings or connotations being conveyed. In war-time propaganda, 'sacrifice,' 'honor,' and 'duty' may often collocate. Concordance Analysis: This provides insight into how certain words are used in context, enabling us to understand underlying messages. For example, studying the concordances of 'us' and 'them' might reveal an 'us vs. them' narrative. Metaphor Analysis: Propaganda often uses metaphors to persuade. A corpus can help uncover these, such as the metaphor of a 'battle' in public health messaging about COVID-19. Diachronic Analysis: This examines changes over time. Tracking shifts in language use can reveal how propaganda evolves to suit changing political climates. Corpus linguistics, thus, provides robust tools for exposing the linguistic mechanics of persuasion and manipulation in propaganda, fostering our critical literacy in an age of information overload. #CorpusLinguistics #PropagandaLanguage #KeywordAnalysis #CollocationAnalysis #ConcordanceAnalysis #MetaphorAnalysis #DiachronicAnalysis

234

Corpus in Focus (CIF)

Corpus in Focus (CIF)@FocusCorpus

6 Jul 2023

Question of the day: How does corpus linguistics aid in understanding the language of propaganda? #CorpusLinguistics #PropagandaLanguage #KeywordAnalysis #CollocationAnalysis #ConcordanceAnalysis #MetaphorAnalysis #DiachronicAnalysis

989

Corpus in Focus (CIF)

Corpus in Focus (CIF)@FocusCorpus

2 Jul 2023

Question of the day: How can corpus linguistics help us understand political rhetoric? #CorpusLinguistics #PoliticalRhetoric #LanguageInPolitics #KeywordAnalysis #CollocationAnalysis #ConcordanceAnalysis #DiachronicCorpusAnalysis

2,106

Corpus in Focus (CIF)

Corpus in Focus (CIF)@FocusCorpus

20 Apr 2023

"Collocation analysis is a commonly used statistical technique in corpus linguistics that examines the co-occurrence of words in a corpus. This technique is used to identify the most frequent and significant word combinations in a language." #CollocationAnalysis

Christopher Nygren

Christopher Nygren @chris_nygren

7 Apr 2017

Hoping to prove once and for all that 'paragone' doesn't mean what you think! #collocationanalysis @adlangmead