Stefan Gries

Stefan Gries of the University of California Santa Barbara will speak at the UCSD Linguistics Department Colloquium on April 12, 2010, at 2:00 pm in AP&M 4301.

Data-driven approaches in corpus linguistics: the role of granularity for register variation, temporal stages, and temporal change

Corpus linguistics is inherently a distributional discipline: corpora contain nothing but things to count: frequencies of occurrence (of morphemes, words, lemmas, n-grams, utterances, texts, etc.), frequencies of co-occurrence (of words, words and patterns, patterns and patterns, etc.), and distributions of elements (of elements within and across files/texts/registers). Thus, any subject studied corpus-linguistically must be operationalized in terms of counts and dispersions.

However, a decision in favor of a particular operationalization requires potentially treacherous decisions regarding the desired/required level of granularity. In many contemporary corpus-linguistic studies, such decisions are made arbitrarily and top-down/a priori. In this talk, I will argue in favor of (i) a more wide-spread use of different kinds of bottom-up approaches and (ii) more frequent and more thorough exploration as well as combination of different levels of granularity in corpus linguistic studies. To exemplify these arguments, I will use studies of register variation as well as diachronic change.

