Exploring Online Language Resources

A course in the IT Learning Programme at the University of Oxford in Hilary Term 2016 will explore how we can use online language datasets to explore language, history and culture. This course is the latest stage in the evolution of the ‘Corpus Linguistics’ course which has run for the past few years.

We now have at our fingertips huge amounts of language data in digital form, representing unprecedented opportunities for exploring and analysing language and discourse. How we can use the evidence of language usage in digital resources to draw conclusions about language, culture and society? Drawing on techniques and methods from corpus linguistics, this course will offer guidance on finding and evaluating digital sources, hands-on exercises to explore and analyse data, and some suggestions on how to assess, use and interpret evidence from digital sources.

Each session will focus on the exploration and analysis of a different corpus or dataset, with practical hands-on exercises for how to use the resource to find evidence to explore linguistic, socio-cultural and historical research questions. Participants are free to attend the whole course or individual sessions, although there will be benefits, to beginners in particular, to attend the whole course and build week by week on the techniques and insights offered by each session. Prior registration with the online services described below will save time on the day.

1. BNCWeb – exploring a corpus of late twentieth century English

12:30-13:30 Thursday 28th January (HT week 2) at IT Services, Banbury Road. Sign for free to attend the course here.

The British National Corpus is a very widely used and cited dataset, which was designed and built in the 1990s to provide a representative and balanced sample of modern British English, in speech and writing, across a number of varieties in a wide range of contexts. This session will introduce and explore basic concepts of corpus design and construction, and introduce techniques, functions and methods for corpus analysis. BNCWeb is a customized application of CQPweb, with the facilities to exploit and analyse the linguistic annotation of the texts in the BNC, and to make use of the detailed descriptions of the sources. Participants can access BNCWeb with Oxford single sign-on via a link at https://ota.oerc.ox.ac.uk/ (and can also register to use the BNCWeb service hosted at Lancaster University, which will be the back-up in case of problems with BNCWeb at Oxford).

2. CQPweb – exploring a range of corpora

12:30-13:30 Thursday 4th February (HT week 3) at IT Services, Banbury Road.  Sign for free to attend the course here.

The online application CQPweb offers an interface to a powerful corpus search and analysis engine which can be applied to any textual dataset. CQPweb is an open source software application, deployed at many institutions around the world to offer access to a wide range of corpora. This session will focus on mining a large corpus from Early English Books Online for historical information. Participants should register to use the service at http://cqpweb.lancs.ac.uk/.

3. corpus.byu.bnc – historical and cultural investigations

12:30-13:30 Thursday 11th February (HT week 4) at IT Services, Banbury Road.  Sign for free to attend the course here.

The set of large corpora hosted at Brigham Young University include contemporary and historical Corpora of British and American English, Spanish, Portuguese, and the Hansard Corpus of UK parliamentary proceedings. This session will further extend the exploration beyond linguistic research questions to explore historical and political texts. Participants should register to use the service at http://corpus.byu.edu/.

4. The Oxford English Corpus – lexicography and beyond

12:30-13:30 Thursday 18th February (HT week 5) at IT Services, Banbury Road.  Sign for free to attend the course here.

The Oxford English Corpus, and related datasets, offer the opportunity to  explore current and recent trends in the English language, via a very large and growing corpus which is regularly updated with new texts. This corpus is used by the lexicographers at Oxford University Press to create and update entries in the Oxford English Dictionary and other dictionaries, reference works and teaching materials, and can also be used to monitor and discover social trends via the discourses revealed in the data. The Oxford English Corpus uses the SketchEngine software to manage, filter and reveal patterns in these multi-billion word corpora. Log-in credentials for the Oxford English Corpus are kindly supplied by OUP and will be issued during the tutorial session.

5. Exploring modern European languages with CLARIN

12:30-13:30 Thursday 25th February (HT week 6) at IT Services, Banbury Road.  Sign for free to attend the course here.

A wealth of corpora and other language resources are becoming more easily available to researchers thanks to the CLARIN European Research Infrastructure Consortium. The UK has recently joined CLARIN as an Observer, allowing access to all UK researchers with institutional single sign-on via the UK Federation. We’ll take a whistle-stop tour of some of the available languages and corpora, with a focus on the facilty for Federated Content Search, finding hits for a search term across a wide number of resources held in different repositories. Access is available to all University of Oxford users, and via institutional single sign-on to users in higher education institutions from participating countries (see more about access at http://clarin.eu/content/easy-access-protected-resources).

The course will take place Thursday lunchtimes weeks 2-6, Hilary Term 2016, at IT Services, Banbury Road, and will be taught by Ylva Berglund Prytz and Martin Wynne of IT Services, University of Oxford. It is open to all members of the University of Oxford and there is no charge.

Posted in Uncategorized | Comments Off on Exploring Online Language Resources

Comments are closed.