Do we need language corpora?

The speakers in the debate w Martin Wynne (OTA)

Speakers and Chair

The ICAME32 conference in Oslo started with a number of pre-conference workshop. The Oxford Text Archive was involved in one – a debate on the motion:

“Language corpora are no longer necessary for linguistic research.”

The debate was recorded and a podcast will be produced and made available later. In the meantime, here follows a few illustrations of part of the arguments put forward. They are not to be seen as neither comprehensive nor necessarily representative of the debate as a whole but are offered as a taster of what the participants offered.

The debate was opened by Silvia Bernardini (University of Bologna) who spoke in favour of the motion. She argued that the availability of large quantities of digital texts has changed the world of corpus building and use. Earlier, when textual material in digital form was rare, corpus building had to be done by experts and corpora were small. Today, we can find material online that can be used to help inform us about language, and we should use that.

Janne Bondi Johannessen (University of Oslo) spoke against the motion. She talked about how we need carefully crafted spoken corpora to answer certain questions about language. On the web, even data that may appear speech-like (such as chat room exchanges) still show greater similarity to written than spoken language.

In her opening statement for the motion, Elena Tognelli-Bonini (University of Siena), discussed how we need to change our methodology as we get other types of data to work with. We cannot use same methods as before, when we were working with small, well-defined sets of data. As corpus linguists we need to develop new query languages, new ways of filtering the new types of data we now have.

The last of the four speakers, Gregory Garretson (Uppsala University), spoke against the motion. He maintained that one problem with studying the language we find on the web is that we do not know what this language represents. Using a corpus allows us to make comparisons and our studies can be replicated – doing the same investigation again will return the same result, an important feature of science.

After the four opening statements, the floor was open to general debate and discussion. It was encouraging to see that this obviously is a question that people can relate to, as a large proportion of the audience took part and shared their thoughts. Many good points were put forward, as will be possible to hear in the podcast when this is published.

At the end of the discussion, the four speakers offered a closing remark each before the participants voted. The result of the vote was that the motion was defeated, possible a fortunate result considering that the debate took place just before the formal opening of an annual corpus linguistics conference. After all, if corpus linguists do not believe in corpora for linguistic research, who does?

Posted in events | Leave a comment

Leave a Reply