Corpora – Emma Tamlyn writes:

As a student member, and with the Corpora workshop being my first CPD event, I was a little unsure of what to expect during the session on 28th February, but I think it’s fair to say that everyone got a lot out of the session with Dr Ana Frankenburg-Garcia – a similar workshop to the one that she gave to the ITI last September, and one which was very popular!

The event took place in the ERIC computer cluster at the University of Leeds (a room that I, along with the other students who attended, am almost too familiar with!) and started with a quick check of the attendees language pairs.

After a general introduction to the world of corpora, Ana explained the four main types of corpus that can be of use to a translator or interpreter:

1) Parallel Corpora: these are corpora in which the source text and its translation are displayed in parallel (surprise surprise!) and are similar to Linguee, using, however, more principled and carefully selected texts. Some examples of parallel corpora are EurParl, using translated documents from the European Parliament and EUR-Lex, using translated documents from the European Court of Justice.

2) General Language Corpora: these are corpora which contain just one language, and can be used in a similar way to checking specific phrases in Google, but again, using more principled and carefully selected texts. Some examples that Ana provided were CREA for Spanish, DeReKo for German, and the British National Corpus for English. Ana did warn us that the BNC was created in the 90s so some of the information may be out of date – for example, ‘Internet’ barely features!

3) Specialised Corpora: these can be assembled using subject specific texts, and can help translators to get to grips with the specific terminology of a particular field.

4) Ad Hoc Corpora made for a specific translation or interpreting job: these can help a translator or interpreter to become quickly acquainted with the vocabulary that may be needed.

After some refreshments we returned to the computers for a demonstration of how to use Sketch Engine ( – there is a free trial avalible for anyone who is curious!), before having a go ourselves.

Ana showed us how to perform a concordance search to check which preposition should follow ‘married’, either with or to, and we were quickly able to see that in the context Ana had described, the answer was ‘married to’. We then moved on to collocations, and words that are associated with other words, the example that Ana gave was ‘opinion’, and which verbs are often used with the noun.

The collocations feature is also useful for finding synonyms once you’ve exhausted all that Word, or the internet has to offer, with Sketch Engine being able to provide you with a helpful word cloud!

We then explored some ready-made corpora, before creating our own using an EEA State of Europe Seas report, and were able to learn what MSFD stands for, and which verbs are often used with ‘seafloor’.

Ana suggested some other sites that may be worth checking out if you’re interested in corpora: AntCon and WordSmith Tools, and explained that Sketch Engine allows you to create as many corpora as you want and can even download them to your own machine. She assured us that she doesn’t work for Sketch Engine, and that she’s just a big fan of the site! After her fantastic workshop, I and many other attendees are now converts! Thanks for a great session Ana.