Success of the 2nd CLARIAH-CM Training Course on Historical Newspapers
The 2nd CLARIAH-CM course focused on the use of various tools for processing historical newspapers: Label Studio, Sketch Engine and AI programming notebooks
12 may 2026 - 10:18 CET
On 27 April, the second CLARIAH-CM training cycle on digital tools and methodologies came to a successful conclusion, this time focusing on historical press. In this second cycle, we focused on various tools that can be used to analyse and process a corpus of historical Spanish-language press. This year’s theme aligns with CLARIAH-ES’s participation in the CLARIN PressMint project, which aims to develop an interoperable corpus of European historical press; consequently, these tools are of great assistance in optimising researchers’ work.
On 27 February, Yanco Torterolo (UAM-UNED) led an introductory workshop on Label Studio, an open-source tool for the automatic transcription of documents. In this workshop, he demonstrated how this tool serves as a user-friendly and collaborative interface for implementing language models applied to the automatic transcription of historical press documents.
A month later, on 23 March, researcher Olga Batiukova led a workshop on managing a press corpus using Sketch Engine, focusing in particular on how to compile and analyse a corpus once it has been properly transcribed. In this way, the workshops complement one another and enable researchers to follow the entire workflow, from corpus preparation to analysis using digital tools.
Finally, on 27 April, Rocío Ortuño and Juan Cigarrán demonstrated how to use generative AI, specifically Claude, to automate many tasks involved in corpus curation and subsequent computational analysis, using a Google Colab notebook.
Videos of the three sessions are available on the CLARIAH-CM Node’s YouTube channel and in the ‘Training’ section of our website, which can be accessed via this link.



