PoeTree: Poetry Treebanks in Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian and Spanish

Authors

DOI:

https://doi.org/10.1163/24523666-bja10044

Keywords:

poetry, computational poetics, corpus linguistics, digital humanities

Abstract

This article presents a set of standardised corpora of poetry comprising over 330,000 poems in ten languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian, and Spanish). Each corpus has been deduplicated, enriched with Universal Dependencies, provided with additional metadata, and converted into a unified json structure.

Author Biographies

  • Petr Plecháč, Czech Academy of Sciences

    Corresponding author
    Institute of Czech Literature, Czech Academy of Sciences, Prague, Czechia

  • Silvie Cinková, Czech Academy of Sciences

    Institute of Czech Literature, Czech Academy of Sciences, Prague, Czechia
    Charles University, Prague, Czechia

  • Robert Kolár, Czech Academy of Sciences

    Institute of Czech Literature, Czech Academy of Sciences, Prague, Czechia

  • Artjoms Šeļa, Polish Academy of Sciences

    Institute of Polish Language, Polish Academy of Sciences, Warsaw, Poland

  • Mirella De Sisto, Tilburg University

    Tilburg University, Tilburg, the Netherlands

  • Lara Nugues, University of Basel

    University of Basel, Basel, Switzerland

  • Neža Kočnik, University of Ljubljana



Downloads

Published

2025-10-10

Issue

Section

Data Papers

How to Cite

PoeTree: Poetry Treebanks in Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian and Spanish. (2025). Research Data Journal for the Humanities and Social Sciences, 9, 1-17. https://doi.org/10.1163/24523666-bja10044