PoeTree: Poetry Treebanks in Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian and Spanish
DOI:
https://doi.org/10.1163/24523666-bja10044Keywords:
poetry, computational poetics, corpus linguistics, digital humanitiesAbstract
This article presents a set of standardised corpora of poetry comprising over 330,000 poems in ten languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian, and Spanish). Each corpus has been deduplicated, enriched with Universal Dependencies, provided with additional metadata, and converted into a unified json structure.
Downloads
Published
2025-10-10
Issue
Section
Data Papers
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Licensing information can be found here.
How to Cite
PoeTree: Poetry Treebanks in Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian and Spanish. (2025). Research Data Journal for the Humanities and Social Sciences, 9, 1-17. https://doi.org/10.1163/24523666-bja10044
