The Diorisis Ancient Greek Corpus

Authors

  • A. Vatri The Alan Turing Institute & University of Oxford
  • B. McGillivray The Alan Turing Institute & University of Cambridge

DOI:

https://doi.org/10.1163/24523666-01000013

Keywords:

digital corpus, Greek, lemmatization, annotated corpus, classics, historical linguistics

Abstract

The Diorisis Ancient Greek Corpus is a digital collection of ancient Greek texts (from Homer to the early fifth century AD) compiled for linguistic analyses, and specifically with the purpose of developing a computational model of semantic change in Ancient Greek. The corpus consists of 820 texts sourced from open access digital libraries. The texts have been automatically enriched with morphological information for each word. The automatic assignment of words to the correct dictionary entry (lemmatization) has been disambiguated with the implementation of a part-of-speech tagger (a computer programme that may select the part of speech to which an ambiguous word belongs).

Downloads

Published

2018-11-02

Issue

Section

Data Papers

How to Cite

The Diorisis Ancient Greek Corpus. (2018). Research Data Journal for the Humanities and Social Sciences, 3, 1-11. https://doi.org/10.1163/24523666-01000013