Text-Fabric Dataset of the Samaritan Pentateuch

Authors

  • Martijn Naaijer University of Copenhagen https://orcid.org/0009-0006-3325-0614
  • Christian Canu Højgaard Fjellhaug International University College
  • Stefan Schorch The Hebrew University of Jerusalem
  • Martin Ehrensvärd University of Copenhagen

DOI:

https://doi.org/10.1163/24523666-bja10051

Keywords:

Hebrew, Samaritans, Pentateuch, Bible, linguistics, digital corpus

Abstract

In this article, the authors present a dataset of the text of the Samaritan Pentateuch with word-level linguistic annotations. The Samaritan Pentateuch is an important early witness of the Pentateuch or Torah. This dataset is based on a transcription generally taken from manuscript Dublin, Chester Beatty Library 751 (Genesis 1:1–Deuteronomy 32:36) and supplemented from manuscript Nablus (Kiryat Luza), Samaritan Synagogue, Garizim 1, where the former manuscript has not preserved the text (Deuteronomy 32:36b–34:10). The dataset is a Text-Fabric dataset. Text-Fabric is a Python package for processing annotated text corpora, which means that the dataset comes with an app, where the text can be inspected and queried using the annotations. It is also easy to perform textual and linguistic research using Python scripts and to make comparisons with other relevant textual datasets with the same annotation conventions.

Author Biographies

  • Martijn Naaijer, University of Copenhagen

    Corresponding author
    Faculty of Theology, University of Copenhagen, Copenhagen, Denmark

  • Christian Canu Højgaard, Fjellhaug International University College

    Fjellhaug International University College, Copenhagen, Denmark

  • Stefan Schorch, The Hebrew University of Jerusalem

    Faculty of Humanities, The Hebrew University of Jerusalem, Israel

  • Martin Ehrensvärd, University of Copenhagen

    Faculty of Theology, University of Copenhagen, Copenhagen, Denmark

Downloads

Published

2025-10-10

Issue

Section

Data Papers

How to Cite

Text-Fabric Dataset of the Samaritan Pentateuch. (2025). Research Data Journal for the Humanities and Social Sciences, 9, 1-13. https://doi.org/10.1163/24523666-bja10051