The Longitudinal IntermediaPlus (2014–2016)

A Case Study in Structuring Unstructured Big Data

Authors

  • Inga Brentel Department for Communication and Media Studies, Institute of Social Science, Heinrich-Heine-University
  • Kristi Winters GESIS

DOI:

https://doi.org/10.1163/24523666-06010001

Keywords:

big data, big data documentation, data harmonization, German media data, longitudinal data, unstructured data

Abstract

This article details the novel structure developed to handle, harmonize and document big data for reuse and long-term preservation. ‘The Longitudinal IntermediaPlus (2014–2016)’ big data dataset is uniquely rich: it covers an array of German online media extendable to cross-media channels and user information. The metadata file for this dataset, and its documentation, were recently deposited as its own MySQL database called charmstana_sample_14-16.sql (https://data.gesis.org/sharing/#!Detail/10.7802/2030) (cs16) and is suitable for generating descriptive statistics. Analogous to the ‘Data View’ in spss, the charmstana_analysis (ca) contains the dataset’s numerical values. Both the cs16 and ca MySQL files are needed to conduct analysis on the full database. The research challenge was to process large-scaled datasets into one longitudinal, big-data data source suitable for academic research, and according to fair principles. The authors review four methodological recommendations that can serve as a framework for solving big-data structuring challenges, using the harmonization software CharmStats.

Downloads

Published

2021-07-07

Issue

Section

Data Papers

How to Cite

The Longitudinal IntermediaPlus (2014–2016): A Case Study in Structuring Unstructured Big Data. (2021). Research Data Journal for the Humanities and Social Sciences, 6, 1-16. https://doi.org/10.1163/24523666-06010001