The Longitudinal IntermediaPlus (2014–2016)
A Case Study in Structuring Unstructured Big Data
DOI:
https://doi.org/10.1163/24523666-06010001Keywords:
big data, big data documentation, data harmonization, German media data, longitudinal data, unstructured dataAbstract
This article details the novel structure developed to handle, harmonize and document big data for reuse and long-term preservation. ‘The Longitudinal IntermediaPlus (2014–2016)’ big data dataset is uniquely rich: it covers an array of German online media extendable to cross-media channels and user information. The metadata file for this dataset, and its documentation, were recently deposited as its own MySQL database called charmstana_sample_14-16.sql (https://data.gesis.org/sharing/#!Detail/10.7802/2030) (cs16) and is suitable for generating descriptive statistics. Analogous to the ‘Data View’ in spss, the charmstana_analysis (ca) contains the dataset’s numerical values. Both the cs16 and ca MySQL files are needed to conduct analysis on the full database. The research challenge was to process large-scaled datasets into one longitudinal, big-data data source suitable for academic research, and according to fair principles. The authors review four methodological recommendations that can serve as a framework for solving big-data structuring challenges, using the harmonization software CharmStats.
Downloads
Published
Issue
Section
License
Copyright (c) 2021 Inga Brentel, Kristi Winters

This work is licensed under a Creative Commons Attribution 4.0 International License.
Licensing information can be found here.
