The Longitudinal IntermediaPlus (2014–2016): A Case Study in Structuring Unstructured Big Data

Inga Brentel; Kristi Winters

doi:10.1163/24523666-06010001

Authors

Inga Brentel Department for Communication and Media Studies, Institute of Social Science, Heinrich-Heine-University
Kristi Winters GESIS

DOI:

https://doi.org/10.1163/24523666-06010001

Keywords:

big data, big data documentation, data harmonization, German media data, longitudinal data, unstructured data

Abstract

This article details the novel structure developed to handle, harmonize and document big data for reuse and long-term preservation. ‘The Longitudinal IntermediaPlus (2014–2016)’ big data dataset is uniquely rich: it covers an array of German online media extendable to cross-media channels and user information. The metadata file for this dataset, and its documentation, were recently deposited as its own MySQL database called charmstana_sample_14-16.sql (https://data.gesis.org/sharing/#!Detail/10.7802/2030) (cs16) and is suitable for generating descriptive statistics. Analogous to the ‘Data View’ in spss, the charmstana_analysis (ca) contains the dataset’s numerical values. Both the cs16 and ca MySQL files are needed to conduct analysis on the full database. The research challenge was to process large-scaled datasets into one longitudinal, big-data data source suitable for academic research, and according to fair principles. The authors review four methodological recommendations that can serve as a framework for solving big-data structuring challenges, using the harmonization software CharmStats.

The Longitudinal IntermediaPlus (2014–2016)

A Case Study in Structuring Unstructured Big Data

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

Indexed by:

The Longitudinal IntermediaPlus (2014–2016)

A Case Study in Structuring Unstructured Big Data

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

social media

Indexed by: