GNTD

German News Topics Dataset - Topic Frequency & Sentiment Development

Release Notes:

Overview

This dataset provides the relative frequency and sentiment of topics covered by 450.000 Handelsblatt articles between 2004 and 2021 within the subsections Finance, Corporation, Politics and Opinions.

Methodology

All articles have been translated to english first. Articles have been encoded using Bidirectional Autoencoder using Transformers (BERT). Next, UMAP has been used to reduce dimensionality from 512 to 10. HDBSCAN has been used for clustering. Sentiments are estimated using FinBERT. Overall, 76 unique topics have been identified.

Visualization

Detailed information can be found using the visualisation app!

The figures below provide an overview about the development of topic frequencies and sentiments over time.

Frequencies over time

Sentiments over time

Frequency-Scales Sentiments over time

Download

Cite

---
@Manual{,
title = {GNTD: German News Topics Dataset},
author = {Garvin Kruthof},
year = {2022},
note = {GNTD Version 1.0},
url = {https://kruthof.github.io/projects/gntd/},
 }
---