German News Topics Dataset - Topic Frequency & Sentiment Development
Release Notes:
This dataset provides the relative frequency and sentiment of topics covered by 450.000 Handelsblatt articles between 2004 and 2021 within the subsections Finance, Corporation, Politics and Opinions.
All articles have been translated to english first. Articles have been encoded using Bidirectional Autoencoder using Transformers (BERT). Next, UMAP has been used to reduce dimensionality from 512 to 10. HDBSCAN has been used for clustering. Sentiments are estimated using FinBERT. Overall, 76 unique topics have been identified.
Detailed information can be found using the visualisation app!
The figures below provide an overview about the development of topic frequencies and sentiments over time.
---
@Manual{,
title = {GNTD: German News Topics Dataset},
author = {Garvin Kruthof},
year = {2022},
note = {GNTD Version 1.0},
url = {https://kruthof.github.io/projects/gntd/},
}
---