A Contextualized Approach to Fake News Detection: Comparative Analysis of BERT

Authors:
DPID: 817

Abstract

The proliferation of misinformation and disinformation, commonly referred to as fake news, poses a significant threat to democratic processes and public trust. Automated and accurate detection of such content is a critical challenge in Natural Language Processing (NLP). Traditional machine learning (ML) models often struggle to capture the complex semantic and contextual nuances inherent in deceptive language. This paper addresses this limitation by proposing and evaluating a robust fake news detection framework utilizing advanced transformer-based models: Bidirectional Encoder Representations from Transformers (BERT) and its resource-efficient variant, DistilBERT. We fine-tune these models on established benchmark datasets, such as LIAR, and compare their performance against conventional ML baselines, including Support Vector Machines (SVM) and Naïve Bayes. Our methodology incorporates the intricacies of WordPiece tokenization and the power of the multi-head self-attention mechanism to generate highly contextualized text representations. The experimental results demonstrate that both BERT and DistilBERT significantly outperform the baselines across key metrics-Accuracy, Precision, Recall, and Fscore-with DistilBERT achieving comparable performance to the full BERT model while offering substantial computational efficiency. This research contributes a detailed comparative analysis and a high-performing, deployable solution for mitigating the spread of online misinformation.