website logo
Fighting Misinformation: Using AraBERT to Detect COVID-19 Fake News
#NLP #AraBERT #Research
2021 8 min read

Fighting Misinformation: Using AraBERT to Detect COVID-19 Fake News

Research paper presented at NLP4IF 2021, leveraging AraBERT for detecting misinformation in Arabic COVID-19 content.


Abstract

During the COVID-19 pandemic, the spread of misinformation posed a significant public health threat. This research addresses the "infodemic" in Arabic-speaking communities by developing an automated detection system.

Background

The NLP4IF 2021 shared task focused on fighting the COVID-19 infodemic across multiple languages. Our team tackled the Arabic track, leveraging the AraBERT pretrained model.

Methodology

Data Preparation

We worked with the official dataset containing:

  • 2,000+ labeled Arabic tweets

  • Binary labels: reliable vs. unreliable

  • Mixed content: claims, questions, and statements

Model Architecture

Our approach used a fine-tuned AraBERT-large model with:

  • Custom classification head

  • Gradient accumulation for memory efficiency

  • Label smoothing for robustness

Training Strategy

  • Learning rate: 2e-5 with linear warmup

  • Batch size: 16 (effective: 64 with accumulation)

  • Early stopping based on validation F1

Results

ModelF1-ScoreAccuracy

---------------------------

Baseline (TF-IDF + SVM)0.720.74

mBERT0.810.83

AraBERT (Ours)0.870.88

Impact

This work contributed to the broader effort of combating COVID-19 misinformation and demonstrated the effectiveness of language-specific pretrained models for Arabic NLP tasks.

Ahmad Hussein

Written by

Ahmad Hussein

AI Systems Architect specializing in LLM orchestration, RAG systems, and scalable cloud platforms. Building secure AI solutions for Fintech, Robotics, and Legal-tech.