Fighting Misinformation: Using AraBERT to Detect COVID-19 Fake News
Research paper presented at NLP4IF 2021, leveraging AraBERT for detecting misinformation in Arabic COVID-19 content.
Abstract
During the COVID-19 pandemic, the spread of misinformation posed a significant public health threat. This research addresses the "infodemic" in Arabic-speaking communities by developing an automated detection system.
Background
The NLP4IF 2021 shared task focused on fighting the COVID-19 infodemic across multiple languages. Our team tackled the Arabic track, leveraging the AraBERT pretrained model.
Methodology
Data Preparation
We worked with the official dataset containing:
- 2,000+ labeled Arabic tweets
- Binary labels: reliable vs. unreliable
- Mixed content: claims, questions, and statements
Model Architecture
Our approach used a fine-tuned AraBERT-large model with:
- Custom classification head
- Gradient accumulation for memory efficiency
- Label smoothing for robustness
Training Strategy
- Learning rate: 2e-5 with linear warmup
- Batch size: 16 (effective: 64 with accumulation)
- Early stopping based on validation F1
Results
| Model | F1-Score | Accuracy |
| ------- | ---------- | ---------- |
| Baseline (TF-IDF + SVM) | 0.72 | 0.74 |
| mBERT | 0.81 | 0.83 |
| AraBERT (Ours) | 0.87 | 0.88 |
Impact
This work contributed to the broader effort of combating COVID-19 misinformation and demonstrated the effectiveness of language-specific pretrained models for Arabic NLP tasks.
Written by
Ahmad Hussein
AI Systems Architect specializing in LLM orchestration, RAG systems, and scalable cloud platforms. Building secure AI solutions for Fintech, Robotics, and Legal-tech.