Vietnamese Fake News Detection: Deep Learning vs Transfer Learning vs LLMs
First Prize @ UEH BIT Faculty Research | Presented at NCTD 2025 National Conference
Faculty-level research project providing comprehensive comparative evaluation of machine learning approaches for Vietnamese fake news detection. Systematically analyzed three major model families: traditional deep learning (BiLSTM with Word2Vec/FastText), transfer learning (PhoBERT frozen/fine-tuned), and large language models (Qwen2.5-7B, Llama-2-7B, DeepSeek) across zero-shot and few-shot paradigms. Evaluated on ReINTEL dataset (9,713 Vietnamese social media posts with 83.2% real vs 16.8% fake class imbalance).

Timeline
2025
Type
Research
Status
completed
Outcome / Impact
- •First Prize in Faculty-level Research Competition at Business Information Technology (BIT) Department, UEH
- •Paper presented at National Conference on Technology and Design 2025 (NCTD 2025) – Shaping Vietnam's Digital Future
- •Fine-tuned PhoBERT achieved 96.30% accuracy with balanced class performance (96.17% precision real, 80.49% recall fake) — 23pp improvement over best LLM
- •Demonstrated superiority of Vietnamese-specific pre-training over multilingual models for specialized classification tasks
- •Established computational efficiency benchmarks: PhoBERT (2GB, 20ms) vs LLMs (8GB, 3s+ inference)
Tech / Skills
Certificates (2)
Case Study
1) Context / Problem
Vietnamese social media faces significant misinformation challenges, with fake news spreading rapidly on platforms like Facebook and Zalo. The ReINTEL dataset contains 9,713 posts with severe class imbalance (83.2% real vs 16.8% fake), representing real-world detection challenges. Existing studies lacked comprehensive comparison across deep learning, transfer learning, and LLM paradigms.
2) Your Role
As Lead Researcher and Developer, I designed and implemented the complete evaluation framework, including BiLSTM with multiple embedding strategies, PhoBERT fine-tuning pipeline, and LLM inference with prompt engineering. Conducted exploratory data analysis, implemented focal loss for class imbalance, and produced comprehensive benchmarking with efficiency-performance trade-off analysis.
3) Approach
Evaluated 10 model configurations: (1) BiLSTM with random/Word2Vec/FastText embeddings using focal loss and class weighting, (2) PhoBERT in frozen and fine-tuned configurations with 256 token sequences, (3) LLMs (Qwen2.5-7B, Llama-2-7B-Vietnamese, DeepSeek) with 4-bit quantization under zero-shot and few-shot paradigms. Used macro-averaged metrics (Precision, Recall, F1, AUC) for fair evaluation under class imbalance.
4) Result / Impact
Won First Prize at UEH BIT Faculty Research Competition and paper accepted for presentation at NCTD 2025 National Conference. PhoBERT fine-tuned achieved state-of-the-art 96.30% accuracy with balanced performance across classes. BiLSTM showed identical results regardless of embedding strategy (81.89%) with catastrophic minority class failure. LLMs underperformed despite scale (best: Qwen 73.25%). Established clear deployment guidelines: PhoBERT optimal for production.
5) Learnings
Language-specific pre-training dramatically outperforms multilingual scaling for Vietnamese NLP. Class imbalance requires sophisticated pre-trained models—traditional approaches fail regardless of feature engineering. Prompt-based learning has fundamental limitations for specialized classification tasks. Future work: cross-temporal validation, multimodal extension, adversarial robustness testing.
6) Links
See links above.