
Thái Hoài An
AI / Machine Learning Candidate
I'm an AI/ML-focused Data Science student building end-to-end solutions across NLP, computer vision, time-series modeling, and information extraction.
My work emphasizes model benchmarking, fine-tuning, retrieval-augmented generation, and interactive deployment prototypes using Python, PyTorch, Transformers, and scikit-learn. I'm looking for internship opportunities where I can turn strong technical experiments into measurable product value.
About Me
Education
BSc in Data Science (Expected 2027), University of Economics Ho Chi Minh City (UEH), 2023-Present
GPA: 3.78/4.0
Career Goals
My short-term goal is to join a data/AI team where I can work with real-world datasets, build end-to-end ML solutions, and learn from strong mentorship. In the long term, I plan to pursue a Master’s degree abroad and continue developing impactful AI applications.
Strengths
- Skilled in coding, algorithm exploration, and data handling across team and personal projects
- Proactive in learning new technologies and sharing knowledge through technical blogging
- Strong logical thinking with a focus on efficiency and practical solutions
- Confident in team leadership, task coordination, and delivering clear technical presentations
- Continuously improving personal learning methods to boost performance and adaptability
Skills
A focused view of the technical stack and evaluation workflow behind the projects, research, and prototypes shown in my resume and portfolio.
Programming & Foundations
Core tools I use to build, version, and analyze ML workflows.
ML Frameworks
Libraries used for training, evaluation, and experimentation.
GenAI & Agent Systems
Capabilities used to build assistant-style workflows and grounded AI products.
Backend & Full-stack
Application-layer technologies I use to ship usable AI and analytics products.
Data Engineering & BI
Infrastructure and analytics tools used in streaming and decision-support projects.
Applied AI Domains
Problem areas I have worked on through research, coursework, and competitions.
Deployment & Prototyping
Tools used to turn experiments into working demos and interactive prototypes.
Research & Evaluation
Methods and metrics I rely on to compare models and communicate results clearly.
Education
Bachelor of Science in Data Science
University of Economics Ho Chi Minh City (UEH)
Oct 2023 - Mar 2027 (Expected)
Academic Achievements
- GPA:3.78/4.0
- Merit-based Scholarship for Academic Excellence - Semester 2
Relevant Coursework
Academic Achievements
- Consolation Prize - National Excellent Student Contest in Chemistry, Vietnam (2023)
High School Diploma
Lê Quý Đôn High School for the Gifted, Ninh Thuận Province
2020 - 2023
Work & Research

KKBox Real-time Customer Churn BI Dashboard
UEH Business Intelligence course project with streaming analytics and decision support
Built a near real-time Business Intelligence system for KKBox churn monitoring and retention decision support. The project combines Kafka log replay, Spark Structured Streaming, ClickHouse OLAP storage, FastAPI APIs, and a React dashboard to deliver descriptive, predictive-proxy, and prescriptive analysis in one workflow.
Outcome
- •Delivered a 3-tab decision-support dashboard spanning descriptive analysis, predictive-proxy scoring, and prescriptive scenario simulation
- •Built end-to-end near real-time data flow: replay logs -> Kafka -> Spark Structured Streaming -> ClickHouse -> FastAPI -> React dashboard

TomatoHub – AI-Powered Relief Campaign Platform
LotusHacks x HackHarvard x GenAI Fund Vietnam Hackathon submission
TomatoHub is a full-stack platform for charity operations that helps organizations launch campaigns faster, lets supporters donate or volunteer with clearer trust signals, and keeps campaign activity transparent. The product combines role-based workflows, QR-based check-in/check-out, public transparency logs, and AI-assisted campaign drafting plus supporter recommendations.
Outcome
- •Shipped a monorepo product with public pages, role-based dashboards, campaign lifecycle management, donation flow, and volunteer registration flow
- •Implemented QR-based volunteer and goods checkpoint logic alongside public transparency logs for auditability

Vietnamese Medical Information Extraction (NER + Relation Extraction)
UEH NLP course final project with semi-supervised IE pipeline for medical text
UEH NLP course final project building an end-to-end Information Extraction system for Vietnamese medical text. Implemented a pipeline architecture (NER → Entity Pairing → Relation Extraction) inspired by PURE, recognized 5 entity types and 4 relation types, and used semi-supervised hybrid learning with silver data to overcome limited labeled data.
Outcome
- •Hybrid semi-supervised RE achieved 81.25% accuracy and 0.631 Macro-F1 (MLP + BERT)
- •Semi-supervised approach improved F1 from 0.599 (Standard) to 0.631 (Hybrid) with silver data augmentation

VN Stock Analytics – Investment Decision Support System
Multi-model data mining with LLMs reasoning for Vietnamese banking stocks
Data Mining course final project building comprehensive analytics system for 14 Vietnamese banking stocks in VN30 index. Integrated multi-source data (market OHLCV, financial reports, macro indicators, news sentiment) and developed 4 XGBoost models: Return Regression, Direction Classification, Risk Forecasting, and Regime Detection. LLMs layer provides reasoning and investment recommendations in natural language.
Outcome
- •Return Regression achieved MAE 0.094 and RMSE 0.119 on 21-day log-return prediction
- •Risk Model achieved 0.98 correlation between predicted and actual volatility

Vietnam Weather Prediction with Softmax Regression
Multiclass weather classification using time-series feature engineering
Data Visualization course final project building weather prediction model for 34 Vietnamese provinces using Softmax Regression. Collected 265K+ records from Open-Meteo API (2005-2025), engineered lag features, cyclic encoding for seasonality, and accumulation features. Classified weather into 3 groups (Clear/Cloudy, Drizzle, Rain) with rigorous train/test split by time.
Outcome
- •Multiclass classification achieved 65.6% accuracy with macro F1-score 0.644
- •Feature engineering improved accuracy from 63.9% to 65.6% (+1.7pp)

Vietnamese Fake News Detection: Deep Learning vs Transfer Learning vs LLMs
First Prize @ UEH BIT Faculty Research | Presented at NCTD 2025 National Conference
Faculty-level research project providing comprehensive comparative evaluation of machine learning approaches for Vietnamese fake news detection. Systematically analyzed three major model families: traditional deep learning (BiLSTM with Word2Vec/FastText), transfer learning (PhoBERT frozen/fine-tuned), and large language models (Qwen2.5-7B, Llama-2-7B, DeepSeek) across zero-shot and few-shot paradigms. Evaluated on ReINTEL dataset (9,713 Vietnamese social media posts with 83.2% real vs 16.8% fake class imbalance).
Outcome
- •First Prize in Faculty-level Research Competition at Business Information Technology (BIT) Department, UEH
- •Paper presented at National Conference on Technology and Design 2025 (NCTD 2025) – Shaping Vietnam's Digital Future

Breast Cancer Ultrasound CAD: Sequential vs Multi-task Deep Learning
BIT Genesis Research Award 2025 @ UEH | Presented at NCTD 2025 National Conference
Faculty-level research comparing Sequential and Multi-task Learning architectures for breast cancer diagnosis from ultrasound images. Built on U-Net with EfficientNet-B4 backbone, systematically evaluated Deformable Convolution and Capsule Network modules through ablation study. Evaluated on BUSI dataset (780 images: Normal/Benign/Malignant) with rigorous statistical testing (Shapiro-Wilk, Mann-Whitney U, Kruskal-Wallis, Tukey HSD).
Outcome
- •BIT Genesis Research Award 2025 at Business Information Technology Department, UEH
- •Paper presented at National Conference on Technology and Design 2025 (NCTD 2025) – Shaping Vietnam's Digital Future

GA Maximum Flow Solver – Network Optimization with Genetic Algorithm
Interactive visualization of evolutionary approach for Maximum Network Flow Problem
Artificial Intelligence course final project applying Genetic Algorithm to solve Maximum Network Flow Problem. Implemented custom GA operators: path-based crossover to maintain flow conservation, adaptive mutation for escaping local optima, and balance flow mechanism. Built interactive Python GUI for real-time graph editing, parameter tuning, and algorithm comparison with Ford-Fulkerson.
Outcome
- •Achieved up to 100% optimality ratio on graphs with ≤30 nodes, competitive with Ford-Fulkerson exact solution
- •Path-based crossover maintains flow conservation constraint, avoiding invalid offspring after genetic operations

Top 3 – Humanitarian Logistics Hackathon
Smart surplus-food allocation for underserved communities
Collaborated with a cross-university team to build a logistics solution combining data management, ML allocation, and IoT warehouse tracking to reduce food waste.
Outcome
- •Top 3 finalist across HCMC universities
- •Proposed ML-driven allocation reducing surplus mismatch

Minute – Agentic S/CRAG AI Meeting Co-Host for BFSI
VNPT AI Hackathon 2025 | Track 1 – Desktop/Web + GoMeet/Google Meet Add-in
Minute standardizes the meeting lifecycle for BFSI/LPBank enterprises: pre-meeting context gathering, real-time in-meeting assistance, and post-meeting minutes + action items generation—all with citations, audit trails, and access control. Built with SAAR (Self-aware Adaptive Agentic RAG) architecture featuring stage-aware routing (Pre/In/Post), real-time WS pipeline (audio → SmartVoice STT → session bus → live transcript/recap/ADR), permission-aware RAG with pgvector, and tool-calling with human-in-the-loop confirmation.
Outcome
- •Built end-to-end AI meeting workflow: Pre-meeting (agenda + pre-read) → In-meeting (live transcript, recap, ADR extraction) → Post-meeting (executive summary, MoM, task sync)
- •Implemented SAAR architecture with stage-aware LangGraph routing, graded RAG retrieval, and self-corrective loops
Certificates & Courses

AI Viet Nam – AIO25 Program
Structured training track in AI/ML with hands-on assignments
Issuer: AI VIET NAM
Participated in the AIO25 program to strengthen fundamentals and applied AI/ML skills through structured modules and practical assignments. Earned verified credentials for Modules 1–10 and the Data Analysis track, covering Python, OOP, data structures, SQL, Git, machine learning, deep learning, computer vision, NLP, and analytics.

National Excellent Student Competition in Chemistry
Encouragement Prize | Academic Achievement
Received an Encouragement Prize in the National Excellent Student Competition in Chemistry, demonstrating strong academic foundation and exam performance before specializing in Data Science and AI.