Profile Photo

Thái Hoài An

AI / Machine Learning Candidate

I'm an AI/ML-focused Data Science student building end-to-end solutions across NLP, computer vision, time-series modeling, and information extraction.

My work emphasizes model benchmarking, fine-tuning, retrieval-augmented generation, and interactive deployment prototypes using Python, PyTorch, Transformers, and scikit-learn. I'm looking for internship opportunities where I can turn strong technical experiments into measurable product value.

About Me

Education

BSc in Data Science (Expected 2027), University of Economics Ho Chi Minh City (UEH), 2023-Present
GPA: 3.78/4.0

Career Goals

My short-term goal is to join a data/AI team where I can work with real-world datasets, build end-to-end ML solutions, and learn from strong mentorship. In the long term, I plan to pursue a Master’s degree abroad and continue developing impactful AI applications.

Strengths

  • Skilled in coding, algorithm exploration, and data handling across team and personal projects
  • Proactive in learning new technologies and sharing knowledge through technical blogging
  • Strong logical thinking with a focus on efficiency and practical solutions
  • Confident in team leadership, task coordination, and delivering clear technical presentations
  • Continuously improving personal learning methods to boost performance and adaptability

Skills

A focused view of the technical stack and evaluation workflow behind the projects, research, and prototypes shown in my resume and portfolio.

Programming & Foundations

Core tools I use to build, version, and analyze ML workflows.

Python
SQL
Git
Feature Engineering

ML Frameworks

Libraries used for training, evaluation, and experimentation.

PyTorch
scikit-learn
Transformers
XGBoost

GenAI & Agent Systems

Capabilities used to build assistant-style workflows and grounded AI products.

RAG
LangGraph
LangChain
Tool Calling
Prompt Engineering

Backend & Full-stack

Application-layer technologies I use to ship usable AI and analytics products.

FastAPI
Next.js
React
PostgreSQL
WebSocket
JWT Auth

Data Engineering & BI

Infrastructure and analytics tools used in streaming and decision-support projects.

Kafka
Spark
ClickHouse
Dashboarding
Streaming Analytics
Docker

Applied AI Domains

Problem areas I have worked on through research, coursework, and competitions.

NLP
NER
Information Extraction
Computer Vision
Time-series Modeling
Decision Support

Deployment & Prototyping

Tools used to turn experiments into working demos and interactive prototypes.

Streamlit
Gradio
Hugging Face Spaces
Vercel
OpenCV
Image Preprocessing

Research & Evaluation

Methods and metrics I rely on to compare models and communicate results clearly.

Experiment Design
Benchmarking
Ablation Studies
Error Analysis
F1 / AUC / IoU / Dice
Statistical Testing

Education

Bachelor of Science in Data Science

University of Economics Ho Chi Minh City (UEH)

Oct 2023 - Mar 2027 (Expected)

Academic Achievements

  • GPA:3.78/4.0
  • Merit-based Scholarship for Academic Excellence - Semester 2

Relevant Coursework

Data Structures and Algorithms
Data Base
Econometrics
Artificial Intelligence
Data Science
Mathematical Statistics
Data Mining
Machine Learning
Data Analytics Programming
Data Visualization
Big Data and Applications
NLP

Academic Achievements

  • Consolation Prize - National Excellent Student Contest in Chemistry, Vietnam (2023)

High School Diploma

Lê Quý Đôn High School for the Gifted, Ninh Thuận Province

2020 - 2023

Work & Research

Sort by:
KKBox Real-time Customer Churn BI Dashboard
Project
Business Intelligence
2026

KKBox Real-time Customer Churn BI Dashboard

UEH Business Intelligence course project with streaming analytics and decision support

Built a near real-time Business Intelligence system for KKBox churn monitoring and retention decision support. The project combines Kafka log replay, Spark Structured Streaming, ClickHouse OLAP storage, FastAPI APIs, and a React dashboard to deliver descriptive, predictive-proxy, and prescriptive analysis in one workflow.

BI Engineer
Full-stack Developer

Outcome

  • Delivered a 3-tab decision-support dashboard spanning descriptive analysis, predictive-proxy scoring, and prescriptive scenario simulation
  • Built end-to-end near real-time data flow: replay logs -> Kafka -> Spark Structured Streaming -> ClickHouse -> FastAPI -> React dashboard
Kafka
Spark Streaming
ClickHouse
FastAPI
React
Vite
Docker
Churn Analytics
Status:
completed
TomatoHub – AI-Powered Relief Campaign Platform
Competition
Social Impact
2026

TomatoHub – AI-Powered Relief Campaign Platform

LotusHacks x HackHarvard x GenAI Fund Vietnam Hackathon submission

TomatoHub is a full-stack platform for charity operations that helps organizations launch campaigns faster, lets supporters donate or volunteer with clearer trust signals, and keeps campaign activity transparent. The product combines role-based workflows, QR-based check-in/check-out, public transparency logs, and AI-assisted campaign drafting plus supporter recommendations.

Full-stack Developer
AI Product Builder

Outcome

  • Shipped a monorepo product with public pages, role-based dashboards, campaign lifecycle management, donation flow, and volunteer registration flow
  • Implemented QR-based volunteer and goods checkpoint logic alongside public transparency logs for auditability
Next.js
FastAPI
PostgreSQL
SQLAlchemy
JWT Auth
QR Check-in
OpenAI
Hackathon
Status:
completed
Vietnamese Medical Information Extraction (NER + Relation Extraction)
Project
NLP
2025

Vietnamese Medical Information Extraction (NER + Relation Extraction)

UEH NLP course final project with semi-supervised IE pipeline for medical text

UEH NLP course final project building an end-to-end Information Extraction system for Vietnamese medical text. Implemented a pipeline architecture (NER → Entity Pairing → Relation Extraction) inspired by PURE, recognized 5 entity types and 4 relation types, and used semi-supervised hybrid learning with silver data to overcome limited labeled data.

Team Member
NLP Developer

Outcome

  • Hybrid semi-supervised RE achieved 81.25% accuracy and 0.631 Macro-F1 (MLP + BERT)
  • Semi-supervised approach improved F1 from 0.599 (Standard) to 0.631 (Hybrid) with silver data augmentation
Python
PhoBERT
Label Studio
spaCy
Gradio
NER
Relation Extraction
Status:
completed
VN Stock Analytics – Investment Decision Support System
Project
AI/ML
2025

VN Stock Analytics – Investment Decision Support System

Multi-model data mining with LLMs reasoning for Vietnamese banking stocks

Data Mining course final project building comprehensive analytics system for 14 Vietnamese banking stocks in VN30 index. Integrated multi-source data (market OHLCV, financial reports, macro indicators, news sentiment) and developed 4 XGBoost models: Return Regression, Direction Classification, Risk Forecasting, and Regime Detection. LLMs layer provides reasoning and investment recommendations in natural language.

Team Lead
Developer

Outcome

  • Return Regression achieved MAE 0.094 and RMSE 0.119 on 21-day log-return prediction
  • Risk Model achieved 0.98 correlation between predicted and actual volatility
Python
XGBoost
LLMs
Sentiment Analysis
Feature Engineering
Streamlit
FastAPI
Status:
completed
Vietnam Weather Prediction with Softmax Regression
Project
Data Analytics
2025

Vietnam Weather Prediction with Softmax Regression

Multiclass weather classification using time-series feature engineering

Data Visualization course final project building weather prediction model for 34 Vietnamese provinces using Softmax Regression. Collected 265K+ records from Open-Meteo API (2005-2025), engineered lag features, cyclic encoding for seasonality, and accumulation features. Classified weather into 3 groups (Clear/Cloudy, Drizzle, Rain) with rigorous train/test split by time.

Team Lead
Developer

Outcome

  • Multiclass classification achieved 65.6% accuracy with macro F1-score 0.644
  • Feature engineering improved accuracy from 63.9% to 65.6% (+1.7pp)
Python
Softmax Regression
scikit-learn
Streamlit
Feature Engineering
Time-series
Status:
completed
Vietnamese Fake News Detection: Deep Learning vs Transfer Learning vs LLMs
Research
AI/ML
2025

Vietnamese Fake News Detection: Deep Learning vs Transfer Learning vs LLMs

First Prize @ UEH BIT Faculty Research | Presented at NCTD 2025 National Conference

Faculty-level research project providing comprehensive comparative evaluation of machine learning approaches for Vietnamese fake news detection. Systematically analyzed three major model families: traditional deep learning (BiLSTM with Word2Vec/FastText), transfer learning (PhoBERT frozen/fine-tuned), and large language models (Qwen2.5-7B, Llama-2-7B, DeepSeek) across zero-shot and few-shot paradigms. Evaluated on ReINTEL dataset (9,713 Vietnamese social media posts with 83.2% real vs 16.8% fake class imbalance).

Lead Researcher
Developer

Outcome

  • First Prize in Faculty-level Research Competition at Business Information Technology (BIT) Department, UEH
  • Paper presented at National Conference on Technology and Design 2025 (NCTD 2025) – Shaping Vietnam's Digital Future
PyTorch
PhoBERT
BiLSTM
LLMs
Qwen
Llama
Transfer Learning
Vietnamese NLP
Status:
completed
Breast Cancer Ultrasound CAD: Sequential vs Multi-task Deep Learning
Research
Computer Vision
2025

Breast Cancer Ultrasound CAD: Sequential vs Multi-task Deep Learning

BIT Genesis Research Award 2025 @ UEH | Presented at NCTD 2025 National Conference

Faculty-level research comparing Sequential and Multi-task Learning architectures for breast cancer diagnosis from ultrasound images. Built on U-Net with EfficientNet-B4 backbone, systematically evaluated Deformable Convolution and Capsule Network modules through ablation study. Evaluated on BUSI dataset (780 images: Normal/Benign/Malignant) with rigorous statistical testing (Shapiro-Wilk, Mann-Whitney U, Kruskal-Wallis, Tukey HSD).

Researcher
Data Processing
Model Development

Outcome

  • BIT Genesis Research Award 2025 at Business Information Technology Department, UEH
  • Paper presented at National Conference on Technology and Design 2025 (NCTD 2025) – Shaping Vietnam's Digital Future
PyTorch
U-Net
EfficientNet
Multi-task Learning
Deformable Conv
Capsule Network
Medical Imaging
BUSI Dataset
Status:
completed
GA Maximum Flow Solver – Network Optimization with Genetic Algorithm
Project
AI/ML
2025

GA Maximum Flow Solver – Network Optimization with Genetic Algorithm

Interactive visualization of evolutionary approach for Maximum Network Flow Problem

Artificial Intelligence course final project applying Genetic Algorithm to solve Maximum Network Flow Problem. Implemented custom GA operators: path-based crossover to maintain flow conservation, adaptive mutation for escaping local optima, and balance flow mechanism. Built interactive Python GUI for real-time graph editing, parameter tuning, and algorithm comparison with Ford-Fulkerson.

Team Lead
Developer

Outcome

  • Achieved up to 100% optimality ratio on graphs with ≤30 nodes, competitive with Ford-Fulkerson exact solution
  • Path-based crossover maintains flow conservation constraint, avoiding invalid offspring after genetic operations
Python
PyQt5
Genetic Algorithm
Network Flow
Ford-Fulkerson
Visualization
Status:
completed
Top 3 – Humanitarian Logistics Hackathon
Competition
Logistics
2025

Top 3 – Humanitarian Logistics Hackathon

Smart surplus-food allocation for underserved communities

Collaborated with a cross-university team to build a logistics solution combining data management, ML allocation, and IoT warehouse tracking to reduce food waste.

Product
Research

Outcome

  • Top 3 finalist across HCMC universities
  • Proposed ML-driven allocation reducing surplus mismatch
Hackathon
Machine Learning
IoT
Teamwork
Status:
completed
Minute – Agentic S/CRAG AI Meeting Co-Host for BFSI
Competition
AI/ML
2025

Minute – Agentic S/CRAG AI Meeting Co-Host for BFSI

VNPT AI Hackathon 2025 | Track 1 – Desktop/Web + GoMeet/Google Meet Add-in

Minute standardizes the meeting lifecycle for BFSI/LPBank enterprises: pre-meeting context gathering, real-time in-meeting assistance, and post-meeting minutes + action items generation—all with citations, audit trails, and access control. Built with SAAR (Self-aware Adaptive Agentic RAG) architecture featuring stage-aware routing (Pre/In/Post), real-time WS pipeline (audio → SmartVoice STT → session bus → live transcript/recap/ADR), permission-aware RAG with pgvector, and tool-calling with human-in-the-loop confirmation.

Lead Developer
Data Engineer
AI Engineer

Outcome

  • Built end-to-end AI meeting workflow: Pre-meeting (agenda + pre-read) → In-meeting (live transcript, recap, ADR extraction) → Post-meeting (executive summary, MoM, task sync)
  • Implemented SAAR architecture with stage-aware LangGraph routing, graded RAG retrieval, and self-corrective loops
Hackathon
LangGraph
RAG
Tool-Calling
FastAPI
WebSocket
Electron
VNPT AI
Status:
completed