Back to Work & Research
Project
completed
2025

Vietnam Weather Prediction with Softmax Regression

Multiclass weather classification using time-series feature engineering

Data Visualization course final project building weather prediction model for 34 Vietnamese provinces using Softmax Regression. Collected 265K+ records from Open-Meteo API (2005-2025), engineered lag features, cyclic encoding for seasonality, and accumulation features. Classified weather into 3 groups (Clear/Cloudy, Drizzle, Rain) with rigorous train/test split by time.

Data Analytics
AI/ML
Time-series
Team Lead
Developer
Vietnam Weather Prediction with Softmax Regression

Timeline

2025

Type

Project

Status

completed

Outcome / Impact

  • Multiclass classification achieved 65.6% accuracy with macro F1-score 0.644
  • Feature engineering improved accuracy from 63.9% to 65.6% (+1.7pp)
  • Deployed interactive Streamlit app on Huggingface Spaces for weather prediction demo

Tech / Skills

Python
Softmax Regression
scikit-learn
Streamlit
Feature Engineering
Time-series

Case Study

1) Context / Problem

Weather forecasting is critical for agriculture, energy, tourism in Vietnam. Traditional methods are complex and resource-intensive. This project aimed to build a simple yet effective Softmax Regression model for multiclass weather prediction using only basic meteorological inputs, suitable for local deployment.

2) Your Role

As Team Lead, I coordinated the team and was responsible for data collection from Open-Meteo API for 34 provinces, feature engineering (lag features, cyclic encoding, diff/accumulation features), model training with L-BFGS optimization, hyperparameter tuning, and deploying the Streamlit demo on Huggingface Spaces.

3) Approach

Collected 265,265 daily records across 34 provinces (2005-2025). Grouped weathercodes into 3 classes: Clear/Cloudy, Drizzle, Rain. Engineered features: (1) Cyclic sin-cos encoding for month seasonality, (2) 7-day lag features for precipitation/temperature/humidity/wind, (3) Diff features for temperature/humidity change, (4) 3-day rain accumulation. Used StandardScaler + OneHotEncoder preprocessing, L-BFGS solver with balanced class weights.

4) Result / Impact

Softmax Regression achieved 65.6% accuracy on 2024-2025 test set. Feature engineering improved performance from 63.9% baseline by +1.7pp. Rain class performed best (F1 0.748), followed by Clear/Cloudy (F1 0.701). Drizzle class remained challenging (F1 0.484) due to transitional nature.

5) Learnings

Lag features and cyclic encoding significantly improve time-series classification. Class imbalance requires balanced weighting. Drizzle (transitional weather) is inherently harder to predict than distinct states. Future work: deep learning models (LSTM), more granular temporal features, ensemble methods.

6) Links

See links above.