Dynamic state space modeling with hdp Hmm: a latentdynamicsbayes tutoria
LatentDynamicsBayes: HDP-HMM for Time Series Analysis
A comprehensive tutorial on using the Bayesian Non-Parametric Modeling framework with HDP-HMM for time series data analysis
Table of Contents
- Introduction
- Installation
- Features Overview
- Quick Start Guide
- Live Mode Tutorial
- Offline CSV Processing Tutorial
- Visualization Guide
- Advanced Configuration
- Understanding the Output
- How to Cite
- References
Introduction
LatentDynamicsBayes is a powerful implementation of the Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) with stick-breaking construction for unsupervised learning of state sequences in multidimensional time series data. This Bayesian non-parametric approach automatically determines the appropriate number of hidden states from the data, making it ideal for discovering latent patterns and structure in complex time series.
The implementation is specifically designed to work with both live streaming data (such as system metrics) and offline historical data (via CSV files). It supports incremental training, real-time inference, and comprehensive visualization, all accelerated with PyTorch and GPU computation when available.
What is HDP-HMM?
The HDP-HMM extends traditional Hidden Markov Models by using a Hierarchical Dirichlet Process prior, allowing the model to automatically determine the number of hidden states that best explains the data. This Bayesian non-parametric approach is particularly valuable when:
- The true number of states is unknown
- The complexity of the data may change over time
- You need to discover natural groupings in temporal data
- You want to avoid manual parameter tuning
Installation
Prerequisites
- Python 3.7+
- PyTorch 1.8+
- NumPy
- Matplotlib
- Seaborn
- pandas
- psutil (optional, for real system metrics)
Setup
- Clone the repository:
git clone https://github.com/yourusername/LatentDynamicsBayes.git
cd LatentDynamicsBayes
- Install the required dependencies:
pip install -r requirements.txt
- Verify the installation:
python demo.py
This will run a quick demonstration using simulated data.
Features Overview
LatentDynamicsBayes provides a comprehensive toolkit for time series analysis:
Core Features
- Bayesian non-parametric modeling with HDP-HMM to automatically determine the optimal number of states
- Dynamic state management with birth, merge, and delete operations
- Dual-mode operation: live streaming and offline batch processing
- PyTorch implementation with GPU acceleration
- Incremental model updates for continuous learning
- Model persistence with checkpointing
Visualization Suite
- Time series visualization with state assignments
- State pattern analysis showing what each state represents
- State evolution tracking with birth/merge/delete events
- Transition probability heatmaps
- Learning curves and model performance metrics
- State-specific time series analysis
- Composite visualizations combining multiple views
Practical Features
- Robust error handling with headless operation support
- Performance monitoring for tracking training and inference times
- Organized plot management with automatic directory structure
- Comprehensive logging with detailed state updates
- Sample data generation for testing and demonstration
Quick Start Guide
Basic Live Mode (with simulated data)
python main.py
Offline Mode with CSV Files
python main.py --data-dir data --window-size 50 --stride 25
Headless Operation (for servers without display)
python main.py --no-gui
Generate Sample Data for Testing
python generate_sample_data.py
Live Mode Tutorial
Live mode processes data in real-time, either from actual system metrics or simulated data. This is ideal for monitoring systems, continuous learning, and real-time pattern detection.
Step 1: Start with a simple run
python main.py
This will:
- Initialize the model with default parameters
- Generate simulated data
- Incrementally train the model on sliding windows of data
- Visualize the results in real-time
Step 2: Experiment with different parameters
python main.py --window-size 200 --max-windows 500
window-size
: Controls how many time steps are included in each sliding windowmax-windows
: Limits the total number of windows to process
Step 3: Save and analyze the results
The model automatically saves:
- The trained model to
models/hdp_hmm.pth
- Visualization plots to the
plots/
directory - Transition matrices to
plots/transition_matrix/
Step 4: Using real system metrics
To use real system metrics instead of simulated data:
python main.py --use-real
This will collect metrics such as CPU usage, memory utilization, and temperature readings (when available).
Offline CSV Processing Tutorial
Offline mode processes historical data from CSV files, which is ideal for retrospective analysis, batch processing, and model development on benchmark datasets.
Step 1: Prepare your CSV files
Each CSV file should:
- Have columns representing features
- Have rows representing time steps
- No header row is required (first row is treated as data)
Example CSV format:
0.5,1.2,0.8
0.6,1.3,0.7
0.7,1.4,0.6
...
Step 2: Generate sample data (optional)
python generate_sample_data.py
This will create sample CSV files in the data/
directory with known state patterns.
Step 3: Process the CSV files
python main.py --data-dir data --window-size 50 --stride 25
Parameters:
data-dir
: Directory containing CSV fileswindow-size
: Number of time steps in each windowstride
: Number of time steps to advance between windows (use smaller values for overlapping windows)
Step 4: Experiment with different window configurations
For non-overlapping windows:
python main.py --data-dir data --window-size 100 --stride 100
For heavily overlapping windows (75% overlap):
python main.py --data-dir data --window-size 100 --stride 25
Step 5: Analyze the results
After processing completes:
- Check the
plots/
directory for visualizations - Examine
final_state_patterns.png
to understand what each state represents - View
final_transition_matrix.png
to see state transition patterns - Explore state-specific time series in
plots/state_time_series/
Visualization Guide
The visualization system provides comprehensive insights into the model’s behavior and the discovered patterns in your data.
Key Visualizations
-
Time Series with State Assignments
This plot shows your raw data with color-coded state assignments, helping you see how the model segments your time series.
-
State Pattern Analysis
This visualization shows what pattern each state represents, including:
- Mean value for each feature
- Standard deviation (shaded area)
- Min/max range
- State frequency and typical duration
-
State Evolution Plot
This plot shows how the number of states changes over time, with markers for:
- Birth of new states (green triangles)
- Merge of similar states (orange circles)
- Deletion of inactive states (red triangles)
-
Transition Probability Heatmap
This heatmap shows the probability of transitioning from one state to another:
- Rows represent “from” states
- Columns represent “to” states
- Darker colors indicate higher probabilities
- Strong diagonal elements indicate persistent states
-
Learning Curve
This plot shows the model’s loss over time, helping you identify:
- Overall learning trend
- Convergence patterns
- Correlation between state changes and model performance
Interpreting Visualizations
When analyzing the visualizations, look for:
- In State Patterns:
- Distinct patterns for each state
- Clear separation between states
- Consistent patterns with low variance
- In Transition Matrix:
- Strong self-transitions (diagonal)
- Clear transition pathways between states
- Absence of uniform transition probabilities
- In State Evolution:
- Stabilization of state count over time
- Reduction in birth/merge/delete events as training progresses
- Correlation between state changes and learning curve improvements
Advanced Configuration
The behavior of the HDP-HMM model can be fine-tuned through several key parameters.
Model Parameters
These parameters can be adjusted in config.json
:
{
"model": {
"n_features": 3,
"max_states": 20,
"alpha": 1.0,
"gamma": 1.0,
"learning_rate": 0.01
}
}
n_features
: Number of input featuresmax_states
: Maximum number of states to consideralpha
: Concentration parameter for the HDPgamma
: Top-level concentration parameterlearning_rate
: Learning rate for optimization
Dynamic State Management
Fine-tune the birth, merge, and delete mechanisms:
{
"state_management": {
"delete_threshold": 1e-3,
"merge_distance": 0.5,
"birth_threshold": 10.0
}
}
delete_threshold
: Minimum beta weight for a state to remain activemerge_distance
: Maximum distance between means for state mergingbirth_threshold
: Negative log-likelihood threshold for creating new states
Tuning Recommendations
- High Noise Data: Increase
delete_threshold
(e.g., 5e-3) andmerge_distance
(e.g., 1.0) - Complex Systems: Decrease
birth_threshold
(e.g., 5.0) to allow more states - Computational Efficiency: Increase
delete_threshold
andbirth_threshold
- High Precision: Decrease
merge_distance
(e.g., 0.3) to prevent merging distinct states
Understanding the Output
State Interpretation
Each discovered state represents a distinctive pattern in your time series data. To interpret a state:
- Look at its mean pattern across features
- Check its variance (standard deviation) to assess consistency
- Examine when and how often it occurs in the sequence
- Analyze its incoming and outgoing transitions
Transition Dynamics
The transition matrix reveals the temporal dynamics of your system:
- High self-transition probabilities indicate persistent states
- Strong transitions between specific states suggest common patterns
- Absence of transitions between states indicates separate regimes or modes
State Count Evolution
The evolution of the state count provides insights into model complexity:
- Initial rapid increase in states as the model discovers patterns
- Merging of similar states as the model refines its understanding
- Eventual stabilization around an optimal number of states
How to Cite
If you use LatentDynamicsBayes in your research, please cite it as:
@software{LatentDynamicsBayes2025,
author = {Your Name},
title = {LatentDynamicsBayes: HDP-HMM for Time Series Analysis},
url = {https://github.com/yourusername/LatentDynamicsBayes},
version = {1.0.0},
year = {2025},
}
For the underlying methodologies, please also cite:
@article{Teh2006,
title={Hierarchical {D}irichlet Processes},
author={Teh, Yee Whye and Jordan, Michael I and Beal, Matthew J and Blei, David M},
journal={Journal of the American Statistical Association},
volume={101},
number={476},
pages={1566--1581},
year={2006}
}
@inproceedings{Fox2008,
title={An {HDP-HMM} for Systems with State Persistence},
author={Fox, Emily B and Sudderth, Erik B and Jordan, Michael I and Willsky, Alan S},
booktitle={Proceedings of the 25th International Conference on Machine Learning},
year={2008}
}
@inproceedings{Hughes2013,
title={Memoized Online Variational Inference for {D}irichlet Process Mixture Models},
author={Hughes, Michael C and Sudderth, Erik B},
booktitle={Advances in Neural Information Processing Systems},
year={2013}
}
References
-
Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet Processes. Journal of the American Statistical Association, 101(476), 1566-1581.
-
Fox, E. B., Sudderth, E. B., Jordan, M. I., & Willsky, A. S. (2008). An HDP-HMM for Systems with State Persistence. In Proceedings of the 25th International Conference on Machine Learning (ICML).
-
Hughes, M. C., & Sudderth, E. B. (2013). Memoized Online Variational Inference for Dirichlet Process Mixture Models. In Advances in Neural Information Processing Systems (NIPS).
-
Hughes, M. C., Stephenson, W. T., & Sudderth, E. (2015). Scalable Adaptation of State Complexity for Nonparametric Hidden Markov Models. In Advances in Neural Information Processing Systems.
This tutorial was created on June 17, 2025
LatentDynamicsBayes - A PyTorch implementation of HDP-HMM for time series analysis