Data Storage Design โ Structuring Time-Series and Analytical Data
Data Storage Design โ Structuring Time-Series and Analytical Data
๐ฏ Overview
In a monitoring system, how data is stored directly impacts:
- Query performance
- Storage efficiency
- Insight generation capability
This post defines a dual-storage strategy using:
- InfluxDB โ raw time-series data
- MySQL โ aggregated analytical data
๐๏ธ Storage Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
[Data Collection Layer]
โโ Telegraf
โโ Python (HWiNFO)
โ
[InfluxDB] (Raw Data)
โ
[Aggregation Layer]
โ
[MySQL] (Analytics Data)
โ
[Metabase / Dashboard]
๐ง Design Strategy
Why Two Databases?
| Database | Role |
|---|---|
| InfluxDB | High-frequency raw data |
| MySQL | Aggregated / reporting |
1. InfluxDB Schema Design
- Measurement
- system_metrics
- hwinfo_metrics
- Tag vs Field
| Type | Usage |
|---|---|
| Tag | filter / grouping |
| Field | actual value |
- Recommended Schema โ Do NOT store numeric values as tags
system_metrics
1
2
3
4
5
6
7
measurement: system_metrics
tags:
- host
fields:
- cpu_usage
- mem_usage
- disk_usage
hwinfo_metrics
1
2
3
4
5
6
7
measurement: hwinfo_metrics
tags:
- host
fields:
- cpu_temp
- fan_speed
- power
๐งช Example Data (Line Protocol)
1
2
system_metrics,host=pc-01 cpu_usage=25.3,mem_usage=60.2
hwinfo_metrics,host=pc-01 cpu_temp=55.2,fan_speed=1200
๐ Retention Policy
- High-frequency data grows fast
- Storage cost increases
| Data Type | Retention |
|---|---|
| Raw sensor data | 7~30 days |
| Aggregated data | long-term |
๐ Downsampling Strategy
Example:
1
2
Raw: 3 sec interval
Downsample: 1 min avg
๐๏ธ 2. MySQL Schema Design
Purpose
- Store aggregated metrics
- Enable BI queries (Metabase)
1. ๐ Aggregation Flow
1
2
3
4
5
InfluxDB (raw)
โ
Python Aggregation
โ
MySQL (summary)
2. ๐ง Design Decisions
1. Separation of Raw vs Aggregated Data
- InfluxDB โ fast writes
- MySQL โ fast queries
2. Avoid Heavy Queries in InfluxDB
- Complex analytics โ MySQL
3. Optimize for Read vs Write
| System | Optimization |
|---|---|
| InfluxDB | write-heavy |
| MySQL | read-heavy |
4. Common Mistakes
โ Using only MySQL โ Poor performance for time-series
โ Using only InfluxDB โ Poor BI support
โ Wrong tag design โ Query performance drop
3. ๐ฏ Key Takeaways
- Use InfluxDB for ingestion
- Use MySQL for analytics
- Separate concerns clearly
๐ Next Step
In the next post, we will implement the data pipeline and aggregation logic, including:
- Python aggregation jobs
- Scheduling strategy
- Data transformation
This post is licensed under CC BY 4.0 by the author.