Post

Data Storage Design โ€” Structuring Time-Series and Analytical Data

Data Storage Design โ€” Structuring Time-Series and Analytical Data

๐ŸŽฏ Overview

In a monitoring system, how data is stored directly impacts:

  • Query performance
  • Storage efficiency
  • Insight generation capability

This post defines a dual-storage strategy using:

  • InfluxDB โ†’ raw time-series data
  • MySQL โ†’ aggregated analytical data

๐Ÿ—๏ธ Storage Architecture

1
2
3
4
5
6
7
8
9
10
11
12
13
[Data Collection Layer]
   โ”œโ”€ Telegraf
   โ””โ”€ Python (HWiNFO)

        โ†“

[InfluxDB] (Raw Data)
        โ†“
[Aggregation Layer]
        โ†“
[MySQL] (Analytics Data)
        โ†“
[Metabase / Dashboard]

๐Ÿง  Design Strategy

Why Two Databases?

DatabaseRole
InfluxDBHigh-frequency raw data
MySQLAggregated / reporting

1. InfluxDB Schema Design

  1. Measurement
  • system_metrics
  • hwinfo_metrics
  1. Tag vs Field
TypeUsage
Tagfilter / grouping
Fieldactual value
  1. Recommended Schema โ— Do NOT store numeric values as tags

system_metrics

1
2
3
4
5
6
7
measurement: system_metrics
tags:
  - host
fields:
  - cpu_usage
  - mem_usage
  - disk_usage

hwinfo_metrics

1
2
3
4
5
6
7
measurement: hwinfo_metrics
tags:
  - host
fields:
  - cpu_temp
  - fan_speed
  - power

๐Ÿงช Example Data (Line Protocol)

1
2
system_metrics,host=pc-01 cpu_usage=25.3,mem_usage=60.2
hwinfo_metrics,host=pc-01 cpu_temp=55.2,fan_speed=1200

๐Ÿ•’ Retention Policy

  • High-frequency data grows fast
  • Storage cost increases
Data TypeRetention
Raw sensor data7~30 days
Aggregated datalong-term

๐Ÿ“Š Downsampling Strategy

Example:

1
2
Raw: 3 sec interval
Downsample: 1 min avg

๐Ÿ—„๏ธ 2. MySQL Schema Design

Purpose

  • Store aggregated metrics
  • Enable BI queries (Metabase)

1. ๐Ÿ”„ Aggregation Flow

1
2
3
4
5
InfluxDB (raw)
        โ†“
Python Aggregation
        โ†“
MySQL (summary)

2. ๐Ÿง  Design Decisions

1. Separation of Raw vs Aggregated Data

  • InfluxDB โ†’ fast writes
  • MySQL โ†’ fast queries

2. Avoid Heavy Queries in InfluxDB

  • Complex analytics โ†’ MySQL

3. Optimize for Read vs Write

SystemOptimization
InfluxDBwrite-heavy
MySQLread-heavy

4. Common Mistakes

โŒ Using only MySQL โ†’ Poor performance for time-series

โŒ Using only InfluxDB โ†’ Poor BI support

โŒ Wrong tag design โ†’ Query performance drop


3. ๐ŸŽฏ Key Takeaways

  • Use InfluxDB for ingestion
  • Use MySQL for analytics
  • Separate concerns clearly

๐Ÿš€ Next Step

In the next post, we will implement the data pipeline and aggregation logic, including:

  • Python aggregation jobs
  • Scheduling strategy
  • Data transformation

This post is licensed under CC BY 4.0 by the author.