...
sensor_id | year | field | count | sum | average | start_time | end_time | updated |
---|---|---|---|---|---|---|---|---|
12345 | 2003 | temperature | 120 | 8400.0 | 60.0 | |||
12345 | 2003 | pH | 120 | 240.0 | 2.0 |
- are start/end times for bins actually useful for anything? there could be holes in between endpoints
- store completeness by sensor /stream?
bins_month
sensor_id | month | year | field | count | sum | average | start_time | end_time | updated |
---|---|---|---|---|---|---|---|---|---|
12345 | 6 | 2003 | temperature | 10 | 600.0 | 60.0 | |||
12345 | 6 | 2003 | pH | 10 | 20.0 | 2.0 |
...
bins_special
sensor_id | label | hour | day | month | yearfield | count | sum | average | start_time | end_time | updated | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
12345 | water_yearspring | 2003 | temperature | 13 | 60180.0 | 60.0 | 01/01/2003 | 03/31/2003 | ||||
12345 | spring | pH | 3 | 6.0 | 2.0 | 01/01/2003 | 03/31/2003 | |||||
12345 | spring | water_year | 2003 | pH | 1 | 2.0 | 2.0 | 01/01/2003 | 01/31/2003 |
for bins_special, do we actually need count/sum/average here, or does it simply need a start/end time and an aggregation level (year/month/etc) that defines the custom aggregation unit and use the bins_year, bins_month to populate?
- for spring, we get monthly averages only within start/end time
- for spring, we get yearly average only including months within start/end
bins_special (alt option)
label | start_time | end_time | updated |
---|---|---|---|
spring | Jan 1 | Mar 31 | |
spring | Jan 1 | Mar 31 |
- if i want special bin by year, only consider points between start and end time.
- if start/end time includes entire year, use bins_year
- if < 1 year time span, aggregate month + day bins until you cover entire time span
- if i want by months, include each month between start/end time
- for complete months, use bins_month
- for partial months, aggregate day bins until you cover entire time span
Other possible tables:
bins_season - do we need to cache this, or calculate from monthly bins? latter option suggested above.
bins_total - do we need to cache this, or is it fast enough to calculate from yearly bins?
...
- cron job (hourly? 5 minutes?)
- whenever new datapoint is added (at most 1 bin per table would need to be created or updated) - upsert