...
Method 1 | Method 2 | Method 3 | |
---|---|---|---|
Planning time | 0.269 ms | 0.259 ms | 0.145 ms |
Execution time | 3069.059 ms | 6029.105 ms | 12008.801 ms |
Caching design
...
If we store the count and sum of the values in addition to average, it becomes easy to update the bin with a new datapoint.
bins_year
sensor_id | year | field | count | sum | average |
---|---|---|---|---|---|
12345 | 2003 | temperature | 120 | 8400 | 60.0 |
12345 | 2003 | pH | 120 | 240 | 2.0 |
bins_month
sensor_id | month | year | field | count | sum | average |
---|---|---|---|---|---|---|
12345 | 6 | 2003 | temperature | 10 | 600 | 60.0 |
12345 | 6 | 2003 | pH | 10 | 20 | 2.0 |
bins_day
sensor_id | day | month | year | field | count | sum | average |
---|---|---|---|---|---|---|---|
12345 | 13 | 6 | 2003 | temperature | 1 | 60 | 60.0 |
12345 | 13 | 6 | 2003 | pH | 1 | 2 | 2.0 |
bins_hour
sensor_id | hour | day | month | year | field | count | sum | average |
---|---|---|---|---|---|---|---|---|
12345 | 19 | 13 | 6 | 2003 | temperature | 1 | 60 | 60.0 |
12345 | 19 | 13 | 6 | 2003 | pH | 1 | 2 | 2.0 |
Other possible tables:
bins_season - do we need to cache this, or calculate from monthly bins?
bins_total - do we need to cache this, or is it fast enough to calculate from yearly bins?
I don't think we want cache table for water_year, for example, because that is specific to GLM/GLTG and not generic for clowder. We could use the month caches to quickly calculate that on the fly.
When do we update cache tables?
- cron job (hourly? 5 minutes?)
- whenever new datapoint is added (at most 1 bin per table would need to be created or updated) - upsert