Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

"cast( data ->> 'temperature' as DOUBLE PRECISION)" is to find the 'temperature' in data and convert it to double format. Using the following SQL can make it more responsive ( just running once):



Code Block
languagesql
create or replace function cast_to_double(text) returns DOUBLE PRECISION as

...

Code Block
languagesql
 $$


begin
    -- Note the double casting to avoid infinite recursion.
    return cast($1::varchar as DOUBLE PRECISION);
exception
    when invalid_text_representation then
        return 0.0;
end;
$$ language plpgsql immutable;
create cast (text as DOUBLE PRECISION) with function cast_to_double(text);

...


Method 1Method 2Method 3
Planning time0.269 ms0.259 ms0.145 ms
Execution time3069.059 ms6029.105 ms12008.801 ms


Caching design

If we store the count and sum of the values in addition to average, it becomes easy to update the bin with a new datapoint.

bins_year

sensor_idyearfieldcountsumaveragestart_timeend_timeupdated
123452003temperature1208400.060.0


123452003pH120240.02.0



  • are start/end times for bins actually useful for anything? there could be holes in between endpoints
  • store completeness by sensor /stream?

bins_month

sensor_idmonthyearfieldcountsumaveragestart_timeend_timeupdated
1234562003temperature10600.060.0


1234562003pH1020.02.0


bins_day


sensor_iddaymonthyearfieldcountsumaveragestart_timeend_timeupdated
123451362003temperature160.060.0


123451362003pH12.02.0


bins_hour


sensor_idhourdaymonthyearfieldcountsumaveragestart_timeend_timeupdated
12345191362003temperature160.060.0


12345191362003pH12.02.0



bins_special

sensor_idlabelfieldcountsumaveragestart_timeend_timeupdated
12345springtemperature3180.060.001/01/200303/31/2003
12345springpH36.02.001/01/200303/31/2003
12345springpH12.02.001/01/200301/31/2003

for bins_special, do we actually need count/sum/average here, or does it simply need a start/end time and an aggregation level (year/month/etc) that defines the custom aggregation unit and use the bins_year, bins_month to populate?

  • for spring, we get monthly averages only within start/end time
  • for spring, we get yearly average only including months within start/end

bins_special (alt option)

labelstart_timeend_timeupdated
springJan 1Mar 31
springJan 1Mar 31
  • if i want special bin by year, only consider points between start and end time. 
    • if start/end time includes entire year, use bins_year
    • if < 1 year time span, aggregate month + day bins until you cover entire time span
  • if i want by months, include each month between start/end time
    • for complete months, use bins_month
    • for partial months, aggregate day bins until you cover entire time span


Other possible tables:

bins_season - do we need to cache this, or calculate from monthly bins? latter option suggested above.

bins_total - do we need to cache this, or is it fast enough to calculate from yearly bins?

I don't think we want cache table for water_year, for example, because that is specific to GLM/GLTG and not generic for clowder. We could use the month caches to quickly calculate that on the fly.

When do we update cache tables?

  • cron job (hourly? 5 minutes?)
  • whenever new datapoint is added (at most 1 bin per table would need to be created or updated) - upsert


Trends region

Image Added