Builds groups of rows belonging to the same time interval and aggregates the values within the group by using a given function (e.g. sum, mean, min, max)
hsGroupByInterval(data, interval, FUN, tsField = names(data)[1], offset1 = 0, offset2 = interval/2, limits = FALSE, ..., dbg = FALSE)
data | data frame containing a timestamp field and data fields to be aggregated over time. |
---|---|
interval | length of time interval in seconds |
FUN | function used to aggregate the values within one and the same interval, e.g. sum, mean, min, max |
tsField | name of timestamp column, default: name of first column |
offset1 | number of seconds by which all timestamps are shifted before they are grouped into intervals. The grouping to intervals is done by dividing the timestamps (converted to number of seconds since 1970-01-01) by the interval length and taking the integer part of the division as interval number. Thus, with offset1 = 0 and an interval length of e.g. 60 seconds, the first interval is from 00:00:00 to 00:00:59, the second from 00:01:00 to 00:01:59 etc., whereas offset1 = 30 in this case would lead to intervals 00:00:30 to 00:01:29, 00:01:30 to 00:02:29 etc.. |
offset2 | value given in seconds determining which of the timestamps in an interval represents the interval in the output. If 0, each time interval is represented by the smallest timestamp belonging to the interval. By default, offset2 is half the interval length, meaning that each time interval is represented by the timestamp in the middle of the interval. |
limits | if TRUE, two additional columns will be added showing the minimum and maximum value of the interval |
... | further arguments passed to aggregate, the internally called function |
dbg | if TRUE, debug messages are shown |
## Get an example time-series with values every one minute step <- 60 df <- hsExampleTSeries(step) ## Calculate 5-min-means with ## offset1 = 0 (default), offset2 = interval/2 (default) df.agg1 <- hsGroupByInterval(df, interval = 5*step, mean, limits = TRUE) df.agg1#> t t.beg t.end y #> 1 2012-01-01 12:02:30 2012-01-01 12:00:00 2012-01-01 12:04:59 5.313752e-01 #> 2 2012-01-01 12:07:30 2012-01-01 12:05:00 2012-01-01 12:09:59 7.313752e-01 #> 3 2012-01-01 12:12:30 2012-01-01 12:10:00 2012-01-01 12:14:59 -5.313752e-01 #> 4 2012-01-01 12:17:30 2012-01-01 12:15:00 2012-01-01 12:19:59 -7.313752e-01 #> 5 2012-01-01 12:22:30 2012-01-01 12:20:00 2012-01-01 12:24:59 -2.449294e-16## Shift the interval limits with ## offset1 = 2.5*60, offset2 = interval/2 (default) df.agg2 <- hsGroupByInterval(df, interval = 5*step, mean, limits = TRUE, offset1 = 2.5*step) df.agg2#> t t.beg t.end y #> 1 2012-01-01 12:00:00 2012-01-01 11:57:30 2012-01-01 12:02:29 2.989341e-01 #> 2 2012-01-01 12:05:00 2012-01-01 12:02:30 2012-01-01 12:07:29 9.040294e-01 #> 3 2012-01-01 12:10:00 2012-01-01 12:07:30 2012-01-01 12:12:29 1.910147e-16 #> 4 2012-01-01 12:15:00 2012-01-01 12:12:30 2012-01-01 12:17:29 -9.040294e-01 #> 5 2012-01-01 12:20:00 2012-01-01 12:17:30 2012-01-01 12:22:29 -2.989341e-01## Shift the timestamps representing the intervals with ## offset1 = 0, offset2 = 0 df.agg3 <- hsGroupByInterval(df, interval = 5*step, mean, limits = TRUE, offset1 = 0, offset2 = 0) df.agg3#> t t.beg t.end y #> 1 2012-01-01 12:00:00 2012-01-01 12:00:00 2012-01-01 12:04:59 5.313752e-01 #> 2 2012-01-01 12:05:00 2012-01-01 12:05:00 2012-01-01 12:09:59 7.313752e-01 #> 3 2012-01-01 12:10:00 2012-01-01 12:10:00 2012-01-01 12:14:59 -5.313752e-01 #> 4 2012-01-01 12:15:00 2012-01-01 12:15:00 2012-01-01 12:19:59 -7.313752e-01 #> 5 2012-01-01 12:20:00 2012-01-01 12:20:00 2012-01-01 12:24:59 -2.449294e-16## Show a plot demonstrating the effect of offset1 and offset2 if (FALSE) { demoGroupByInterval(df) } ## Handling NA values... ## Set y to NA at 2 random positions df[sample(nrow(df), 2), 2] <- NA df ## Let' have a look at df#> t y #> 1 2012-01-01 12:00:00 0.000000e+00 #> 2 2012-01-01 12:01:00 3.090170e-01 #> 3 2012-01-01 12:02:00 5.877853e-01 #> 4 2012-01-01 12:03:00 8.090170e-01 #> 5 2012-01-01 12:04:00 9.510565e-01 #> 6 2012-01-01 12:05:00 1.000000e+00 #> 7 2012-01-01 12:06:00 9.510565e-01 #> 8 2012-01-01 12:07:00 8.090170e-01 #> 9 2012-01-01 12:08:00 5.877853e-01 #> 10 2012-01-01 12:09:00 3.090170e-01 #> 11 2012-01-01 12:10:00 1.224647e-16 #> 12 2012-01-01 12:11:00 NA #> 13 2012-01-01 12:12:00 NA #> 14 2012-01-01 12:13:00 -8.090170e-01 #> 15 2012-01-01 12:14:00 -9.510565e-01 #> 16 2012-01-01 12:15:00 -1.000000e+00 #> 17 2012-01-01 12:16:00 -9.510565e-01 #> 18 2012-01-01 12:17:00 -8.090170e-01 #> 19 2012-01-01 12:18:00 -5.877853e-01 #> 20 2012-01-01 12:19:00 -3.090170e-01 #> 21 2012-01-01 12:20:00 -2.449294e-16#> t y #> 1 2012-01-01 12:02:30 0 #> 2 2012-01-01 12:07:30 0 #> 3 2012-01-01 12:12:30 2 #> 4 2012-01-01 12:17:30 0 #> 5 2012-01-01 12:22:30 0## default behaviour: mean(values containing at least one NA) = NA hsGroupByInterval(df, interval = 300, mean)#> t y #> 1 2012-01-01 12:02:30 5.313752e-01 #> 2 2012-01-01 12:07:30 7.313752e-01 #> 3 2012-01-01 12:12:30 NA #> 4 2012-01-01 12:17:30 -7.313752e-01 #> 5 2012-01-01 12:22:30 -2.449294e-16## ignore NA values by passing na.rm = TRUE to the aggregate function hsGroupByInterval(df, interval = 300, mean, na.rm = TRUE)#> t y #> 1 2012-01-01 12:02:30 5.313752e-01 #> 2 2012-01-01 12:07:30 7.313752e-01 #> 3 2012-01-01 12:12:30 -5.866912e-01 #> 4 2012-01-01 12:17:30 -7.313752e-01 #> 5 2012-01-01 12:22:30 -2.449294e-16