Builds groups of rows belonging to the same time interval and aggregates the values within the group by using a given function (e.g. sum, mean, min, max)
Usage
hsGroupByInterval(
data,
interval,
FUN,
tsField = names(data)[1],
offset1 = 0,
offset2 = interval/2,
limits = FALSE,
...,
dbg = FALSE
)
Arguments
- data
data frame containing a timestamp field and data fields to be aggregated over time.
- interval
length of time interval in seconds
- FUN
function used to aggregate the values within one and the same interval, e.g. sum, mean, min, max
- tsField
name of timestamp column, default: name of first column
- offset1
number of seconds by which all timestamps are shifted before they are grouped into intervals. The grouping to intervals is done by dividing the timestamps (converted to number of seconds since 1970-01-01) by the interval length and taking the integer part of the division as interval number. Thus, with offset1 = 0 and an interval length of e.g. 60 seconds, the first interval is from 00:00:00 to 00:00:59, the second from 00:01:00 to 00:01:59 etc., whereas offset1 = 30 in this case would lead to intervals 00:00:30 to 00:01:29, 00:01:30 to 00:02:29 etc..
- offset2
value given in seconds determining which of the timestamps in an interval represents the interval in the output. If 0, each time interval is represented by the smallest timestamp belonging to the interval. By default, offset2 is half the interval length, meaning that each time interval is represented by the timestamp in the middle of the interval.
- limits
if TRUE, two additional columns will be added showing the minimum and maximum value of the interval
- ...
further arguments passed to aggregate, the internally called function
- dbg
if TRUE, debug messages are shown
Examples
## Get an example time-series with values every one minute
step <- 60
df <- hsExampleTSeries(step)
## Calculate 5-min-means with
## offset1 = 0 (default), offset2 = interval/2 (default)
df.agg1 <- hsGroupByInterval(df, interval = 5*step, mean, limits = TRUE)
df.agg1
#> t t.beg t.end y
#> 1 2012-01-01 12:02:30 2012-01-01 12:00:00 2012-01-01 12:04:59 5.313752e-01
#> 2 2012-01-01 12:07:30 2012-01-01 12:05:00 2012-01-01 12:09:59 7.313752e-01
#> 3 2012-01-01 12:12:30 2012-01-01 12:10:00 2012-01-01 12:14:59 -5.313752e-01
#> 4 2012-01-01 12:17:30 2012-01-01 12:15:00 2012-01-01 12:19:59 -7.313752e-01
#> 5 2012-01-01 12:22:30 2012-01-01 12:20:00 2012-01-01 12:24:59 -2.449213e-16
## Shift the interval limits with
## offset1 = 2.5*60, offset2 = interval/2 (default)
df.agg2 <- hsGroupByInterval(df, interval = 5*step, mean, limits = TRUE,
offset1 = 2.5*step)
df.agg2
#> t t.beg t.end y
#> 1 2012-01-01 12:00:00 2012-01-01 11:57:30 2012-01-01 12:02:29 2.989341e-01
#> 2 2012-01-01 12:05:00 2012-01-01 12:02:30 2012-01-01 12:07:29 9.040294e-01
#> 3 2012-01-01 12:10:00 2012-01-01 12:07:30 2012-01-01 12:12:29 1.910147e-16
#> 4 2012-01-01 12:15:00 2012-01-01 12:12:30 2012-01-01 12:17:29 -9.040294e-01
#> 5 2012-01-01 12:20:00 2012-01-01 12:17:30 2012-01-01 12:22:29 -2.989341e-01
## Shift the timestamps representing the intervals with
## offset1 = 0, offset2 = 0
df.agg3 <- hsGroupByInterval(df, interval = 5*step, mean, limits = TRUE,
offset1 = 0, offset2 = 0)
df.agg3
#> t t.beg t.end y
#> 1 2012-01-01 12:00:00 2012-01-01 12:00:00 2012-01-01 12:04:59 5.313752e-01
#> 2 2012-01-01 12:05:00 2012-01-01 12:05:00 2012-01-01 12:09:59 7.313752e-01
#> 3 2012-01-01 12:10:00 2012-01-01 12:10:00 2012-01-01 12:14:59 -5.313752e-01
#> 4 2012-01-01 12:15:00 2012-01-01 12:15:00 2012-01-01 12:19:59 -7.313752e-01
#> 5 2012-01-01 12:20:00 2012-01-01 12:20:00 2012-01-01 12:24:59 -2.449213e-16
## Show a plot demonstrating the effect of offset1 and offset2
if (FALSE) {
demoGroupByInterval(df)
}
## Handling NA values...
## Set y to NA at 2 random positions
df[sample(nrow(df), 2), 2] <- NA
df ## Let' have a look at df
#> t y
#> 1 2012-01-01 12:00:00 0.000000e+00
#> 2 2012-01-01 12:01:00 3.090170e-01
#> 3 2012-01-01 12:02:00 5.877853e-01
#> 4 2012-01-01 12:03:00 8.090170e-01
#> 5 2012-01-01 12:04:00 9.510565e-01
#> 6 2012-01-01 12:05:00 1.000000e+00
#> 7 2012-01-01 12:06:00 9.510565e-01
#> 8 2012-01-01 12:07:00 8.090170e-01
#> 9 2012-01-01 12:08:00 5.877853e-01
#> 10 2012-01-01 12:09:00 3.090170e-01
#> 11 2012-01-01 12:10:00 1.224606e-16
#> 12 2012-01-01 12:11:00 -3.090170e-01
#> 13 2012-01-01 12:12:00 -5.877853e-01
#> 14 2012-01-01 12:13:00 -8.090170e-01
#> 15 2012-01-01 12:14:00 NA
#> 16 2012-01-01 12:15:00 -1.000000e+00
#> 17 2012-01-01 12:16:00 -9.510565e-01
#> 18 2012-01-01 12:17:00 -8.090170e-01
#> 19 2012-01-01 12:18:00 -5.877853e-01
#> 20 2012-01-01 12:19:00 -3.090170e-01
#> 21 2012-01-01 12:20:00 NA
## Count NA values per group
hsGroupByInterval(df, interval = 300, function(x){sum(is.na(x))})
#> t y
#> 1 2012-01-01 12:02:30 0
#> 2 2012-01-01 12:07:30 0
#> 3 2012-01-01 12:12:30 1
#> 4 2012-01-01 12:17:30 0
#> 5 2012-01-01 12:22:30 1
## default behaviour: mean(values containing at least one NA) = NA
hsGroupByInterval(df, interval = 300, mean)
#> t y
#> 1 2012-01-01 12:02:30 0.5313752
#> 2 2012-01-01 12:07:30 0.7313752
#> 3 2012-01-01 12:12:30 NA
#> 4 2012-01-01 12:17:30 -0.7313752
#> 5 2012-01-01 12:22:30 NA
## ignore NA values by passing na.rm = TRUE to the aggregate function
hsGroupByInterval(df, interval = 300, mean, na.rm = TRUE)
#> t y
#> 1 2012-01-01 12:02:30 0.5313752
#> 2 2012-01-01 12:07:30 0.7313752
#> 3 2012-01-01 12:12:30 -0.4264548
#> 4 2012-01-01 12:17:30 -0.7313752
#> 5 2012-01-01 12:22:30 NaN