Extract Row Ranges by Pattern
Usage
extractRowRanges(
x,
pattern,
column = NULL,
starts = NULL,
startOffset = 1L,
stopOffset = 1L,
nameByMatch = FALSE,
nameColumnsByMatch = TRUE,
renumber = TRUE
)
Arguments
- x
data frame or matrix or vector of character (representing e.g. the rows read from a text file)
- pattern
pattern to be searched for, either in
[, column]
(ifx
is a data frame or a matrix) or inx
(ifx
is a vector of character)- column
name of column in which to search for pattern if
x
is a data frame or a matrix- starts
optional. Vector of indices representing the starts of the row ranges to be extracted. This argument overrides
pattern
. Instead of using the pattern to find the start indices, the indices given here are used.- startOffset
row offset to be added to row number in which the pattern matches
- stopOffset
row offset to be subtracted from row number in which the pattern matches
- nameByMatch
logical. if
TRUE
, the elements in the result list are named by the matching values. Defaults toFALSE
.- nameColumnsByMatch
if
TRUE
(default) the columns of the result data frames or matrices are named (ifx
is a data frame or a matrix)- renumber
if
TRUE
(default) andx
is a data frame or a matrix the row names of the result data frames or matrices are reset toNULL
, i.e. their rows are renumbered
Value
list of data frames or list of matrices or list of vectors of
character. Each list element represents one section of the input, found
between two consecutive matches of pattern
.
Examples
textLines <- c(
"Date,Value",
"1.1.,1",
"2.1.,2",
",",
"Date,Value",
"3.1.,3",
"4.1.,4",
",",
"Date,Value",
"5.1.,5",
"6.1.,6"
)
# Convert textLines to data frame. The function should be able to handle both.
(dataFrame <- read.table(text = textLines, sep = ",", stringsAsFactors = FALSE))
#> V1 V2
#> 1 Date Value
#> 2 1.1. 1
#> 3 2.1. 2
#> 4
#> 5 Date Value
#> 6 3.1. 3
#> 7 4.1. 4
#> 8
#> 9 Date Value
#> 10 5.1. 5
#> 11 6.1. 6
# stopOffset = 2L: skip empty line at the bottom of each sub-table
extractRowRanges(
textLines,
pattern = "Date",
stopOffset = 2L,
)
#> [[1]]
#> [1] "1.1.,1" "2.1.,2"
#>
#> [[2]]
#> [1] "3.1.,3" "4.1.,4"
#>
#> [[3]]
#> [1] "5.1.,5" "6.1.,6"
#>
extractRowRanges(
dataFrame,
pattern = "Date",
column = "V1",
stopOffset = 2L
)
#> [[1]]
#> Date Value
#> 1 1.1. 1
#> 2 2.1. 2
#>
#> [[2]]
#> Date Value
#> 1 3.1. 3
#> 2 4.1. 4
#>
#> [[3]]
#> Date Value
#> 1 5.1. 5
#> 2 6.1. 6
#>
# Extract sections after a header line
# startOffset = 2L: skip empty line after header "topic: ..."
textLines <- c(
"topic: A",
"",
" a.1",
" a.2",
" a.3",
"topic: B",
"",
" b.1",
"topic: C",
"",
" c.1",
" c.2"
)
extractRowRanges(
textLines,
pattern = "topic",
startOffset = 2L,
nameByMatch = TRUE
)
#> $topic_A
#> [1] " a.1" " a.2" " a.3"
#>
#> $topic_B
#> [1] " b.1"
#>
#> $topic_C
#> [1] " c.1" " c.2"
#>