Extract Row Ranges by Pattern

Usage

extractRowRanges(
  x,
  pattern,
  column = NULL,
  starts = NULL,
  startOffset = 1L,
  stopOffset = 1L,
  nameByMatch = FALSE,
  nameColumnsByMatch = TRUE,
  renumber = TRUE
)

Arguments

x: data frame or matrix or vector of character (representing e.g. the rows read from a text file)
pattern: pattern to be searched for, either in [, column] (if x is a data frame or a matrix) or in x (if x is a vector of character)
column: name of column in which to search for pattern if x is a data frame or a matrix
starts: optional. Vector of indices representing the starts of the row ranges to be extracted. This argument overrides pattern. Instead of using the pattern to find the start indices, the indices given here are used.
startOffset: row offset to be added to row number in which the pattern matches
stopOffset: row offset to be subtracted from row number in which the pattern matches
nameByMatch: logical. if TRUE, the elements in the result list are named by the matching values. Defaults to FALSE.
nameColumnsByMatch: if TRUE (default) the columns of the result data frames or matrices are named (if x is a data frame or a matrix)
renumber: if TRUE (default) and x is a data frame or a matrix the row names of the result data frames or matrices are reset to NULL, i.e. their rows are renumbered

Value

list of data frames or list of matrices or list of vectors of character. Each list element represents one section of the input, found between two consecutive matches of pattern.

Examples

textLines <- c(
  "Date,Value",
  "1.1.,1",
  "2.1.,2",
  ",",
  "Date,Value",
  "3.1.,3",
  "4.1.,4",
  ",",
  "Date,Value",
  "5.1.,5",
  "6.1.,6"
)

# Convert textLines to data frame. The function should be able to handle both.
(dataFrame <- read.table(text = textLines, sep = ",", stringsAsFactors = FALSE))
#>      V1    V2
#> 1  Date Value
#> 2  1.1.     1
#> 3  2.1.     2
#> 4            
#> 5  Date Value
#> 6  3.1.     3
#> 7  4.1.     4
#> 8            
#> 9  Date Value
#> 10 5.1.     5
#> 11 6.1.     6

# stopOffset = 2L: skip empty line at the bottom of each sub-table
extractRowRanges(
  textLines,
  pattern = "Date", 
  stopOffset = 2L,
)
#> [[1]]
#> [1] "1.1.,1" "2.1.,2"
#> 
#> [[2]]
#> [1] "3.1.,3" "4.1.,4"
#> 
#> [[3]]
#> [1] "5.1.,5" "6.1.,6"
#> 

extractRowRanges(
  dataFrame,
  pattern = "Date",
  column = "V1",
  stopOffset = 2L
)
#> [[1]]
#>   Date Value
#> 1 1.1.     1
#> 2 2.1.     2
#> 
#> [[2]]
#>   Date Value
#> 1 3.1.     3
#> 2 4.1.     4
#> 
#> [[3]]
#>   Date Value
#> 1 5.1.     5
#> 2 6.1.     6
#> 

# Extract sections after a header line
# startOffset = 2L: skip empty line after header "topic: ..."
textLines <- c(
  "topic: A",
  "",
  " a.1",
  " a.2",
  " a.3",
  "topic: B",
  "",
  " b.1",
  "topic: C",
  "",
  " c.1",
  " c.2"
)

extractRowRanges(
  textLines,
  pattern = "topic", 
  startOffset = 2L,
  nameByMatch = TRUE
)
#> $topic_A
#> [1] " a.1" " a.2" " a.3"
#> 
#> $topic_B
#> [1] " b.1"
#> 
#> $topic_C
#> [1] " c.1" " c.2"
#>