Skip to contents

Extract Row Ranges by Pattern

Usage

extractRowRanges(
  x,
  pattern,
  column = NULL,
  starts = NULL,
  startOffset = 1L,
  stopOffset = 1L,
  nameByMatch = FALSE,
  nameColumnsByMatch = TRUE,
  renumber = TRUE
)

Arguments

x

data frame or matrix or vector of character (representing e.g. the rows read from a text file)

pattern

pattern to be searched for, either in [, column] (if x is a data frame or a matrix) or in x (if x is a vector of character)

column

name of column in which to search for pattern if x is a data frame or a matrix

starts

optional. Vector of indices representing the starts of the row ranges to be extracted. This argument overrides pattern. Instead of using the pattern to find the start indices, the indices given here are used.

startOffset

row offset to be added to row number in which the pattern matches

stopOffset

row offset to be subtracted from row number in which the pattern matches

nameByMatch

logical. if TRUE, the elements in the result list are named by the matching values. Defaults to FALSE.

nameColumnsByMatch

if TRUE (default) the columns of the result data frames or matrices are named (if x is a data frame or a matrix)

renumber

if TRUE (default) and x is a data frame or a matrix the row names of the result data frames or matrices are reset to NULL, i.e. their rows are renumbered

Value

list of data frames or list of matrices or list of vectors of character. Each list element represents one section of the input, found between two consecutive matches of pattern.

Examples

textLines <- c(
  "Date,Value",
  "1.1.,1",
  "2.1.,2",
  ",",
  "Date,Value",
  "3.1.,3",
  "4.1.,4",
  ",",
  "Date,Value",
  "5.1.,5",
  "6.1.,6"
)

# Convert textLines to data frame. The function should be able to handle both.
(dataFrame <- read.table(text = textLines, sep = ",", stringsAsFactors = FALSE))
#>      V1    V2
#> 1  Date Value
#> 2  1.1.     1
#> 3  2.1.     2
#> 4            
#> 5  Date Value
#> 6  3.1.     3
#> 7  4.1.     4
#> 8            
#> 9  Date Value
#> 10 5.1.     5
#> 11 6.1.     6

# stopOffset = 2L: skip empty line at the bottom of each sub-table
extractRowRanges(
  textLines,
  pattern = "Date", 
  stopOffset = 2L,
)
#> [[1]]
#> [1] "1.1.,1" "2.1.,2"
#> 
#> [[2]]
#> [1] "3.1.,3" "4.1.,4"
#> 
#> [[3]]
#> [1] "5.1.,5" "6.1.,6"
#> 

extractRowRanges(
  dataFrame,
  pattern = "Date",
  column = "V1",
  stopOffset = 2L
)
#> [[1]]
#>   Date Value
#> 1 1.1.     1
#> 2 2.1.     2
#> 
#> [[2]]
#>   Date Value
#> 1 3.1.     3
#> 2 4.1.     4
#> 
#> [[3]]
#>   Date Value
#> 1 5.1.     5
#> 2 6.1.     6
#> 

# Extract sections after a header line
# startOffset = 2L: skip empty line after header "topic: ..."
textLines <- c(
  "topic: A",
  "",
  " a.1",
  " a.2",
  " a.3",
  "topic: B",
  "",
  " b.1",
  "topic: C",
  "",
  " c.1",
  " c.2"
)

extractRowRanges(
  textLines,
  pattern = "topic", 
  startOffset = 2L,
  nameByMatch = TRUE
)
#> $topic_A
#> [1] " a.1" " a.2" " a.3"
#> 
#> $topic_B
#> [1] " b.1"
#> 
#> $topic_C
#> [1] " c.1" " c.2"
#>