closest {MsCoreUtils}R Documentation

Relaxed Value Matching

Description

These functions offer relaxed matching of one vector in another. In contrast to the similar match() and %in% functions they just accept numeric arguments but have an additional tolerance argument that allows relaxed matching.

Usage

closest(
  x,
  table,
  tolerance = Inf,
  ppm = 0,
  duplicates = c("keep", "closest", "remove"),
  nomatch = NA_integer_
)

common(
  x,
  table,
  tolerance = Inf,
  ppm = 0,
  duplicates = c("keep", "closest", "remove")
)

join(x, y, tolerance = 0, ppm = 0, type = c("outer", "left", "right", "inner"))

Arguments

x

numeric, the values to be matched.

table

numeric, the values to be matched against. In contrast to match() table has to be sorted in increasing order.

tolerance

numeric, accepted tolerance. Could be of length one or the same length as table.

ppm

numeric(1) representing a relative, value-specific parts-per-million (PPM) tolerance that is added to tolerance.

duplicates

character(1), how to handle duplicated matches.

nomatch

numeric(1), if the difference between the value in x and table is larger than tolerance nomatch is returned.

y

numeric, the values to be joined. Should be sorted.

type

character(1), defines how x and y should be joined. See details for join.

Details

For closest/common the tolerance argument could be set to 0 to get the same results as for match()/%in%. If it is set to Inf (default) the index of the closest values is returned without any restriction.

It is not guaranteed that there is a one-to-one matching for neither the x to table nor the table to x matching.

If multiple elements in x match a single element in table all their corresponding indices are returned if duplicates="keep" is set (default). This behaviour is identical to match(). For duplicates="closest" just the closest element in x gets the corresponding index in table and for duplicates="remove" all elements in x that match to the same element in table are set to nomatch.

If a single element in x matches multiple elements in table the closest is returned for duplicates="keep" or duplicates="duplicates" (keeping multiple matches isn't possible in this case because the implementation relies on findInterval). If the differences between x and the corresponding matches in table are identical the lower index (the smaller element in table) is returned. For duplicates="remove" all multiple matches are returned as nomatch as above.

join: joins two numeric vectors by mapping values in x with values in y and vice versa if they are similar enough (provided the tolerance and ppm specified). The function returns a matrix with the indices of mapped values in x and y. Parameter type allows to define how the vectors will be joined: type = "left": values in x will be mapped to values in y, elements in y not matching any value in x will be discarded. type = "right": same as type = "left" but for y. type = "outer": return matches for all values in x and in y. type = "inner": report only indices of values that could be mapped.

Value

closest returns an integer vector of the same length as x giving the closest position in table of the first match or nomatch if there is no match.

common returns a logical vector of length x that is TRUE if the element in x was found in table. It is similar to %in%.

join returns a matrix with two columns, namely x and y, representing the index of the values in x matching the corresponding value in y (or NA if the value does not match).

Note

closest will replace all NA values in x by nomatch (that is identical to the behaviour of match).

join is based on closest(x, y, tolerance, duplicates = "closest"). That means for multiple matches just the closest one is reported.

Author(s)

Sebastian Gibb

See Also

match()

%in%

Other grouping/matching functions: bin()

Examples

## Define two vectors to match
x <- c(1, 3, 5)
y <- 1:10

## Compare match and closest
match(x, y)
closest(x, y)

## If there is no exact match
x <- x + 0.1
match(x, y) # no match
closest(x, y)

## Some new values
x <- c(1.11, 45.02, 556.45)
y <- c(3.01, 34.12, 45.021, 46.1, 556.449)

## Using a single tolerance value
closest(x, y, tolerance = 0.01)

## Using a value-specific tolerance accepting differences of 20 ppm
closest(x, y, tolerance = ppm(y, 20))

## Same using 50 ppm
closest(x, y, tolerance = ppm(y, 50))

## Sometimes multiple elements in `x` match to `table`
x <- c(1.6, 1.75, 1.8)
y <- 1:2
closest(x, y, tolerance = 0.5)
closest(x, y, tolerance = 0.5, duplicates = "closest")
closest(x, y, tolerance = 0.5, duplicates = "remove")

## Are there any common values?
x <- c(1.6, 1.75, 1.8)
y <- 1:2
common(x, y, tolerance = 0.5)
common(x, y, tolerance = 0.5, duplicates = "closest")
common(x, y, tolerance = 0.5, duplicates = "remove")

## Join two vectors
x <- c(1, 2, 3, 6)
y <- c(3, 4, 5, 6, 7)

jo <- join(x, y, type = "outer")
jo
x[jo$x]
y[jo$y]

jl <- join(x, y, type = "left")
jl
x[jl$x]
y[jl$y]

jr <- join(x, y, type = "right")
jr
x[jr$x]
y[jr$y]

ji <- join(x, y, type = "inner")
ji
x[ji$x]
y[ji$y]

[Package MsCoreUtils version 1.0.0 Index]