findInterval {base}R Documentation

Find Interval Numbers or Indices

Description

Find the indices of x in vec, where vec must be sorted (non-decreasingly); i.e., if i <- findInterval(x,v), we have v[i[j]] <= x[j] < v[i[j] + 1] where v[0] := - Inf, v[N+1] := + Inf, and N <- length(vec). At the two boundaries, the returned index may differ by 1, depending on the optional arguments rightmost.closed and all.inside.

Usage

findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE)

Arguments

x numeric.
vec numeric, sorted (weakly) increasingly, of length N, say.
rightmost.closed logical; if true, the rightmost interval, vec[N-1] .. vec[N] is treated as closed, see below.
all.inside logical; if true, the returned indices are coerced into {1,...,N-1}, i.e., 0 is mapped to 1 and N to N-1.

Details

The function findInterval finds the index of one vector x in another, vec, where the latter must be non-decreasing. Where this is trivial, equivalent to apply( outer(x, vec, ">="), 1, sum), as a matter of fact, the internal algorithm uses interval search ensuring O(n * log(N)) complexity where n <- length(x) (and N <- length(vec)). For (almost) sorted x, it will be even faster, basically O(n).

This is the same computation as for the empirical distribution function, and indeed, findInterval(t, sort(X)) is identical to n * Fn(t; X[1],..,X[n]) where Fn is the empirical distribution function of X[1],..,X[n].

When rightmost.closed = TRUE, the result for x[j] = vec[N] ( = max(vec)), is N - 1 as for all other values in the last interval.

Value

vector of length length(x) with values in 0:N (and NA) where N <- length(vec), or values coerced to 1:(N-1) iff all.inside = TRUE (equivalently coercing all x values inside the intervals). Note that NAs are propagated from x, and Inf values are allowed in both x and vec.

Author(s)

Martin Maechler

See Also

approx(*, method = "constant") which is a generalization of findInterval(), ecdf for computing the empirical distribution function which is (up to a factor of n) also basically the same as findInterval(.).

Examples

N <- 100
X <- sort(round(rt(N, df=2), 2))
tt <- c(-100, seq(-2,2, len=201), +100)
it <- findInterval(tt, X)
tt[it < 1 | it >= N] # only first and last are outside range(X)

[Package base version 2.4.1 Index]