Friday, November 04, 2011

match vs. %in%

match and %in% are two very commonly-used function in R. So, what's the difference of them?

First, how to use them -- (copy from R manual)

match returns a vector of the positions of (first) matches of its first argument in its second.
%in% is a more intuitive interface as a binary operator, which returns a logical vector indicating if there is a match or not for its left operand. 
match(x, table, nomatch = NA_integer_, incomparables = NULL)x %in% table
Examples: 
> a
[1] 1 1 0 1 5 1 2 4
> b
 [1] 10  9  8  7  6  5  4  3  2  1
> match(a,b)
[1] 10 10 NA 10  6 10  9  7
> a %in% b
[1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

So, if two vectors are overlapped like
a ---------------
b         -----------------------------
To get the overlapped part in order of a, use a[a %in% b], even though there are duplicates in the overlapped part. However, this does not work for match, since match() only returns the first match of a in b. For example, 
> match(b,a)
 [1] NA NA NA NA NA  5  8 NA  7  1
> match(b,a, nomatch=0)
 [1] 0 0 0 0 0 5 8 0 7 1
> a[match(b,a, nomatch=0)]
[1] 5 4 2 1
even using 'nomatch=0', the final command still returns 4 elements, not the overlapped ones.

No comments:

Post a Comment