We know cut() function in R works for the purpose. For example,
tx0 <- c(9, 4, 6, 5, 3, 10, 5, 3, 5) x <- rep(0:8, tx0)
> x [1] 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 4 4 4 5 5 5 5 5 5 5 5 5 5 6 [39] 6 6 6 6 7 7 7 8 8 8 8 8
> table( cut(x, b = 8)) (-0.008,0.994] (0.994,2] (2,3] (3,4] (4,5] 9 4 6 5 13 (5,6] (6,7.01] (7.01,8.01] 5 3 5
In the cut() document, there is a note, saying
Instead oftable(cut(x, br))
,hist(x, br, plot = FALSE)
is more efficient and less memory hungry. Instead ofcut(*, labels = FALSE)
,findInterval()
is more efficient.But if you try as it said, you will the counts returned look different:> hist(x, 8, plot=F) $breaks [1] 0 1 2 3 4 5 6 7 8 $counts [1] 13 6 5 3 10 5 3 5
What's wrong?
Nothing is wrong. Just missed argument. "Whenbreaks
is specified as a single number, the range of the data is divided intobreaks
pieces of equal length, and then the outer limits are moved away by 0.1% of the range to ensure that the extreme values both fall within the break intervals. (Ifx
is a constant vector, equal-length intervals are created, one of which includes the single value.)"The conclusion is:when breaks is a vector, table( cut(x, b = 0:8,include.lowest = T)) is equal to hist(x, breaks=0:8, plot=F)$counts; when breaks is a single number, it's not.
No comments:
Post a Comment