One Tip Per Day: histogram

Showing posts with label histogram. Show all posts

Tuesday, February 05, 2013

counts numbers in a interval

Say I have a list of values, and I cut them by some break points, how do I know the number of values in each interval?

We know cut() function in R works for the purpose. For example,

tx0 <- c(9, 4, 6, 5, 3, 10, 5, 3, 5)
x <- rep(0:8, tx0)

> x
 [1] 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 4 4 4 5 5 5 5 5 5 5 5 5 5 6
[39] 6 6 6 6 7 7 7 8 8 8 8 8

> table( cut(x, b = 8))

(-0.008,0.994]      (0.994,2]          (2,3]          (3,4]          (4,5] 
             9              4              6              5             13 
         (5,6]       (6,7.01]    (7.01,8.01] 
             5              3              5

In the cut() document, there is a note, saying


Instead of table(cut(x, br)), hist(x, br, plot = FALSE) is more efficient and less memory hungry. Instead of cut(*, labels = FALSE), findInterval() is more efficient.

But if you try as it said, you will the counts returned look different:


> hist(x, 8, plot=F)
$breaks
[1] 0 1 2 3 4 5 6 7 8
$counts
[1] 13  6  5  3 10  5  3  5




What's wrong?





Nothing is wrong. Just missed argument. "When breaks is specified as a single number, the range of the data is divided into breaks pieces of equal length, and then the outer limits are moved away by 0.1% of the range to ensure that the extreme values both fall within the break intervals. (If x is a constant vector, equal-length intervals are created, one of which includes the single value.)"



The conclusion is:
when breaks is a vector, table( cut(x, b = 0:8,include.lowest = T)) is equal to hist(x, breaks=0:8, plot=F)$counts; when breaks is a single number, it's not.

Wednesday, March 28, 2012

scatter plot with density curve beside

def.par <- par(no.readonly = TRUE) # save default, for resetting...
xhist <- hist(d1, plot=FALSE)
yhist <- hist(d2, breaks=800, freq=F, plot=FALSE)
top <- max(c(xhist$counts, yhist$counts))
nf <- layout(matrix(c(2,0,1,3),2,2,byrow=TRUE), c(5,1), c(1,5), TRUE)
#layout.show(nf)

par(mar=c(4,4,1,1))
plot(d1, d2, xlab="d1 = log2(prediction_m1) - log2(observation)", ylab="d2 = log2(prediction_m2) - log2(observation)", col="#0000ff22", pch=16, cex=1.5, type='n')
abline(h=0,v=0, lty=2)
points(d1, d2, col="#0000ff22", pch=16, cex=1.5)
abline(a=DD,b=1, lty=2, col='red')
abline(a=-DD,b=1, lty=2, col='red')
library(calibrate)
textxy(d1[abs(d1-d2)>DD],d2[abs(d1-d2)>DD], as.character(m$gene_name[abs(d1-d2)>DD]), dcol='red')
a=round(lm(d2[abs(d1-d2)<DD]~d1[abs(d1-d2)<DD])$coefficients[1],3)
b=round(lm(d2[abs(d1-d2)<DD]~d1[abs(d1-d2)<DD])$coefficients[2],3)
abline(a=a,b=b, lty=1, col='red')
legend("topleft", c(paste("d2 = ",b,"*d1 ",ifelse(a>0,'+',''),a,sep=""),"m1: full model","m2: model without repressive marks"), bty='n')

par(mar=c(0,4,1,1))
hist(d1, breaks=800, freq=F, col='#0000ff22', border='#0000ff22', axes=FALSE, ylab="", xlab="",main="H1 cell line")
lines(density(d1, bw=1), col="black")
#barplot(xhist$counts, axes=FALSE, ylim=c(0, top), space=0, main="H1 cell line")
par(mar=c(4,0,1,1))
barplot(yhist$density, axes=FALSE, space=0, horiz=TRUE, col='#0000ff22', border='#0000ff22',)
#hist(d2, breaks=800, freq=F, col='#0000ff22', border='#0000ff22', axes=FALSE, ylab="", xlab="",main="")
lines(density(d2, bw=1)$y, y=seq(1,length(yhist$density), length.out=length(density(d2, bw=1)$y)),col="black")

par(def.par)

Thursday, September 01, 2011

Density curve of histogram plot in R

Ref: http://casoilresource.lawr.ucdavis.edu/drupal/book/export/html/23

To add density curve on a histogram, like the green curve above, use code below:

#plot the distribution
hist(slope, breaks=1000, freq=F, main=main, xlab="Slope Value (percent)", xlim=c(0,150), ylim=c(0,.05) )
lines(density(slope, bw=1), col="green")

Pages