Tuesday, July 24, 2012

get UCSC images for a list of regions in batch


Here is my working R code for the task. It can be simplified as 3 lines.


# example of controling individual track
# example of controling via session

# read regions
toPlot=read.table("../data/piRNA.clusters.coordinates.cpg.bed", header=T)

## paralle version
library(multicore)
# mclapply(1:nrow(toPlot), function(i) screenshotUCSC(theURL, "", as.character(toPlot$chr[i]), toPlot$start[i]-2000, toPlot$end[i]+1999, paste("region_", i, "_", toPlot$name[i],".pdf", sep="")), mc.cores=10)

# anti-robot version 
# UCSC Policy: Program-driven use of this software is limited to a maximum of one hit every 15 seconds and no more than 5,000 hits per day.
for(i in 1:nrow(toPlot)){
    screenshotUCSC(theURL, "", as.character(toPlot$chr[i]), toPlot$start[i]-2000, toPlot$end[i]+1999, paste("region_", i, "_", toPlot$name[i],".pdf", sep=""))
    Sys.sleep(5) 
}

# merge script
mergePDF("piRNAs_ucsc_screeshot.pdf", list.files(pattern="region_.*.pdf"))
try(system("rm region_.*.pdf"))

####### lib ############

# Here is an R script wrote by Aaron Statham which saves UCSC to pdfs -
# you can choose which genome and tracks to display by altering the 'url' parameter. 'trackfile' is the url of a file describing the custom tracks (beds/bigwigs) to display
mergePDF <- function(output="merged.pdf", sourcefiles=c("source1.pdf","source2.pdf","source3.pdf"))
{
    # create the command string and call the command using system()
    command=paste("gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite",paste("-sOutputFile",output, sep="="), paste(sourcefiles, collapse=" "),sep=" ")
    try(system(command))
}


screenshotUCSC <- function(url, trackfile, chr, start, end, filename) {
        oldpen <- options("scipen")
        options(scipen=100)
        temp <- readLines(paste(url, "&hgt.customText=", trackfile, "&position=",chr,":",start,"-",end, sep=""))
        #cat(temp,"\n")
        pdfurl <- paste("http://genome-preview.ucsc.edu/trash",gsub(".*trash","",gsub(".pdf.*","",temp[grep(".pdf", temp, fixed=TRUE)][1])), ".pdf", sep="")
        cat(pdfurl,"\n");
        options(scipen=oldpen)
        download.file(pdfurl, filename, mode="wb", quiet=TRUE)
}

No comments:

Post a Comment