Monday, October 05, 2015

A parallel and fast way to download multiple files

We can write a short script to download multiple files easily in command, e..g

for i in X Y Z; do wget http://www.site.com/folder/$i.url; done

If we want them to run in background (so that in a pseudo-parallel way), we can use -b option for wget.

But this is still not fast enough, and the parallel with wget -b won't give me any notice once it's done.

Here is my solution: axel + parallel

parallel -a urls.file axel

Let's say I want to download all brain sample bigwig files of H3K4me1 marks from the Roadmap Epigenomics project. Here is the code:

mark=H3K4me1
> url.$mark # to generate an empty file
for i in E071 E074 E068 E069 E072 E067 E073 E070 E082 E081;
do 
  echo http://egg2.wustl.edu/roadmap/data/byFileType/signal/consolidated/macs2signal/pval/$i-$mark.pval.signal.bigwig >> url.$mark
done
parallel -a url.$mark axel -n 5 

Regarding what's axel and how fast it is comparing to wget, please refer to this link: http://www.cyberciti.biz/tips/download-accelerator-for-linux-command-line-tools.html

1 comment: