Monday, March 26, 2018

Brace expansion

Let's say I want to download the chromHMM result for all brain tissues from Roadmap project. The brain tissues are numbers from E067-E074, E081, and E082 according to Anshul's table: https://docs.google.com/spreadsheets/d/1yikGx4MsO9Ei36b64yOy9Vb6oPC5IBGlFbYEt-N6gOM/edit#gid=15

So, what's the wildcard to match all the ten brain tissues? E067-E074, E081, and E082

Here is the trick:

wget http://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/E0{{67..74},81,81}_15_coreMarks_segments.bed

There are two types of brace expansion in Bash:
1. String list: {string1,string2,...,stringN}
2. Range list: {<START>..<END>}

You can combine them or nest them, for example
{{67..74},81,81} ==> 67,68,69,70,71,72,73,74,81,82
{A..C}{0..2}  ==> A0, B0, C0, A1, B1, C1, A2, B2, C2
{{A..C},{0..2}}  ==> A, B, C, 0, 1, 2

In the new Bash 4.0, you can even do more, for example
- padding them, e.g. {0001..5} ==> 0001 0002 0003 0004 0005
- specify an increment using ranges, e.g. {1..10..2} ==> 1 3 5 7 9; {a..z..3} ==> a d g j m p s v y
- Using a prefix: 0{1..5} ==> 01 02 03 04 05
- or postfix: ---{A..E}---  ==> ---A--- ---B--- ---C--- ---D--- ---E---


Anyone understand the fun of code below:

function braceify {
    [[ $1 == +([[:digit:]]) ]] || return
    typeset -a a
    read -ra a < <(factor "$1")
    eval "echo $(printf '{$(printf ,%%.s {1..%s})}' "${a[@]:1}")"
}

printf 'eval printf "$arg"%s' "$(braceify 1000000)"

See explain here: http://wiki.bash-hackers.org/syntax/expansion/brace

No comments:

Post a Comment