Wednesday, May 29, 2013

something I learned from Bo

Here are some shinning tips in his smallRNA pipeline:

1. parallel single-thread commands using parafly.

Parafly is one useful program in the Trinity package (a smart RNAseq assembly program). Here is example of how to use it:

echo "mycommand1 > output1" >  $paraFile
echo "mycommand2 > output2" >>  $paraFile
echo "mycommand3 > output3" >>  $paraFile
...
ParaFly -c $paraFile -CPU $CPU

2. resuming pipeline by tracking the status of each step

[ ! -f .status.${STEP}.stepname ] && \
mycommand1 && \
mycommand2 && \
        mycommand3 && \
        touch .status.${STEP}.stepname 
STEP=$((STEP+1))

Using a step number and a status file to record the success of each step. Only if all commands are successfully run, a new status file will be generated. This ideal was also used in Trinity.

No comments:

Post a Comment