- Using perl:
- Using awk:
- if zipped file, using:
- get length statistics:
- or using fancy code from Jim Kent's utility:
So, one line code for all input fastq files would be:
find *.fq.gz -not -name \*raw\* -printf "zcat %p | awk '{if(NR%%4==2) print length(\$1)}' | textHistogram -maxBinCount=59 stdin \n" | sh
Note that you have to use double % to escape the % character for printf formatting control, just like in C. (Thanks for zencuke's answer here)
You will get something like this:
RNAseq.20E_library.result_ primary.clean.fa
large values truncated: need 35 bins or larger binSize than 1
Maximum value 53.000000
18 ********* 27730
19 ******************* 58997
20 ****************************** ****************************** 186919
21 ************************ 74536
22 ************** 45171
23 ************* 39107
24 *************** 45560
25 *********************** 70452
26 ****************************** ******************* 154030
27 ****************************** ****************************** 187704
28 ************************** 81198
29 ***** 17016
30 ** 5341
31 * 2439
32 0
33 1
34 0
35 0
36 0
37 0
38 0
39 1
<minVal or >= 40 173
RNAseq.Day_13_library.result_ primary.clean.fa
large values truncated: need 35 bins or larger binSize than 1
Maximum value 53.000000
18 ***** 22570
19 ********* 40335
20 ***************************** 127999
21 ********************* 95179
22 ****************** 79808
23 ********* 39596
24 ******* 29423
25 ************* 55438
26 ****************************** ******* 164868
27 ****************************** ****************************** 265722
28 *************************** 120353
29 ********* 38625
30 *** 14684
31 * 5214
32 0
33 0
34 1
<minVal or >= 35 140
No comments:
Post a Comment