Here it is:
awk '/^>/,/############/' scaffold.gff
the grammer is awk '/from/, /to/' filename, to get lines from the one containing "from" to the line containing "to". From my tests, I did not get very clear when there are multiple "from" and/or "to". So, be careful!
Another option is:
awk '/^>/ {p=1}; p==1 {print}' scaffold.gff
So, the final code for catenate all extracted fasta sequences is:
find -name scaffold\*.gff -exec sh -c "awk '/^>/ {p=1}; p==1 {print}' {} >> Manduca_gff_files_version_1.scaffold.fa" \; &
for i in `find -name scaffold\*.gff | sort`; do awk '/^>/ {p=1}; p==1 {print}' $i >> Manduca_gff_files_version_1.scaffold.fa; done
- http://www.unix.com/shell-programming-scripting/6959-split-file-specified-string.html
- http://www.bioperl.org/wiki/Getting_Fasta_sequences_from_a_GFF
No comments:
Post a Comment