samtools fastq can convert bam to fastq format, e.g. samtools fastq input.bam -o output.fastq
The output file will be automatically compressed if the file names have a .gz or .bgzf extension, e.g.
samtools fastq input.bam -o output1.fastq.gz
Alternatively, you can also pipe the stdout to compressor explicitly, e.g.
samtools fastq input.bam | gzip > output2.fastq.gz
Interestingly, I noticed that output2.fastq.gz is significantly smaller than output1.fastq.gz, even though the uncompressed file content is the same.
Actually, this is because of the different default compression ratios used in samtools and gzip.
In samtools fastq, its default compression level is 1 (out of [0..9]) while gzip's default compression level is 6 (out of [1..9]).
Usage:
$ samtools fastq
Usage: samtools fastq [options...] <in.bam>
Description:
Converts a SAM, BAM or CRAM into either FASTQ or FASTA format depending on the command invoked.
Options:
-0 FILE write reads designated READ_OTHER to FILE
-1 FILE write reads designated READ1 to FILE
-2 FILE write reads designated READ2 to FILE
-o FILE write reads designated READ1 or READ2 to FILE
note: if a singleton file is specified with -s, only
paired reads will be written to the -1 and -2 files.
-f INT only include reads with all of the FLAGs in INT present [0]
-F INT only include reads with none of the FLAGS in INT present [0x900]
-G INT only EXCLUDE reads with all of the FLAGs in INT present [0]
-n don't append /1 and /2 to the read name
-N always append /1 and /2 to the read name
-O output quality in the OQ tag if present
-s FILE write singleton reads designated READ1 or READ2 to FILE
-t copy RG, BC and QT tags to the FASTQ header line
-T TAGLIST copy arbitrary tags to the FASTQ header line
-v INT default quality score if not given in file [1]
-i add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG)
-c compression level [0..9] to use when creating gz or bgzf fastq files [1]
--i1 FILE write first index reads to FILE
--i2 FILE write second index reads to FILE
--barcode-tag TAG Barcode tag [default: BC]
--quality-tag TAG Quality tag [default: QT]
--index-format STR How to parse barcode and quality tags
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
-@, --threads INT
Number of additional threads to use [0]
--verbosity INT
Set level of verbosity
The files will be automatically compressed if the file names have a .gz or .bgzf extension.
GZIP(1) GZIP(1)
NAME
gzip, gunzip, zcat - compress or expand files
SYNOPSIS
gzip [ -acdfhlLnNrtvV19 ] [-S suffix] [--rsyncable] [ name ... ]
gunzip [ -acfhlLnNrtvV ] [-S suffix] [ name ... ]
zcat [ -fhLV ] [ name ... ]
OPTIONS
-# --fast --best
Regulate the speed of compression using the specified digit #, where -1 or --fast indicates the fastest compression method (less compression) and -9 or --best indicates the slowest compression method (best compression). The default compression level is -6 (that is, biased towards high compression at expense of speed).
No comments:
Post a Comment