bam文件统计覆盖深度、靶向捕获效率是在基因组测序分析中经常用到的操作,之前也用过python、perl实现过但是速度比较慢,今天偶然发现了一个软件bamdst(https://github.com/shiquan/bamdst), 采用c语言编写,速度快,分析的类型也比较多,涉及到了mapping统计、靶向捕获统计、flanking区域统计、深度覆盖统计等。
用起来比较方便,具体使用可以参考github。
以下是这个软件分析的项目以及解释。
Item | Annotation |
[Total] Raw Reads (All reads) | All reads in the bam file(s). |
[Total] QC Fail reads | Reads number failed QC, this flag is marked by other software,like bwa. See flag in the bam structure. |
[Total] Raw Data(Mb) | Total reads data in the bam file(s). |
[Total] Paired Reads | Paired reads numbers. |
[Total] Mapped Reads | Mapped reads numbers. |
[Total] Fraction of Mapped Reads | Ratio of mapped reads against raw reads. |
[Total] Mapped Data(Mb) | Mapped data in the bam file(s). |
[Total] Fraction of Mapped Data(Mb) | Ratio of mapped data against raw data. |
[Total] Properly paired | Paired reads with properly insert size. See bam format protocol for details. |
[Total] Fraction of Properly paired | Ratio of properly paired reads against mapped reads |
[Total] Read and mate paired | Read (read1) and mate read (read2) paired. |
[Total] Fraction of Read and mate paired | Ratio of read and mate paired against mapped reads |
[Total] Singletons | Read mapped but mate read unmapped, and vice versa. |
[Total] Read and mate map to diff chr | Read and mate read mapped to different chromosome, usually because mapping error and structure variants. |
[Total] Read1 | First reads in mate paired sequencing |
[Total] Read2 | Mate reads |
[Total] Read1(rmdup) | First reads after remove duplications. |
[Total] Read2(rmdup) | Mate reads after remove duplications. |
[Total] forward strand reads | Number of forward strand reads. |
[Total] backward strand reads | Number of backward strand reads. |
[Total] PCR duplicate reads | PCR duplications. |
[Total] Fraction of PCR duplicate reads | Ratio of PCR duplications. |
[Total] Map quality cutoff value | Cutoff map quality score, this value can be set by -q. default is 20, because some variants caller like GATK only consider high quality reads. |
[Total] MapQuality above cutoff reads | Number of reads with higher or equal quality score than cutoff value. |
[Total] Fraction of MapQ reads in all reads | Ratio of reads with higher or equal Q score against raw reads. |
[Total] Fraction of MapQ reads in mapped reads | Ratio of reads with higher or equal Q score against mapped reads. |
[Target] Target Reads | Number of reads covered target region (specified by bed file). |
[Target] Fraction of Target Reads in all reads | Ratio of target reads against raw reads. |
[Target] Fraction of Target Reads in mapped reads | Ratio of target reads against mapped reads. |
[Target] Target Data(Mb) | Total bases covered target region. If a read covered target region partly, only the covered bases will be counted. |
[Target] Target Data Rmdup(Mb) | Total bases covered target region after remove PCR duplications. |
[Target] Fraction of Target Data in all data | Ratio of target bases against raw bases. |
[Target] Fraction of Target Data in mapped data | Ratio of target bases against mapped bases. |
[Target] Len of region | The length of target regions. |
[Target] Average depth | Average depth of target regions. Calculated by "target bases length of regions". |
[Target] Average depth(rmdup) | Average depth of target regions after remove PCR duplications. |
[Target] Coverage (>0x) | Ratio of bases with depth greater than 0x in target regions, which also means the ratio of covered regions in target regions. |
[Target] Coverage (>=4x) | Ratio of bases with depth greater than or equal to 4x in target regions. |
[Target] Coverage (>=10x) | Ratio of bases with depth greater than or equal to 10x in target regions. |
[Target] Coverage (>=30x) | Ratio of bases with depth greater than or equal to 30x in target regions. |
[Target] Coverage (>=100x) | Ratio of bases with depth greater than or equal to 100x in target regions. |
[Target] Target Region Count | Number of target regions. In normal practise,it is the total number of exomes. |
[Target] Region covered > 0x | The number of these regions with average depth greater than 0x. |
[Target] Fraction Region covered > 0x | Ratio of these regions with average depth greater than 0x. |
[Target] Fraction Region covered >= 4x | Ratio of these regions with average depth greater than or equal to 4x. |
[Target] Fraction Region covered >= 10x | Ratio of these regions with average depth greater than or equal to 10x. |
[Target] Fraction Region covered >= 30x | Ratio of these regions with average depth greater than or equal to 30x. |
[Target] Fraction Region covered >= 100x | Ratio of these regions with average depth greater than or equal to 100x. |
[flank] flank size | The flank size will be count. 200 bp in default. Oligos could also capture the nearby regions of target regions. |
[flank] Len of region (not include target region) | The length of flank regions (target regions will not be count). |
[flank] Average depth | Average depth of flank regions. |
[flank] flank Reads | The total number of reads covered the flank regions. Note: some reads covered the edge of target regions, will be count in flank regions also. |
[flank] Fraction of flank Reads in all reads | Ratio of reads covered in flank regions against raw reads. |
[flank] Fraction of flank Reads in mapped reads | Ration of reads covered in flank regions against mapped reads. |
[flank] flank Data(Mb) | Total bases in the flank regions. |
[flank] Fraction of flank Data in all data | Ratio of total bases in the flank regions against raw data. |
[flank] Fraction of flank Data in mapped data | Ratio of total bases in the flank regions against mapped data. |
[flank] Coverage (>0x) | Ratio of flank bases with depth greater than 0x. |
[flank] Coverage (>=4x) | Ratio of flank bases with depth greater than or equal to 4x. |
[flank] Coverage (>=10x) | Ratio of flank bases with depth greater than or equal to 10x. |
[flank] Coverage (>=30x) | Ratio of flank bases with depth greater than or equal to 30x. |
[flank] Coverage (>=100x) | Ratio of flank bases with depth greater than or equal to 100x. |