计算 Oxford Nanopore 测序数据和比对的统计数据
项目描述
纳米统计
从 fastq、bam 或 albacore 测序摘要格式的长读测序数据集中计算各种统计数据。
安装
NanoStat 是为 Python3 编写的,不适用于 Python2.7 或更早版本。
pip install nanostat
或者
conda install -c bioconda nanostat
用法
NanoStat [-h] [-v] [-o OUTDIR] [-p PREFIX] [-n NAME] [-t N]
[--barcoded] [--readtype {1D,2D,1D2}]
(--fastq file [file ...] | --fasta file [file ...] | --summary file [file ...] | --bam file [file ...])
Calculate statistics of long read sequencing dataset.
General options:
-h, --help show the help and exit
-v, --version Print version and exit.
-o, --outdir OUTDIR Specify directory in which output has to be created.
-p, --prefix PREFIX Specify an optional prefix to be used for the output file.
-n, --name NAME Specify a filename/path for the output, stdout is the default.
-t, --threads N Set the allowed number of threads to be used by the script.
--tsv, Print the output in a tab-separated-values format
Input options.:
--barcoded Use if you want to split the summary file by barcode
--readtype {1D,2D,1D2}
Which read type to extract information about from summary. Options are 1D, 2D,
1D2
Input data sources, one of these is required.:
--fastq file [file ...]
Data is in one or more (compressed) fastq file(s).
--fasta file [file ...]
Data is in one or more (compressed) fasta file(s).
--summary file [file ...]
Data is in one or more (compressed) summary file(s)generated by albacore or guppy.
--bam file [file ...]
Data is in one or more sorted bam file(s).
EXAMPLES:
NanoStat --fastq reads.fastq.gz --outdir statreports
NanoStat --summary sequencing_summary1.txt sequencing_summary2.txtsequencing_summary3.txt --readtype 1D2
NanoStat --bam alignment.bam alignment2.bam
例子
NanoStat --fastq reads.fastq.gz --outdir statreports
NanoStat --summary sequencing_summary1.txt sequencing_summary2.txt sequencing_summary3.txt --readtype 1D2
NanoStat --bam alignment.bam alignment2.bam
示例输出
General summary:
Active channels: 502
Mean read length: 8593.5
Mean read quality: 10.8
Median read length: 5168.0
Median read quality: 11.2
Number of reads: 408254
Read length N50: 15141
Total bases: 3508315665
Number, percentage and megabases of reads above quality cutoffs
>Q5: 406428 (99.6%) 3502.0Mb
>Q7: 395016 (96.8%) 3234.5Mb
>Q10: 305509 (74.8%) 2475.9Mb
>Q12: 87903 (21.5%) 422.9Mb
>Q15: 124 (0.0%) 0.1Mb
Top 5 highest mean basecall quality scores and their read lengths
1: 16.2 (407; a803bcfc-9d7a-4a87-84e4-1a0296113700)
2: 16.2 (880; f5fee32a-9471-4a68-8697-a71887599757)
3: 16.1 (729; 3ea23a79-641e-41ab-bb5b-c22609977136)
4: 16.1 (1057; b0cef5fd-c5e1-4539-9591-b7376b2953e8)
5: 15.8 (841; 3d4f8075-6151-4147-bdc3-e5d53ff66084)
Top 5 longest reads and their mean basecall quality score
1: 255821 (6.8; 7d069f04-d4db-4f12-a1b9-c19d70993492)
2: 254573 (7.1; a245999b-de28-4720-a8c3-0d5cbb26e473)
3: 253711 (7.0; a84b106b-13d3-4bfa-b548-71a47c9032c3)
4: 245784 (7.0; 2a60ee11-8793-46c1-a3d9-667bc4e70405)
5: 245776 (7.1; 72a8cf33-75fd-4c07-8a4c-7516b690938b)
我欢迎所有建议、错误报告、功能请求和贡献。请留下问题或打开拉取请求。我通常会在一天内回复,或者很少在几天内回复。
引文
如果您使用此工具,请考虑引用我们的出版物。