比较牛津纳米孔测序数据和比对的运行
项目描述
NanoComp
比较多次运行的长读长测序数据和比对。创建长度、质量和百分比同一性的小提琴图或箱线图,并创建动态的重叠读取长度直方图和累积产量图。
从 1.1.0 版开始,NanoComp 还将为动态 html 绘图创建静态 png 图像,因为后者对于大型数据集可能会变得非常大且加载缓慢。但是,这需要您安装orca。如果没有 orca,脚本仍然可以工作,但不会创建动态图的静态副本。
安装
pip install NanoComp
该脚本是为 Python3 编写的。
用法
NanoComp [-h] [-v] [-t THREADS] [-o OUTDIR] [-p PREFIX] [--verbose]
[--raw] [--readtype {1D,2D,1D2}] [--barcoded]
[--split_runs TSV_FILE]
[-f {eps,jpeg,jpg,pdf,pgf,png,ps,raw,rgba,svg,svgz,tif,tiff}]
[-n names [names ...]] [--plot {violin,box}] [--title TITLE]
(--fastq files [files ...] | --summary files [files ...] | --bam files [files ...])
General options:
-h, --help show the help and exit
-v, --version Print version and exit.
-t, --threads THREADS
Set the allowed number of threads to be used by the script
-o, --outdir OUTDIR Specify directory in which output has to be created.
-p, --prefix PREFIX Specify an optional prefix to be used for the output files.
--verbose Write log messages also to terminal.
--raw Store the extracted data in tab separated file.
Options for filtering or transforming input prior to plotting:
--readtype {1D,2D,1D2}
Which read type to extract information about from summary. Options are 1D, 2D,
1D2
--barcoded Barcoded experiment in summary format, splitting per barcode.
--split_runs TSV_FILE
File: Split the summary on run IDs and use names in tsv file. Mandatory header
fields are 'NAME' and 'RUN_ID'.
Options for customizing the plots created:
-f, --format {'png'(default),'jpg','jpeg','webp','svg','pdf','eps','json'}
Specify the output format of the plots. JSON output allows for customisation by the end-user after plotting the figures (https://plotly.com/python-api-reference/generated/plotly.io.read_json.html).
-n, --names names Specify the names to be used for the datasets.
-c, --colors colors Specify the colors to be used for the datasets.
--plot {violin,box,ridge,false}
Which plot type to use: 'box', 'violin' (default), 'ridge' (joyplot) or 'false' (no plots)
--title TITLE Add a title to all plots, requires quoting if using spaces
Input data sources, one of these is required.:
--fastq files [files ...]
Data is in (compressed) fastq format.
--fasta files [files ...]
Data is in (compressed) fasta format.
--summary files [files ...]
Data is in (compressed) summary files generated by albacore or guppy.
--bam files [files ...]
Data is in sorted bam files.
例子
NanoComp --bam alignment1.bam alignment2.bam alignment3.bam --outdir compare-runs
NanoComp --fastq reads1.fastq.gz reads2.fastq.gz reads3.fastq.gz reads4.fastq.gz --names run1 run2 run3 run4
示例输出
我欢迎所有建议、错误报告、功能请求和贡献。请留下问题或打开拉取请求。我通常会在一天内回复,或者很少在几天内回复。
引文
如果您使用此工具,请考虑引用我们的出版物。