baseqDrops - 处理 Drop-seq、10X(3prime) 和 inDrop RNA-seq 数据集

处理 Drop-seq、10X(3prime) 和 inDrop RNA-seq 数据集

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
Topic
- Software Development :: Build Tools

项目描述

# baseqDrops
用于处理 10X、indrop 和 Drop-seq 数据集的通用管道。

## 安装baseqDrops
我们需要python3和一个名为：baseqDrops的包，可以通过以下方式安装：

pip install baseqDrops

安装后，您将有一个可运行的命令`baseqDrops`

建议计算机或服务器的内存> = 30Gb并且CPU 内核 >=8 以实现高效处理；

## 配置文件

需要以下软件或资源：

+ `star`：STAR软件，用于快速将RNA-Seq数据与基因组对齐；
+ `samtools`：用于排序对齐的 bam 文件（版本 >=1.6）；
+ `whitelistDir`：indrop 和 10X 的条码白名单文件应放在 whitelistDir 下。这些文件可以从 https://github.com/beiseq/baseqDrops/tree/master/whitelist 下载；
+ `cellranger_ref_<genome>`：读取比对和标记基因的关键过程是从开源 cellranger 管道（https://github.com/10XGenomics/cellranger）中得到启发和借鉴的。基因组索引和转录组的参考资料可以从 https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest 下载。
在配置文件中，cellranger 引用的目录命名为 `cellranger_<genome>`。

运行命令时，配置记录在名为 `config_drops.ini` 的文件中：

[Drops]
samtools = /path/to/samtools
star = /path/to/STAR
whitelistDir = /path/to/whitelist_file_directory
cellranger_ref_hg38 = /path/to/reference/refdata-cellranger-GRCh38-1.2.0/

## 获取帮助信息

baseqDrops run-pipe --help

## 进程步骤

1. `Cell Barcode Counting`：统计数据集中存在的条码。这将生成一个名为：barcode_count_<sample>.csv 的文件；
2. `Cell Barcode Correction, Aggregating and Filtering`：修正1bp内错配的cellbarcode然后聚合，通过最小reads（默认5000）过滤barcode，生成一个有效的barcode列表，命名为barcode_stats_<sample>。 csv;
3.`Split the Reads of Valid Cell Barcodes`：将raw pair-end raw reads按照barcode的2bp前缀拆分成16个单端文件进行多处理；barcode_splits 文件夹包含以下文件： split.<sample>.<AA|AT|AC|AG...|GG>.fq;
4. `Alignment to Genome using STAR`：多个（由--parallel/-p定义）STAR程序同时运行，结果会在名为star_align的文件夹中；bam 文件进一步按序列头排序；
5.`Reads Tagging`：将reads比对位置标记为对应的基因名称；
6.`Generating Expression Table`：生成UMI量化的表达式表（Result.UMIs.<sample>.txt）和raw read count（Result.Reads.<sample>.txt）；

## 运行管道

应提供这些参数：（或运行：baseqDrops run-pipe --help 以获取信息）

+ `--outdir/-d`：输出路径（默认为./，结果将存储在./<name>中）；
+ `--config`：配置文件的路径；
+ `--genome/-g`: 基因组版本 [hg38/mm38/hgmm];
+ `--protocol/-p`: [10X|indrop|dropseq];
+ `--minreads`：条码所需的最小读取次数；
+ `--name/-n` : 样本名称，将创建一个<outdir>/<name>文件夹，作为主目录；
+ `--parallel` : STAR 和 tagging 进程同时运行的数量（默认为 4，更大的并行数需要更多的内存）；
+ `--fq1/-1`: Pair-end 1 排序文件的路径；
+ `--fq2/-2`: Pair-end 2 排序文件的路径；
+ `--top_million_reads`：对于大数据集，可以选择使用部分数据快速查看，超过N百万次的reads将被跳过；

如果您的数据是人类来源的并且 `cellranger_ref_hg38` 已在配置文件中定义，您可以运行：

baseqDrops run-pipe --config ./config_drops.ini -g hg38 -p 10X --minreads 1000 -n 10X_test -1 10x_1。 1.fq.gz -2 10x.2.fq.gz -d ./

##单步运行

我们还提供了逐步运行管道的方式，所有参数都应如上所述提供，额外的“- -step" 应该提供，例如：

baseqDrops run-pipe --config ./config.ini -g hg38 -p dropseq --minreads 1000 -n dropseq2 --top_million_reads 20 -1 dropseq_1.1.fq.gz -2 dropeq.2.fq.gz --step count -d ./

列出的步骤：

+ `Cell Barcode Counting`: --step count
+ `Cell Barcode Correction, Aggregating and Filtering`: --step stats
+ `Split the Reads of Valid Cell Barcodes`: --step split
+ `Alignment to使用 STAR 的基因组：--step star
+ `Reads Tagging`：--step tagging
+ `Generating Expression Table`：--step table

## 联系方式

如有任何问题，请发送电子邮件至：friedpine@gmail.com

项目详情

发展状况
- 3 - 阿尔法
目标听众
- 开发者
执照
- OSI 批准 :: MIT 许可证
编程语言
- Python :: 3.6
话题
- 软件开发 :: 构建工具

发布历史发布通知| RSS订阅

这个版本

2.0

2019 年 2 月 2 日

1.5

2018 年 11 月 21 日

下载文件

下载适用于您平台的文件。如果您不确定要选择哪个，请了解有关安装包的更多信息。

源分布

baseqDrops-2.0.tar.gz （20.4 kB 查看哈希）

已上传 2019 年 2 月 2 日 source

内置分布

baseqDrops-2.0-py2.py3-none-any.whl （33.4 kB 查看哈希）

已上传 2019 年 2 月 2 日 py2 py3

baseqDrops -2.0.tar.gz 的哈希值

baseqDrops-2.0.tar.gz 的哈希值
算法	哈希摘要
SHA256	`775f40d1e4f394e3b48d44ed47cdbbff7aeed1117996f62900099f147cf6b82c`
MD5	`777836e05a54391ac0ffb5bd1b5b167b`
布莱克2-256	`c4c25b1323bb5da55797053b7b533b5e864a38c8363d39df365718c8beddccf3`

baseqDrops -2.0-py2.py3-none-any.whl 的哈希值

baseqDrops-2.0-py2.py3-none-any.whl 的哈希值
算法	哈希摘要
SHA256	`ccfbdb8f99f41fd898c09a40432c4605fc11f7fec95e32a16f5be1f11c7357ee`
MD5	`b701d3e4f3e02574f679e307d3ee2411`
布莱克2-256	`4f58b668bad105d17ab900757568eeaaeb0f4a824a538cfd8a21994121a673c9`

baseqDrops 2.0

导航

项目链接

统计数据

Meta

Maintainers

分类

项目描述

项目详情

项目链接

统计数据

元

维护者

分类器

发布历史发布通知| RSS订阅

下载文件

源分布

内置分布

baseqDrops 2.0

导航

项目链接

统计数据

Meta

Maintainers

分类

项目描述

项目详情

项目链接

统计数据

元

维护者

分类器

发布历史 发布通知| RSS订阅

下载文件

源分布

内置分布

发布历史发布通知| RSS订阅