通过 g:Profiler 工具包进行功能丰富分析等
项目描述
gprofiler
项目描述
g:Profiler工具包的官方 Python 3 接口, 用于丰富分析功能(GO 和其他)术语、标识符命名空间之间的转换以及映射相关生物体中的同源基因。
它对 pandas 有一个可选的依赖项。
安装 gprofiler
安装 gprofiler 的推荐方法是使用 pip
pip install gprofiler-official
旧版
gprofiler 0.3.x-official 系列与系列不兼容1.0.x。我们更改了主要版本号以表示 API 中的重大更改。要安装以前的版本gprofiler-official,请使用命令
pip install gprofiler-official==0.3.5
工具:
要使用 g:Profiler 工具包中的任何工具,首先初始化 GProfiler 对象。
from gprofiler import GProfiler
gp = GProfiler(
user_agent='ExampleTool', #optional user agent
return_dataframe=True, #return pandas dataframe or plain python structures
)
g:GOSt(配置文件)
from gprofiler import GProfiler
gp = GProfiler(return_dataframe=True)
gp.profile(organism='hsapiens',
query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'])
输出:
source native name p_value significant description term_size query_size intersection_size effective_domain_size precision recall query parents
GO:BP GO:0048585 negative regulation of response to stimulus 0.004229 True "Any process that stops, prevents, or reduces ... 1610 7 6 17622 0.857143 0.003727 query_1 [GO:0048583, GO:0048519, GO:0050896]
GO:BP GO:0002224 toll-like receptor signaling pathway 0.016351 True "Any series of molecular signals generated as ... 133 7 3 17622 0.428571 0.022556 query_1 [GO:0002221]
GO:BP GO:0048486 parasympathetic nervous system development 0.026199 True "The process whose specific outcome is the pro... 19 7 2 17622 0.285714 0.105263 query_1 [GO:0048483, GO:0048731]
GO:BP GO:0034162 toll-like receptor 9 signaling pathway 0.038733 True "Any series of molecular signals generated as ... 23 7 2 17622 0.285714 0.086957 query_1 [GO:0002224]
GO:BP GO:0002221 pattern recognition receptor signaling pathway 0.039782 True "Any series of molecular signals generated as ... 179 7 3 17622 0.428571 0.016760 query_1 [GO:0002758]
CORUM CORUM:5669 PlexinA3-Nrp1 complex 0.049767 True PlexinA3-Nrp1 complex 2 2 1 3620 0.500000 0.500000 query_1 [CORUM:0000000]
CORUM CORUM:5759 PLXNA3-RANBPM complex 0.049767 True PLXNA3-RANBPM complex 2 2 1 3620 0.500000 0.500000 query_1 [CORUM:0000000]
source是数据源的代码native是其本机命名空间中丰富术语/功能类别的 ID。name是丰富术语的可读名称,description是更长的描述(如果有)。p_value是校正后的 p 值term_size,query_size,intersection_size,effective_domain_size是超几何检验的参数。query是查询的名称,如果在一次调用中进行了多个查询(例如gp.profile(query={'query1':['NR1H4'], 'query2':['NR1H4','TRIP12']})) ,则该名称很重要
设置参数no_evidences=False将添加列intersections(注释到术语并存在于查询中的基因列表)和列evidences(相交基因的 GO 证据代码列表的列表)
注意!该参数combined通过将不同查询的结果打包在一起,显着改变了输出结构。例如:
gp.profile(query={'query1':['NR1H4'], 'query2':['NR1H4','TRIP12']}, combined=True)
输出(截断):
source native name p_values description term_size query_sizes intersection_sizes effective_domain_size parents
GO:MF GO:1902122 chenodeoxycholic acid binding [0.024822026073022193, 0.04964405214614093] "Interacting selectively and non-covalently wi... 1 [1, 2] [1, 1] 17516 [GO:0032052, GO:0005496]
GO:MF GO:0035257 nuclear hormone receptor binding [1.0, 0.033391754400990514] "Interacting selectively and non-covalently wi... 154 [1, 2] [1, 2] 17516 [GO:0051427, GO:0061629]
GO:MF GO:0051427 hormone receptor binding [1.0, 0.04929258983003374] "Interacting selectively and non-covalently wi... 187 [1, 2] [1, 2] 17516 [GO:0005102]
g:Convert(转换)
from gprofiler import GProfiler
gp = GProfiler(return_dataframe=True)
gp.convert(organism='hsapiens',
query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'],
target_namespace='ENTREZGENE_ACC')
输出:
incoming converted n_incoming n_converted name description namespaces query
NR1H4 9971 1 1 NR1H4 nuclear receptor subfamily 1 group H member 4 ... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE query_1
TRIP12 9320 2 1 TRIP12 thyroid hormone receptor interactor 12 [Source... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE query_1
UBC 7316 3 1 UBC ubiquitin C [Source:HGNC Symbol;Acc:HGNC:12468] ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE query_1
FCRL3 115352 4 1 FCRL3 Fc receptor like 3 [Source:HGNC Symbol;Acc:HGN... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE query_1
PLXNA3 55558 5 1 PLXNA3 plexin A3 [Source:HGNC Symbol;Acc:HGNC:9101] ENTREZGENE,HGNC,WIKIGENE query_1
GDNF 2668 6 1 GDNF glial cell derived neurotrophic factor [Source... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE query_1
VPS11 55823 7 1 VPS11 VPS11, CORVET/HOPS core subunit [Source:HGNC S... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE query_1
PLXNA3 55558 5 1 PLXNA3 plexin A3 [Source:HGNC Symbol;Acc:HGNC:9101] ENTREZGENE,HGNC,WIKIGENE query_1
incoming列列出了输入基因,converted列出了目标命名空间中的基因(在本例中为 Entrez 基因登录号)。
g:正交(正交)
from gprofiler import GProfiler
gp = GProfiler(return_dataframe=True)
gp.orth(organism='hsapiens',
query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'],
target='mmusculus')
输出:
incoming converted ortholog_ensg n_incoming n_converted n_result name description namespaces
NR1H4 ENSG00000012504 ENSMUSG00000047638 1 1 1 Nr1h4 nuclear receptor subfamily 1, group H, member ... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
TRIP12 ENSG00000153827 ENSMUSG00000026219 2 1 1 Trip12 thyroid hormone receptor interactor 12 [Source... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
UBC ENSG00000150991 ENSMUSG00000008348 3 1 1 Ubc ubiquitin C [Source:MGI Symbol;Acc:MGI:98889] ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
FCRL3 ENSG00000160856 N/A 4 1 1 N/A N/A ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
PLXNA3 ENSG00000130827 ENSMUSG00000031398 5 1 1 Plxna3 plexin A3 [Source:MGI Symbol;Acc:MGI:107683] ENTREZGENE,HGNC,WIKIGENE
GDNF ENSG00000168621 ENSMUSG00000022144 6 1 1 Gdnf glial cell line derived neurotrophic factor [S... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
VPS11 ENSG00000160695 ENSMUSG00000032127 7 1 1 Vps11 VPS11, CORVET/HOPS core subunit [Source:MGI Sy... ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
incoming是输入基因,converted是
输入基因ortholog_ensg的规范 Ensembl ID, 是目标生物中直系同源基因的规范 Ensembl ID。
g:SNPense (snpense)
from gprofiler import GProfiler
gp = GProfiler(return_dataframe=True)
gp.snpense(query=['rs11734132', 'rs7961894', 'rs4305276', 'rs17396340'])
输出:
rs_id chromosome strand start end ensgs gene_names variants
rs11734132 -1 -1 [] [] {'intron_variant': 0, 'non_coding_transcript_v...
rs7961894 12 + 121927677 121927677 [ENSG00000158023] [WDR66] {'intron_variant': 3, 'non_coding_transcript_v...
rs4305276 2 + 240555596 240555596 [ENSG00000144504] [ANKMY1] {'intron_variant': 57, 'non_coding_transcript_...
rs17396340 1 + 10226118 10226118 [ENSG00000054523] [KIF1B] {'intron_variant': 8, 'non_coding_transcript_v...
rs_id是输入 rs 编号chromosome,strand,start并对end变化的位置进行编码ensgs并且gene_names是与 rs 编号相关的蛋白质编码基因列表。variants是预测的变异效应。
项目详情
关
gprofiler_official -1.0.0-py3-none-any.whl 的哈希值
| 算法 | 哈希摘要 | |
|---|---|---|
| SHA256 | c582baf728e5a6cddac964e4085ca385e082c4ef0279e3af1a16a9af07ab5395 |
|
| MD5 | a31adb48d09059958b1f48cf0d356879 |
|
| 布莱克2-256 | df1b5a87c1a1da8f601c00a0ce4dedb5aab8a5cad6a0f4a5062c4da22a045072 |