Skip to main content

通过 g:Profiler 工具包进行功能丰富分析等

项目描述

gprofiler

项目描述

g:Profiler工具包的官方 Python 3 接口, 用于丰富分析功能(GO 和其他)术语、标识符命名空间之间的转换以及映射相关生物体中的同源基因。

它对 pandas 有一个可选的依赖项。

安装 gprofiler

安装 gprofiler 的推荐方法是使用 pip

pip install gprofiler-official

旧版

gprofiler 0.3.x-official 系列与系列不兼容1.0.x。我们更改了主要版本号以表示 API 中的重大更改。要安装以前的版本gprofiler-official,请使用命令

pip install gprofiler-official==0.3.5

工具:

要使用 g:Profiler 工具包中的任何工具,首先初始化 GProfiler 对象。

from gprofiler import GProfiler
gp = GProfiler(
    user_agent='ExampleTool', #optional user agent
    return_dataframe=True, #return pandas dataframe or plain python structures    
)

g:GOSt(配置文件)

from gprofiler import GProfiler

gp = GProfiler(return_dataframe=True)
gp.profile(organism='hsapiens',
            query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'])

输出:

source      native                                            name   p_value  significant                                        description  term_size  query_size  intersection_size  effective_domain_size  precision    recall    query                               parents
GO:BP  GO:0048585     negative regulation of response to stimulus  0.004229         True  "Any process that stops, prevents, or reduces ...       1610           7                  6                  17622   0.857143  0.003727  query_1  [GO:0048583, GO:0048519, GO:0050896]
GO:BP  GO:0002224            toll-like receptor signaling pathway  0.016351         True  "Any series of molecular signals generated as ...        133           7                  3                  17622   0.428571  0.022556  query_1                          [GO:0002221]
GO:BP  GO:0048486      parasympathetic nervous system development  0.026199         True  "The process whose specific outcome is the pro...         19           7                  2                  17622   0.285714  0.105263  query_1              [GO:0048483, GO:0048731]
GO:BP  GO:0034162          toll-like receptor 9 signaling pathway  0.038733         True  "Any series of molecular signals generated as ...         23           7                  2                  17622   0.285714  0.086957  query_1                          [GO:0002224]
GO:BP  GO:0002221  pattern recognition receptor signaling pathway  0.039782         True  "Any series of molecular signals generated as ...        179           7                  3                  17622   0.428571  0.016760  query_1                          [GO:0002758]
CORUM  CORUM:5669                           PlexinA3-Nrp1 complex  0.049767         True                              PlexinA3-Nrp1 complex          2           2                  1                   3620   0.500000  0.500000  query_1                       [CORUM:0000000]
CORUM  CORUM:5759                           PLXNA3-RANBPM complex  0.049767         True                              PLXNA3-RANBPM complex          2           2                  1                   3620   0.500000  0.500000  query_1                       [CORUM:0000000]
  • source是数据源的代码
  • native是其本机命名空间中丰富术语/功能类别的 ID。
  • name是丰富术语的可读名称,description是更长的描述(如果有)。
  • p_value是校正后的 p 值
  • term_size, query_size, intersection_size,effective_domain_size是超几何检验的参数。
  • query是查询的名称,如果在一次调用中进行了多个查询(例如gp.profile(query={'query1':['NR1H4'], 'query2':['NR1H4','TRIP12']})) ,则该名称很重要

设置参数no_evidences=False将添加列intersections(注释到术语并存在于查询中的基因列表)和列evidences(相交基因的 GO 证据代码列表的列表)

注意!该参数combined通过将不同查询的结果打包在一起,显着改变了输出结构。例如:

gp.profile(query={'query1':['NR1H4'], 'query2':['NR1H4','TRIP12']}, combined=True)

输出(截断):

source      native                                               name                                     p_values                                        description  term_size query_sizes intersection_sizes  effective_domain_size                                           parents
GO:MF  GO:1902122                      chenodeoxycholic acid binding  [0.024822026073022193, 0.04964405214614093]  "Interacting selectively and non-covalently wi...          1      [1, 2]             [1, 1]                  17516                          [GO:0032052, GO:0005496]
GO:MF  GO:0035257                   nuclear hormone receptor binding                  [1.0, 0.033391754400990514]  "Interacting selectively and non-covalently wi...        154      [1, 2]             [1, 2]                  17516                          [GO:0051427, GO:0061629]
GO:MF  GO:0051427                           hormone receptor binding                   [1.0, 0.04929258983003374]  "Interacting selectively and non-covalently wi...        187      [1, 2]             [1, 2]                  17516                                      [GO:0005102]

g:Convert(转换)

from gprofiler import GProfiler

gp = GProfiler(return_dataframe=True)
gp.convert(organism='hsapiens',
            query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'],
            target_namespace='ENTREZGENE_ACC')

输出:

incoming converted  n_incoming  n_converted    name                                        description                           namespaces    query
  NR1H4      9971           1            1   NR1H4  nuclear receptor subfamily 1 group H member 4 ...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
 TRIP12      9320           2            1  TRIP12  thyroid hormone receptor interactor 12 [Source...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
    UBC      7316           3            1     UBC    ubiquitin C [Source:HGNC Symbol;Acc:HGNC:12468]  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
  FCRL3    115352           4            1   FCRL3  Fc receptor like 3 [Source:HGNC Symbol;Acc:HGN...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
 PLXNA3     55558           5            1  PLXNA3       plexin A3 [Source:HGNC Symbol;Acc:HGNC:9101]             ENTREZGENE,HGNC,WIKIGENE  query_1
   GDNF      2668           6            1    GDNF  glial cell derived neurotrophic factor [Source...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
  VPS11     55823           7            1   VPS11  VPS11, CORVET/HOPS core subunit [Source:HGNC S...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
 PLXNA3     55558           5            1  PLXNA3       plexin A3 [Source:HGNC Symbol;Acc:HGNC:9101]             ENTREZGENE,HGNC,WIKIGENE  query_1

incoming列列出了输入基因,converted列出了目标命名空间中的基因(在本例中为 Entrez 基因登录号)。

g:正交(正交)

from gprofiler import GProfiler

gp = GProfiler(return_dataframe=True)
gp.orth(organism='hsapiens',
            query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'],
            target='mmusculus')

输出:

incoming        converted       ortholog_ensg  n_incoming  n_converted  n_result    name                                        description                           namespaces
  NR1H4  ENSG00000012504  ENSMUSG00000047638           1            1         1   Nr1h4  nuclear receptor subfamily 1, group H, member ...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
 TRIP12  ENSG00000153827  ENSMUSG00000026219           2            1         1  Trip12  thyroid hormone receptor interactor 12 [Source...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
    UBC  ENSG00000150991  ENSMUSG00000008348           3            1         1     Ubc      ubiquitin C [Source:MGI Symbol;Acc:MGI:98889]  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
  FCRL3  ENSG00000160856                 N/A           4            1         1     N/A                                                N/A  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
 PLXNA3  ENSG00000130827  ENSMUSG00000031398           5            1         1  Plxna3       plexin A3 [Source:MGI Symbol;Acc:MGI:107683]             ENTREZGENE,HGNC,WIKIGENE
   GDNF  ENSG00000168621  ENSMUSG00000022144           6            1         1    Gdnf  glial cell line derived neurotrophic factor [S...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
  VPS11  ENSG00000160695  ENSMUSG00000032127           7            1         1   Vps11  VPS11, CORVET/HOPS core subunit [Source:MGI Sy...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE

incoming是输入基因,converted是 输入基因ortholog_ensg的规范 Ensembl ID, 是目标生物中直系同源基因的规范 Ensembl ID。

g:SNPense (snpense)

from gprofiler import GProfiler

gp = GProfiler(return_dataframe=True)
gp.snpense(query=['rs11734132', 'rs7961894', 'rs4305276', 'rs17396340'])

输出:

rs_id chromosome strand      start        end              ensgs gene_names                                           variants
rs11734132                           -1         -1                 []         []  {'intron_variant': 0, 'non_coding_transcript_v...
 rs7961894         12      +  121927677  121927677  [ENSG00000158023]    [WDR66]  {'intron_variant': 3, 'non_coding_transcript_v...
 rs4305276          2      +  240555596  240555596  [ENSG00000144504]   [ANKMY1]  {'intron_variant': 57, 'non_coding_transcript_...
rs17396340          1      +   10226118   10226118  [ENSG00000054523]    [KIF1B]  {'intron_variant': 8, 'non_coding_transcript_v...

  • rs_id是输入 rs 编号
  • chromosome, strand,start并对end变化的位置进行编码
  • ensgs并且gene_names是与 rs 编号相关的蛋白质编码基因列表。
  • variants是预测的变异效应。

项目详情


下载文件

下载适用于您平台的文件。如果您不确定要选择哪个,请了解有关安装包的更多信息。

源分布

gprofiler-official-1.0.0.tar.gz (9.6 kB 查看哈希

已上传 source

内置分布

gprofiler_official-1.0.0-py3-none-any.whl (9.3 kB 查看哈希

已上传 py3