快速组成单机分析流水线。
项目描述
CoRelAy – 组合相关性分析
CoRelAy 是一种组合小规模(单机)分析管道的工具。管道设计有许多步骤(任务)和默认操作(处理器)。然后可以通过分配新的操作员(处理器)来单独更改管道的任何步骤。处理器具有定义其操作的参数。
创建 CoRelAy 是为了快速实施管道以生成分析数据,然后可以使用 ViRelAy 对其进行可视化。
如果您发现 CoRelAy 对您的研究有用,为什么不引用我们的相关论文:
@article{anders2021software,
author = {Anders, Christopher J. and
Neumann, David and
Samek, Wojciech and
Müller, Klaus-Robert and
Lapuschkin, Sebastian},
title = {Software for Dataset-wide XAI: From Local Explanations to Global Insights with {Zennit}, {CoRelAy}, and {ViRelAy}},
journal = {CoRR},
volume = {abs/2106.13200},
year = {2021},
}
文档
最新的文档托管在 corelay.readthedocs.io 上。
安装
可以使用 pip 安装 CoRelAy
$ pip install corelay
要安装可选的 HDBSCAN 和 UMAP 支持,请使用
$ pip install corelay[umap,hdbscan]
用法
可以在 中找到突出显示CoRelAy某些功能的示例example/
。
我们主要使用 HDF5 文件来存储结果。ViRelAy使用的结构记录在ViRelAy
存储库中,位于docs/database_specification.md
. 创建可与ViRelAy一起使用的 HDF5 文件的示例显示在example/hdf5_structure.py
要进行可使用 ViRelAy 可视化的完整 SpRAy 分析,可以在中找到高级脚本
example/virelay_analysis.py
。
以下显示 的内容example/memoize_spectral_pipeline.py
:
'''Example using memoization to store (intermediate) results.'''
import time
import h5py
import numpy as np
from corelay.base import Param
from corelay.processor.base import Processor
from corelay.processor.flow import Sequential, Parallel
from corelay.pipeline.spectral import SpectralClustering
from corelay.processor.clustering import KMeans
from corelay.processor.embedding import TSNEEmbedding, EigenDecomposition
from corelay.io.storage import HashedHDF5
# custom processors can be implemented by defining a function attribute
class Flatten(Processor):
def function(self, data):
return data.reshape(data.shape[0], np.prod(data.shape[1:]))
class SumChannel(Processor):
# parameters can be assigned by defining a class-owned Param instance
axis = Param(int, 1)
def function(self, data):
return data.sum(1)
class Normalize(Processor):
def function(self, data):
data = data / data.sum((1, 2), keepdims=True)
return data
def main():
np.random.seed(0xDEADBEEF)
fpath = 'test.analysis.h5'
with h5py.File(fpath, 'a') as fd:
# HashedHDF5 is an io-object that stores outputs of Processors based on hashes in hdf5
iobj = HashedHDF5(fd.require_group('proc_data'))
# generate some exemplary data
data = np.random.normal(size=(64, 3, 32, 32))
n_clusters = range(2, 20)
# SpectralClustering is an Example for a pre-defined Pipeline
pipeline = SpectralClustering(
# processors, such as EigenDecomposition, can be assigned to pre-defined tasks
embedding=EigenDecomposition(n_eigval=8, io=iobj),
# flow-based Processors, such as Parallel, can combine multiple Processors
# broadcast=True copies the input as many times as there are Processors
# broadcast=False instead attempts to match each input to a Processor
clustering=Parallel([
Parallel([
KMeans(n_clusters=k, io=iobj) for k in n_clusters
], broadcast=True),
# io-objects will be used during computation when supplied to Processors
# if a corresponding output value (here identified by hashes) already exists,
# the value is not computed again but instead loaded from the io object
TSNEEmbedding(io=iobj)
], broadcast=True, is_output=True)
)
# Processors (and Params) can be updated by simply assigning corresponding attributes
pipeline.preprocessing = Sequential([
SumChannel(),
Normalize(),
Flatten()
])
start_time = time.perf_counter()
# Processors flagged with "is_output=True" will be accumulated in the output
# the output will be a tree of tuples, with the same hierachy as the pipeline
# (i.e. clusterings here contains a tuple of the k-means outputs)
clusterings, tsne = pipeline(data)
# since we memoize our results in a hdf5 file, subsequent calls will not compute
# the values (for the same inputs), but rather load them from the hdf5 file
# try running the script multiple times
duration = time.perf_counter() - start_time
print(f'Pipeline execution time: {duration:.4f} seconds')
if __name__ == '__main__':
main()
项目详情
下载文件
下载适用于您平台的文件。如果您不确定要选择哪个,请了解有关安装包的更多信息。
源分布
corelay-0.2.1.tar.gz
(154.1 kB
查看哈希)
内置分布
corelay-0.2.1-py3-none-any.whl
(45.0 kB
查看哈希)