Skip to main content

元存储 Python SDK。用于机器学习的特征存储和数据目录。

项目描述

发布 问题 拉取请求 文档 执照

元存储

元存储 Python SDK。

用于机器学习的特征存储和数据目录。

先决条件

安装

生产

安装包:

pip install metastore

发展

安装包:

pip install -e .[development]

注意使用-e, --editable标志在开发模式下安装包。

注意为开发设置一个虚拟环境。

格式源代码:

autopep8 --recursive --in-place setup.py metastore/ tests/

皮棉源代码:

pylint setup.py metastore/ tests/

测试包:

pytest

报告测试覆盖率:

pytest --cov --cov-fail-under 80

注意将该--cov-fail-under标志设置为 80% 以验证代码覆盖率指标。

构建文档:

cd docs/
sphinx-build -b html metastore/ build/

注意此步骤将在构建之前生成 API 参考。

用法

创建项目定义

# metastore.yaml

project:
    name: 'customer_transactions'
    display_name: 'Customer transactions'
    description: 'Customer transactions feature store.'
    author: 'Metastore Developers'
    tags:
      - 'customer'
      - 'transaction'
    version: '1.0.0'
credential_store:
    type: 'local'
    path: '/path/to/.env'
metadata_store:
    type: 'file'
    path: 's3://path/to/metadata.db'
    s3_endpoint:
        type: 'secret'
        name: 'S3_ENDPOINT'
    s3_access_key:
        type: 'secret'
        name: 'S3_ACCESS_KEY'
    s3_secret_key:
        type: 'secret'
        name: 'S3_SECRET_KEY'
feature_store:
    offline_store:
        type: 'file'
        path: 's3://path/to/features/'
        s3_endpoint:
            type: 'secret'
            name: 'S3_ENDPOINT'
        s3_access_key:
            type: 'secret'
            name: 'S3_ACCESS_KEY'
        s3_secret_key:
            type: 'secret'
            name: 'S3_SECRET_KEY'
    online_store:
        type: 'redis'
        hostname:
            type: 'secret'
            name: 'REDIS_HOSTNAME'
        port:
            type: 'secret'
            name: 'REDIS_PORT'
        database:
            type: 'secret'
            name: 'REDIS_DATABASE'
        password:
            type: 'secret'
            name: 'REDIS_PASSWORD'
data_sources:
  - name: 'postgresql_data_source'
    type: 'postgresql'
    hostname:
        type: 'secret'
        name: 'POSTGRESQL_HOSTNAME'
    port:
        type: 'secret'
        name: 'POSTGRESQL_PORT'
    database:
        type: 'secret'
        name: 'POSTGRESQL_DATABASE'
    username:
        type: 'secret'
        name: 'POSTGRESQL_USERNAME'
    password:
        type: 'secret'
        name: 'POSTGRESQL_PASSWORD'

创建特征定义

# feature_definitions.py

from datetime import timedelta

from metastore import (
    FeatureStore,
    FeatureGroup,
    Feature,
    ValueType
)


feature_store = FeatureStore(repository='/path/to/repository/')

feature_group = FeatureGroup(
    name='customer_transactions',
    record_identifiers=['customer_id'],
    event_time_feature='timestamp',
    features=[
        Feature(name='customer_id', value_type=ValueType.INTEGER),
        Feature(name='timestamp', value_type=ValueType.STRING),
        Feature(name='daily_transactions', value_type=ValueType.FLOAT),
        Feature(name='total_transactions', value_type=ValueType.FLOAT)
    ]
)

feature_store.apply(feature_group)

摄取特征

# ingest_features.py

from metastore import FeatureStore


feature_store = FeatureStore(repository='/path/to/repository/')

dataframe = feature_store.read_from_source(
    'postgresql_data_source',
    table='customer_transaction',
    index_column='customer_id',
    partitions=10
)

feature_store.ingest('customer_transactions', dataframe)

实现特征

# materialize_features.py

from datetime import datetime, timedelta

from metastore import FeatureStore


feature_store = FeatureStore(repository='/path/to/repository/')

feature_store.materialize(
    'customer_transactions',
    end_date=datetime.utcnow(),
    expires_in=timedelta(days=1)
)

检索历史特征

# retrieve_historical_features.py

from datetime import datetime

import pandas as pd
from metastore import FeatureStore


feature_store = FeatureStore(repository='/path/to/repository/')

record_identifiers = pd.DataFrame({
    'customer_id': [00001],
    'timestamp': [datetime.utcnow()]
})

dataframe = feature_store.get_historical_features(
    record_identifiers=record_identifiers,
    features=[
        'customer_transactions:daily_transactions',
        'customer_transactions:total_transactions'
    ]
)

metadata = dataframe.attrs['metastore']
print(metadata)

检索在线功能

# retrieve_online_features.py

import pandas as pd
from metastore import FeatureStore


feature_store = FeatureStore(repository='/path/to/repository/')

record_identifiers = pd.DataFrame({
    'customer_id': [00001]
})

dataframe = feature_store.get_online_features(
    record_identifiers=record_identifiers,
    features=[
        'customer_transactions:daily_transactions',
        'customer_transactions:total_transactions'
    ]
)

metadata = dataframe.attrs['metastore']
print(metadata)

文档

请参考官方Metastore 文档

变更日志

变更日志包含有关每个版本中的新功能、改进、已知问题和错误修复的信息。

版权和许可

版权所有 (c) 2022,Metastore 开发人员。版权所有。

在BSD-3-Clause License下开发的项目。

项目详情


下载文件

下载适用于您平台的文件。如果您不确定要选择哪个,请了解有关安装包的更多信息。

内置分布

metastore-1.0.0.dev21-py3-none-any.whl (8.7 kB 查看哈希

已上传 py3