元存储 Python SDK。用于机器学习的特征存储和数据目录。
项目描述
元存储
元存储 Python SDK。
用于机器学习的特征存储和数据目录。
先决条件
安装
生产
安装包:
pip install metastore
发展
安装包:
pip install -e .[development]
注意使用
-e, --editable标志在开发模式下安装包。
注意为开发设置一个虚拟环境。
格式源代码:
autopep8 --recursive --in-place setup.py metastore/ tests/
皮棉源代码:
pylint setup.py metastore/ tests/
测试包:
pytest
报告测试覆盖率:
pytest --cov --cov-fail-under 80
注意将该
--cov-fail-under标志设置为 80% 以验证代码覆盖率指标。
构建文档:
cd docs/
sphinx-build -b html metastore/ build/
注意此步骤将在构建之前生成 API 参考。
用法
创建项目定义
# metastore.yaml
project:
name: 'customer_transactions'
display_name: 'Customer transactions'
description: 'Customer transactions feature store.'
author: 'Metastore Developers'
tags:
- 'customer'
- 'transaction'
version: '1.0.0'
credential_store:
type: 'local'
path: '/path/to/.env'
metadata_store:
type: 'file'
path: 's3://path/to/metadata.db'
s3_endpoint:
type: 'secret'
name: 'S3_ENDPOINT'
s3_access_key:
type: 'secret'
name: 'S3_ACCESS_KEY'
s3_secret_key:
type: 'secret'
name: 'S3_SECRET_KEY'
feature_store:
offline_store:
type: 'file'
path: 's3://path/to/features/'
s3_endpoint:
type: 'secret'
name: 'S3_ENDPOINT'
s3_access_key:
type: 'secret'
name: 'S3_ACCESS_KEY'
s3_secret_key:
type: 'secret'
name: 'S3_SECRET_KEY'
online_store:
type: 'redis'
hostname:
type: 'secret'
name: 'REDIS_HOSTNAME'
port:
type: 'secret'
name: 'REDIS_PORT'
database:
type: 'secret'
name: 'REDIS_DATABASE'
password:
type: 'secret'
name: 'REDIS_PASSWORD'
data_sources:
- name: 'postgresql_data_source'
type: 'postgresql'
hostname:
type: 'secret'
name: 'POSTGRESQL_HOSTNAME'
port:
type: 'secret'
name: 'POSTGRESQL_PORT'
database:
type: 'secret'
name: 'POSTGRESQL_DATABASE'
username:
type: 'secret'
name: 'POSTGRESQL_USERNAME'
password:
type: 'secret'
name: 'POSTGRESQL_PASSWORD'
创建特征定义
# feature_definitions.py
from datetime import timedelta
from metastore import (
FeatureStore,
FeatureGroup,
Feature,
ValueType
)
feature_store = FeatureStore(repository='/path/to/repository/')
feature_group = FeatureGroup(
name='customer_transactions',
record_identifiers=['customer_id'],
event_time_feature='timestamp',
features=[
Feature(name='customer_id', value_type=ValueType.INTEGER),
Feature(name='timestamp', value_type=ValueType.STRING),
Feature(name='daily_transactions', value_type=ValueType.FLOAT),
Feature(name='total_transactions', value_type=ValueType.FLOAT)
]
)
feature_store.apply(feature_group)
摄取特征
# ingest_features.py
from metastore import FeatureStore
feature_store = FeatureStore(repository='/path/to/repository/')
dataframe = feature_store.read_from_source(
'postgresql_data_source',
table='customer_transaction',
index_column='customer_id',
partitions=10
)
feature_store.ingest('customer_transactions', dataframe)
实现特征
# materialize_features.py
from datetime import datetime, timedelta
from metastore import FeatureStore
feature_store = FeatureStore(repository='/path/to/repository/')
feature_store.materialize(
'customer_transactions',
end_date=datetime.utcnow(),
expires_in=timedelta(days=1)
)
检索历史特征
# retrieve_historical_features.py
from datetime import datetime
import pandas as pd
from metastore import FeatureStore
feature_store = FeatureStore(repository='/path/to/repository/')
record_identifiers = pd.DataFrame({
'customer_id': [00001],
'timestamp': [datetime.utcnow()]
})
dataframe = feature_store.get_historical_features(
record_identifiers=record_identifiers,
features=[
'customer_transactions:daily_transactions',
'customer_transactions:total_transactions'
]
)
metadata = dataframe.attrs['metastore']
print(metadata)
检索在线功能
# retrieve_online_features.py
import pandas as pd
from metastore import FeatureStore
feature_store = FeatureStore(repository='/path/to/repository/')
record_identifiers = pd.DataFrame({
'customer_id': [00001]
})
dataframe = feature_store.get_online_features(
record_identifiers=record_identifiers,
features=[
'customer_transactions:daily_transactions',
'customer_transactions:total_transactions'
]
)
metadata = dataframe.attrs['metastore']
print(metadata)
文档
请参考官方Metastore 文档。
变更日志
变更日志包含有关每个版本中的新功能、改进、已知问题和错误修复的信息。
版权和许可
版权所有 (c) 2022,Metastore 开发人员。版权所有。
在BSD-3-Clause License下开发的项目。