Skip to main content

用于对电子健康记录 (EHR) 数据执行 QC 的软件包

项目描述

EHRQC

介绍

机器学习 (ML) 模型的性能主要取决于训练它的基础数据。因此,确保训练数据具有最高质量是非常重要的。在将其提供给机器学习算法之前,执行与处理缺失值和异常值相关的操作是一种标准做法,目前有完善的程序和专用库。但是,它们本质上是通用的,不涵盖特定领域的细微差别。例如,还将执行非标准数据完整性检查,以消除特定于医疗领域的电子健康记录 (EHR) 中的进一步错误。

系统架构

图片

示例输出

请参阅人口统计.html 、vitals.htmllab_measurements.htmlvitals_anomalies.htmllab_measurements_anomalies.html

安装指南

安装以下库

pip install numpy
pip install matplotlib
pip install yattag
pip install scipy
pip install sklearn
pip install pandas

然后安装 EHRQC

pip install EHRQC

用户指南

从 OMOP 模式中提取人口统计数据

from qc.extract import extractOmopDemographics as extractOmopDemographics

omopDemographicsDf = extractOmopDemographics()
omopDemographicsDf.head()

从 OMOP 模式中提取 Vitals 数据

from qc.extract import extractMimicOmopVitals as extractMimicOmopVitals

mimicOmopVitalsDf = extractMimicOmopVitals()
mimicOmopVitalsDf.head()

从 OMOP 模式中提取实验室测量数据

from qc.extract import extractOmopLabMeasurements as extractOmopLabMeasurements

omopLabMeasurementsDf = extractOmopLabMeasurements()
omopLabMeasurementsDf.head()

从 MIMIC 模式中提取人口统计数据

from qc.extract import extractMimicDemographics as extractMimicDemographics

mimicDemographicsDf = extractMimicDemographics()
mimicDemographicsDf.head()

从 MIMIC 模式中提取 Vitals 数据

from qc.extract import extractMimicVitals as extractMimicVitals

mimicVitalsDf = extractMimicVitals()
mimicVitalsDf.head()

从 MIMIC 模式中提取实验室测量数据

from qc.extract import extractMimicLabMeasurements as extractMimicLabMeasurements

mimicLabMeasurementsDf = extractMimicLabMeasurements()
mimicLabMeasurementsDf.head()

人口统计图表示例 1

import qc.demographicsGraphs as demographicsGraphs

data = [
    [0, 1, 2, 'male', 'white', date.fromisoformat('2020-09-13'), date.fromisoformat('2021-09-13')], 
    [2, 3, 4, np.nan, 'white', date.fromisoformat('2020-09-14'), date.fromisoformat('2021-09-13')], 
    [4, 5, 6, 'female', 'black', date.fromisoformat('2020-09-15'), date.fromisoformat('2021-09-13')], 
    [6, 7, 8, np.nan, 'asian', date.fromisoformat('2020-09-14'), date.fromisoformat('2021-09-13')]]
demographicsGraphs.plot(pd.DataFrame(data, columns=['age', 'weight', 'height', 'gender', 'ethnicity', 'dob', 'dod']))

人口统计图表示例 2

import qc.demographicsGraphs as demographicsGraphs

df = dbUtils._getDemographics()
demographicsGraphs.plot(df)

生命体征图示例 1

import qc.vitalsGraphs as vitalsGraphs

data = [
    [0, 1, 2], 
    [2, np.nan, 4], 
    [4, 5, np.nan], 
    [0, 1, 2], 
    [2, 3, 4], 
    [4, 5, np.nan], 
    [0, 1, 2], 
    [2, 3, 4], 
    [4, 5, 6], 
    [6, 7, np.nan]]
vitalsGraphs.plot(pd.DataFrame(data, columns=['heartrate', 'sysbp', 'diabp']))

生命体征图示例 2

import qc.vitalsGraphs as vitalsGraphs

df = dbUtils._getVitals()
vitalsGraphs.plot(df)

实验室测量图表示例 1

import qc.labMeasurementsGraphs as labMeasurementsGraphs

data = [
    [0, 1, 2], 
    [2, np.nan, 4], 
    [4, 5, np.nan], 
    [0, 1, 2], 
    [2, 3, 4], 
    [4, 5, np.nan], 
    [0, 1, 2], 
    [2, 3, 4], 
    [4, 5, 6], 
    [6, 7, np.nan]]
labMeasurementsGraphs.plot(pd.DataFrame(data, columns=['glucose', 'hemoglobin', 'anion_gap']))

实验室测量图表示例 2

import qc.labMeasurementsGraphs as labMeasurementsGraphs

df = dbUtils._getLabMeasurements()
labMeasurementsGraphs.plot(df)

缺失数据插补方法比较例1

import qc.missingDataImputation as missingDataImputation

df = dbUtils._getVitals()
df = df.dropna()
meanR2, medianR2, knnR2, mfR2, emR2, miR2 = missingDataImputation.compare()
print(meanR2, medianR2, knnR2, mfR2, emR2, miR2)

缺失数据插补方法比较例2

import qc.missingDataImputation as missingDataImputation

df = dbUtils._getLabMeasurements()
df = df.dropna()
meanR2, medianR2, knnR2, mfR2, emR2, miR2 = missingDataImputation.compare()
print(meanR2, medianR2, knnR2, mfR2, emR2, miR2)

缺失数据插补示例 1

import qc.missingDataImputation as missingDataImputation

df = dbUtils._getVitals()
imputedDf = missingDataImputation.impute(df, 'miss_forest')

Vitals 异常图示例

import qc.vitalsAnomalies as vitalsAnomalies

df = dbUtils._getVitals()
vitalsAnomalies.plot(df)

实验室测量异常图示例

import qc.labMeasurementsAnomalies as labMeasurementsAnomalies

df = dbUtils._getVitals()
labMeasurementsAnomalies.plot(df)

运行管道示例

from qc.pipeline import run

data = run(source='mimic', type='demographics', graph=True, impute_missing=True)
print(data.head())

## source -> Can be one of 'mimic' or 'omop'  
## type -> Can be one of 'demographics', 'vitals', 'lab_measurements'  
## graph -> If true, the EDA graph will be generated  
## impute_missing -> If true, missing values will be imputed based on the best imputation method for the given data

致谢

阿尔弗雷德医院标志 阿尔弗雷德医院标志 Superbug_AI_Branding_FINAL

项目详情


下载文件

下载适用于您平台的文件。如果您不确定要选择哪个,请了解有关安装包的更多信息。

源分布

EHRQC-0.4.tar.gz (21.7 kB 查看哈希

已上传 source

内置分布

EHRQC-0.4-py3-none-any.whl (27.8 kB 查看哈希

已上传 py3