用于对电子健康记录 (EHR) 数据执行 QC 的软件包
项目描述
EHRQC
介绍
机器学习 (ML) 模型的性能主要取决于训练它的基础数据。因此,确保训练数据具有最高质量是非常重要的。在将其提供给机器学习算法之前,执行与处理缺失值和异常值相关的操作是一种标准做法,目前有完善的程序和专用库。但是,它们本质上是通用的,不涵盖特定领域的细微差别。例如,还将执行非标准数据完整性检查,以消除特定于医疗领域的电子健康记录 (EHR) 中的进一步错误。
系统架构
示例输出
请参阅人口统计.html 、vitals.html、lab_measurements.html、vitals_anomalies.html和lab_measurements_anomalies.html
安装指南
安装以下库
pip install numpy
pip install matplotlib
pip install yattag
pip install scipy
pip install sklearn
pip install pandas
然后安装 EHRQC
pip install EHRQC
用户指南
从 OMOP 模式中提取人口统计数据
from qc.extract import extractOmopDemographics as extractOmopDemographics
omopDemographicsDf = extractOmopDemographics()
omopDemographicsDf.head()
从 OMOP 模式中提取 Vitals 数据
from qc.extract import extractMimicOmopVitals as extractMimicOmopVitals
mimicOmopVitalsDf = extractMimicOmopVitals()
mimicOmopVitalsDf.head()
从 OMOP 模式中提取实验室测量数据
from qc.extract import extractOmopLabMeasurements as extractOmopLabMeasurements
omopLabMeasurementsDf = extractOmopLabMeasurements()
omopLabMeasurementsDf.head()
从 MIMIC 模式中提取人口统计数据
from qc.extract import extractMimicDemographics as extractMimicDemographics
mimicDemographicsDf = extractMimicDemographics()
mimicDemographicsDf.head()
从 MIMIC 模式中提取 Vitals 数据
from qc.extract import extractMimicVitals as extractMimicVitals
mimicVitalsDf = extractMimicVitals()
mimicVitalsDf.head()
从 MIMIC 模式中提取实验室测量数据
from qc.extract import extractMimicLabMeasurements as extractMimicLabMeasurements
mimicLabMeasurementsDf = extractMimicLabMeasurements()
mimicLabMeasurementsDf.head()
人口统计图表示例 1
import qc.demographicsGraphs as demographicsGraphs
data = [
[0, 1, 2, 'male', 'white', date.fromisoformat('2020-09-13'), date.fromisoformat('2021-09-13')],
[2, 3, 4, np.nan, 'white', date.fromisoformat('2020-09-14'), date.fromisoformat('2021-09-13')],
[4, 5, 6, 'female', 'black', date.fromisoformat('2020-09-15'), date.fromisoformat('2021-09-13')],
[6, 7, 8, np.nan, 'asian', date.fromisoformat('2020-09-14'), date.fromisoformat('2021-09-13')]]
demographicsGraphs.plot(pd.DataFrame(data, columns=['age', 'weight', 'height', 'gender', 'ethnicity', 'dob', 'dod']))
人口统计图表示例 2
import qc.demographicsGraphs as demographicsGraphs
df = dbUtils._getDemographics()
demographicsGraphs.plot(df)
生命体征图示例 1
import qc.vitalsGraphs as vitalsGraphs
data = [
[0, 1, 2],
[2, np.nan, 4],
[4, 5, np.nan],
[0, 1, 2],
[2, 3, 4],
[4, 5, np.nan],
[0, 1, 2],
[2, 3, 4],
[4, 5, 6],
[6, 7, np.nan]]
vitalsGraphs.plot(pd.DataFrame(data, columns=['heartrate', 'sysbp', 'diabp']))
生命体征图示例 2
import qc.vitalsGraphs as vitalsGraphs
df = dbUtils._getVitals()
vitalsGraphs.plot(df)
实验室测量图表示例 1
import qc.labMeasurementsGraphs as labMeasurementsGraphs
data = [
[0, 1, 2],
[2, np.nan, 4],
[4, 5, np.nan],
[0, 1, 2],
[2, 3, 4],
[4, 5, np.nan],
[0, 1, 2],
[2, 3, 4],
[4, 5, 6],
[6, 7, np.nan]]
labMeasurementsGraphs.plot(pd.DataFrame(data, columns=['glucose', 'hemoglobin', 'anion_gap']))
实验室测量图表示例 2
import qc.labMeasurementsGraphs as labMeasurementsGraphs
df = dbUtils._getLabMeasurements()
labMeasurementsGraphs.plot(df)
缺失数据插补方法比较例1
import qc.missingDataImputation as missingDataImputation
df = dbUtils._getVitals()
df = df.dropna()
meanR2, medianR2, knnR2, mfR2, emR2, miR2 = missingDataImputation.compare()
print(meanR2, medianR2, knnR2, mfR2, emR2, miR2)
缺失数据插补方法比较例2
import qc.missingDataImputation as missingDataImputation
df = dbUtils._getLabMeasurements()
df = df.dropna()
meanR2, medianR2, knnR2, mfR2, emR2, miR2 = missingDataImputation.compare()
print(meanR2, medianR2, knnR2, mfR2, emR2, miR2)
缺失数据插补示例 1
import qc.missingDataImputation as missingDataImputation
df = dbUtils._getVitals()
imputedDf = missingDataImputation.impute(df, 'miss_forest')
Vitals 异常图示例
import qc.vitalsAnomalies as vitalsAnomalies
df = dbUtils._getVitals()
vitalsAnomalies.plot(df)
实验室测量异常图示例
import qc.labMeasurementsAnomalies as labMeasurementsAnomalies
df = dbUtils._getVitals()
labMeasurementsAnomalies.plot(df)
运行管道示例
from qc.pipeline import run
data = run(source='mimic', type='demographics', graph=True, impute_missing=True)
print(data.head())
## source -> Can be one of 'mimic' or 'omop'
## type -> Can be one of 'demographics', 'vitals', 'lab_measurements'
## graph -> If true, the EDA graph will be generated
## impute_missing -> If true, missing values will be imputed based on the best imputation method for the given data
致谢
项目详情
下载文件
下载适用于您平台的文件。如果您不确定要选择哪个,请了解有关安装包的更多信息。
源分布
EHRQC-0.4.tar.gz
(21.7 kB
查看哈希)
内置分布
EHRQC-0.4-py3-none-any.whl
(27.8 kB
查看哈希)