Skip to main content

用于处理光谱数据的 python 包

项目描述

光谱

欢迎来到 pyspectra。
该软件包旨在将功能组合在一起,以分析和转换来自多个光谱仪器的光谱数据。

当前支持的输入文件有:

  • .spc
  • .dx

PySpectra 旨在通过使用与 pandas 数据框对象的友好集成来促进在 python 中使用光谱文件。
. pyspectra 还提供了一组例程来执行光谱预处理,例如:

  • MSC
  • SNV
  • 去趋势
  • 萨维茨基 - 戈莱
  • 衍生品
  • ..

数据光谱可用于传统的化学计量学分析,但也可用于一般高级分析建模,以便通过提供光谱信息向制造模型提供附加信息。

#Import basic libraries
import spc
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA

读取 .spc 文件

读取单个文件

from pyspectra.readers.read_spc import read_spc
spc=read_spc('pyspectra/sample_spectra/VIAVI/JDSU_Phar_Rotate_S06_1_20171009_1540.spc')
spc.plot()
plt.xlabel("nm")
plt.ylabel("Abs")
plt.grid(True)
print(spc.head())
gx-y(1)
908.100000    0.123968
914.294355    0.118613
920.488710    0.113342
926.683065    0.108641
932.877419    0.098678
dtype: float64

单 spc 光谱

从目录中读取多个 .spc 文件

from pyspectra.readers.read_spc import read_spc_dir

df_spc, dict_spc=read_spc_dir('pyspectra/sample_spectra/VIAVI')
display(df_spc.transpose())
f, ax =plt.subplots(1, figsize=(18,8))
ax.plot(df_spc.transpose())
plt.xlabel("nm")
plt.ylabel("Abs")
ax.legend(labels= list(df_spc.transpose().columns))
plt.show()
gx-y(1)
gx-y(1)
gx-y(1)
gx-y(1)
gx-y(1)
gx-y(1)
gx-y(1)
gx-y(1)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
JDSU_Phar_Rotate_S06_1_20171009_1540.spc JDSU_Phar_Rotate_S11_2_20171009_1614.spc JDSU_Phar_Rotate_S17_1_20171009_1652.spc JDSU_Phar_Rotate_S23_1_20171009_1734.spc JDSU_Phar_Rotate_S30_2_20171009_1815.spc JDSU_Phar_Rotate_S37_2_20171009_1853.spc JDSU_Phar_Rotate_S43_2_20171009_1928.spc JDSU_Phar_Rotate_S49_1_20171009_2000.spc
908.100000 0.123968 0.164750 0.156647 0.147828 0.182833 0.171957 0.164471 0.149373
914.294355 0.118613 0.159980 0.150746 0.142974 0.178452 0.166827 0.159545 0.142818
920.488710 0.113342 0.155193 0.144959 0.138178 0.173734 0.161695 0.154330 0.136648
926.683065 0.108641 0.151398 0.140178 0.134014 0.170061 0.157110 0.149876 0.130452
932.877419 0.098678 0.141859 0.129715 0.124426 0.160590 0.147076 0.140119 0.119561
... ... ... ... ... ... ... ... ...
1651.422581 0.220935 0.262070 0.259643 0.242916 0.279041 0.271492 0.260664 0.252704
1657.616935 0.221848 0.262732 0.260664 0.243092 0.278962 0.272893 0.261647 0.254481
1663.811290 0.219904 0.260335 0.258975 0.240656 0.276382 0.271624 0.260278 0.253761
1670.005645 0.214080 0.253475 0.253110 0.234047 0.269528 0.265615 0.254568 0.248288
1676.200000 0.204217 0.242375 0.243082 0.223539 0.258771 0.255306 0.244826 0.238663

125 行 × 8 列

多光谱 spc

读取 .dx 光谱文件

Pyspectra 还使用一组正则表达式构建,允许读取来自不同供应商的最常见的 .dx 文件格式,例如:

  • 开源软件
  • Si-Ware系统
  • 光谱引擎
  • 德州仪器
  • VIAVI

读取单个 .dx 文件

.dx 阅读器可以阅读:

  • 包含单个光谱的单个文件:读取
  • 包含多个光谱的单个文件:读取
  • 目录中的多个文件:read_from_dir

单文件,单光谱

# Single file with single spectra
from pyspectra.readers.read_dx import read_dx
#Instantiate an object
Foss_single= read_dx()
# Run  read method
df=Foss_single.read(file='pyspectra/sample_spectra/DX multiple files/Example1.dx')
df.transpose().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x1f44faa7940>

单 DX 光谱

单文件,多光谱:

.dx 阅读器将所有信息存储为样本上对象的属性。每个键代表一个样本。

Foss_single= read_dx()
# Run  read method
df=Foss_single.read(file='pyspectra/sample_spectra/FOSS/FOSS.dx')
df.transpose().plot(legend=False)
<matplotlib.axes._subplots.AxesSubplot at 0x1f44f7f2e50>

多 DX 光谱

for c in Foss_single.Samples['29179'].keys():
    print(c)
y
Conc
TITLE
JCAMP_DX
DATA TYPE
CLASS
DATE
DATA PROCESSING
XUNITS
YUNITS
XFACTOR
YFACTOR
FIRSTX
LASTX
MINY
MAXY
NPOINTS
FIRSTY
CONCENTRATIONS
XYDATA
X
Y

光谱预处理

Pyspectra 有一组内置类来执行光谱预处理,例如:

  • MSC:乘法散射校正
  • SNV:标准正态变量
  • 去趋势
  • n阶导数
  • Savitzky golay smmothing
from pyspectra.transformers.spectral_correction import msc, detrend ,sav_gol,snv
MSC= msc()
MSC.fit(df)
df_msc=MSC.transform(df)


f, ax= plt.subplots(2,1,figsize=(14,8))
ax[0].plot(df.transpose())
ax[0].set_title("Raw spectra")

ax[1].plot(df_msc.transpose())
ax[1].set_title("MSC spectra")
plt.show()

MSC转化

SNV= snv()
df_snv=SNV.fit_transform(df)

Detr= detrend()
df_detrend=Detr.fit_transform(spc=df_snv,wave=np.array(df_snv.columns))

f, ax= plt.subplots(3,1,figsize=(18,8))
ax[0].plot(df.transpose())
ax[0].set_title("Raw spectra")

ax[1].plot(df_snv.transpose())
ax[1].set_title("SNV spectra")

ax[2].plot(df_detrend.transpose())
ax[2].set_title("SNV+ Detrend spectra")

plt.tight_layout()
plt.show()

SNV 和 Detrend 转换

光谱建模

使用 PCA 分解

pca=PCA()
pca.fit(df_msc)
plt.figure(figsize=(18,8))
plt.plot(range(1,len(pca.explained_variance_)+1),100*pca.explained_variance_.cumsum()/pca.explained_variance_.sum())
plt.grid(True)
plt.xlabel("Number of components")
plt.ylabel(" cumulative % of explained variance")

PCA累积方差

df_pca=pd.DataFrame(pca.transform(df_msc))
plt.figure(figsize=(18,8))
plt.plot(df_pca.loc[:,0:25].transpose())


plt.title("Transformed spectra PCA")
plt.ylabel("Response feature")
plt.xlabel("Principal component")
plt.grid(True)
plt.show()

转换后的 PCA 值

使用 automl 库部署更快的模型

import tpot
from tpot import TPOTRegressor
from sklearn.model_selection import RepeatedKFold
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
model = TPOTRegressor(generations=10, population_size=50, scoring='neg_mean_absolute_error',
                      cv=cv, verbosity=2, random_state=1, n_jobs=-1)
y=Foss_single.Conc[:,0]
x=df_pca.loc[:,0:25]
model.fit(x,y)
HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=550.0, style=ProgressStyle(de…



Generation 1 - Current best internal CV score: -0.30965836730187607

Generation 2 - Current best internal CV score: -0.30965836730187607

Generation 3 - Current best internal CV score: -0.30965836730187607

Generation 4 - Current best internal CV score: -0.308295313408046

Generation 5 - Current best internal CV score: -0.308295313408046

Generation 6 - Current best internal CV score: -0.308295313408046

Generation 7 - Current best internal CV score: -0.308295313408046

Generation 8 - Current best internal CV score: -0.3082953134080456

Generation 9 - Current best internal CV score: -0.3082953134080456

Generation 10 - Current best internal CV score: -0.3078569602146527

Best pipeline: LassoLarsCV(PCA(LinearSVR(input_matrix, C=0.1, dual=True, epsilon=0.1, loss=epsilon_insensitive, tol=0.01), iterated_power=3, svd_solver=randomized), normalize=False)





TPOTRegressor(cv=RepeatedKFold(n_repeats=3, n_splits=10, random_state=1),
              generations=10, n_jobs=-1, population_size=50, random_state=1,
              scoring='neg_mean_absolute_error', verbosity=2)
from sklearn.metrics import r2_score
r2=round(r2_score(y,model.predict(x)),2)
plt.scatter(y,model.predict(x),alpha=0.5, color='r')
plt.plot([y.min(),y.max()],[y.min(),y.max()],LineStyle='--',color='black')
plt.xlabel("y actual")
plt.ylabel("y predicted")
plt.title("Spectra model prediction R^2:"+ str(r2))

plt.show()

TPOT模型拟合

项目详情


下载文件

下载适用于您平台的文件。如果您不确定要选择哪个,请了解有关安装包的更多信息。

源分布

pyspectra-0.0.1.2.tar.gz (14.2 kB 查看哈希

已上传 source

内置分布

pyspectra-0.0.1.2-py3-none-any.whl (22.6 kB 查看哈希

已上传 py3