用于处理光谱数据的 python 包
项目描述
光谱
欢迎来到 pyspectra。
该软件包旨在将功能组合在一起,以分析和转换来自多个光谱仪器的光谱数据。
当前支持的输入文件有:
- .spc
- .dx
PySpectra 旨在通过使用与 pandas 数据框对象的友好集成来促进在 python 中使用光谱文件。
. pyspectra 还提供了一组例程来执行光谱预处理,例如:
- MSC
- SNV
- 去趋势
- 萨维茨基 - 戈莱
- 衍生品
- ..
数据光谱可用于传统的化学计量学分析,但也可用于一般高级分析建模,以便通过提供光谱信息向制造模型提供附加信息。
#Import basic libraries
import spc
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
读取 .spc 文件
读取单个文件
from pyspectra.readers.read_spc import read_spc
spc=read_spc('pyspectra/sample_spectra/VIAVI/JDSU_Phar_Rotate_S06_1_20171009_1540.spc')
spc.plot()
plt.xlabel("nm")
plt.ylabel("Abs")
plt.grid(True)
print(spc.head())
gx-y(1)
908.100000 0.123968
914.294355 0.118613
920.488710 0.113342
926.683065 0.108641
932.877419 0.098678
dtype: float64
从目录中读取多个 .spc 文件
from pyspectra.readers.read_spc import read_spc_dir
df_spc, dict_spc=read_spc_dir('pyspectra/sample_spectra/VIAVI')
display(df_spc.transpose())
f, ax =plt.subplots(1, figsize=(18,8))
ax.plot(df_spc.transpose())
plt.xlabel("nm")
plt.ylabel("Abs")
ax.legend(labels= list(df_spc.transpose().columns))
plt.show()
gx-y(1)
gx-y(1)
gx-y(1)
gx-y(1)
gx-y(1)
gx-y(1)
gx-y(1)
gx-y(1)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
| JDSU_Phar_Rotate_S06_1_20171009_1540.spc | JDSU_Phar_Rotate_S11_2_20171009_1614.spc | JDSU_Phar_Rotate_S17_1_20171009_1652.spc | JDSU_Phar_Rotate_S23_1_20171009_1734.spc | JDSU_Phar_Rotate_S30_2_20171009_1815.spc | JDSU_Phar_Rotate_S37_2_20171009_1853.spc | JDSU_Phar_Rotate_S43_2_20171009_1928.spc | JDSU_Phar_Rotate_S49_1_20171009_2000.spc | |
|---|---|---|---|---|---|---|---|---|
| 908.100000 | 0.123968 | 0.164750 | 0.156647 | 0.147828 | 0.182833 | 0.171957 | 0.164471 | 0.149373 |
| 914.294355 | 0.118613 | 0.159980 | 0.150746 | 0.142974 | 0.178452 | 0.166827 | 0.159545 | 0.142818 |
| 920.488710 | 0.113342 | 0.155193 | 0.144959 | 0.138178 | 0.173734 | 0.161695 | 0.154330 | 0.136648 |
| 926.683065 | 0.108641 | 0.151398 | 0.140178 | 0.134014 | 0.170061 | 0.157110 | 0.149876 | 0.130452 |
| 932.877419 | 0.098678 | 0.141859 | 0.129715 | 0.124426 | 0.160590 | 0.147076 | 0.140119 | 0.119561 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1651.422581 | 0.220935 | 0.262070 | 0.259643 | 0.242916 | 0.279041 | 0.271492 | 0.260664 | 0.252704 |
| 1657.616935 | 0.221848 | 0.262732 | 0.260664 | 0.243092 | 0.278962 | 0.272893 | 0.261647 | 0.254481 |
| 1663.811290 | 0.219904 | 0.260335 | 0.258975 | 0.240656 | 0.276382 | 0.271624 | 0.260278 | 0.253761 |
| 1670.005645 | 0.214080 | 0.253475 | 0.253110 | 0.234047 | 0.269528 | 0.265615 | 0.254568 | 0.248288 |
| 1676.200000 | 0.204217 | 0.242375 | 0.243082 | 0.223539 | 0.258771 | 0.255306 | 0.244826 | 0.238663 |
125 行 × 8 列
读取 .dx 光谱文件
Pyspectra 还使用一组正则表达式构建,允许读取来自不同供应商的最常见的 .dx 文件格式,例如:
- 开源软件
- Si-Ware系统
- 光谱引擎
- 德州仪器
- VIAVI
读取单个 .dx 文件
.dx 阅读器可以阅读:
- 包含单个光谱的单个文件:读取
- 包含多个光谱的单个文件:读取
- 目录中的多个文件:read_from_dir
单文件,单光谱
# Single file with single spectra
from pyspectra.readers.read_dx import read_dx
#Instantiate an object
Foss_single= read_dx()
# Run read method
df=Foss_single.read(file='pyspectra/sample_spectra/DX multiple files/Example1.dx')
df.transpose().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x1f44faa7940>
单文件,多光谱:
.dx 阅读器将所有信息存储为样本上对象的属性。每个键代表一个样本。
Foss_single= read_dx()
# Run read method
df=Foss_single.read(file='pyspectra/sample_spectra/FOSS/FOSS.dx')
df.transpose().plot(legend=False)
<matplotlib.axes._subplots.AxesSubplot at 0x1f44f7f2e50>
for c in Foss_single.Samples['29179'].keys():
print(c)
y
Conc
TITLE
JCAMP_DX
DATA TYPE
CLASS
DATE
DATA PROCESSING
XUNITS
YUNITS
XFACTOR
YFACTOR
FIRSTX
LASTX
MINY
MAXY
NPOINTS
FIRSTY
CONCENTRATIONS
XYDATA
X
Y
光谱预处理
Pyspectra 有一组内置类来执行光谱预处理,例如:
- MSC:乘法散射校正
- SNV:标准正态变量
- 去趋势
- n阶导数
- Savitzky golay smmothing
from pyspectra.transformers.spectral_correction import msc, detrend ,sav_gol,snv
MSC= msc()
MSC.fit(df)
df_msc=MSC.transform(df)
f, ax= plt.subplots(2,1,figsize=(14,8))
ax[0].plot(df.transpose())
ax[0].set_title("Raw spectra")
ax[1].plot(df_msc.transpose())
ax[1].set_title("MSC spectra")
plt.show()
SNV= snv()
df_snv=SNV.fit_transform(df)
Detr= detrend()
df_detrend=Detr.fit_transform(spc=df_snv,wave=np.array(df_snv.columns))
f, ax= plt.subplots(3,1,figsize=(18,8))
ax[0].plot(df.transpose())
ax[0].set_title("Raw spectra")
ax[1].plot(df_snv.transpose())
ax[1].set_title("SNV spectra")
ax[2].plot(df_detrend.transpose())
ax[2].set_title("SNV+ Detrend spectra")
plt.tight_layout()
plt.show()
光谱建模
使用 PCA 分解
pca=PCA()
pca.fit(df_msc)
plt.figure(figsize=(18,8))
plt.plot(range(1,len(pca.explained_variance_)+1),100*pca.explained_variance_.cumsum()/pca.explained_variance_.sum())
plt.grid(True)
plt.xlabel("Number of components")
plt.ylabel(" cumulative % of explained variance")
df_pca=pd.DataFrame(pca.transform(df_msc))
plt.figure(figsize=(18,8))
plt.plot(df_pca.loc[:,0:25].transpose())
plt.title("Transformed spectra PCA")
plt.ylabel("Response feature")
plt.xlabel("Principal component")
plt.grid(True)
plt.show()
使用 automl 库部署更快的模型
import tpot
from tpot import TPOTRegressor
from sklearn.model_selection import RepeatedKFold
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
model = TPOTRegressor(generations=10, population_size=50, scoring='neg_mean_absolute_error',
cv=cv, verbosity=2, random_state=1, n_jobs=-1)
y=Foss_single.Conc[:,0]
x=df_pca.loc[:,0:25]
model.fit(x,y)
HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=550.0, style=ProgressStyle(de…
Generation 1 - Current best internal CV score: -0.30965836730187607
Generation 2 - Current best internal CV score: -0.30965836730187607
Generation 3 - Current best internal CV score: -0.30965836730187607
Generation 4 - Current best internal CV score: -0.308295313408046
Generation 5 - Current best internal CV score: -0.308295313408046
Generation 6 - Current best internal CV score: -0.308295313408046
Generation 7 - Current best internal CV score: -0.308295313408046
Generation 8 - Current best internal CV score: -0.3082953134080456
Generation 9 - Current best internal CV score: -0.3082953134080456
Generation 10 - Current best internal CV score: -0.3078569602146527
Best pipeline: LassoLarsCV(PCA(LinearSVR(input_matrix, C=0.1, dual=True, epsilon=0.1, loss=epsilon_insensitive, tol=0.01), iterated_power=3, svd_solver=randomized), normalize=False)
TPOTRegressor(cv=RepeatedKFold(n_repeats=3, n_splits=10, random_state=1),
generations=10, n_jobs=-1, population_size=50, random_state=1,
scoring='neg_mean_absolute_error', verbosity=2)
from sklearn.metrics import r2_score
r2=round(r2_score(y,model.predict(x)),2)
plt.scatter(y,model.predict(x),alpha=0.5, color='r')
plt.plot([y.min(),y.max()],[y.min(),y.max()],LineStyle='--',color='black')
plt.xlabel("y actual")
plt.ylabel("y predicted")
plt.title("Spectra model prediction R^2:"+ str(r2))
plt.show()
项目详情
下载文件
下载适用于您平台的文件。如果您不确定要选择哪个,请了解有关安装包的更多信息。
源分布
pyspectra-0.0.1.2.tar.gz
(14.2 kB
查看哈希)
内置分布
pyspectra-0.0.1.2-py3-none-any.whl
(22.6 kB
查看哈希)