Skip to main content

在 python 中使用 Spectral Gating 降噪

项目描述

构建状态 覆盖状态 粘合剂 在 Colab 中打开 PyPI 版本

使用光谱门控在 python 中降噪

Noisereduce 是 Python 中的一种降噪算法,可降低语音、生物声学和生理信号等时域信号中的噪声。它依赖于一种称为“光谱选通”的方法,这是一种噪声门。它通过计算信号(以及可选的噪声信号)的频谱图并估计该信号/噪声的每个频带的噪声阈值(或门)来工作。该阈值用于计算掩码,该掩码将噪声门控到频率变化阈值以下。

最新版本的降噪包括两种算法:

  1. 平稳降噪:将估计的噪声阈值保持在整个信号的相同水平
  2. 非平稳降噪:随着时间的推移不断更新估计的噪声阈值

第 2 版更新:

  • 增加了两种形式的频谱门控降噪:平稳降噪和非平稳降噪。
  • 添加了多处理,因此您可以对更大的数据执行降噪。
  • 新版本打破了旧版本的API。
  • 以前的版本仍可在from noisereduce.noisereducev1 import reduce_noise
  • 您现在可以创建一个降噪对象,该对象允许您减少较长录音子集的噪音

固定降噪

  • 基本直觉是在每个频率通道上计算统计数据以确定噪声门。然后将门应用于信号。
  • 该算法基于(但不完全复制)Audacity降噪效果概述的算法(链接到 C++ 代码
  • 该算法有两个输入:
    1. 包含剪辑原型噪声的噪声剪辑(可选)
    2. 包含要去除的信号和噪声的信号剪辑

平稳降噪算法的步骤

  1. 在噪声音频剪辑上计算频谱图
  2. 统计数据是在噪声的频谱图上计算的(以频率计)
  3. 阈值是根据噪声的统计数据(以及算法所需的灵敏度)计算得出的
  4. 对信号计算频谱图
  5. 通过将信号频谱图与阈值进行比较来确定掩码
  6. 使用滤波器对频率和时间进行平滑处理
  7. 掩码应用于信号的频谱图,并被反转 如果没有提供噪声信号,该算法会将信号视为噪声剪辑,这往往工作得很好

非平稳降噪

  • 非平稳降噪算法是平稳降噪算法的扩展,但允许噪声门随时间变化。
  • 当您知道信号发生的时间尺度时(例如,鸟叫可能是几百毫秒),您可以基于在较长时间尺度上发生的事件是噪声的假设来设置噪声阈值。
  • 该算法的灵感来自于生物声学中一种称为每通道能量归一化的最新方法。

非平稳降噪算法的步骤

  1. 对信号计算频谱图
  2. 使用在每个频道上向前和向后应用的 IIR 滤波器计算频谱图的时间平滑版本。
  3. 基于该时间平滑频谱图计算掩码
  4. 使用滤波器对频率和时间进行平滑处理
  5. 掩码应用于信号的频谱图,并被反转

安装

pip install noisereduce

用法

请参阅示例笔记本:在 Colab 中打开

最简单的用法

from scipy.io import wavfile
import noisereduce as nr
# load data
rate, data = wavfile.read("mywav.wav")
# perform noise reduction
reduced_noise = nr.reduce_noise(y=data, sr=rate)
wavfile.write("mywav_reduced_noise.wav", rate, reduced_noise)

参数reduce_noise

y : np.ndarray [shape=(# frames,) or (# channels, # frames)], real-valued
      input signal
  sr : int
      sample rate of input signal / noise signal
  y_noise : np.ndarray [shape=(# frames,) or (# channels, # frames)], real-valued
      noise signal to compute statistics over (only for non-stationary noise reduction).
  stationary : bool, optional
      Whether to perform stationary, or non-stationary noise reduction, by default False
  prop_decrease : float, optional
      The proportion to reduce the noise by (1.0 = 100%), by default 1.0
  time_constant_s : float, optional
      The time constant, in seconds, to compute the noise floor in the non-stationary
      algorithm, by default 2.0
  freq_mask_smooth_hz : int, optional
      The frequency range to smooth the mask over in Hz, by default 500
  time_mask_smooth_ms : int, optional
      The time range to smooth the mask over in milliseconds, by default 50
  thresh_n_mult_nonstationary : int, optional
      Only used in nonstationary noise reduction., by default 1
  sigmoid_slope_nonstationary : int, optional
      Only used in nonstationary noise reduction., by default 10
  n_std_thresh_stationary : int, optional
      Number of standard deviations above mean to place the threshold between
      signal and noise., by default 1.5
  tmp_folder : [type], optional
      Temp folder to write waveform to during parallel processing. Defaults to 
      default temp folder for python., by default None
  chunk_size : int, optional
      Size of signal chunks to reduce noise over. Larger sizes
      will take more space in memory, smaller sizes can take longer to compute.
      , by default 60000
      padding : int, optional
      How much to pad each chunk of signal by. Larger pads are
      needed for larger time constants., by default 30000
  n_fft : int, optional
      length of the windowed signal after padding with zeros.
      The number of rows in the STFT matrix ``D`` is ``(1 + n_fft/2)``.
      The default value, ``n_fft=2048`` samples, corresponds to a physical
      duration of 93 milliseconds at a sample rate of 22050 Hz, i.e. the
      default sample rate in librosa. This value is well adapted for music
      signals. However, in speech processing, the recommended value is 512,
      corresponding to 23 milliseconds at a sample rate of 22050 Hz.
      In any case, we recommend setting ``n_fft`` to a power of two for
      optimizing the speed of the fast Fourier transform (FFT) algorithm., by default 1024
  win_length : [type], optional
      Each frame of audio is windowed by ``window`` of length ``win_length``
      and then padded with zeros to match ``n_fft``.
      Smaller values improve the temporal resolution of the STFT (i.e. the
      ability to discriminate impulses that are closely spaced in time)
      at the expense of frequency resolution (i.e. the ability to discriminate
      pure tones that are closely spaced in frequency). This effect is known
      as the time-frequency localization trade-off and needs to be adjusted
      according to the properties of the input signal ``y``.
      If unspecified, defaults to ``win_length = n_fft``., by default None
  hop_length : [type], optional
      number of audio samples between adjacent STFT columns.
      Smaller values increase the number of columns in ``D`` without
      affecting the frequency resolution of the STFT.
      If unspecified, defaults to ``win_length // 4`` (see below)., by default None
  n_jobs : int, optional
      Number of parallel jobs to run. Set at -1 to use all CPU cores, by default 1

引文

如果您在研究中使用此代码,请引用它:

@software{tim_sainburg_2019_3243139,
  author       = {Tim Sainburg},
  title        = {timsainb/noisereduce: v1.0},
  month        = jun,
  year         = 2019,
  publisher    = {Zenodo},
  version      = {db94fe2},
  doi          = {10.5281/zenodo.3243139},
  url          = {https://doi.org/10.5281/zenodo.3243139}
}


@article{sainburg2020finding,
  title={Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires},
  author={Sainburg, Tim and Thielk, Marvin and Gentner, Timothy Q},
  journal={PLoS computational biology},
  volume={16},
  number={10},
  pages={e1008228},
  year={2020},
  publisher={Public Library of Science}
}

<small>基于cookiecutter 数据科学项目模板的项目。#cookiecutterdatascience</small>

项目详情


下载文件

下载适用于您平台的文件。如果您不确定要选择哪个,请了解有关安装包的更多信息。

源分布

noisereduce-2.0.1.tar.gz (15.3 kB 查看哈希)

已上传 source

内置分布

noisereduce-2.0.1-py3-none-any.whl (15.6 kB 查看哈希

已上传 py3