CMU Sphinxbase 和 Pocketsphinx 库的 Python 接口

项目描述

口袋狮身人面像 Python

Pocketsphinx 是CMU Sphinx语音识别开源工具包的一部分。

此包为使用SWIG和Setuptools创建的 CMU Sphinxbase和Pocketsphinx库提供 python 接口。

支持的平台

视窗
Linux
Mac OS X

安装

# Make sure we have up-to-date versions of pip, setuptools and wheel
python -m pip install --upgrade pip setuptools wheel
pip install --upgrade pocketsphinx

更多用于手动安装的二进制发行版可在此处获得。

用法

现场演讲

它是一个迭代器类，用于从麦克风进行连续识别或关键字搜索。

from pocketsphinx import LiveSpeech
for phrase in LiveSpeech(): print(phrase)

关键字搜索示例：

from pocketsphinx import LiveSpeech

speech = LiveSpeech(lm=False, keyphrase='forward', kws_threshold=1e-20)
for phrase in speech:
    print(phrase.segments(detailed=True))

使用您的模型和字典：

import os
from pocketsphinx import LiveSpeech, get_model_path

model_path = get_model_path()

speech = LiveSpeech(
    verbose=False,
    sampling_rate=16000,
    buffer_size=2048,
    no_search=False,
    full_utt=False,
    hmm=os.path.join(model_path, 'en-us'),
    lm=os.path.join(model_path, 'en-us.lm.bin'),
    dic=os.path.join(model_path, 'cmudict-en-us.dict')
)

for phrase in speech:
    print(phrase)

音频文件

它是一个迭代器类，用于从文件中进行连续识别或关键字搜索。

from pocketsphinx import AudioFile
for phrase in AudioFile(): print(phrase) # => "go forward ten meters"

关键字搜索示例：

from pocketsphinx import AudioFile

audio = AudioFile(lm=False, keyphrase='forward', kws_threshold=1e-20)
for phrase in audio:
    print(phrase.segments(detailed=True)) # => "[('forward', -617, 63, 121)]"

使用您的模型和字典：

import os
from pocketsphinx import AudioFile, get_model_path, get_data_path

model_path = get_model_path()
data_path = get_data_path()

config = {
    'verbose': False,
    'audio_file': os.path.join(data_path, 'goforward.raw'),
    'buffer_size': 2048,
    'no_search': False,
    'full_utt': False,
    'hmm': os.path.join(model_path, 'en-us'),
    'lm': os.path.join(model_path, 'en-us.lm.bin'),
    'dict': os.path.join(model_path, 'cmudict-en-us.dict')
}

audio = AudioFile(**config)
for phrase in audio:
    print(phrase)

将帧转换为时间坐标：

from pocketsphinx import AudioFile

# Frames per Second
fps = 100

for phrase in AudioFile(frate=fps):  # frate (default=100)
    print('-' * 28)
    print('| %5s |  %3s  |   %4s   |' % ('start', 'end', 'word'))
    print('-' * 28)
    for s in phrase.seg():
        print('| %4ss | %4ss | %8s |' % (s.start_frame / fps, s.end_frame / fps, s.word))
    print('-' * 28)

# ----------------------------
# | start |  end  |   word   |
# ----------------------------
# |  0.0s | 0.24s | <s>      |
# | 0.25s | 0.45s | <sil>    |
# | 0.46s | 0.63s | go       |
# | 0.64s | 1.16s | forward  |
# | 1.17s | 1.52s | ten      |
# | 1.53s | 2.11s | meters   |
# | 2.12s |  2.6s | </s>     |
# ----------------------------

袖珍狮身人面像

它是一个简单灵活的代理类pocketsphinx.Decode。

from pocketsphinx import Pocketsphinx
print(Pocketsphinx().decode()) # => "go forward ten meters"

一个更全面的例子：

from __future__ import print_function
import os
from pocketsphinx import Pocketsphinx, get_model_path, get_data_path

model_path = get_model_path()
data_path = get_data_path()

config = {
    'hmm': os.path.join(model_path, 'en-us'),
    'lm': os.path.join(model_path, 'en-us.lm.bin'),
    'dict': os.path.join(model_path, 'cmudict-en-us.dict')
}

ps = Pocketsphinx(**config)
ps.decode(
    audio_file=os.path.join(data_path, 'goforward.raw'),
    buffer_size=2048,
    no_search=False,
    full_utt=False
)

print(ps.segments()) # => ['<s>', '<sil>', 'go', 'forward', 'ten', 'meters', '</s>']
print('Detailed segments:', *ps.segments(detailed=True), sep='\n') # => [
#     word, prob, start_frame, end_frame
#     ('<s>', 0, 0, 24)
#     ('<sil>', -3778, 25, 45)
#     ('go', -27, 46, 63)
#     ('forward', -38, 64, 116)
#     ('ten', -14105, 117, 152)
#     ('meters', -2152, 153, 211)
#     ('</s>', 0, 212, 260)
# ]

print(ps.hypothesis())  # => go forward ten meters
print(ps.probability()) # => -32079
print(ps.score())       # => -7066
print(ps.confidence())  # => 0.04042641466841839

print(*ps.best(count=10), sep='\n') # => [
#     ('go forward ten meters', -28034)
#     ('go for word ten meters', -28570)
#     ('go forward and majors', -28670)
#     ('go forward and meters', -28681)
#     ('go forward and readers', -28685)
#     ('go forward ten readers', -28688)
#     ('go forward ten leaders', -28695)
#     ('go forward can meters', -28695)
#     ('go forward and leaders', -28706)
#     ('go for work ten meters', -28722)
# ]

默认配置

如果在创建 Pocketsphinx、AudioFile 或 LiveSpeech 类的实例时不传递任何参数，它将使用下一个默认值：

verbose = False
logfn = /dev/null or nul
audio_file = site-packages/pocketsphinx/data/goforward.raw
audio_device = None
sampling_rate = 16000
buffer_size = 2048
no_search = False
full_utt = False
hmm = site-packages/pocketsphinx/model/en-us
lm = site-packages/pocketsphinx/model/en-us.lm.bin
dict = site-packages/pocketsphinx/model/cmudict-en-us.dict

任何其他选项都必须按原样传递到配置中，而不使用符号-。

如果要禁用默认语言模型或字典，可以将相应选项的值更改为 False：

lm = False
dict = False

详细

将输出发送到标准输出：

from pocketsphinx import Pocketsphinx

ps = Pocketsphinx(verbose=True)
ps.decode()

print(ps.hypothesis())

将输出发送到文件：

from pocketsphinx import Pocketsphinx

ps = Pocketsphinx(verbose=True, logfn='pocketsphinx.log')
ps.decode()

print(ps.hypothesis())

兼容性

父类仍然可用：

import os
from pocketsphinx import DefaultConfig, Decoder, get_model_path, get_data_path

model_path = get_model_path()
data_path = get_data_path()

# Create a decoder with a certain model
config = DefaultConfig()
config.set_string('-hmm', os.path.join(model_path, 'en-us'))
config.set_string('-lm', os.path.join(model_path, 'en-us.lm.bin'))
config.set_string('-dict', os.path.join(model_path, 'cmudict-en-us.dict'))
decoder = Decoder(config)

# Decode streaming data
buf = bytearray(1024)
with open(os.path.join(data_path, 'goforward.raw'), 'rb') as f:
    decoder.start_utt()
    while f.readinto(buf):
        decoder.process_raw(buf, False, False)
    decoder.end_utt()
print('Best hypothesis segments:', [seg.word for seg in decoder.seg()])

安装开发版

安装要求

窗户要求：

Ubuntu 要求：

sudo apt-get install -qq python python-dev python-pip build-essential swig git libpulse-dev libasound2-dev

Mac OS X 要求：

brew reinstall swig python

使用 pip 安装

pip install https://github.com/bambocher/pocketsphinx-python/archive/master.zip

使用 distutils 安装

git clone --recursive https://github.com/bambocher/pocketsphinx-python
cd pocketsphinx-python
python setup.py install

使用 pocketsphinx-python 的项目

SpeechRecognition - 用于执行语音识别的库，支持多个引擎和 API，在线和离线。

执照

BSD 许可证

项目详情

发布历史发布通知| RSS订阅

5.0.0rc4 预发布

2022 年 9 月 17 日

5.0.0rc3 预发布

2022 年 9 月 7 日

这个版本

0.1.15

2018 年 6 月 3 日

0.1.12

2018 年 6 月 3 日

0.1.11

2018 年 6 月 3 日

0.1.10

2018 年 6 月 3 日

0.1.9

2018 年 6 月 2 日

0.1.7

2018 年 6 月 1 日

0.1.5

2018 年 6 月 1 日

0.1.3

2016 年 9 月 12 日

0.1.0

2016 年 6 月 4 日

0.0.9

2015 年 11 月 24 日

0.0.8

2015 年 11 月 2 日

pocketsphinx 0.1.15

导航

项目链接

统计数据

Meta

Maintainers

分类

项目描述

口袋狮身人面像 Python

支持的平台

安装

用法

现场演讲

音频文件

袖珍狮身人面像

默认配置

详细

兼容性

安装开发版

安装要求

使用 pip 安装

使用 distutils 安装

使用 pocketsphinx-python 的项目

执照

项目详情

项目链接

统计数据

元

维护者

分类器

发布历史发布通知| RSS订阅

pocketsphinx 0.1.15

导航

项目链接

统计数据

Meta

Maintainers

分类

项目描述

口袋狮身人面像 Python

支持的平台

安装

用法

现场演讲

音频文件

袖珍狮身人面像

默认配置

详细

兼容性

安装开发版

安装要求

使用 pip 安装

使用 distutils 安装

使用 pocketsphinx-python 的项目

执照

项目详情

项目链接

统计数据

元

维护者

分类器

发布历史 发布通知| RSS订阅

发布历史发布通知| RSS订阅