penelope - Penelope 是一种用于创建、编辑和转换词典的多功能工具，尤其适用于电子阅读器设备

Penelope 是一种用于创建、编辑和转换词典的多功能工具，尤其适用于电子阅读器设备

License: MIT License (MIT License)

Tags Dictionary, Dictionaries, Index, Merge, Flatten, eReader, eReaders, Bookeen, CSV, EPUB, MOBI, Kindle, Kobo, StarDict, XML, MARISA, kindlegen, dictzip

项目描述

Penelope是一种用于创建、编辑和转换词典的多功能工具，尤其适用于电子阅读器设备。

版本：3.1.3
日期：2016-09-23
开发商：Alberto Pettarin
许可证：麻省理工学院许可证（MIT）
联系方式：点击这里

使用当前版本，您可以：

将字典从/转换为以下格式：
- Bookeen Cybook Odyssey (读/写)
- CSV（读/写）
- EPUB（仅限 W）
- MOBI（仅限 Kindle、W）
- Kobo（仅 R 索引，仅 W 未加密/未混淆）
- 星语词典 (R/W)
- XML (读/写)
将多个相同类型的字典合并为一个字典
合并同一词条的多个定义
按词条和/或定义排序
定义您自己的输入解析器以合并/排序/编辑定义
定义您自己的整理功能（仅限 bookeen输出格式）
输出包含字典的 EPUB 文件（例如，以应对您的电子阅读器缺乏搜索功能）
输出 MOBI (Kindle) 字典

重要更新

2016-04-17 可悲的是，我不能再花时间在 Penelope 上工作，因为我的其他 FLOSS 项目占用了我 100% 的 FLOSS 时间，而且我还需要支付房租和账单，与家人和朋友共度时光等.，和其他人一样。因此，我不会处理问题或拉取请求，请不要指望它们会被处理。我正在积极寻找其他开发人员来接管这个项目。（当转换发生时，这个通知应该被删除。）如果你需要转换字典并且当前版本的 Penelope 不适合你，你可能想看看 **PyGlossary**。对于给您带来的不便，我最诚挚的歉意。

安装

使用点子

打开控制台并输入：
```
$ [sudo] pip install penelope
```
而已！只需运行不带参数（或使用-h或--help）即可获取手册：
```
$ penelope
```

此过程将安装lxml和marisa-trie。您可能需要分别安装dictzip（StarDict 输出）和kindlegen（MOBI 输出），见下文。

从源代码

获取源代码：
- 用git克隆这个 repo ：
```
$ git clone https://github.com/pettarin/penelope.git
```
- 或下载最新版本并在某处解压缩，
- 或下载当前的主 ZIP 并在某处解压缩。
打开控制台，进入penelope（克隆）目录：
```
$ cd /path/to/penelope
```
而已！只需运行不带参数（或使用-h或--help）即可获取手册：
```
$ python -m penelope
```

此过程不会安装任何依赖项：您需要手动安装，见下文。

依赖项

Python，版本 2.7.x 或 3.4.x（或更高版本）
编写 StarDict 字典：dictzip可执行文件，可在您的$PATH中使用或使用--dictzip-path指定：
```
$ [sudo] apt-get install dictzip
```
读/写 Kobo 词典：Python 模块marisa-trie：
```
$ [sudo] pip install marisa-trie
```

或MARISA可执行文件在您的$PATH中可用或使用--marisa-bin-path指定

编写 MOBI Kindle 词典： kindlegen 可执行文件，在您的$PATH中可用或使用 --kindlegen-path指定
读/写 XML 字典：Python 模块lxml：
```
$ [sudo] pip install lxml
```

用法

usage:
  $ penelope -h
  $ penelope -i INPUT_FILE -j INPUT_FORMAT -f LANGUAGE_FROM -t LANGUAGE_TO -p OUTPUT_FORMAT -o OUTPUT_FILE [OPTIONS]
  $ penelope -i IN1,IN2[,IN3...] -j INPUT_FORMAT -f LANGUAGE_FROM -t LANGUAGE_TO -p OUTPUT_FORMAT -o OUTPUT_FILE [OPTIONS]

description:
  Convert dictionary file(s) with file name prefix INPUT_FILE from format INPUT_FORMAT to format OUTPUT_FORMAT, saving it as OUTPUT_FILE.
  The dictionary is from LANGUAGE_FROM to LANGUAGE_TO, possibly the same.
  You can merge several dictionaries (with the same format), by providing a list of comma-separated prefixes, as shown by the third synopsis above.

optional arguments:
  -h, --help            show this help message and exit
  -d, --debug           enable debug mode (default: False)
  -f LANGUAGE_FROM, --language-from LANGUAGE_FROM
                        from language (ISO 639-1 code)
  -i INPUT_FILE, --input-file INPUT_FILE
                        input file name prefix(es). Multiple prefixes must be
                        comma-separated.
  -j INPUT_FORMAT, --input-format INPUT_FORMAT
                        from format (values: bookeen|csv|kobo|stardict|xml)
  -k, --keep            keep temporary files (default: False)
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        output file name
  -p OUTPUT_FORMAT, --output-format OUTPUT_FORMAT
                        to format (values:
                        bookeen|csv|epub|kobo|mobi|stardict|xml)
  -t LANGUAGE_TO, --language-to LANGUAGE_TO
                        to language (ISO 639-1 code)
  -v, --version         print version and exit
  --author AUTHOR       author string
  --copyright COPYRIGHT
                        copyright string
  --cover-path COVER_PATH
                        path of the cover image file
  --description DESCRIPTION
                        description string
  --email EMAIL         email string
  --identifier IDENTIFIER
                        identifier string
  --license LICENSE     license string
  --title TITLE         title string
  --website WEBSITE     website string
  --year YEAR           year string
  --apply-css APPLY_CSS
                        apply the given CSS file (epub and mobi output only)
  --bookeen-collation-function BOOKEEN_COLLATION_FUNCTION
                        use the specified collation function
  --bookeen-install-file
                        create *.install file (default: False)
  --csv-fs CSV_FS       CSV field separator (default: ',')
  --csv-ignore-first-line
                        ignore the first line of the input CSV file(s)
                        (default: False)
  --csv-ls CSV_LS       CSV line separator (default: '\n')
  --dictzip-path DICTZIP_PATH
                        path to dictzip executable
  --epub-no-compress    do not create the compressed container (epub output
                        only, default: False)
  --escape-strings      escape HTML strings (default: False)
  --flatten-synonyms    flatten synonyms, creating a new entry with
                        headword=synonym and using the definition of the
                        original headword (default: False)
  --group-by-prefix-function GROUP_BY_PREFIX_FUNCTION
                        compute the prefix of headwords using the given prefix
                        function file
  --group-by-prefix-length GROUP_BY_PREFIX_LENGTH
                        group headwords by prefix of given length (default: 2)
  --group-by-prefix-merge-across-first
                        merge headword groups even when the first character
                        changes (default: False)
  --group-by-prefix-merge-min-size GROUP_BY_PREFIX_MERGE_MIN_SIZE
                        merge headword groups until the given minimum number
                        of headwords is reached (default: 0, meaning no merge
                        will take place)
  --ignore-case         ignore headword case, all headwords will be lowercased
                        (default: False)
  --ignore-synonyms     ignore synonyms, not reading/writing them if present
                        (default: False)
  --include-index-page  include an index page (epub and mobi output only,
                        default: False)
  --input-file-encoding INPUT_FILE_ENCODING
                        use the specified encoding for reading the raw
                        contents of input file(s) (default: 'utf-8')
  --input-parser INPUT_PARSER
                        use the specified parser function after reading the
                        raw contents of input file(s)
  --kindlegen-path KINDLEGEN_PATH
                        path to kindlegen executable
  --marisa-bin-path MARISA_BIN_PATH
                        path to MARISA bin directory
  --marisa-index-size MARISA_INDEX_SIZE
                        maximum size of the MARISA index (default: 1000000)
  --merge-definitions   merge definitions for the same headword (default:
                        False)
  --merge-separator MERGE_SEPARATOR
                        add this string between merged definitions (default: '
                        | ')
  --mobi-no-kindlegen   do not run kindlegen, keep .opf and .html files
                        (default: False)
  --no-definitions      do not output definitions for EPUB and MOBI formats
                        (default: False)
  --sd-ignore-sametypesequence
                        ignore the value of sametypesequence in StarDict .ifo
                        files (default: False)
  --sd-no-dictzip       do not compress the .dict file in StarDict files
                        (default: False)
  --sort-after          sort after merging/flattening (default: False)
  --sort-before         sort before merging/flattening (default: False)
  --sort-by-definition  sort by definition (default: False)
  --sort-by-headword    sort by headword (default: False)
  --sort-ignore-case    ignore case when sorting (default: False)
  --sort-reverse        reverse the sort order (default: False)

examples:

  $ penelope -i dict.csv -j csv -f en -t it -p stardict -o output.zip
    Convert en->it dictionary dict.csv (in CSV format) into output.zip (in StarDict format)

  $ penelope -i dict.csv -j csv -f en -t it -p stardict -o output.zip --merge-definitions
    As above, but also merge definitions

  $ penelope -i d1,d2,d3 -j csv -f en -t it -p csv -o output.csv --sort-after --sort-by-headword
    Merge CSV dictionaries d1, d2, and d3 into output.csv, sorting by headword

  $ penelope -i d1,d2,d3 -j csv -f en -t it -p csv -o output.csv --sort-after --sort-by-headword --sort-ignore-case
    As above, but ignore case for sorting

  $ penelope -i d1,d2,d3 -j csv -f en -t it -p csv -o output.csv --sort-after --sort-by-headword --sort-reverse
    As above, but reverse the order

  $ penelope -i dict.zip -j stardict -f en -t it -p csv -o output.csv
    Convert en->it dictionary dict.zip (in StarDict format) into output.csv (in CSV format)

  $ penelope -i dict.zip -j stardict -f en -t it -p csv -o output.csv --ignore-synonyms
    As above, but do not read the .syn synonym file if present

  $ penelope -i dict.zip -j stardict -f en -t it -p csv -o output.csv --flatten-synonyms
    As above, but flatten synonyms

  $ penelope -i dict.zip -j stardict -f en -t it -p bookeen -o output
    Convert dict.zip into output.dict.idx and output.dict for Bookeen devices

  $ penelope -i dict.zip -j stardict -f en -t it -p kobo -o dicthtml-en-it
    Convert dict.zip into dicthtml-en-it.zip for Kobo devices

  $ penelope -i dict.csv -j csv -f en -t it -p mobi -o output.mobi --cover-path mycover.png --title "My English->Italian Dictionary"
    Convert dict.csv into a MOBI (Kindle) dictionary, using the specified cover image and title

  $ penelope -i dict.xml -j xml -f en -t it -p mobi -o output.epub
    Convert dict.xml into an EPUB dictionary

  $ penelope -i dict.xml -j xml -f en -t it -p mobi -o output.epub --epub-output-definitions
    As above, but also output definitions

您可以在此处找到 ISO 639-1 语言代码。

安装字典

Bookeen Odyssey 设备

例如，假设您想使用 IT -> EN 字典。

在您的 PC 上，生成/下载 IT -> EN 字典文件 it-en.dict和it-en.dict.idx。
通过 USB 电缆将您的 Odyssey 设备连接到您的 PC。
使用文件管理器，将两个文件it-en.dict和 it-en.dict.idx从您的 PC 复制到 Odyssey 设备上的Dictionaries/目录中。
重新启动您的 Odyssey，打开一本意大利语书籍并选择一个单词：应该会出现英语的定义。（对于此测试，请选择一个常用词，以确保它出现在字典中！）

请注意，Bookeen 词典软件将通过阅读电子书的dc:language元数据来选择要使用的词典。确保您的电子书具有正确的dc:language元数据，否则可能无法加载正确的字典。

工房设备

在撰写本文时 (2016-02-16)，Kobo 设备仅在文件具有官方 Kobo 词典的文件名时才会加载词典，即：

dicthtml.zip (EN)
dicthtml-de.zip (DE), dicthtml-de-en.zip (DE -> EN), dicthtml-en-de.zip (EN -> DE),
dicthtml-es.zip (ES), dicthtml-es-en.zip (ES -> EN), dicthtml-en-es.zip (EN -> ES),
dicthtml-fr.zip (FR), dicthtml-fr-en.zip (FR -> EN), dicthtml-en-fr.zip (EN -> FR),
dicthtml-it.zip (IT), dicthtml-it-en.zip (IT -> EN), dicthtml-en-it.zip (EN -> IT),
dicthtml-nl.zip (NL)
dicthtml-ja.zip (JA), dicthtml-en-ja.zip (EN -> JA),
dicthtml-pt.zip (PT), dicthtml-pt-en.zip (PT -> EN), dicthtml-en-pt.zip (EN -> PT)

（请参阅此 MobileRead 线程）

因此，如果您想安装使用 Penelope 制作的自定义词典，您必须选择覆盖其中一个官方 Kobo 词典，从而有效地失去了使用后者的可能性。

例如，假设您想使用波兰语词典 ( dicthtml-pl.zip )，而您对使用官方葡萄牙语词典 ( dicthtml-pt.zip ) 不感兴趣。

在您的 PC 上，生成/下载波兰语词典 dicthtml-pl.zip。
在您的 Kobo 设备中，转到设置并激活葡萄牙语词典。
通过 USB 电缆将 Kobo 设备连接到 PC。
使用文件管理器，将dicthtml-pl.zip从 PC 复制到 Kobo 设备上的.kobo/dict/目录中。（请注意， .kobo是一个隐藏目录：您可能需要启用文件管理器的“显示隐藏文件/目录”设置。）
将dicthtml-pl.zip重命名为dicthtml-pt.zip。
重启你的 Kobo，打开一本书波兰语并选择一个词：定义应该出现。（对于此测试，请选择一个常用词，以确保它出现在字典中！）

请注意，如果您更新 Kobo 的固件，自定义词典可能会被官方词典覆盖。因此，请将您的自定义词典的备份副本保存在安全的地方，例如您的 PC 或 SD 卡。

您可以在此 MobileRead 线程中找到自定义词典列表，主要由 Penelope 完成。

执照

Penelope自版本 2.0.0 (2014-06-30) 以来在 MIT 许可下发布。

由Google Code托管的先前版本是根据 GNU GPL 3 许可证发布的。

限制和缺失的功能

Bookeen 的字典格式没有官方文档（它已经过逆向工程），YMMV
Kobo 的字典格式没有官方文档（它已经过逆向工程），YMMV
部分支持阅读 Kobo 字典（索引已读取，定义未读取，因为它们已加密/混淆）
不支持阅读 EPUB (3) 字典；写作部分需要打磨/重构
不支持阅读 PRC/MOBI (Kindle) 字典
可以读取的 StarDict 文件有一些限制（请参阅format_stardict.py 中的注释）
文档不完整
缺少单元测试

赞助商

2015 年 12 月：IngleseXpress.it，“Grazie per averci aiutato a pubblicare per Kindle il Dizionario Inglese-Italiano della Pronuncia Scritta Semplificata！”

致谢

非常感谢：

uwelovesdonna为改进代码和设置项目 wiki 的许多页面贡献了想法；
Jens Sadowski指出了 Unicode 文件名的错误并建议使用 multiset dict()而不是 set dict()；
oldnat用于指出 Windows 和 Python 3 下的错误；
Wolfgang Miller-Reichling提供阅读 CSV 字典的代码；
branok为德语排序功能提供了想法和初始代码；
朋友建议将-l切换到MARISA_BUILD；
Lukas Brückner建议在以 XML 格式输出时转义& < > ；
Stephan Lichtenhagen建议在 Python 3 上强制使用 UTF-8 编码；
niconavarrete用于指出 $CWD 的依赖关系（问题 #1），在 v2.0.1 中解决；
elchamaco提供带有.syn文件的 StarDict 字典进行测试。

项目详情

许可证：麻省理工学院许可证（MIT 许可证）

作者： 阿尔贝托·佩塔林

标签字典、字典、索引、合并、展平、电子阅读器、电子阅读器、 Bookeen、 CSV、 EPUB、 MOBI、 Kindle、 Kobo、 StarDict、 XML、 MARISA、 kindlegen、 dictzip

发布历史发布通知| RSS订阅

这个版本

3.1.3.0

2016 年 9 月 23 日

3.1.2.0

2016 年 2 月 16 日

3.1.1.1

2015 年 12 月 2 日

3.1.0.1

2015 年 11 月 29 日

3.0.1.11

2015 年 11 月 24 日

3.0.1.10

2015 年 11 月 24 日

3.0.1.9

2015 年 11 月 24 日

3.0.0.1

2015 年 11 月 24 日

下载文件

下载适用于您平台的文件。如果您不确定要选择哪个，请了解有关安装包的更多信息。

源分布

penelope-3.1.3.0.tar.gz (51.9 kB 查看哈希)

已上传 2016 年 9 月 23 日 source

penelope-3.1.3.0.tar.gz 的哈希值

penelope-3.1.3.0.tar.gz 的哈希值
算法	哈希摘要
SHA256	`74cd0d464ff1359dd1413ecf2841552d1ece771780b168dde3cca208fb942c6e`
MD5	`ea2b6f24a2d2ad91cbbde4c1be9170c9`
布莱克2-256	`2d75c3bf00036ce60558aaa59961e924786d55e6c86d7e9a89265bb45194bb90`

penelope 3.1.3.0

导航

项目链接

统计数据

Meta

Maintainers

分类

项目描述

重要更新

安装

使用点子

从源代码

依赖项

用法

安装字典

Bookeen Odyssey 设备

工房设备

执照

限制和缺失的功能

赞助商

致谢

项目详情

项目链接

统计数据

元

维护者

分类器

发布历史发布通知| RSS订阅

下载文件

源分布

penelope 3.1.3.0

导航

项目链接

统计数据

Meta

Maintainers

分类

项目描述

重要更新

安装

使用点子

从源代码

依赖项

用法

安装字典

Bookeen Odyssey 设备

工房设备

执照

限制和缺失的功能

赞助商

致谢

项目详情

项目链接

统计数据

元

维护者

分类器

发布历史 发布通知| RSS订阅

下载文件

源分布

发布历史发布通知| RSS订阅