ubelt - 一个 Python 实用工具带，包含简单的工具、类似标准库的感觉和额外的电池。

一个 Python 实用工具带，包含简单的工具、类似标准库的感觉和额外的电池。

项目描述

Ubelt 是一个小型库，包含强大的、经过测试的、文档化的和简单的函数，它扩展了 Python 标准库。它有一个扁平的 API，在 Windows、Mac 和 Linux 上的行为都相似（最多有一些不可避免的小差异）。ubelt中几乎每个函数都是用 doctest 编写的。这提供了有用的文档和示例用法，并有助于实现 100% 的测试覆盖率（Windows 上的小例外）。

目标：提供简单的函数来完成python标准库尚未解决的常见任务。
约束：必须是低影响的纯python；它应该易于安装和使用。
方法：所有函数都是用文档字符串和文档测试编写的，以确保始终存在基线级别的文档和测试（即使函数被复制/粘贴到其他库中）
座右铭：好的实用程序可以提升所有代码。

在此处阅读文档：http: //ubelt.readthedocs.io/en/latest/

这些是 ubelt 的 API 支持的一些任务：

扩展的 pathlib，带有 expand、ensuredir、endswith、augment、delete (ub.Path)

获取跨平台数据/缓存/配置目录的路径（ub.Path.appdir，...）

对字典执行集合操作 (SetDict)

具有扩展辅助方法的字典，例如 subdict、take、peek_value、invert、sorted_keys、sorted_vals (UDict)

散列常用数据结构，如 list、dict、int、str 等（hash_data）

哈希文件 (hash_file)

缓存一段代码（Cacher、CacheStamp）

计时代码块（定时器）

以比 tqdm (ProgIter) 更少的开销显示循环进度

下载带有可选缓存和哈希验证的文件（下载、抓取数据）

运行 shell 命令 (cmd)

在候选位置查找文件或目录（find_path、find_exe）

字符串格式的嵌套数据结构 (repr2)

带有 ANSI 标签的彩色文本 (color_text)

水平连接多行字符串 (hzcat)

创建跨平台符号链接（symlink）

使用该模块的路径导入模块 (import_module_from_path)

检查特定标志或值是否在命令行上（argflag，argval）

memoize 函数（memoize、memoize_method、memoize_property）

构建有序集（oset）

列表和字典上的 argmax/min/sort (argmin, argsort,)

获取项目的直方图或在列表中查找重复项 (dict_hist, find_duplicates)

按某些标准（group_items）对一系列项目进行分组

乌贝尔特很小。它的顶级 API 使用大约 40 行定义：

from ubelt.util_arg import (argflag, argval,)
from ubelt.util_cache import (CacheStamp, Cacher,)
from ubelt.util_colors import (NO_COLOR, color_text, highlight_code,)
from ubelt.util_const import (NoParam,)
from ubelt.util_cmd import (cmd,)
from ubelt.util_dict import (AutoDict, AutoOrderedDict, SetDict, UDict, ddict,
                             dict_diff, dict_hist, dict_isect, dict_subset,
                             dict_union, dzip, find_duplicates, group_items,
                             invert_dict, map_keys, map_vals, map_values,
                             named_product, odict, sdict, sorted_keys,
                             sorted_vals, sorted_values, udict, varied_values,)
from ubelt.util_deprecate import (schedule_deprecation,)
from ubelt.util_download import (download, grabdata,)
from ubelt.util_download_manager import (DownloadManager,)
from ubelt.util_func import (compatible, identity, inject_method,)
from ubelt.util_format import (FormatterExtensions, repr2,)
from ubelt.util_futures import (Executor, JobPool,)
from ubelt.util_io import (delete, touch,)
from ubelt.util_links import (symlink,)
from ubelt.util_list import (allsame, argmax, argmin, argsort, argunique,
                             boolmask, chunks, compress, flatten, iter_window,
                             iterable, peek, take, unique, unique_flags,)
from ubelt.util_hash import (hash_data, hash_file,)
from ubelt.util_import import (import_module_from_name,
                               import_module_from_path, modname_to_modpath,
                               modpath_to_modname, split_modpath,)
from ubelt.util_indexable import (IndexableWalker, indexable_allclose,)
from ubelt.util_memoize import (memoize, memoize_method, memoize_property,)
from ubelt.util_mixins import (NiceRepr,)
from ubelt.util_path import (Path, TempDir, augpath, ensuredir, expandpath,
                             shrinkuser, userhome,)
from ubelt.util_platform import (DARWIN, LINUX, POSIX, WIN32, find_exe,
                                 find_path, platform_cache_dir,
                                 platform_config_dir, platform_data_dir,)
from ubelt.util_str import (codeblock, hzcat, indent, paragraph,)
from ubelt.util_stream import (CaptureStdout, CaptureStream, TeeStringIO,)
from ubelt.util_time import (Timer, timeparse, timestamp,)
from ubelt.util_zip import (split_archive, zopen,)
from ubelt.orderedset import (OrderedSet, oset,)
from ubelt.progiter import (ProgIter,)

安装：

Ubelt 作为通用轮子在 pypi 上分发，可以 pip 安装在 Python 3.6+ 上。安装在 CPython 和 PyPy 实现上进行了测试。对于 Python 2.7 和 3.5，最后支持的版本是 0.11.1。

pip install ubelt

请注意，我们在 pypi 上的发行版使用 GPG 签名。签名公钥是D297D757；这应该与dev/public_gpg_key中的值一致。

功能实用性

当我不得不手动选择一组我认为最有用的函数时，我选择了这些函数，并就原因提供了一些评论：

import ubelt as ub

ub.Path  # inherits from pathlib.Path with quality of life improvements
ub.UDict  # inherits from dict with keywise set operations and quality of life improvements
ub.Cacher  # configuration based on-disk cachine
ub.CacheStamp  # indirect caching with corruption detection
ub.hash_data  # hash mutable python containers, useful with Cacher to config strings
ub.cmd  # combines the best of subprocess.Popen and os.system
ub.download  # download a file with a single command. Also see grabdata for the same thing, but caching from CacheStamp.
ub.JobPool   # easy multi-threading / multi-procesing / or single-threaded processing
ub.ProgIter  # a minimal progress iterator. It's single threaded, informative, and faster than tqdm.
ub.memoize  # like ``functools.cache``, but uses ub.hash_data if the args are not hashable.
ub.repr2  # readable representations of nested data structures

但更好的方法可能是客观地衡量使用频率并建立有用性直方图。我使用python dev/gen_api_for_docs.py生成了这个直方图，它大致计算了我在另一个项目中使用 ubelt 函数的次数。注意：此措施偏向于较旧的功能。

函数名称	用处
ubelt.repr2	2384
ubelt.路径	624
ubelt.ProgIter	539
ubelt.expandpath	419
ubelt.paragraph	358
ubelt.take	342
ubelt.cmd	283
ubelt.codeblock	273
ubelt.ensuredir	252
ubelt.map_vals	248
ubelt.odict	234
ubelt.ddict	225
ubelt.flatten	218
ubelt.peek	202
ubelt.NiceRepr	195
ubelt.group_items	192
ubelt.oset	182
ubelt.dzip	169
ubelt.iterable	159
ubelt.dict_isect	157
ubelt.NoParam	154
ubelt.hash_data	141
ubelt.argflag	136
ubelt.dict_diff	129
ubelt.计时器	125
ubelt.augpath	120
ubelt.dict_hist	115
ubelt.grabdata	114
ubelt.color_text	104
ubert.identity	102
ubelt.delete	99
ubelt.argval	93
ubelt.dict_union	90
ubelt.memoize	89
ubelt.compress	87
ubelt.allsame	81
ubelt.unique	64
ubelt.named_product	61
ubelt.hzcat	61
ubelt.invert_dict	61
ubelt.JobPool	60
ubelt.timestamp	48
ubelt.dict_subset	46
ubelt.Cacher	44
ubelt.indent	44
ubelt.argsort	43
ubelt.IndexableWalker	41
ubelt.writeto	41
ubelt.iter_window	40
ubelt.chunks	39
ubelt.hash_file	38
ubelt.find_duplicates	38
ubelt.map_keys	36
ubelt.symlink	34
ubelt.sorted_vals	33
ubelt.find_exe	32
ubelt.memoize_property	31
ubelt.modname_to_modpath	29
ubelt.WIN32	28
ubelt.CacheStamp	27
ubelt.import_module_from_name	25
ubelt.argmax	23
ubelt.highlight_code	23
ubelt.varied_values	22
ubelt.readfrom	22
ubelt.import_module_from_path	21
ubelt.兼容	20
ubelt.memoize_method	20
ubelt.sorted_keys	20
ubelt.Executor	19
ubelt.touch	17
ubelt.AutoDict	13
ubelt.inject_method	13
ubelt.zopen	11
ubelt.shrinkuser	11
ubelt.userhome	8
ubelt.schedule_deprecation	8
ubelt.LINUX	8
ubelt.split_modpath	7
ubelt.modpath_to_modname	7
ubelt.CaptureStdout	5
ubelt.达尔文	5
ubelt.argmin	4
ubelt.download	3
ubelt.find_path	2
ubelt.AutoOrderedDict	2
ubelt.argunique	1
ubelt.unique_flags	1
ubelt.udict	0
ubelt.timeparse	0
ubelt.split_archive	0
ubelt.sorted_values	0
ubelt.sdict	0
ubelt.platform_data_dir	0
ubelt.platform_config_dir	0
ubelt.platform_cache_dir	0
ubelt.map_values	0
ubelt.indexable_allclose	0
ubelt.boolmask	0
ubelt.UDict	0
ubelt.TempDir	0
ubelt.TeeStringIO	0
ubelt.SetDict	0
ubelt.POSIX	0
ubelt.OrderedSet	0
ubelt.NO_COLOR	0
ubelt.FormatterExtensions	0
ubelt.DownloadManager	0
ubelt.CaptureStream	0

例子

最新的例子是 doctests。我们还有一个 Jupyter 笔记本：https ://github.com/Erotemic/ubelt/blob/main/docs/notebooks/Ubelt%20Demo.ipynb

以下是ubelt内部一些特性的一些示例

路径

Ubelt通过添加几个新的（通常是可链接的）方法扩展了pathlib.Path 。即，扩充、删除、扩展、确保目录、收缩用户。它还将触摸行为修改为可链接的。（1.0.0 中的新功能）

>>> # Ubelt extends pathlib functionality
>>> import ubelt as ub
>>> dpath = ub.Path('~/.cache/ubelt/demo_path').expand().ensuredir()
>>> fpath = dpath / 'text_file.txt'
>>> aug_fpath = fpath.augment(suffix='.aux', ext='.jpg').touch()
>>> aug_dpath = dpath.augment('demo_path2')
>>> assert aug_fpath.read_text() == ''
>>> fpath.write_text('text data')
>>> assert aug_fpath.exists()
>>> assert not aug_fpath.delete().exists()
>>> assert dpath.exists()
>>> assert not dpath.delete().exists()
>>> print(f'{fpath.shrinkuser()}')
>>> print(f'{dpath.shrinkuser()}')
>>> print(f'{aug_fpath.shrinkuser()}')
>>> print(f'{aug_dpath.shrinkuser()}')
~/.cache/ubelt/demo_path/text_file.txt
~/.cache/ubelt/demo_path
~/.cache/ubelt/demo_path/text_file.aux.jpg
~/.cache/ubelt/demo_pathdemo_path2

散列

ub.hash_data为常见的 Python 嵌套数据结构构造散列。可以注册允许它散列自定义类型的扩展。默认情况下，它处理列表、字典、集合、切片、uuid 和 numpy 数组。

>>> import ubelt as ub
>>> data = [('arg1', 5), ('lr', .01), ('augmenters', ['flip', 'translate'])]
>>> ub.hash_data(data, hasher='sha256')
0d95771ff684756d7be7895b5594b8f8484adecef03b46002f97ebeb1155fb15

还包括对火炬张量和熊猫数据帧的支持，但需要明确启用。还存在一个非公共插件架构来将此功能扩展到任意类型。虽然没有官方支持，但它是可用的，并且将来会更好地集成。有关详细信息，请参阅 ubelt/util_hash.py。

缓存

缓存来自脚本内代码块的中间结果，只需最少的样板或对原始代码的修改。

对于数据的直接缓存，请使用Cacher类。默认情况下，结果将被写入 ubelt 的 appdir 缓存，但可以通过dpath或appname参数指定确切的位置。此外，可以通过depends参数指定进程依赖关系，这允许隐式缓存失效。据我所知，这是使用现有 Python 语法（截至 2022 年 6 月 3 日）缓存代码块的最简洁方法（4 行样板代码）。

>>> import ubelt as ub
>>> depends = ['config', {'of': 'params'}, 'that-uniquely-determine-the-process']
>>> cacher = ub.Cacher('test_process', depends=depends, appname='myapp')
>>> # start fresh
>>> cacher.clear()
>>> for _ in range(2):
>>>     data = cacher.tryload()
>>>     if data is None:
>>>         myvar1 = 'result of expensive process'
>>>         myvar2 = 'another result'
>>>         data = myvar1, myvar2
>>>         cacher.save(data)
>>> myvar1, myvar2 = data

对于间接缓存，请使用CacheStamp类。这只是写一个“标记”文件，标记一个进程已经完成。此外，您可以指定标记何时到期的标准。如果您让CacheStamp 知道预期的“产品”，如果该文件已更改，它将使标记过期，这在缓存可能损坏或需要失效的情况下很有用。

>>> import ubelt as ub
>>> dpath = ub.Path.appdir('ubelt/demo/cache').delete().ensuredir()
>>> params = {'params1': 1, 'param2': 2}
>>> expected_fpath = dpath / 'file.txt'
>>> stamp = ub.CacheStamp('name', dpath=dpath, depends=params,
>>>                      hasher='sha256', product=expected_fpath,
>>>                      expires='2101-01-01T000000Z', verbose=3)
>>> # Start fresh
>>> stamp.clear()
>>>
>>> for _ in range(2):
>>>     if stamp.expired():
>>>         expected_fpath.write_text('expensive process')
>>>         stamp.renew()

有关Cacher和CacheStamp的更多详细信息，请参阅https://ubelt.readthedocs.io/en/latest/ubelt.util_cache.html。

循环进度

ProgIter是一个无线程附加的进度表，它写入标准输出。它是tqdm的主要替代品。 ProgIter 的优点是它不使用任何 python 线程，因此使用大量使用多处理的代码可以更安全。

注意：ProgIter也在独立模块中定义：pip install progiter）

>>> import ubelt as ub
>>> def is_prime(n):
...     return n >= 2 and not any(n % i == 0 for i in range(2, n))
>>> for n in ub.ProgIter(range(1000), verbose=2):
>>>     # do some work
>>>     is_prime(n)
    0/1000... rate=0.00 Hz, eta=?, total=0:00:00, wall=14:05 EST
    1/1000... rate=82241.25 Hz, eta=0:00:00, total=0:00:00, wall=14:05 EST
  257/1000... rate=177204.69 Hz, eta=0:00:00, total=0:00:00, wall=14:05 EST
  642/1000... rate=94099.22 Hz, eta=0:00:00, total=0:00:00, wall=14:05 EST
 1000/1000... rate=71886.74 Hz, eta=0:00:00, total=0:00:00, wall=14:05 EST

命令行交互

内置的 Python subprocess.Popen模块很棒，但有时它可能有点笨拙。os.system命令使用方便，但灵活性不高。ub.cmd函数旨在解决这个问题。它与os.system一样简单运行，但它返回一个字典，其中包含返回码、标准输出、标准错误和后台使用的 Popen对象。

此实用程序旨在提供跨不同平台的尽可能一致的行为。我们的目标是支持 Windows、Linux 和 OSX。

>>> import ubelt as ub
>>> info = ub.cmd('gcc --version')
>>> print(ub.repr2(info))
{
    'command': 'gcc --version',
    'err': '',
    'out': 'gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609\nCopyright (C) 2015 Free Software Foundation, Inc.\nThis is free software; see the source for copying conditions.  There is NO\nwarranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n\n',
    'proc': <subprocess.Popen object at 0x7ff98b310390>,
    'ret': 0,
}

还要注意使用ub.repr2来很好地格式化输出字典。

此外，如果您指定verbose=True，ub.cmd将同时捕获标准输出并实时显示它（即它将“ tee ”输出）。

>>> import ubelt as ub
>>> info = ub.cmd(

ubelt 1.2.2

导航

项目链接

统计数据

Meta

Maintainers

分类

项目描述

安装：

功能实用性

例子

路径

散列

缓存

循环进度

命令行交互