nodetrie - NodeTrie 的 Python 绑定，一个 trie 数据结构库

NodeTrie 的 Python 绑定，一个 trie 数据结构库

项目描述

节点特里

Trie 数据结构库。

安装

pip install nodetrie

动机、设计目标

NodeTrie 是为此目的编写的原生 C 库的 Python 扩展。

它源于缺乏 Python 的可行替代方案。虽然存在其他 trie 库实现，但它们受到严重限制，例如

只读结构，无插入
大树的高内存使用
缺乏搜索，尤其是文件掩码或通配符样式搜索
慢速插入

PyPi 上的现有实现属于这些广泛的类别，包括Marissa-Trie（只读）和datrie（慢速插入，大型树的内存使用非常高）。

NodeTrie 的 C 库旨在尽可能减少内存使用，并且仍然允许可以搜索任意长度的树。

每个节点都有一个与其关联的名称作为其数据，以及子节点列表和子节点数量。

功能和设计说明

NodeTrie 是一棵 n 叉树，这意味着任何一个节点都可以有任意数量的子节点
节点子数组在插入时根据每个节点的需要动态调整大小。没有固定的最小或最大尺寸
节点名称可以是任意长度，可用内存允许
来自Node.name的节点名称在 Python 2/3 中始终是 unicode
插入时可以使用任何 python 字符串类型
节点名称在插入时从 unicode 隐式解码，如果需要，使用nodetrie.ENCODING ( utf-8 ) 默认编码，可以覆盖
每次调用Node.children时，都会从底层 C 指针创建新的 Python Node对象。Python 解释器创建这些对象有开销。保留和重复使用子引用是安全且性能更好的，请参见下面的示例

限制

未实施删除
C 库实现使用子指针数组来降低搜索空间的复杂性，并使用字符指针来命名名称以允许任意名称长度。这可能会导致内存碎片
python中的节点对象是只读的。无法覆盖现有Node对象的名称，也无法修改其属性
不应使用允许空字符的字符编码，例如 UCS-2

示例用法

from nodetrie import Node

# This is the root of the tree, keep a reference to it.
# Deleting or letting the root node go out of scope will de-allocate
# the entire tree
node = Node()

# Insert a linked tree so that a->b->c->d where -> means 'has child node'
node.insert_split_path(['a', 'b', 'c', 'd'])
node.children[0].name == 'a'

# Sub-trees can be referred to by child nodes
a_node = node.children[0]
a_node.name == 'a'
a_node.children[0].name == 'b'
a_node.is_leaf() == False

# Insertions create only new nodes
# Insert linked tree so that a->b->c->dd
node.insert_split_path(['a', 'b', 'c', 'dd'])

# Only one 'a' node
node.children_size == 1

# Existing references to nodes will have correct children
# after insertion without recreating the node object.
# Here, a_node is an existing object prior to more nodes
# being added to its sub-tree. After insertion, a's sub-tree contains newly
# inserted nodes as expected

# 'c' node is first child of 'b' which is first child of 'a'
# 'c' node has two children, 'd' and 'dd'
c_node = a_node.children[0].children[0]
c_node.children_size == 2
c_node.is_leaf() == False

# 'd' and 'dd' are both leaf nodes
leaf_nodes = [c for c in c_node.children if c.is_leaf()]
len(leaf_nodes) == 2

搜索

NodeTrie 支持精确名称以及文件掩码匹配树搜索。

from __future__ import print_function
from nodetrie import Node

node = Node()
for paths in [['a', 'b', 'c1', 'd1'], ['a', 'b', 'c1', 'd2'],
              ['a', 'b', 'c2', 'd1'], ['a', 'b', 'c2', 'd2']]:
    node.insert_split_path(paths)
for path, _node in node.search(node, ['a', 'b', '*', '*'], []):
    print(path, _node)

输出

[u'a', u'b', u'c1', u'd1'] Node: 'd1'
[u'a', u'b', u'c1', u'd2'] Node: 'd2'
[u'a', u'b', u'c2', u'd1'] Node: 'd1'
[u'a', u'b', u'c2', u'd2'] Node: 'd2'

查询函数返回匹配子树的分隔符连接节点名称。

for match in node.query('a.b.*.*'):
    print(match)

for match in node.query('a|b|*|*', separator='|'):
   print(match)

输出

(u'a.b.c1.d1', Node: 'd1')
(u'a.b.c1.d2', Node: 'd2')
(u'a.b.c2.d1', Node: 'd1')
(u'a.b.c2.d2', Node: 'd2')

(u'a|b|c1|d1', Node: 'd1')
(u'a|b|c1|d2', Node: 'd2')
(u'a|b|c2|d1', Node: 'd1')
(u'a|b|c2|d2', Node: 'd2')

贡献是最受欢迎的。

项目详情

发布历史发布通知| RSS订阅

这个版本

1.0.6

2017 年 10 月 17 日

1.0.5

2017 年 3 月 24 日

1.0.4

2017 年 3 月 24 日

1.0.3

2017 年 3 月 24 日

1.0.2

2017 年 3 月 21 日

1.0.1

2017 年 3 月 16 日

下载文件

下载适用于您平台的文件。如果您不确定要选择哪个，请了解有关安装包的更多信息。

源分布

nodetrie-1.0.6.tar.gz （78.0 kB 查看哈希）

已上传 2017 年 10 月 17 日 source

nodetrie -1.0.6.tar.gz 的哈希值

nodetrie-1.0.6.tar.gz 的哈希值
算法	哈希摘要
SHA256	`bdbf51b1ef96ccd405a4b95944f812323b28c7ab34f474c683d270ef3efc450a`
MD5	`b6b56add17727dd278ee408e0e016647`
布莱克2-256	`0b6925e5a761a9744a0b7857155a92cc9541fac167b9bb5a3e9dc9f3052a3e68`

nodetrie 1.0.6

导航

项目链接

统计数据

Meta

Maintainers

分类

项目描述

节点特里

安装

动机、设计目标

功能和设计说明

限制

示例用法

搜索

项目详情

项目链接

统计数据

元

维护者

分类器

发布历史发布通知| RSS订阅

下载文件

源分布

nodetrie 1.0.6

导航

项目链接

统计数据

Meta

Maintainers

分类

项目描述

节点特里

安装

动机、设计目标

功能和设计说明

限制

示例用法

搜索

项目详情

项目链接

统计数据

元

维护者

分类器

发布历史 发布通知| RSS订阅

下载文件

源分布

发布历史发布通知| RSS订阅