Python 绑定到 Tree-sitter 解析库
项目描述
py-tree-sitter
这个模块提供 Python 绑定到tree-sitter解析库。
安装
该包目前仅适用于 Python 3。没有库依赖项,但您需要安装 C 编译器。
pip3 install tree_sitter
用法
设置
首先,您需要为要解析的每种语言提供一个 Tree-sitter 语言实现。您可以克隆一些现有的语言存储库或创建自己的:
git clone https://github.com/tree-sitter/tree-sitter-go
git clone https://github.com/tree-sitter/tree-sitter-javascript
git clone https://github.com/tree-sitter/tree-sitter-python
使用该Language.build_library
方法将它们编译成可从 Python 使用的库。如果自上次修改源代码后库已经编译,则此函数将立即返回:
from tree_sitter import Language, Parser
Language.build_library(
# Store the library in the `build` directory
'build/my-languages.so',
# Include one or more languages
[
'vendor/tree-sitter-go',
'vendor/tree-sitter-javascript',
'vendor/tree-sitter-python'
]
)
Language
将语言作为对象加载到您的应用程序中:
GO_LANGUAGE = Language('build/my-languages.so', 'go')
JS_LANGUAGE = Language('build/my-languages.so', 'javascript')
PY_LANGUAGE = Language('build/my-languages.so', 'python')
基本解析
创建一个Parser
并将其配置为使用其中一种语言:
parser = Parser()
parser.set_language(PY_LANGUAGE)
解析一些源代码:
tree = parser.parse(bytes("""
def foo():
if bar:
baz()
""", "utf8"))
如果您的源代码位于字节对象以外的某些数据结构中,则可以将“读取”可调用对象传递给解析函数。
read callable 可以使用字节偏移量或点元组从缓冲区读取并将源代码作为字节对象返回。空字节对象或 None 终止对该行的解析。字节必须将源编码为 UTF-8。
例如,要使用字节偏移量:
src = bytes("""
def foo():
if bar:
baz()
""", "utf8")
def read_callable(byte_offset, point):
return src[byte_offset:byte_offset+1]
tree = parser.parse(read_callable)
并使用这一点:
src_lines = ["def foo():\n", " if bar:\n", " baz()"]
def read_callable(byte_offset, point):
row, column = point
if row >= len(src_lines) or column >= len(src_lines[row]):
return None
return src_lines[row][column:].encode('utf8')
tree = parser.parse(read_callable)
检查结果Tree
:
root_node = tree.root_node
assert root_node.type == 'module'
assert root_node.start_point == (1, 0)
assert root_node.end_point == (3, 13)
function_node = root_node.children[0]
assert function_node.type == 'function_definition'
assert function_node.child_by_field_name('name').type == 'identifier'
function_name_node = function_node.children[1]
assert function_name_node.type == 'identifier'
assert function_name_node.start_point == (1, 4)
assert function_name_node.end_point == (1, 7)
assert root_node.sexp() == "(module "
"(function_definition "
"name: (identifier) "
"parameters: (parameters) "
"body: (block "
"(if_statement "
"condition: (identifier) "
"consequence: (block "
"(expression_statement (call "
"function: (identifier) "
"arguments: (argument_list))))))))"
遍历语法树
如果需要高效遍历大量节点,可以使用TreeCursor
:
cursor = tree.walk()
assert cursor.node.type == 'module'
assert cursor.goto_first_child()
assert cursor.node.type == 'function_definition'
assert cursor.goto_first_child()
assert cursor.node.type == 'def'
# Returns `False` because the `def` node has no children
assert not cursor.goto_first_child()
assert cursor.goto_next_sibling()
assert cursor.node.type == 'identifier'
assert cursor.goto_next_sibling()
assert cursor.node.type == 'parameters'
assert cursor.goto_parent()
assert cursor.node.type == 'function_definition'
编辑
编辑源文件时,您可以编辑语法树以使其与源文件保持同步:
tree.edit(
start_byte=5,
old_end_byte=5,
new_end_byte=5 + 2,
start_point=(0, 5),
old_end_point=(0, 5),
new_end_point=(0, 5 + 2),
)
然后,当您准备好将更改合并到新语法树中时,您可以Parser.parse
再次调用,但传入旧树:
new_tree = parser.parse(new_source, tree)
这将比从头开始解析运行得快得多。
Tree.get_changed_ranges
可以在旧树上调用该方法以返回其语法结构已更改的范围列表:
for changed_range in tree.get_changed_ranges(new_tree):
print('Changed range:')
print(f' Start point {changed_range.start_point}')
print(f' Start byte {changed_range.start_byte}')
print(f' End point {changed_range.end_point}')
print(f' End byte {changed_range.end_byte}')
模式匹配
您可以使用树查询在语法树中搜索模式:
query = PY_LANGUAGE.query("""
(function_definition
name: (identifier) @function.def)
(call
function: (identifier) @function.call)
""")
captures = query.captures(tree.root_node)
assert len(captures) == 2
assert captures[0][0] == function_name_node
assert captures[0][1] == "function.def"
该Query.captures()
方法采用可用于限制查询范围的可选start_point
、end_point
和
关键字参数。只需要给出一对或一对来限制范围。如果省略 all,则使用传递节点的整个范围。start_byte
end_byte
..._byte
..._point
项目详情
tree_sitter -0.20.1.tar.gz 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | e93f082c545d6649bcfb5d681ed255eb004a6ce22988971a128f40692feec60d |
|
MD5 | 20dfe6be4902930e6abfc640a6c3fb70 |
|
布莱克2-256 | ea118d3f8ed4761c375dca0918a5b170aa2d777f5325c5442c36c0851305b77a |
tree_sitter -0.20.1-cp39-cp39-macosx_12_0_arm64.whl 的哈希值
算法 | 哈希摘要 | |
---|---|---|
SHA256 | 6f11a1fd909dcf569e7b1d98861a837436799e757bbbc5cd5280989050929e12 |
|
MD5 | 1f69df5aaff99434397dcff29d3de99e |
|
布莱克2-256 | 54d47cdecaad1c3564470938b631908589b1e2a0f91bf5d817b8b00d5f438d90 |