ListComparator - 比较有序列表、xml 和 csv 应用程序

比较有序列表、xml 和 csv 应用程序

项目描述

内容

详细文档
贡献者
- 的贡献
更改历史

</nav>

详细文档

XML 和 CSV 比较

提供了两个脚本 xml_cmp 和 csv_cmp 它们都比较 2 个文件并将 delta 输出为 file_suppr、file_addon 和 file_changes

扩展名分别被强制为 xml 或 csv

列表比较

listcomparator 提供了一个 Comparator 对象，该对象允许查找两个列表之间的差异，前提是列表的元素以相同的顺序出现

>>> old = [1, 2, 3, 4, 5, 6]
>>> new = [1, 3, 4, 7, 6]

>>> from listcomparator.comparator import Comparator

让我们创建一个比较器对象

>>> comp = Comparator(old,new)

check 方法给添加和删除属性赋值

>>> comp.check()
>>> comp.additions
[7]
>>> comp.deletions
[2, 5]

我们也可以使用列表列表

>>> old_list = [['62145', 'azerty'], ['1234', 'qwerty'], ['9876', 'ipsum']]
>>> new_list = [['62145', 'azerty'], ['1234', 'qwertw'], ['4865', 'lorem']]
>>> comp = Comparator(old_list, new_list)
>>> comp.check()
>>> comp.additions
[['1234', 'qwertw'], ['4865', 'lorem']]
>>> comp.deletions
[['1234', 'qwerty'], ['9876', 'ipsum']]

当修改出现在输出 comp.additions 和 comp.deletions 中时，我们可能会遇到问题，在我们的例子中，“qwerty”变成了“qwertz”。您可能希望将此视为更改。如果您提供一个告诉 Comparator 如何识别此类情况的函数，Comparator 可以处理此问题并过滤掉此类情况。在我们的示例中，如果列表的第一个元素相同（一种 id），我们认为 2 个元素相同。

>>> def my_key(x):
...     return x[0]
...

然后 getChanges 方法提供了一个新属性：changes

>>> comp.getChanges(my_key)
>>> comp.changes
[['1234', 'qwertw']]

当然，添加和删除保持不变

>>> comp.additions
[['1234', 'qwertw'], ['4865', 'lorem']]
>>> comp.deletions
[['1234', 'qwerty'], ['9876', 'ipsum']]

您可能只想考虑“纯”添加和删除 getChanges 允许使用关键字参数“purge”来做到这一点

>>> comp.getChanges(my_key, purge=True)
>>> comp.changes
[['1234', 'qwertw']]
>>> comp.additions
[['4865', 'lorem']]
>>> comp.deletions
[['9876', 'ipsum']]

旧属性和新属性存储要比较的列表，您可能希望重置这些列表，Comparator 提供了一个 purgeOldNew 方法来清理内存

>>> comp.old
[['62145', 'azerty'], ['1234', 'qwerty'], ['9876', 'ipsum']]
>>> comp.new
[['62145', 'azerty'], ['1234', 'qwertw'], ['4865', 'lorem']]
>>> comp.purgeOldNew()
>>> comp.old
>>> comp.new

比较 XML 文件

比较器可用于比较xml文件让我们制作两个描述书籍的xml文件

>>> old='''<?xml version="1.0" ?>
... <infos>
... <book><title>White pages 1995</title>
... <author>
... <surname>La Poste</surname>
... </author>
... <chapter><title>Paris</title>
... <para>ABEL Antoine 82 23 44 12</para>
... <para>ABEL Pierre 82 67 23 12</para>
... </chapter>
... </book>
... <book><title>Yellow pages 2007</title>
... <author>
... <surname>La Poste</surname>
... </author>
... <chapter><title>Bretagne</title>
... <para>Zindep 82 23 44 12</para>
... <para>ZYM 82 67 23 12</para>
... </chapter>
... </book>
... <book><title>Dark pages 2007</title>
... <author>
... <surname>La Poste</surname>
... </author>
... <chapter><title>Greves</title>
... <para>SNCF 82 23 44 12</para>
... </chapter>
... </book>
... </infos>
... '''

>>> new='''<?xml version="1.0"?>
... <infos>
... <book><title>White pages 1995</title>
... <author>
... <surname>La Poste</surname>
... </author>
... <chapter><title>Paris</title>
... <para>ABIL Antoine 82 23 44 12</para>
... <para>ABEL Pierre 82 67 23 12</para>
... </chapter>
... </book>
... <book><title>Yellow pages 2007</title>
... <author>
... <surname>La Poste</surname>
... </author>
... <chapter><title>Bretagne</title>
... <para>Zindep 82 23 44 12</para>
... <para>ZYM 82 67 23 12</para>
... </chapter>
... </book>
... <book><title>Blue pages 2007</title>
... <author>
... <surname>La Poste</surname>
... </author>
... <chapter><title>Bretagne</title>
... <para>Mer 82 23 44 12</para>
... <para>Ciel 82 67 23 12</para>
... </chapter>
... </book>
... </infos>
... '''

解析xml需要elementtree

>>> from elementtree import ElementTree as ET

对于这个测试，我们将使用 cStringIO 而不是文件

>>> import cStringIO
>>> ex_old = cStringIO.StringIO(old)
>>> ex_new = cStringIO.StringIO(new)

我们解析内容

>>> root_old = ET.parse(ex_old).getroot()
>>> root_new = ET.parse(ex_new).getroot()

“book”标签标识我们想要的对象 >>> objects_old = root_old.findall('book') >>> objects_new = root_new.findall('book')

因为我们无法比较 2 个对象，所以我们将它们字符串化

>>> objects_old = [ET.tostring(o) for o in objects_old]
>>> objects_new = [ET.tostring(o) for o in objects_new]

从那里，比较器很有用

>>> my_comp = Comparator(objects_old, objects_new)
>>> my_comp.check()

>>> for e in my_comp.additions:
...     print e
...
<book><title>White pages 1995</title>
<author>
<surname>La Poste</surname>
</author>
<chapter><title>Paris</title>
<para>ABIL Antoine 82 23 44 12</para>
<para>ABEL Pierre 82 67 23 12</para>
</chapter>
</book>
<BLANKLINE>
<book><title>Blue pages 2007</title>
<author>
<surname>La Poste</surname>
</author>
<chapter><title>Bretagne</title>
<para>Mer 82 23 44 12</para>
<para>Ciel 82 67 23 12</para>
</chapter>
</book>
<BLANKLINE>

>>> for e in my_comp.deletions:
...     print e
...
<book><title>White pages 1995</title>
<author>
<surname>La Poste</surname>
</author>
<chapter><title>Paris</title>
<para>ABEL Antoine 82 23 44 12</para>
<para>ABEL Pierre 82 67 23 12</para>
</chapter>
</book>
<BLANKLINE>
<book><title>Dark pages 2007</title>
<author>
<surname>La Poste</surname>
</author>
<chapter><title>Greves</title>
<para>SNCF 82 23 44 12</para>
</chapter>
</book>
<BLANKLINE>

我们需要知道 wich 标签用于唯一地定义一个对象在这里我们选择使用“title”标签

>>> def item_signature(xml_element):
...     title = xml_element.find('title')
...     return title.text
...

我们构建了供比较器使用的自定义函数

>>> def my_key(str):
...     file_like = cStringIO.StringIO(str)
...     root = ET.parse(file_like)
...     return item_signature(root)
...

然后 Comparator 的 getChanges 方法变得可用

>>> my_comp.getChanges(my_key, purge=True)

独家添加了哪些书籍？

>>> for e in my_comp.additions:
...     print e
...
<book><title>Blue pages 2007</title>
<author>
<surname>La Poste</surname>
</author>
<chapter><title>Bretagne</title>
<para>Mer 82 23 44 12</para>
<para>Ciel 82 67 23 12</para>
</chapter>
</book>
<BLANKLINE>

哪些书被完全删除了？

>>> for e in my_comp.deletions:
...     print e
...
<book><title>Dark pages 2007</title>
<author>
<surname>La Poste</surname>
</author>
<chapter><title>Greves</title>
<para>SNCF 82 23 44 12</para>
</chapter>
</book>
<BLANKLINE>

什么书变了？即具有相同的标题，但不同的其他值

>>> for e in my_comp.changes:
...     print e
...
<book><title>White pages 1995</title>
<author>
<surname>La Poste</surname>
</author>
<chapter><title>Paris</title>
<para>ABIL Antoine 82 23 44 12</para>
<para>ABEL Pierre 82 67 23 12</para>
</chapter>
</book>
<BLANKLINE>

然后我们可以将这些结果放回 xml 文件中

此代码符合 PEP8
它经过全面测试，100% 覆盖率
Buildbot 在每次提交时运行测试

贡献者

主要开发商

Nicolas Laurance <zindep dot com 的 nlaurance>

的贡献

Yves Mahe <zindep dot com 的 ymahe>

更改历史

0.1 中的新功能

首次发布

项目详情

发布历史发布通知| RSS订阅

这个版本

0.1

2009 年 12 月 13 日

下载文件

下载适用于您平台的文件。如果您不确定要选择哪个，请了解有关安装包的更多信息。

源分布

ListComparator-0.1.tar.gz （10.9 kB 查看哈希）

已上传 2009 年 12 月 13 日 source

ListComparator -0.1.tar.gz 的哈希值

ListComparator-0.1.tar.gz 的哈希值
算法	哈希摘要
SHA256	`1b17dad959d0963261a8e8e08c06018329d413461d0f0facd77fb206b66074fb`
MD5	`e8b3b57781101ab6eeeda7e9fb807e27`
布莱克2-256	`6ec629c3bbc181c6b24dcfb8b46a76d66bf2c83f3802f727de727bbb41841688`

ListComparator 0.1

导航

项目链接

统计数据

Meta

Maintainers

分类

项目描述

详细文档

XML 和 CSV 比较

列表比较

比较 XML 文件

贡献者

的贡献

更改历史

项目详情

项目链接

统计数据

元

维护者

分类器

发布历史发布通知| RSS订阅

下载文件

源分布

ListComparator 0.1

导航

项目链接

统计数据

Meta

Maintainers

分类

项目描述

详细文档

XML 和 CSV 比较

列表比较

比较 XML 文件

贡献者

的贡献

更改历史

项目详情

项目链接

统计数据

元

维护者

分类器

发布历史 发布通知| RSS订阅

下载文件

源分布

发布历史发布通知| RSS订阅