nginxpla - 小而强大的实时python nginx访问日志解析器和分析器，支持top-like风格

小而强大的实时python nginx访问日志解析器和分析器，支持top-like风格

Environment
- Console
Intended Audience
- Developers
- System Administrators
License
- OSI Approved :: MIT License
Programming Language

项目描述

灵感来自ngxtop

nginxpla是控制台 nginx 的日志解析器和分析器，用 python 编写。完全可配置的报告和模板。与ngxtop一样，它允许通过选择的指标构建类似top的自定义报告。我已经尽力做到可定制和可扩展。

nginxpla非常强大，可以在这里和现在对您的 Nginx 服务器进行故障排除和监控。它不适合长期监控，因为它的底层有 sqlite3。当大量数据累积时，性能可能会下降。所以，你警告过。

nginxpla是基于配置的实用程序。这意味着在第一次运行后，它会在用户主目录文件夹.nginxpla中创建，并带有 yaml 格式的配置文件。当您运行nginxpla时，它会加载配置，例如您尝试分析的文件的 log_format 和带有模块的模板。该程序在配置上足够灵活，可以分析几乎所有可以被正则表达式解析的逐行日志。包含多个模块的模块化结构。

1.安装

pip install nginxpla
nginxpla --install
nano ~/.nginxpla/nginxpla.yaml

2. 用法

Usage:
    nginxpla <access-log-file> [options]
    nginxpla <access-log-file> [options] (print) <var> ...
    nginxpla (-h | --help)
    nginxpla --version

Options:
    -l <file>, --access-log <file>  access log file to parse.
    -f <format>, --log-format <format>  log format as specify in log_format directive. [default: combined]
    -i <seconds>, --interval <seconds>  report interval when running in --top mode [default: 2.0]
    -t <template>, --template <template>  use template from config file [default: main]
    -m <modules>, --modules <modules>  comma separated module list [default: all]

    --info  print configuration info for access_log
    --top  watch for new lines as they are written to the access log.

    -g <var>, --group-by <var>  group by variable [default: ]
    -w <var>, --having <expr>  having clause [default: 1]
    -o <var>, --order-by <var>  order of output for default query [default: count]
    -n <number>, --limit <number>  limit the number of records included in report [default: 10]
    -a <exp> ..., --a <exp> ...  add exp (must be aggregation exp: sum, avg, min, max, etc.) into output

    -v, --verbose  more verbose output
    -d, --debug  print every line and parsed record
    -h, --help  print this help message.
    --version  print version information.

    Advanced:
    -c <file>, --config <file>  nginxpla config file path.
    -e <filter-expression>, --filter <filter-expression>  filter in, records satisfied given expression are processed.
    -p <filter-expression>, --pre-filter <filter-expression>  in-filter expression to check in pre-parsing phase.
    -s <sql-request>, --sql <sql-request>  raw Sql in sqlite format. Table with data is log
    --fields <fields>  Fields to import in sqllite log table, for example, --fields user_agent,status

Examples:
    Print statistics for default template
    $ nginxpla access_log

    Select All indexed data from base
    $ nginxpla access_log --sql select * from log

    Select All indexed data from base
    $ nginxpla access_log --sql 'SELECT user_agent, status, count(1) AS count FROM log GROUP BY user_agent, status ORDER BY count DESC LIMIT 100' --fields user_agent,status

配置

安装后配置日志部分：

logs:
    mydomain:
        log_path_regexp: 'mydomain\.access\.log'
        format: "default"
    second_domain_name:
        log_path_regexp: 'second_domain_name\.access\.log'
        format: "custom"
    fallback_to_combined:
        log_path_regexp: '.*'
        format: "combined"

如果您使用自定义 nginx log_format 或者您想要配置不同的内容，您可以在以下部分定义格式：

formats:
    default: '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_x_forwarded_for"'
    combined: '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'
    custom: '$http_x_forwarded_for - [$time_local] "$host" "$request" $status ($bytes_sent) "$http_referer" "$uri $args" "$http_user_agent" [$request_time] [$upstream_response_time]'

重要提示：解析 $variables 后将是数据库中具有相同名称的列，您可以对其进行操作

regex_formats -section的作用与formats相同。如果你是 regex-guru，你可以通过 regex 加速解析。regex_formats比简单的方法更受欢迎，如果定义的格式和regex_format具有相同的名称，则将使用regex_format。

SQL 后缀

为了更好的可视化，我添加了后缀。只需将其添加到 SQL 中的列名，所有行数据都会被格式化。Sql 后缀本身将从结果表列名中删除。

_human_size — 大小格式化程序，将 4399151 之类的数字转换为 4,20Mb

例子

$ nginxpla access_log --fields request_path,body_bytes_sent query SELECT request_path, sum(body_bytes_sent) as bytes_sent_human_size GROUP BY request_path ORDER BY bytes_sent_human_size DESC LIMIT 10

报告表列人名

SQL 中的所有列名都将转换为以空格分隔的字符串。但是在您的 sql 中，您应该使用原始列名。

$ nginxpla access_log --fields se,request_path --filter="se=='Google Bot'" query 'SELECT request_path as request_path_by_google_bot, count(1) as count FROM log GROUP BY request_path ORDER BY count DESC LIMIT 10'

| Request Path By Google Bot   |   Count |
|------------------------------+---------|
| /c/202060826/new             |      68 |
| /c/202060826/discount        |      29 |
| /c/202001900                 |      28 |
| /c/202001107                 |      22 |
| /c/1000008746                |      17 |
| /c/202060845                 |      17 |
| /c/202000010                 |      16 |
| /c/202061131                 |      16 |
| /c/202062183/new             |      16 |
| /c/202061132                 |      15 |

running for 18 seconds, 33923 records processed: 1789.62 req/sec

打印格式

对于简单的查询，您可以使用打印语法：

nginxpla <access-log-file> [options] (print) <var> ...

print-syntax 解析器做了一些有用的魔法。它是排序和自动结果分组。魔术领域很重要

$ nginxpla access_log --limit=0 print se count

例子

# Uses Search Engine Module and Pattern Module

$ nginxpla access_log --filter="se != '-'" --limit=0 print se request_path_pattern count

| Se           | Request Path Pattern   |   Count |
|--------------+------------------------+---------|
| Yahoo Slurp  | Product                |  183522 |
| Yahoo Slurp  | Rubric                 |  106551 |
| Yahoo Slurp  | Brand                  |   18200 |
| Google Bot   | Rubric                 |   17549 |
| Google Bot   | Product                |   10959 |
| Google Bot   | Brand                  |    3019 |

running for 28 seconds, 361730 records processed: 12546.68 req/sec

4. 模块

模式模块

允许定义您的请求路径模式。例如，在您项目的 url 结构中，所有品牌的格式都类似于 /brand/slug...您可以按模式对它们进行分组：

modules:
    pattern:
        package: "module.pattern"
        class: "PatternModule"
        ...
        options:
            ...
            brand:
                from: '^/brand/.*'
                to: "Brand"
            ...

有关完整模块配置，请参见默认配置示例

所有以/brand/开头的 url都会有字段request_path_pattern的值为“Brand”，您可以在报告、打印或查询中使用它

$ nginxpla access_log print request_path_pattern count

ASN 模块

使用 GeoLite2-ASN.mmdb 获取asn和ans_name变量进行记录。asn_name包含来自 whois 的公司名称

ASN 模块配置

asn:
label: <s>"ASN</s> <s>Top:"</s>
package: <s>"module.asn"</s>
class: <s>"AsnModule"</s>
fields:
    - asn
    - asn_name
    - remote_addr
    - bytes_sent
    - request_time
inedxes:
    - asn_name
sql: |
    SELECT
        asn                                         AS ASN,
        asn_name                                    AS Company,
        count(1)                                    AS Count,
        sum(bytes_sent)                             AS sum_bytes_sent_human_size,
        sum(request_time)                           AS total_time,
        avg(request_time)                           AS avg_time,
        count(CASE WHEN status_type = 2 THEN 1 END) AS '2xx',
        count(CASE WHEN status_type = 3 THEN 1 END) AS '3xx',
        count(CASE WHEN status_type = 4 THEN 1 END) AS '4xx',
        count(CASE WHEN status_type = 5 THEN 1 END) AS '5xx'
    FROM log
    GROUP BY asn_name
    HAVING %(--having)s
    ORDER BY %(--order-by)s DESC
    LIMIT  %(--limit)s

模块 API

这个怎么运作

当一个字符串被解析成变量时，它们被连接成一个记录。此外，记录进入模块（handle_record），模块可以更改或添加一些内容到记录中。之后，只有部分记录进入数据库。确切的内容取决于设置文件中的关键字段，这是优化所必需的。然后报告汇编开始。报告方法按照配置中指定的顺序调用。handle_report方法使用相同的算法启动。但是，它接收结果报告作为参数。

记录- 从日志行解析的字典
报告- 所有报告的文本
ModuleConfi - 具有模块设置的对象

模块它只是一个具有 3 个方法和构造函数的小类。

handle_record - 方法只接受一个参数记录并且必须返回它。你可以修改它。报告- 报告文本，您可以使用 sql 从数据库中获取数据。如果您不喜欢 config.store 中的方法 - 您可以获取连接 ( config.store.conn() ) 并执行您想要的操作 handle_report - 获取结果报告，必须将其返回

模块示例

"""
Simple Module

package: "module.simple"
class: "SimpleModule"

"""
from nginxpla.utils import generate_table
from nginxpla.module_config import ModuleConfig

class SimpleModule:
    def handle_record(self, record):
        record['some_variable'] = 'some_value'
        return record

    def report(self):
        config = self.config
        [header, data] = config.storage.fetchtable(config.sql, config.arguments)
        return generate_table(header, data)

    def handle_report(self, report: str):
        report += "something to append to the end of entire script's report"
        return report

    def __init__(self, module_config: ModuleConfig):
        self.config = module_config

项目详情

发布历史发布通知| RSS订阅

这个版本

0.0.6

2021 年 7 月 15 日

0.0.5

2021 年 7 月 15 日

0.0.4

2021 年 7 月 5 日

0.0.2

2021 年 7 月 5 日

0.0.1

2021 年 7 月 5 日

下载文件

下载适用于您平台的文件。如果您不确定要选择哪个，请了解有关安装包的更多信息。

源分布

nginxpla-0.0.6.tar.gz (23.8 kB 查看哈希)

已上传 2021 年 7 月 15 日 source

nginxpla -0.0.6.tar.gz 的哈希值

nginxpla-0.0.6.tar.gz 的哈希值
算法	哈希摘要
SHA256	`e5108ae0af05718e00ad29a6a661b145a833eba0853e9b316e5ca800e93b0604`
MD5	`43c8c7025c0c53989c648053090c5f86`
布莱克2-256	`3e71abb5329a580424db72e36cdaea6c8356bb0a93eeb6ff4e77bc2b9c8b75d3`

nginxpla 0.0.6

导航

项目链接

统计数据

Meta

Maintainers

分类

项目描述

1.安装

2. 用法

配置

SQL 后缀

报告表列人名

打印格式

4. 模块

模块 API

项目详情

项目链接

统计数据

元

维护者

分类器

发布历史发布通知| RSS订阅

下载文件

源分布

nginxpla 0.0.6

导航

项目链接

统计数据

Meta

Maintainers

分类

项目描述

1.安装

2. 用法

配置

SQL 后缀

报告表列人名

打印格式

4. 模块

模块 API

项目详情

项目链接

统计数据

元

维护者

分类器

发布历史 发布通知| RSS订阅

下载文件

源分布

发布历史发布通知| RSS订阅