elk的告警调研与实践

背景

我们上次讲filebeat之pipeline实践,用filebeat采集到了es,那么错误日志是不断实时采集上来了,可是能否在出现某种异常的时候能通知告警一下呢,比如通过企业微信机器人通知我们一下,通过短信邮箱通知我们一下?那么我们来调研实践一下elk的告警功能。

kibana Alerting

收费功能,在kibana中现在已经集成了 kibana Alerting功能
破解可查看 Elasticsearch 7.x 白金级 破解实践

2Kpl30

  • Alerts and Actions(规则和连接器)
    Alerts 是运行在 Kibana 的服务, 把一些复杂的条件都隐藏起来功能也较简单,Watcher 提供更复杂条件查找,也可以通过 DSL 设置更复杂的条件。
  • Watcher(监听器)
    Watcher 是运行于 Elasticsearch

Alerts and Actions(规则和连接器)

因为只支持简单的可视化添加规则,暂不做深入。

Watcher(监听器)

一个 watcher 由5个部分组成

1
2
3
4
5
6
7
{
"trigger": {},
"input": {},
"condition": {},
"transform" {},
"actions": {}
}

trigger

这个定义多长时间 watcher 运行一次。比如我们可以定义如下:

1
2
3
4
5
6
7
8
9
"trigger": {
"schedule": {
"daily": {
"at": [
"9:45" //  其实是东八区 17:45
]
}
}
}

这里要注意一下,如果定义的是cron或者具体某个时间,请务必采用UTC时间定义。也就是当前时间-8小时。因为trigger目前只支持utc时间

lMSz6I
2ARF75
相关链接
https://www.elastic.co/guide/en/elasticsearch/reference/7.16/trigger-schedule.html
https://github.com/elastic/elasticsearch/issues/34659

input

input 获取你要评估的数据。要定期搜索日志数据,如查询当天的数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"<vpn-log-{now/d{YYYY-MM-dd}}>"
],
"rest_total_hits_as_int": true,
"body": {
"size": 0,
"query": {
"bool": {
"filter": {
"range": {
"@timestamp": {
"gte": "now/d",
"lte": "now",
"time_zone": "+08:00"
}
}
}
}
}
}
}
}
}

condition

condition 评估你加载到 watch 中的数据的触发要求,不如总数大于0

1
2
3
4
5
6
7
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gt": 0
}
}
},

transform

讲transform的数据装载到ctx.payload,可以不与input一样,这样我们就能在action去拿到我们要进行通知的内容了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
"transform": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"<vpn-log-{now/d{YYYY-MM-dd}}>"
],
"rest_total_hits_as_int": true,
"body": {
"query": {
"bool": {
"filter": {
"range": {
"@timestamp": {
"gte": "now/d",
"lte": "now",
"time_zone": "+08:00"
}
}
}
}
},
"aggs": {
"topn": {
"terms": {
"field": "tags"
},
"aggs": {
"source_ip_topn": {
"terms": {
"field": "source_ip"
}
}
}
}
}
}
}
}
}

actions

但是 Watcher 真正的强大在于能够在满足 watch 条件的时候做一些事情。 watch 的操作定义了当 watch 条件评估为真时要做什么。 你可以发送电子邮件、调用第三方 webhook、将文档写入 Elasticsearch 索引或将消息记录到标准 Elasticsearch 日志文件中。这里我们来发一个企业微信机器人webhook

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
"actions": {
"wecom_webhook": {
"webhook": {
"scheme": "https",
"host": "qyapi.weixin.qq.com",
"port": 443,
"method": "post",
"path": "/cgi-bin/webhook/send",
"params": {
"key": "XXX"
},
"headers": {
"Content-Type": "application/json"
},
"body": """{"msgtype":"text","text":{"content":"【vpn监控-每日异常汇总】 - 今日当前共{{ctx.payload.hits.total}}条错误异常\n\n 问题排行:\n\n{{#ctx.payload.aggregations.topn.buckets}} - {{key}} {{doc_count}}次\n{{#source_ip_topn.buckets}} \t {{key}} {{doc_count}}次\n{{/source_ip_topn.buckets}}\n{{/ctx.payload.aggregations.topn.buckets}}\n\n请查看Dashboard定位问题:http://it.dakewe.com/goto/fc2c30d43913c3bc066fd5b470b47953\n账号/密码:public_viewer"}}"""
}
}
}

完整示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
{
"trigger": {
"schedule": {
"daily": {
"at": [
"9:45"
]
}
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"<vpn-log-{now/d{YYYY-MM-dd}}>"
],
"rest_total_hits_as_int": true,
"body": {
"size": 0,
"query": {
"bool": {
"filter": {
"range": {
"@timestamp": {
"gte": "now/d",
"lte": "now",
"time_zone": "+08:00"
}
}
}
}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gt": 0
}
}
},
"actions": {
"wecom_webhook": {
"webhook": {
"scheme": "https",
"host": "qyapi.weixin.qq.com",
"port": 443,
"method": "post",
"path": "/cgi-bin/webhook/send",
"params": {
"key": "XXX"
},
"headers": {
"Content-Type": "application/json"
},
"body": """{"msgtype":"text","text":{"content":"【vpn监控-每日异常汇总】 - 今日当前共{{ctx.payload.hits.total}}条错误异常\n\n 问题排行:\n\n{{#ctx.payload.aggregations.topn.buckets}} - {{key}} {{doc_count}}次\n{{#source_ip_topn.buckets}} \t {{key}} {{doc_count}}次\n{{/source_ip_topn.buckets}}\n{{/ctx.payload.aggregations.topn.buckets}}\n\n请查看Dashboard定位问题:http://it.dakewe.com/goto/fc2c30d43913c3bc066fd5b470b47953\n账号/密码:public_viewer"}}"""
}
}
},
"transform": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"<vpn-log-{now/d{YYYY-MM-dd}}>"
],
"rest_total_hits_as_int": true,
"body": {
"query": {
"bool": {
"filter": {
"range": {
"@timestamp": {
"gte": "now/d",
"lte": "now",
"time_zone": "+08:00"
}
}
}
}
},
"aggs": {
"topn": {
"terms": {
"field": "tags"
},
"aggs": {
"source_ip_topn": {
"terms": {
"field": "source_ip"
}
}
}
}
}
}
}
}
}
}

添加和模拟 Watcher

我们可以从kibana进行watcher的创建和模拟。

1OrAOs
UWAQg9

专题目录

ElasticStack-安装篇
ElasticStack-elasticsearch篇
ElasticStack-logstash篇
elasticSearch-mapping相关
elasticSearch-分词器介绍
elasticSearch-分词器实践笔记
elasticSearch-同义词分词器自定义实践
docker-elk集群实践
filebeat与logstash实践
filebeat之pipeline实践
Elasticsearch 7.x 白金级 破解实践
elk的告警调研与实践