StarRocks存算分离之Prometheus基于Service注解的服务发现

背景

我们前期根据kube-prometheus-stack部署实践进行了监控的部署,并且很好的对k8s集群的各项指标进行了grafana可视化监控。
但是我们还有一个监控需求来源于数仓,日常管理数仓中,我会出现如下几个需求点:

  • 缓存数据到磁盘,
    这个需求源于我们使用的TKE使用的腾讯云的CFS作为存储,而CFS是按量收费的,那么StarRocks缓存到磁盘到底占用的多少磁盘空间,以及是否需要清理,就迫在眉睫
  • 数仓与对象储存流量情况
    我们需要日常关注StarRocks与对象存储的流量带宽情况
  • 物化视图的成功与否及监控告警
    StarRocks中创建了非常多的物化视图,而这些物化视图的成功失败及时间节点,需要更好的监控到位

基于以上需求,我们来尝试解决这些问题

StarRocks配置prometheus metrics scrape

根据 StarRocks Cluster Integration With Prometheus and Grafana Service 指南,我们先给StarRocks配置好 prometheus metrics scrape

我是根据operator安装的而非helm,所以根据文档我的配置如下:

重点关注spec.starRocksBeSpec.service.annotationsspec.starRocksFeSpec.service.annotations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
piVersion: starrocks.com/v1
kind: StarRocksCluster
metadata:
name: kube-starrocks
namespace: default
spec:
starRocksBeSpec:
configMapInfo:
configMapName: kube-starrocks-be-cm
resolveKey: be.conf
image: starrocks/be-ubuntu:3.3-latest
limits:
cpu: 4
memory: 4Gi
replicas: 1
requests:
cpu: 1
memory: 2Gi
service:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8040"
prometheus.io/scrape: "true"
starRocksFeSpec:
configMapInfo:
configMapName: kube-starrocks-fe-cm
resolveKey: fe.conf
image: starrocks/fe-ubuntu:3.3-latest
limits:
cpu: 4
memory: 4Gi
replicas: 1
requests:
cpu: 1
memory: 2Gi
service:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8030"
prometheus.io/scrape: "true"

根据 Service 注解动态采集 参考

prometheus-additional.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
- job_name: 'StarRocks_Cluster'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: keep
regex: starrocks # 过滤starrocks命名空间
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
scrape_configs:
- job_name: starrocks-fe-monitor
honor_labels: true
scrape_interval: 15s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- starrocks
relabel_configs:
- source_labels:
- __meta_kubernetes_endpoint_port_name
regex: http
action: keep
- source_labels:
- __meta_kubernetes_service_name
regex: starrockscluster-fe-service
action: keep
- source_labels:
- __meta_kubernetes_pod_node_name
target_label: node
- source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: service
- source_labels:
- __meta_kubernetes_pod_name
target_label: pod

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
scrape_configs:
- job_name: starrocks-be-monitor
honor_labels: true
scrape_interval: 15s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- starrocks
relabel_configs:
- source_labels:
- __meta_kubernetes_endpoint_port_name
regex: webserver
action: keep
- source_labels:
- __meta_kubernetes_service_name
regex: starrockscluster-cn-service
action: keep
- source_labels:
- __meta_kubernetes_pod_node_name
target_label: node
- source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: service
- source_labels:
- __meta_kubernetes_pod_name
target_label: pod

kube-prometheus-stack 采集配置方法

如果你使用 kube-prometheus-stack 来安装 Prometheus,需要在 additionalScrapeConfigs或者additionalScrapeConfigsSecret里加上采集配置,示例:

  • 在additionalScrapeConfigsSecret配置
    1
    kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
    1
    2
    3
    4
    5
    additionalScrapeConfigsSecret:
    enabled: true
    name: additional-configs
    key: prometheus-additional.yaml

  • 在additionalScrapeConfigs配置
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    prometheus:
    prometheusSpec:
    additionalScrapeConfigs:
    - job_name: 'StarRocks_Cluster'
    kubernetes_sd_configs:
    - role: endpoints
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (https?)
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
    - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
    action: keep
    regex: starrocks # 过滤starrocks命名空间
    - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name
    - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: kubernetes_pod_name
    配置好后,我们到Prometheus web界面观察,发现已经正常在采集了。
    CICYEY

Grafana 监控可视化展示

按照文档Import StarRocks Grafana Dashboard,导入Grafana模板,发现毛数据都木有,哈哈哈🤣,至此等待StarRocks官方修复。
fXzdKd

我们来试试其他几个模板
Dashboard 模板

Lxfc0Q

gUQLim

其他参考

Kubernetes 监控:Prometheus Operator