Istio入门与实战 11.2 指标收集

更多图书

11.2　指标收集

Istio提供统一的“服务指标”收集功能，可以通过简单的配置就能收集服务指标，比如，收集服务的响应时间、请求次数、响应流量等。Istio使用Prometheus获取指标数据。

统计服务的请求次数示例：

1 apiVersion: config.istio.io/v1alpha2
 2 kind: metric
 3 metadata:
 4   name: myrequestcount
 5   namespace: istio-system
 6 spec:
 7   value: "1"
 8   dimensions:
 9     destination_service: destination.service.name | "unknown"
10     destination_namespace: destination.service.namespace | "unknown"
11     destination_version: destination.labels["version"] | "unknown"
12     response_code: response.code | 200
13   monitored_resource_type: '"UNSPECIFIED"'
14 ---
15 apiVersion: config.istio.io/v1alpha2
16 kind: prometheus
17 metadata:
18   name: myrequestcounthandler
19   namespace: istio-system
20 spec:
21   metrics:
22   - name: request_count
23     instance_name: myrequestcount.metric.istio-system
24     kind: COUNTER
25     label_names:
26     - destination_service
27     - destination_namespace
28     - destination_version
29     - response_code
30 ---
31 apiVersion: config.istio.io/v1alpha2
32 kind: rule
33 metadata:
34   name: request-count
35   namespace: istio-system
36 spec:
37   actions:
38   - handler: myrequestcounthandler.prometheus
39     instances:
40     - myrequestcount.metric

第1~13行定义了名为myrequestcount的metric实例。value定义表示每次请求收集的指标数据计数值为1，dimensions定义了被收集指标的身份标识，用于区分不同的指标数据。monitored_resource_type定义了被监控资源的类型为"UNSPECIFIED"，仅对支持该字段的后端有用，此处可以忽略。

第15~29行定义了名为myrequestcounthandler的prometheus适配器。metrics定义要存储在Prometheus中的指标数据，name定义了指标名称，默认情况下在Prometheus中以istio后接name的形式存储，如"istio_request_count"，也可以通过指定namespace字段改变指标名称前缀。instance_name定义了使用metric的名称，必须使用全名称，kind表示存储在Prometheus时的指标类型，label_names定义了metric存储到Prometheus时要存储的label名称，必须与metric中的定义相对应。

第31~40行定义了名为request-count的rule规则。表明把myrequestcount实例收集到的指标数据发送给myrequestcounthandler适配器处理。

统计TCP服务数据传输大小示例：

1 apiVersion: config.istio.io/v1alpha2
 2 kind: metric
 3 metadata:
 4   name: mytcpsentbytes
 5   namespace: default
 6 spec:
 7   value: connection.sent.bytes | 0
 8   dimensions:
 9     source_service: source.labels["app"] | source.workload.name | "unknown"
10     source_version: source.labels["version"] | "unknown"
11     destination_service: destination.service.name | "unknown"
12     destination_version: destination.labels["version"] | "unknown"
13   monitoredResourceType: '"UNSPECIFIED"'
14 ---
15 apiVersion: config.istio.io/v1alpha2
16 kind: metric
17 metadata:
18   name: mytcpreceivedbytes
19   namespace: default
20 spec:
21   value: connection.received.bytes | 0
22   dimensions:
23     source_service: source.labels["app"] | source.workload.name | "unknown"
24     source_version: source.labels["version"] | "unknown"
25     destination_service: destination.service.name | "unknown"
26     destination_version: destination.labels["version"] | "unknown"
27   monitoredResourceType: '"UNSPECIFIED"'
28 ---
29 apiVersion: config.istio.io/v1alpha2
30 kind: prometheus
31 metadata:
32   name: tcphandler
33   namespace: default
34 spec:
35   metrics:
36   - name: tcp_sent_bytes
37     instance_name: mytcpsentbytes.metric.default
38     kind: COUNTER
39     label_names:
40     - source_service
41     - source_version
42     - destination_service
43     - destination_version
44   - name: tcp_received_bytes
45     instance_name: mytcpreceivedbytes.metric.default
46     kind: COUNTER
47     label_names:
48     - source_service
49     - source_version
50     - destination_service
51     - destination_version
52 ---
53 apiVersion: config.istio.io/v1alpha2
54 kind: rule
55 metadata:
56   name: tcp-sent-received-bytes
57   namespace: default
58 spec:
59   match: context.protocol == "tcp"
60          && destination.service.namespace == "default"
61   actions:
62   - handler: tcphandler.prometheus
63     instances:
64     - mytcpreceivedbytes.metric
65     - mytcpsentbytes.metric

第1~13行定义了名为mytcpsentbytes的metric实例，value的定义表示使用TCP发送数据的大小作为收集的指标数据，dimensions定义了被收集指标的身份标识。monitored_resource_type定义了被监控资源的类型为"UNSPECIFIED"。

第15~27行定义了名为mytcpreceivedbytes的metric实例，value的定义表示使用TCP接收数据的大小作为收集的指标数据，其他定义均与之前统计服务请求次数的定义一样，在此不再赘述。

第29~51行定义了名为tcphandler的prometheus适配器。其他定义均与之前统计服务请求次数的定义一样，在此不再赘述。

第53~65行定义了名为tcp-sent-received-bytes的rule规则。表明当请求的协议为TCP且目标服务的命名空间为default时，把mytcpsentbytes实例和mytcpreceivedbytes实例收集到的指标数据发送给tcphandler适配器处理。

【实验】

1）查看是否部署了Prometheus：

$ kubectl get deploy prometheus -n istio-system
NAME          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
prometheus    1          1               1             1             21m
$ kubectl get svc prometheus -n istio-system
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP    PORT(S)      AGE
prometheus   ClusterIP   10.99.177.169   <none>         9090/TCP     21m

上面命令的结果表示Prometheus已经部署。

2）创建用于请求的Pod：

$ kubectl apply -f kubernetes/fortio.yaml

3）创建服务HTTP指标收集规则：

$ kubectl apply -f istio/telemetry/metric-http-request-count.yaml

4）暴露Prometheus查询Web服务：

$ kubectl apply -f kubernetes/istio-prometheus-service.yaml

5）并发请求服务：

$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -curl http://service-python/env
HTTP/1.1 200 OK
content-type: application/json
content-length: 177
server: envoy
date: Fri, 18 Jan 2019 10:56:41 GMT
x-envoy-upstream-service-time: 1014
{"message":"python v2","upstream":[{"message":"lua v2","response_time":0.13},{"message":"node v2","response_time":1.0,"upstream":[{"message":"go v1","response_time":"0.32"}]}]}
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 30 -n 300 -loglevel Error http://service-python/env
10:59:53 I logger.go:97> Log level is now 4 Error (was 2 Info)
Fortio 1.0.1 running at 30 queries per second, 2->2 procs, for 300 calls: http://service-python/env
Aggregated Sleep Time : count 296 avg -50.177311 +/- 36.53 min -130.220001356 max -0.261570142 sum -14852.4839
# range, mid point, percentile, count
>= -130.22 <= -0.26157 , -65.2408 , 100.00, 296
# target 50% -65.4611
WARNING 100.00% of sleep were falling behind
Aggregated Function Time : count 300 avg 1.8540739 +/- 1.99 min 0.075693251 max 7.331838352 sum 556.222183
# target  50%   0.45
# target  75%   3.13333
# target  90%   5.07522
# target  99%   7.10618
# target  99.9% 7.30927
Sockets used: 6 (for perfect keepalive, would be 4)
Code 200 : 298 (99.3 %)
Code 503 : 2 (0.7 %)
All done 300 calls (plus 0 warmup) 1854.074 ms avg, 2.1 qps

6）在Prometheus UI上查看指标数据。

访问地址http://11.11.11.111:32141/ ，使用'istio_request_count'条件查询创建的指标数据，可以在Console中看到如下的指标数据：

istio_request_count{destination_namespace="default",destination_service="service-go",destination_version="v1",instance="10.244.2.6:42422",job="istio-mesh",response_code="200"}            272
istio_request_count{destination_namespace="default",destination_service="service-go",destination_version="v2",instance="10.244.2.6:42422",job="istio-mesh",response_code="200"}            282
...
istio_request_count{destination_namespace="default",destination_service="service-python",destination_version="v2",instance="10.244.2.6:42422",job="istio-mesh",response_code="200"}        256
istio_request_count{destination_namespace="istio-system",destination_service="istio-policy",destination_version="unknown",instance="10.244.2.6:42422",job="istio-mesh",response_code="200"}     38
istio_request_count{destination_namespace="istio-system",destination_service="istio-telemetry",destination_version="unknown",instance="10.244.2.6:42422",job="istio-mesh",response_code="200"}  862

查看图表数据，如图11-3所示。

图11-3　图表数据

使用如下的条件查询service-go服务的请求数据：

istio_request_count{destination_service="service-python", destination_version="v1",response_code="200"}

可以在Console看到如下的数据：

istio_request_count{destination_namespace="default",destination_service="service-python",destination_version="v1",instance="10.244.2.6:42422",job="istio-mesh",response_code="200"}  307

7）清理：

$ kubectl delete -f istio/telemetry/metric-http-request-count.yaml

8）创建服务TCP指标收集规则：

$ kubectl apply -f istio/telemetry/metric-tcp-data-size.yaml

9）部署Redis服务：

$ kubectl apply -f kubernetes/redis-server.yaml

10）部署service-redis服务：

$ kubectl apply -f service/redis/service-redis.yaml

11）并发请求服务：

$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -curl http://service-redis/env
HTTP/1.1 200 OK
content-type: text/plain; charset=utf-8
date: Fri, 18 Jan 2019 13:47:17 GMT
x-envoy-upstream-service-time: 661
server: envoy
transfer-encoding: chunked
800
# Server
redis_version:5.0.1
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:8a9d320088384235
redis_mode:standalone
os:Linux 3.10.0-693.5.2.el7.x86_64 x86_64
arch_bits:64
...
# Clients
connected_clients:1
client_recent_max_input_buffer:0
client_recent_max_output_buffer:0
blocked_clients:0
# Memory
used_memory:853752
...
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 10 -n 100 -loglevel Error http://service-redis/env
13:47:28 I logger.go:97> Log level is now 4 Error (was 2 Info)
Fortio 1.0.1 running at 10 queries per second, 2->2 procs, for 100 calls: http://service-redis/env
Aggregated Function Time : count 100 avg 0.010159774 +/- 0.007213 min 0.004142955 max 0.043500525 sum 1.01597743
# target 50% 0.00816667
# target 75% 0.0105
# target 90% 0.0148
# target 99% 0.0426254
# target 99.9% 0.043413
Sockets used: 4 (for perfect keepalive, would be 4)
Code 200 : 100 (100.0 %)
All done 100 calls (plus 0 warmup) 10.160 ms avg, 10.0 qps

12）在Prometheus UI上查看指标数据。

访问地址http://11.11.11.111:32141/ ，使用'istio_tcp_sent_bytes'条件查询创建的指标数据，可以在Console中看到如下的指标数据：

istio_tcp_sent_bytes{destination_service="redis",destination_version="v1",instance="10.244.2.6:42422",job="istio-mesh",source_service="service-redis",source_version="v1"}  326355

使用'istio_tcp_received_bytes'条件查询创建的指标数据，可以在Console中看到如下的指标数据：

istio_tcp_received_bytes{destination_service="redis",destination_version="v1",instance="10.244.2.6:42422",job="istio-mesh",source_service="service-redis",source_version="v1"}  1400

13）清理：

$ kubectl delete -f service/redis/service-redis.yaml
$ kubectl delete -f kubernetes/redis-server.yaml
$ kubectl delete -f istio/telemetry/metric-tcp-data-size.yaml
$ kubectl delete -f kubernetes/fortio.yaml
$ kubectl delete -f kubernetes/istio-prometheus-service.yaml