Istio提供统一的“服务指标”收集功能,可以通过简单的配置就能收集服务指标,比如,收集服务的响应时间、请求次数、响应流量等。Istio使用Prometheus获取指标数据。
统计服务的请求次数示例:
1 apiVersion: config.istio.io/v1alpha2 2 kind: metric 3 metadata: 4 name: myrequestcount 5 namespace: istio-system 6 spec: 7 value: "1" 8 dimensions: 9 destination_service: destination.service.name | "unknown" 10 destination_namespace: destination.service.namespace | "unknown" 11 destination_version: destination.labels["version"] | "unknown" 12 response_code: response.code | 200 13 monitored_resource_type: '"UNSPECIFIED"' 14 --- 15 apiVersion: config.istio.io/v1alpha2 16 kind: prometheus 17 metadata: 18 name: myrequestcounthandler 19 namespace: istio-system 20 spec: 21 metrics: 22 - name: request_count 23 instance_name: myrequestcount.metric.istio-system 24 kind: COUNTER 25 label_names: 26 - destination_service 27 - destination_namespace 28 - destination_version 29 - response_code 30 --- 31 apiVersion: config.istio.io/v1alpha2 32 kind: rule 33 metadata: 34 name: request-count 35 namespace: istio-system 36 spec: 37 actions: 38 - handler: myrequestcounthandler.prometheus 39 instances: 40 - myrequestcount.metric
第1~13行定义了名为myrequestcount的metric实例。value定义表示每次请求收集的指标数据计数值为1,dimensions定义了被收集指标的身份标识,用于区分不同的指标数据。monitored_resource_type定义了被监控资源的类型为"UNSPECIFIED",仅对支持该字段的后端有用,此处可以忽略。
第15~29行定义了名为myrequestcounthandler的prometheus适配器。metrics定义要存储在Prometheus中的指标数据,name定义了指标名称,默认情况下在Prometheus中以istio后接name的形式存储,如"istio_request_count",也可以通过指定namespace字段改变指标名称前缀。instance_name定义了使用metric的名称,必须使用全名称,kind表示存储在Prometheus时的指标类型,label_names定义了metric存储到Prometheus时要存储的label名称,必须与metric中的定义相对应。
第31~40行定义了名为request-count的rule规则。表明把myrequestcount实例收集到的指标数据发送给myrequestcounthandler适配器处理。
统计TCP服务数据传输大小示例:
1 apiVersion: config.istio.io/v1alpha2 2 kind: metric 3 metadata: 4 name: mytcpsentbytes 5 namespace: default 6 spec: 7 value: connection.sent.bytes | 0 8 dimensions: 9 source_service: source.labels["app"] | source.workload.name | "unknown" 10 source_version: source.labels["version"] | "unknown" 11 destination_service: destination.service.name | "unknown" 12 destination_version: destination.labels["version"] | "unknown" 13 monitoredResourceType: '"UNSPECIFIED"' 14 --- 15 apiVersion: config.istio.io/v1alpha2 16 kind: metric 17 metadata: 18 name: mytcpreceivedbytes 19 namespace: default 20 spec: 21 value: connection.received.bytes | 0 22 dimensions: 23 source_service: source.labels["app"] | source.workload.name | "unknown" 24 source_version: source.labels["version"] | "unknown" 25 destination_service: destination.service.name | "unknown" 26 destination_version: destination.labels["version"] | "unknown" 27 monitoredResourceType: '"UNSPECIFIED"' 28 --- 29 apiVersion: config.istio.io/v1alpha2 30 kind: prometheus 31 metadata: 32 name: tcphandler 33 namespace: default 34 spec: 35 metrics: 36 - name: tcp_sent_bytes 37 instance_name: mytcpsentbytes.metric.default 38 kind: COUNTER 39 label_names: 40 - source_service 41 - source_version 42 - destination_service 43 - destination_version 44 - name: tcp_received_bytes 45 instance_name: mytcpreceivedbytes.metric.default 46 kind: COUNTER 47 label_names: 48 - source_service 49 - source_version 50 - destination_service 51 - destination_version 52 --- 53 apiVersion: config.istio.io/v1alpha2 54 kind: rule 55 metadata: 56 name: tcp-sent-received-bytes 57 namespace: default 58 spec: 59 match: context.protocol == "tcp" 60 && destination.service.namespace == "default" 61 actions: 62 - handler: tcphandler.prometheus 63 instances: 64 - mytcpreceivedbytes.metric 65 - mytcpsentbytes.metric
第1~13行定义了名为mytcpsentbytes的metric实例,value的定义表示使用TCP发送数据的大小作为收集的指标数据,dimensions定义了被收集指标的身份标识。monitored_resource_type定义了被监控资源的类型为"UNSPECIFIED"。
第15~27行定义了名为mytcpreceivedbytes的metric实例,value的定义表示使用TCP接收数据的大小作为收集的指标数据,其他定义均与之前统计服务请求次数的定义一样,在此不再赘述。
第29~51行定义了名为tcphandler的prometheus适配器。其他定义均与之前统计服务请求次数的定义一样,在此不再赘述。
第53~65行定义了名为tcp-sent-received-bytes的rule规则。表明当请求的协议为TCP且目标服务的命名空间为default时,把mytcpsentbytes实例和mytcpreceivedbytes实例收集到的指标数据发送给tcphandler适配器处理。
【实验】
1)查看是否部署了Prometheus:
$ kubectl get deploy prometheus -n istio-system NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE prometheus 1 1 1 1 21m $ kubectl get svc prometheus -n istio-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE prometheus ClusterIP 10.99.177.169 <none> 9090/TCP 21m
上面命令的结果表示Prometheus已经部署。
2)创建用于请求的Pod:
$ kubectl apply -f kubernetes/fortio.yaml
3)创建服务HTTP指标收集规则:
$ kubectl apply -f istio/telemetry/metric-http-request-count.yaml
4)暴露Prometheus查询Web服务:
$ kubectl apply -f kubernetes/istio-prometheus-service.yaml
5)并发请求服务:
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -curl http://service-python/env HTTP/1.1 200 OK content-type: application/json content-length: 177 server: envoy date: Fri, 18 Jan 2019 10:56:41 GMT x-envoy-upstream-service-time: 1014 {"message":"python v2","upstream":[{"message":"lua v2","response_time":0.13},{"message":"node v2","response_time":1.0,"upstream":[{"message":"go v1","response_time":"0.32"}]}]} $ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 30 -n 300 -loglevel Error http://service-python/env 10:59:53 I logger.go:97> Log level is now 4 Error (was 2 Info) Fortio 1.0.1 running at 30 queries per second, 2->2 procs, for 300 calls: http://service-python/env Aggregated Sleep Time : count 296 avg -50.177311 +/- 36.53 min -130.220001356 max -0.261570142 sum -14852.4839 # range, mid point, percentile, count >= -130.22 <= -0.26157 , -65.2408 , 100.00, 296 # target 50% -65.4611 WARNING 100.00% of sleep were falling behind Aggregated Function Time : count 300 avg 1.8540739 +/- 1.99 min 0.075693251 max 7.331838352 sum 556.222183 # target 50% 0.45 # target 75% 3.13333 # target 90% 5.07522 # target 99% 7.10618 # target 99.9% 7.30927 Sockets used: 6 (for perfect keepalive, would be 4) Code 200 : 298 (99.3 %) Code 503 : 2 (0.7 %) All done 300 calls (plus 0 warmup) 1854.074 ms avg, 2.1 qps
6)在Prometheus UI上查看指标数据。
访问地址http://11.11.11.111:32141/ ,使用'istio_request_count'条件查询创建的指标数据,可以在Console中看到如下的指标数据:
istio_request_count{destination_namespace="default",destination_service="service-go",destination_version="v1",instance="10.244.2.6:42422",job="istio-mesh",response_code="200"} 272 istio_request_count{destination_namespace="default",destination_service="service-go",destination_version="v2",instance="10.244.2.6:42422",job="istio-mesh",response_code="200"} 282 ... istio_request_count{destination_namespace="default",destination_service="service-python",destination_version="v2",instance="10.244.2.6:42422",job="istio-mesh",response_code="200"} 256 istio_request_count{destination_namespace="istio-system",destination_service="istio-policy",destination_version="unknown",instance="10.244.2.6:42422",job="istio-mesh",response_code="200"} 38 istio_request_count{destination_namespace="istio-system",destination_service="istio-telemetry",destination_version="unknown",instance="10.244.2.6:42422",job="istio-mesh",response_code="200"} 862
查看图表数据,如图11-3所示。
图11-3 图表数据
使用如下的条件查询service-go服务的请求数据:
istio_request_count{destination_service="service-python", destination_version="v1",response_code="200"}
可以在Console看到如下的数据:
istio_request_count{destination_namespace="default",destination_service="service-python",destination_version="v1",instance="10.244.2.6:42422",job="istio-mesh",response_code="200"} 307
7)清理:
$ kubectl delete -f istio/telemetry/metric-http-request-count.yaml
8)创建服务TCP指标收集规则:
$ kubectl apply -f istio/telemetry/metric-tcp-data-size.yaml
9)部署Redis服务:
$ kubectl apply -f kubernetes/redis-server.yaml
10)部署service-redis服务:
$ kubectl apply -f service/redis/service-redis.yaml
11)并发请求服务:
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -curl http://service-redis/env HTTP/1.1 200 OK content-type: text/plain; charset=utf-8 date: Fri, 18 Jan 2019 13:47:17 GMT x-envoy-upstream-service-time: 661 server: envoy transfer-encoding: chunked 800 # Server redis_version:5.0.1 redis_git_sha1:00000000 redis_git_dirty:0 redis_build_id:8a9d320088384235 redis_mode:standalone os:Linux 3.10.0-693.5.2.el7.x86_64 x86_64 arch_bits:64 ... # Clients connected_clients:1 client_recent_max_input_buffer:0 client_recent_max_output_buffer:0 blocked_clients:0 # Memory used_memory:853752 ... $ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 10 -n 100 -loglevel Error http://service-redis/env 13:47:28 I logger.go:97> Log level is now 4 Error (was 2 Info) Fortio 1.0.1 running at 10 queries per second, 2->2 procs, for 100 calls: http://service-redis/env Aggregated Function Time : count 100 avg 0.010159774 +/- 0.007213 min 0.004142955 max 0.043500525 sum 1.01597743 # target 50% 0.00816667 # target 75% 0.0105 # target 90% 0.0148 # target 99% 0.0426254 # target 99.9% 0.043413 Sockets used: 4 (for perfect keepalive, would be 4) Code 200 : 100 (100.0 %) All done 100 calls (plus 0 warmup) 10.160 ms avg, 10.0 qps
12)在Prometheus UI上查看指标数据。
访问地址http://11.11.11.111:32141/ ,使用'istio_tcp_sent_bytes'条件查询创建的指标数据,可以在Console中看到如下的指标数据:
istio_tcp_sent_bytes{destination_service="redis",destination_version="v1",instance="10.244.2.6:42422",job="istio-mesh",source_service="service-redis",source_version="v1"} 326355
使用'istio_tcp_received_bytes'条件查询创建的指标数据,可以在Console中看到如下的指标数据:
istio_tcp_received_bytes{destination_service="redis",destination_version="v1",instance="10.244.2.6:42422",job="istio-mesh",source_service="service-redis",source_version="v1"} 1400
13)清理:
$ kubectl delete -f service/redis/service-redis.yaml $ kubectl delete -f kubernetes/redis-server.yaml $ kubectl delete -f istio/telemetry/metric-tcp-data-size.yaml $ kubectl delete -f kubernetes/fortio.yaml $ kubectl delete -f kubernetes/istio-prometheus-service.yaml