Istio入门与实战 11.3 日志收集

更多图书

11.3　日志收集

Istio提供了开箱即用的日志收集组件，可以把日志输出到Mixer组件上的标准输出或者存储到Mixer组件上的文件里。也可以把日志发送到Fluentd组件，然后收集存储到日志中心，用于检索分析日志，生产环境推荐使用这种方式来收集日志。

收集HTTP协议服务的请求日志输出到标准输出示例：

1 apiVersion: config.istio.io/v1alpha2
 2 kind: logentry
 3 metadata:
 4   name: myaccesslog
 5   namespace: istio-system
 6 spec:
 7   severity: '"Default"'
 8   timestamp: request.time
 9   variables:
10     source_ip: source.ip | ip("0.0.0.0")
11     destination_ip: destination.ip | ip("0.0.0.0")
12     source_user: source.principal | ""
13     method: request.method | ""
14     url: request.path | ""
15     protocol: request.scheme | "http"
16     response_code: response.code | 0
17     response_size: response.size | 0
18     request_size: request.size | 0
19     latency: response.duration | "0ms"
20   monitored_resource_type: '"UNSPECIFIED"'
21 ---
22 apiVersion: config.istio.io/v1alpha2
23 kind: stdio
24 metadata:
25   name: myaccessloghandler
26   namespace: istio-system
27 spec:
28  severity_levels:
29    info: 1
30  outputAsJson: true
31 ---
32 apiVersion: config.istio.io/v1alpha2
33 kind: rule
34 metadata:
35   name: myaccesslog-logstdio
36   namespace: istio-system
37 spec:
38   match: "true"
39   actions:
40    - handler: myaccessloghandler.stdio
41      instances:
42    - myaccesslog.logentry

第1~20行定义了名为了myaccesslog的logentry实例。severity表示收集的日志级别为Default，此字段供支持日志级别的适配器使用。timestamp表示使用请求的时间作为日志的时间。variables定义了日志收集的数据。monitored_resource_type定义了被监控资源的类型为"UNSPECIFIED"。

第22~30行定义了名为myaccessloghandler的stdio适配器。stdio适配器会把收集到的数据输出到Mixer组件的标准输出，severity_levels定义了日志的级别为info，outputAsJson设置为true表示以JSON的格式输出日志，可以通过设置outputLevel字段指定输出的级别。可以通过更多的配置选项输出日志到文件中并自动轮转日志，此次实验并没有使用这种方法，有兴趣的读者可以查看官方文档^[1] 。

第32~42行定义了名为myaccesslog-logstdio的规则，表明把myaccesslog实例收集到的日志发送给myaccessloghandler适配器处理。

收集服务请求日志发送到Fluentd组件示例：

1 apiVersion: config.istio.io/v1alpha2
 2 kind: logentry
 3 metadata:
 4   name: newlog
 5   namespace: istio-system
 6 spec:
 7   severity: '"info"'
 8   timestamp: request.time
 9   variables:
10     source: source.labels["app"] | source.workload.name | "unknown"
11     user: source.principal | "unknown"
12     destination:destination.labels["app"] | destination.service.name | "unknown"
13     response_code: response.code | 0
14     response_size: response.size | 0
15     latency: response.duration | "0ms"
16   monitored_resource_type: '"UNSPECIFIED"'
17 ---
18 apiVersion: config.istio.io/v1alpha2
19 kind: fluentd
20 metadata:
21   name: fluentdhandler
22   namespace: istio-system
23 spec:
24   address: "fluentd-es.logging:24224"
25 ---
26 apiVersion: config.istio.io/v1alpha2
27 kind: rule
28 metadata:
29   name: newlogtofluentd
30   namespace: istio-system
31 spec:
32   match: "true"
33   actions:
34    - handler: fluentdhandler.fluentd
35      instances:
36      - newlog.logentry

第1~16行定义了名为newlog的logentry实例。与收集HTTP协议服务的请求日志定义类似，在此不再赘述。

第18~24行定义了名为fluentdhandler的fluentd适配器。address定义了fluentd服务地址。

第26~36行定义了名为newlogtofluentd的规则，表明把newlog实例收集到的日志数据发送到fluentdhandler适配器。

【实验一】　测试日志输出到Mixer标准输出

1）创建用于请求的Pod：

$ kubectl apply -f kubernetes/fortio.yaml

2）创建服务日志收集到Mixer标准输出的规则：

$ kubectl apply -f istio/telemetry/log-http-access-log.yaml

3）打开新终端查看收集到的服务日志：

$ kubectl -n istio-system logs -f $(kubectl -n istio-system get pods -l istio-mixer-type=telemetry -o jsonpath='{.items[0].metadata.name}') -c mixer | grep \"instance\":\"myaccesslog.logentry.istio-system\"

4）另外打开一个终端并发请求服务：

$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -curl http://service-python/env
HTTP/1.1 200 OK
content-type: application/json
content-length: 176
server: envoy
date: Fri, 18 Jan 2019 13:53:26 GMT
x-envoy-upstream-service-time: 595
{"message":"python v2","upstream":[{"message":"lua v2","response_time":0.1},{"message":"node v1","upstream":[{"message":"go v2","response_time":"0.30"}],"response_time":0.76}]}
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 10 -n 100 -loglevel Error http://service-python/env
14:01:11 I logger.go:97> Log level is now 4 Error (was 2 Info)
Fortio 1.0.1 running at 10 queries per second, 2->2 procs, for 100 calls: http://service-python/env
Aggregated Sleep Time : count 96 avg -3.8814565 +/- 2.688 min -9.370066353 max 0.046896141 sum -372.619823
# range, mid point, percentile, count
>= -9.37007 <= -0.001 , -4.68553 , 98.96, 95
> 0.044 <= 0.0468961 , 0.0454481 , 100.00, 1
# target 50% -4.68553
WARNING 98.96% of sleep were falling behind
Aggregated Function Time : count 100 avg 0.73897862 +/- 0.4143 min 0.097562604 max 1.980383317 sum 73.8978625
# target 50%   0.669231
# target 75%   1
# target 90%   1.58823
# target 99%   1.94117
# target 99.9% 1.97646
Sockets used: 5 (for perfect keepalive, would be 4)
Code 200 : 99 (99.0 %)
Code 503 : 1 (1.0 %)
All done 100 calls (plus 0 warmup) 738.979 ms avg, 5.1 qps

5）清理：

$ kubectl delete -f istio/telemetry/log-http-access-log.yaml

【实验二】　测试日志输出到Fluentd标准输出

由于机器性能问题，本次实验环境只使用Fluentd收集日志，并输出到Fluentd的标准输出上，在生产环境中，把日志通过Fluentd收集，然后保存到ElasticSearch集群中，通过Kibana在Web中搜索查看分析日志。在机器性能足够的情况下，可以使用loging-stack.yaml部署EFK日志收集平台来模拟生产环境中的日志收集场景。

1）部署Fluentd：

$ kubectl apply -f kubernetes/loging-fluentd-stdout.yaml
$ kubectl get pod -n logging
NAME                          READY   STATUS    RESTARTS   AGE
fluentd-es-6cd547b4bc-sjqmk   1/1     Running   0          3m13s

2）创建服务日志发送到Fluentd的日志收集规则：

$ kubectl apply -f istio/telemetry/log-fluentd.yaml

3）打开终端查看服务日志：

$ kubectl -n logging logs -f $(kubectl -n logging get pods -l app=fluentd-es -o jsonpath='{.items[0].metadata.name}') | grep newlog.logentry.istio-system

4）另外打开一个终端并发请求服务：

$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -curl http://service-python/env
HTTP/1.1 200 OK
content-type: application/json
content-length: 178
server: envoy
date: Fri, 18 Jan 2019 14:08:10 GMT
x-envoy-upstream-service-time: 667
{"message":"python v1","upstream":[{"message":"lua v1","response_time":1.03},{"message":"node v2","response_time":1.95,"upstream":[{"message":"go v2","response_time":"1.04"}]}]}
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 10 -n 100 -loglevel Error http://service-python/env
14:08:51 I logger.go:97> Log level is now 4 Error (was 2 Info)
Fortio 1.0.1 running at 10 queries per second, 2->2 procs, for 100 calls: http://service-python/env
Aggregated Sleep Time : count 96 avg -3.8814565 +/- 2.688 min -9.370066353 max 0.046896141 sum -372.619823
# range, mid point, percentile, count
>= -9.37007 <= -0.001 , -4.68553 , 98.96, 95
> 0.044 <= 0.0468961 , 0.0454481 , 100.00, 1
# target 50% -4.68553
WARNING 98.96% of sleep were falling behind
Aggregated Function Time : count 100 avg 0.73897862 +/- 0.4143 min 0.097562604 max 1.980383317 sum 73.8978625
# target 50% 0.669231
# target 75% 1
# target 90% 1.58823
# target 99% 1.94117
# target 99.9% 1.97646
Sockets used: 5 (for perfect keepalive, would be 4)
Code 200 : 99 (99.0 %)
Code 503 : 1 (1.0 %)
All done 100 calls (plus 0 warmup) 738.979 ms avg, 5.1 qps

5）清理：

$ kubectl delete -f kubernetes/loging-fluentd-stdout.yaml
$ kubectl delete -f istio/telemetry/log-fluentd.yaml

【实验三】　测试日志输出到EFK

进行此实验步骤时，每台虚拟机分配了4核CPU和4G内存，资源不够的读者可以跳过此步骤实验。

1）部署EFK日志收集平台：

$ kubectl apply -f kubernetes/loging-stack.yaml

2）创建服务日志发送到Fluentd日志收集规则：

$ kubectl apply -f istio/telemetry/log-fluentd.yaml

3）并发请求服务：

$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -curl http://service-python/env
HTTP/1.1 200 OK
content-type: application/json
content-length: 178
server: envoy
date: Fri, 18 Jan 2019 14:10:25 GMT
x-envoy-upstream-service-time: 179
{"message":"python v1","upstream":[{"message":"lua v1","response_time":0.15},{"message":"node v2","response_time":0.51,"upstream":[{"message":"go v1","response_time":"0.40"}]}]}
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 10 -n 300 -loglevel Error http://service-python/env
14:11:50 I logger.go:97> Log level is now 4 Error (was 2 Info)
Fortio 1.0.1 running at 10 queries per second, 2->2 procs, for 300 calls: http://service-python/env
Aggregated Sleep Time : count 296 avg -33.639106 +/- 23.5 min -84.517839391 max -0.225980787 sum -9957.17528
# range, mid point, percentile, count
>= -84.5178 <= -0.225981 , -42.3719 , 100.00, 296
# target 50% -42.5148
WARNING 100.00% of sleep were falling behind
Aggregated Function Time : count 300 avg 1.4654142 +/- 1.06 min 0.009331473 max 7.796983286 sum 439.624248
# target 50%   1.26667
# target 75%   1.98095
# target 90%   2.81132
# target 99%   6.5
# target 99.9% 7.70789
Sockets used: 10 (for perfect keepalive, would be 4)
Code 200 : 293 (97.7 %)
Code 503 : 7 (2.3 %)
All done 300 calls (plus 0 warmup) 1465.414 ms avg, 2.6 qps

4）在Kibana上查看日志数据。

访问地址http://11.11.11.111:32142/ ，设置并查看日志。选择Management导航栏，点击Index Patterns创建Index Pattern，如图11-4所示。

填入logstash-*匹配ElasticSearch中存储的日志索引，如图11-5所示。

选择@timestamp作为时间字段，如图11-6所示。

在Discover导航栏查看日志，如图11-7所示。

图11-4　在Kibana上查看日志数据

图11-5　匹配ElasticSearch中存储的日志索引

图11-6　选择时间字段

5）清理：

$ kubectl delete -f istio/telemetry/log-fluentd.yaml
$ kubectl delete -f kubernetes/loging-stack.yaml
$ kubectl delete -f kubernetes/fortio.yaml

图11-7　在Discover导航栏查看日志

[1] 官方文档地址为https://istio.io/docs/reference/config/policy-and-telemetry/adapters/stdio/。