Istio入门与实战 8.8 限流

更多图书

8.8　限流

Istio提供两种限流的实现：基于内存限流和基于Redis的限流。基于内存的限流方式适用于只部署一个Mixer的集群，而且由于使用内存存储的数据来限流，Mixer重启后限流数据会丢失。因此在生产环境，建议使用基于Redis的限流方式，这种方式可存储限流数据。在Istio中，服务被限流的请求会得到429（Too Many Requests）响应码。

限流的配置分为客户端和Mixer端两个部分。

客户端配置：

·QuotaSpec定义了Quota实例和对应的每次请求消耗的配额数。

·QuotaSpecBinding将QuotaSpec与一个或多个服务相关联绑定，只有被关联绑定的服务限流才会生效。

Mixer端配置：

·quota实例定义了Mixer如何区别度量一个请求的限流配额，用来描述请求数据收集的维度。

·memquota/redisquota适配器定义了memquota/redisquota的配置，根据quota实例定义的请求数据收集维度来区分并定义一个或多个限流配额。

·rule规则定义了quota实例应该何时分发给memquota/redisquota适配器处理。

1.基于内存的限流

使用示例如下：

1 apiVersion: "config.istio.io/v1alpha2"
 2 kind: quota
 3 metadata:
 4   name: requestcount
 5   namespace: istio-system
 6 spec:
 7   dimensions:
 8     source: request.headers["x-forwarded-for"] | "unknown"
 9     destination: destination.labels["app"] | destination.service.name | "unknown"
10     destinationVersion: destination.labels["version"] | "unknown"
11 ---
12 apiVersion: "config.istio.io/v1alpha2"
13 kind: memquota
14 metadata:
15   name: handler
16   namespace: istio-system
17 spec:
18   quotas:
19   - name: requestcount.quota.istio-system
20     maxAmount: 500
21     validDuration: 1s
22     overrides:
23     - dimensions:
24         destination: service-go
25       maxAmount: 50
26       validDuration: 1s
27     - dimensions:
28         destination: service-node
29         source: "10.28.11.20"
30       maxAmount: 50
31       validDuration: 1s
32     - dimensions:
33         destination: service-node
34       maxAmount: 20
35       validDuration: 1s
36     - dimensions:
37         destination: service-python
38       maxAmount: 2
39       validDuration: 5s
40 ---
41 apiVersion: config.istio.io/v1alpha2
42 kind: rule
43 metadata:
44   name: quota
45   namespace: istio-system
46 spec:
47   actions:
48   - handler: handler.memquota
49     instances:
50     - requestcount.quota
51 ---
52 apiVersion: config.istio.io/v1alpha2
53 kind: QuotaSpec
54 metadata:
55   name: request-count
56   namespace: istio-system
57 spec:
58   rules:
59   - quotas:
60     - charge: 1
61       quota: requestcount
62 ---
63 apiVersion: config.istio.io/v1alpha2
64 kind: QuotaSpecBinding
65 metadata:
66   name: request-count
67   namespace: istio-system
68 spec:
69   quotaSpecs:
70   - name: request-count
71     namespace: istio-system
72   services:
73   - name: service-go
74     namespace: default
75   - name: service-node
76     namespace: default
77   - name: service-python
78     namespace: default

第1~10行定义了名为requestcount的quota实例，获取请求的source、destination、destinationVersion值供memquota适配器来区分请求的限流配额。取值规则如下：

·source获取请求的x-forwarded-for请求头的值作为source的取值，不存在时，source取值"unknown"。

·destination获取请求的目标服务标签中的app标签的值，不存在时，取目标服务的service.name字段值，否则destination取值"unknown"。

·destinationVersion获取请求目标服务标签中的version标签的值，不存在时，destinationVersion取值"unknown"。

第12~39行定义了名为handler的memquota适配器。19行中的name字段值为上面定义的quota实例名称。第20行定义了默认的限流配额为500。第21行定义默认的限流计算周期为1s，即默认情况下每秒最高500个请求。第23~39行为具体的限流配置。第23~26行定义了当destination是service-go时，每秒不能高于50个请求。第27~31行定义了当destination为service-node且source为"10.28.11.20"时，每秒不能高于50个请求。第32~35行定义了当destination为service-node时，每秒不能高于20个请求。第36~39行定义了当destination为service-python时，每5秒内不能高于2个请求。

第41~50行定义了名为quota的rule规则，由于没有指定条件，会把所有相关联的服务请求都分发给memquota适配器处理。

第52~61行定义了名为request-count的QuotaSpec，指定了名为requestcount的quota实例每次消耗一个配额。

第63~78行定义了名为request-count的QuotaSpecBinding，把default命名空间的service-go、service-node、service-python服务与名为request-count的QuotaSpec关联起来。

在memquota适配器配置的所有限流规则中，执行限流时会从第一条限流规则开始匹配。当遇到第一条匹配的规则后，后面的规则不再匹配。如果没有匹配到任何具体的规则，则使用默认的规则。所以第27~31行定义的限流规则不能与第32~35行定义的限流规则交换位置，如果交换位置就会导致第27~31行定义的限流规则永远不会被匹配到，所以配置限流规则的时候，越具体的匹配规则应该放在越靠前的位置，否则可能会出现达不到预期的限流效果。

quota实例具体可以使用获取哪些值用于区分请求，可以参考官方文档^[1] 。

2.基于Redis的限流

使用示例如下：

1 apiVersion: "config.istio.io/v1alpha2"
 2 kind: quota
 3 metadata:
 4   name: requestcount
 5   namespace: istio-system
 6 spec:
 7   dimensions:
 8     source: request.headers["x-forwarded-for"] | "unknown"
 9     destination: destination.labels["app"] | destination.workload.name | "unknown"
10     destinationVersion: destination.labels["version"] | "unknown"
11 ---
12 apiVersion: "config.istio.io/v1alpha2"
13 kind: redisquota
14 metadata:
15   name: handler
16   namespace: istio-system
17 spec:
18   redisServerUrl: redis-ratelimit.istio-system:6379
19   connectionPoolSize: 10
20   quotas:
21   - name: requestcount.quota.istio-system
22     maxAmount: 500
23     validDuration: 1s
24     bucketDuration: 500ms
25     rateLimitAlgorithm: ROLLING_WINDOW
26     overrides:
27     - dimensions:
28         destination: service-go
29       maxAmount: 50
30     - dimensions:
31         destination: service-node
32         source: "10.28.11.20"
33       maxAmount: 50
34     - dimensions:
35         destination: service-node
36       maxAmount: 20
37     - dimensions:
38         destination: service-python
39       maxAmount: 2
40 ---
41 apiVersion: config.istio.io/v1alpha2
42 kind: rule
43 metadata:
44   name: quota
45   namespace: istio-system
46 spec:
47   actions:
48   - handler: handler.redisquota
49     instances:
50     - requestcount.quota
51 ---
52 apiVersion: config.istio.io/v1alpha2
53 kind: QuotaSpec
54 metadata:
55   name: request-count
56   namespace: istio-system
57 spec:
58   rules:
59   - quotas:
60     - charge: 1
61       quota: requestcount
62 ---
63 apiVersion: config.istio.io/v1alpha2
64 kind: QuotaSpecBinding
65 metadata:
66   name: request-count
67   namespace: istio-system
68 spec:
69   quotaSpecs:
70   - name: request-count
71     namespace: istio-system
72   services:
73   - name: service-go
74     namespace: default
75   - name: service-node
76     namespace: default
77   - name: service-python
78     namespace: default

第12~39行定义了名为handler的redisquota适配器，第18行定义了Redis的连接地址，19行定义了Redis的连接池大小。

第22行定义了默认配额为500，23行定义了默认限流周期为1秒，即默认情况下每秒最高500个请求。

第25行定义了使用的限流算法有两种：FIXED_WINDOW和ROLLING_WINDOW，其中，FIXED_WINDOW为默认的算法：

·FIXED_WINDOW算法可以设置请求速率峰值高达2倍。

·ROLLING_WINDOW算法可以提高精确度，这也会额外消耗Redis的资源。

第27~39行定义了具体的限流规则，与memquota不同，这里不允许再单独为限流规则设置限流周期，只能使用默认的限流周期。

其余部分的配置与memquota的限流配置一样。下面举例说明限流方法。

（1）基于条件的限流

如下配置表示只对cookie中不存在user的请求做限流：

apiVersion: config.istio.io/v1alpha2
kind: rule
metadata:
  name: quota
  namespace: istio-system
spec:
  match: match(request.headers["cookie"], "user=*") == false
  actions:
  - handler: handler.memquota
    instances:
    - requestcount.quota

（2）对所有服务限流

如下的配置表示对所有服务进行限流。

apiVersion: config.istio.io/v1alpha2
kind: QuotaSpecBinding
metadata:
  name: request-count
  namespace: istio-system
spec:
  quotaSpecs:
  - name: request-count
    namespace: istio-system
  services:
    - service: '*'

【实验】

本次实验使用基于内存的memquota适配器来进行服务限流测试。如果使用基于Redis的redisquota适配器进行实验，可能会由于实验环境机器性能问题，导致Mixer访问Redis出现错误，进而导致请求速率还没有到达设置值时就出现被限流的情况，影响实验结果的准确性。

1）部署其他服务：

$ kubectl apply -f service/node/service-node.yaml
$ kubectl apply -f service/lua/service-lua.yaml
$ kubectl apply -f service/python/service-python.yaml
$ kubectl get pod
NAME                                   READY     STATUS      RESTARTS   AGE
service-go-v1-7cc5c6f574-488rs         2/2       Running     0          15m
service-go-v2-7656dcc478-bfq5x         2/2       Running     0          15m
service-lua-v1-5c9bcb7778-d7qwp        2/2       Running     0          3m12s
service-lua-v2-75cb5cdf8-g9vht         2/2       Running     0          3m12s
service-node-v1-d44b9bf7b-z7vbr        2/2       Running     0          3m11s
service-node-v2-86545d9796-rgtxw       2/2       Running     0          3m10s
service-python-v1-79fc5849fd-xgfkn     2/2       Running     0          3m9s
service-python-v2-7b6864b96b-5w6cj     2/2       Running     0          3m15s

2）启动用于并发测试的Pod：

$ kubectl apply -f kubernetes/fortio.yaml

3）创建限流规则：

$ kubectl apply -f istio/resilience/quota-mem-ratelimit.yaml

4）访问service-go服务，测试限流是否生效：

$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -curl http://service-go/env
HTTP/1.1 200 OK
content-type: application/json; charset=utf-8
date: Wed, 16 Jan 2019 15:33:02 GMT
content-length: 19
x-envoy-upstream-service-time: 226
server: envoy
{"message":"go v1"}
# 30 qps
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 30 -n 300 -loglevel Error http://service-go/env
15:33:36 I logger.go:97> Log level is now 4 Error (was 2 Info)
Fortio 1.0.1 running at 30 queries per second, 2->2 procs, for 300 calls: http://service-go/env
Aggregated Function Time : count 300 avg 0.0086544419 +/- 0.005944 min 0.002929143 max 0.065596074 sum 2.59633258
# target 50% 0.007375
# target 75% 0.00938095
# target 90% 0.0115
# target 99% 0.0325
# target 99.9% 0.0647567
Sockets used: 4 (for perfect keepalive, would be 4)
Code 200 : 300 (100.0 %)
All done 300 calls (plus 0 warmup) 8.654 ms avg, 30.0 qps
# 50 qps
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 50 -n 500 -loglevel Error http://service-go/env
15:34:17 I logger.go:97> Log level is now 4 Error (was 2 Info)
Fortio 1.0.1 running at 50 queries per second, 2->2 procs, for 500 calls: http://service-go/env
Aggregated Function Time : count 500 avg 0.0086848862 +/- 0.005076 min 0.00307391 max 0.05419281 sum 4.34244311
# target 50% 0.0075
# target 75% 0.00959459
# target 90% 0.0132857
# target 99% 0.03
# target 99.9% 0.0531446
Sockets used: 4 (for perfect keepalive, would be 4)
Code 200 : 500 (100.0 %)
All done 500 calls (plus 0 warmup) 8.685 ms avg, 50.0 qps
# 60 qps
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 60 -n 600 -loglevel Error http://service-go/env
15:35:28 I logger.go:97> Log level is now 4 Error (was 2 Info)
Fortio 1.0.1 running at 60 queries per second, 2->2 procs, for 600 calls: http://service-go/env
Aggregated Function Time : count 600 avg 0.0090870522 +/- 0.008314 min 0.002537502 max 0.169680378 sum 5.45223134
# target 50% 0.00748529
# target 75% 0.0101538
# target 90% 0.0153548
# target 99% 0.029375
# target 99.9% 0.163872
Sockets used: 23 (for perfect keepalive, would be 4)
Code 200 : 580 (96.7 %)
Code 429 : 20 (3.3 %)
All done 600 calls (plus 0 warmup) 9.087 ms avg, 59.9 qps

5）访问service-node服务，测试限流是否生效：

$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -curl http://service-node/env
HTTP/1.1 200 OK
content-type: application/json; charset=utf-8
content-length: 77
date: Wed, 16 Jan 2019 15:36:13 GMT
x-envoy-upstream-service-time: 1187
server: envoy
{"message":"node v2","upstream":[{"message":"go v1","response_time":"0.51"}]}
# 20 qps
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 20 -n 200 -loglevel Error http://service-node/env
15:37:51 I logger.go:97> Log level is now 4 Error (was 2 Info)
Fortio 1.0.1 running at 20 queries per second, 2->2 procs, for 200 calls: http://service-node/env
Aggregated Sleep Time : count 196 avg -0.21285915 +/- 1.055 min -4.8433788589999995 max 0.190438028 sum -41.7203939
# range, mid point, percentile, count
>= -4.84338 <= -0.001 , -2.42219 , 18.37, 36
> 0.003 <= 0.004 , 0.0035 , 20.41, 4
> 0.011 <= 0.013 , 0.012 , 20.92, 1
> 0.015 <= 0.017 , 0.016 , 21.43, 1
> 0.069 <= 0.079 , 0.074 , 21.94, 1
> 0.089 <= 0.099 , 0.094 , 24.49, 5
> 0.099 <= 0.119 , 0.109 , 28.57, 8
> 0.119 <= 0.139 , 0.129 , 33.67, 10
> 0.139 <= 0.159 , 0.149 , 38.27, 9
> 0.159 <= 0.179 , 0.169 , 68.37, 59
> 0.179 <= 0.190438 , 0.184719 , 100.00, 62
# target 50% 0.166797
WARNING 18.37% of sleep were falling behind
Aggregated Function Time : count 200 avg 0.07655831 +/- 0.3601 min 0.007514854 max 5.046878744 sum 15.311662
# target 50% 0.0258696
# target 75% 0.045
# target 90% 0.104
# target 99% 0.55
# target 99.9% 5.0375
Sockets used: 4 (for perfect keepalive, would be 4)
Code 200 : 200 (100.0 %)
All done 200 calls (plus 0 warmup) 76.558 ms avg, 18.1 qps
# 30 qps
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 30 -n 300 -loglevel Error http://service-node/env
15:38:36 I logger.go:97> Log level is now 4 Error (was 2 Info)
Fortio 1.0.1 running at 30 queries per second, 2->2 procs, for 300 calls: http://service-node/env
Aggregated Sleep Time : count 296 avg 0.035638851 +/- 0.1206 min -0.420611573 max 0.132597685 sum 10.5491
# range, mid point, percentile, count
>= -0.420612 <= -0.001 , -0.210806 , 24.66, 73
> -0.001 <= 0 , -0.0005 , 25.00, 1
...
# target 50% 0.0934
WARNING 24.66% of sleep were falling behind
Aggregated Function Time : count 300 avg 0.06131494 +/- 0.08193 min 0.001977589 max 0.42055696 sum 18.3944819
# target 50% 0.03
# target 75% 0.0628571
# target 90% 0.175
# target 99% 0.4
# target 99.9% 0.418501
Sockets used: 55 (for perfect keepalive, would be 4)
Code 200 : 249 (83.0 %)
Code 429 : 51 (17.0 %)
All done 300 calls (plus 0 warmup) 61.315 ms avg, 29.9 qps
# 30 qps
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 30 -n 300 -loglevel Error -H "x-forwarded-for: 10.28.11.20" http://service-node/env
15:40:34 I logger.go:97> Log level is now 4 Error (was 2 Info)
Fortio 1.0.1 running at 30 queries per second, 2->2 procs, for 300 calls: http://service-node/env
Aggregated Sleep Time : count 296 avg -1.4901022 +/- 1.952 min -6.08576837 max 0.123485559 sum -441.070241
# range, mid point, percentile, count
>= -6.08577 <= -0.001 , -3.04338 , 69.59, 206
...
# target 50% -1.72254
WARNING 69.59% of sleep were falling behind
Aggregated Function Time : count 300 avg 0.1177745 +/- 0.4236 min 0.008494289 max 5.14910151 sum 35.332351
# target 50% 0.0346875
# target 75% 0.0985714
# target 90% 0.25
# target 99% 0.55
# target 99.9% 5.12674
Sockets used: 4 (for perfect keepalive, would be 4)
Code 200 : 300 (100.0 %)
All done 300 calls (plus 0 warmup) 117.775 ms avg, 24.7 qps
# 50 qps
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 50 -n 500 -loglevel Error -H "x-forwarded-for: 10.28.11.20" http://service-node/env
15:45:31 I logger.go:97> Log level is now 4 Error (was 2 Info)
Fortio 1.0.1 running at 50 queries per second, 2->2 procs, for 500 calls: http://service-node/env
Aggregated Sleep Time : count 496 avg 0.0015264793 +/- 0.1077 min -0.382731569 max 0.078526418 sum 0.757133711
# range, mid point, percentile, count
>= -0.382732 <= -0.001 , -0.191866 , 25.40, 126
> -0.001 <= 0 , -0.0005 , 25.60, 1
...
> 0.069 <= 0.0785264 , 0.0737632 , 100.00, 34
# target 50% 0.0566056
WARNING 25.40% of sleep were falling behind
Aggregated Function Time : count 500 avg 0.039103632 +/- 0.05723 min 0.001972061 max 0.450959277 sum 19.5518159
# target 50% 0.0175385
# target 75% 0.0323529
# target 90% 0.0975
# target 99% 0.3
# target 99.9% 0.450719
Sockets used: 7 (for perfect keepalive, would be 4)
Code 200 : 497 (99.4 %)
Code 429 : 3 (0.6 %)
All done 500 calls (plus 0 warmup) 39.104 ms avg, 48.4 qps
# 60 qps
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 60 -n 600 -loglevel Error -H "x-forwarded-for: 10.28.11.20" http://service-node/env
15:50:24 I logger.go:97> Log level is now 4 Error (was 2 Info)
Fortio 1.0.1 running at 60 queries per second, 2->2 procs, for 600 calls: http://service-node/env
Aggregated Sleep Time : count 596 avg -0.081667759 +/- 0.1592 min -0.626635518 max 0.064876123 sum -48.6739846
# range, mid point, percentile, count
>= -0.626636 <= -0.001 , -0.313818 , 51.01, 304
> 0 <= 0.001 , 0.0005 , 51.34, 2
...
> 0.059 <= 0.0648761 , 0.0619381 , 100.00, 14
# target 50% -0.0133888
WARNING 51.01% of sleep were falling behind
Aggregated Function Time : count 600 avg 0.04532505 +/- 0.04985 min 0.001904423 max 0.304644243 sum 27.1950299
# target 50% 0.0208163
# target 75% 0.07
# target 90% 0.1025
# target 99% 0.233333
# target 99.9% 0.303251
Sockets used: 19 (for perfect keepalive, would be 4)
Code 200 : 585 (97.5 %)
Code 429 : 15 (2.5 %)
All done 600 calls (plus 0 warmup) 45.325 ms avg, 59.9 qps

6）访问service-python服务，测试限流是否生效：

$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -curl http://service-python/env
HTTP/1.1 200 OK
content-type: application/json
content-length: 178
server: envoy
date: Wed, 16 Jan 2019 15:47:30 GMT
x-envoy-upstream-service-time: 366
{"message":"python v2","upstream":[{"message":"lua v2","response_time":0.19},{"message":"node v2","response_time":0.18,"upstream":[{"message":"go v1","response_time":"0.02"}]}]}
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 1 -n 10 -loglevel Error http://service-python/env
15:48:02 I logger.go:97> Log level is now 4 Error (was 2 Info)
Fortio 1.0.1 running at 1 queries per second, 2->2 procs, for 10 calls: http://service-python/env
Aggregated Function Time : count 10 avg 0.45553668 +/- 0.5547 min 0.003725253 max 1.4107851249999999 sum 4.55536678
# target 50% 0.18
# target 75% 1.06846
# target 90% 1.27386
# target 99% 1.39709
# target 99.9% 1.40942
Sockets used: 6 (for perfect keepalive, would be 4)
Code 200 : 5 (50.0 %)
Code 429 : 5 (50.0 %)
All done 10 calls (plus 0 warmup) 455.537 ms avg, 0.6 qps

从上面的实验结果，可以得出如下的结论：

·对于service-go服务，当qps低于50时，请求几乎全部正常通过，当qps大于50时，会有部分请求得到429的响应码，这说明我们针对service-go服务配置的限流规则已经生效。

·对于service-node服务，普通调用时，当qps大于20时，就会出现部分请求得到429响应码。但是当添加"x-forwarded-for:10.28.11.20"请求头时，只有qps大于50时，才会出现部分请求得到429响应码，这说明我们针对service-node服务配置的两条限流规则都已经生效。

·对于service-python服务，我们限定每5秒只允许2次请求的限制，当以每秒1qps请求时，10个请求只有3个请求通过，其他请求均得到429响应码。这说明我们针对service-python服务配置的限流规则也已经生效。

Istio通过quota实现限流，但是限流控制并不是非常精确，可能会存在部分误差，使用时需要注意。

7）清理：

$ kubectl delete -f kubernetes/fortio.yaml
$ kubectl delete -f istio/resilience/quota-mem-ratelimit.yaml
$ kubectl delete -f service/node/service-node.yaml
$ kubectl delete -f service/lua/service-lua.yaml
$ kubectl delete -f service/python/service-python.yaml

[1] 官方文档链接是https://istio.io/docs/reference/config/policy-and-telemetry/attribute-vocabulary/。