Istio提供两种限流的实现:基于内存限流和基于Redis的限流。基于内存的限流方式适用于只部署一个Mixer的集群,而且由于使用内存存储的数据来限流,Mixer重启后限流数据会丢失。因此在生产环境,建议使用基于Redis的限流方式,这种方式可存储限流数据。在Istio中,服务被限流的请求会得到429(Too Many Requests)响应码。
限流的配置分为客户端和Mixer端两个部分。
客户端配置:
·QuotaSpec定义了Quota实例和对应的每次请求消耗的配额数。
·QuotaSpecBinding将QuotaSpec与一个或多个服务相关联绑定,只有被关联绑定的服务限流才会生效。
Mixer端配置:
·quota实例定义了Mixer如何区别度量一个请求的限流配额,用来描述请求数据收集的维度。
·memquota/redisquota适配器定义了memquota/redisquota的配置,根据quota实例定义的请求数据收集维度来区分并定义一个或多个限流配额。
·rule规则定义了quota实例应该何时分发给memquota/redisquota适配器处理。
1.基于内存的限流
使用示例如下:
1 apiVersion: "config.istio.io/v1alpha2" 2 kind: quota 3 metadata: 4 name: requestcount 5 namespace: istio-system 6 spec: 7 dimensions: 8 source: request.headers["x-forwarded-for"] | "unknown" 9 destination: destination.labels["app"] | destination.service.name | "unknown" 10 destinationVersion: destination.labels["version"] | "unknown" 11 --- 12 apiVersion: "config.istio.io/v1alpha2" 13 kind: memquota 14 metadata: 15 name: handler 16 namespace: istio-system 17 spec: 18 quotas: 19 - name: requestcount.quota.istio-system 20 maxAmount: 500 21 validDuration: 1s 22 overrides: 23 - dimensions: 24 destination: service-go 25 maxAmount: 50 26 validDuration: 1s 27 - dimensions: 28 destination: service-node 29 source: "10.28.11.20" 30 maxAmount: 50 31 validDuration: 1s 32 - dimensions: 33 destination: service-node 34 maxAmount: 20 35 validDuration: 1s 36 - dimensions: 37 destination: service-python 38 maxAmount: 2 39 validDuration: 5s 40 --- 41 apiVersion: config.istio.io/v1alpha2 42 kind: rule 43 metadata: 44 name: quota 45 namespace: istio-system 46 spec: 47 actions: 48 - handler: handler.memquota 49 instances: 50 - requestcount.quota 51 --- 52 apiVersion: config.istio.io/v1alpha2 53 kind: QuotaSpec 54 metadata: 55 name: request-count 56 namespace: istio-system 57 spec: 58 rules: 59 - quotas: 60 - charge: 1 61 quota: requestcount 62 --- 63 apiVersion: config.istio.io/v1alpha2 64 kind: QuotaSpecBinding 65 metadata: 66 name: request-count 67 namespace: istio-system 68 spec: 69 quotaSpecs: 70 - name: request-count 71 namespace: istio-system 72 services: 73 - name: service-go 74 namespace: default 75 - name: service-node 76 namespace: default 77 - name: service-python 78 namespace: default
第1~10行定义了名为requestcount的quota实例,获取请求的source、destination、destinationVersion值供memquota适配器来区分请求的限流配额。取值规则如下:
·source获取请求的x-forwarded-for请求头的值作为source的取值,不存在时,source取值"unknown"。
·destination获取请求的目标服务标签中的app标签的值,不存在时,取目标服务的service.name字段值,否则destination取值"unknown"。
·destinationVersion获取请求目标服务标签中的version标签的值,不存在时,destinationVersion取值"unknown"。
第12~39行定义了名为handler的memquota适配器。19行中的name字段值为上面定义的quota实例名称。第20行定义了默认的限流配额为500。第21行定义默认的限流计算周期为1s,即默认情况下每秒最高500个请求。第23~39行为具体的限流配置。第23~26行定义了当destination是service-go时,每秒不能高于50个请求。第27~31行定义了当destination为service-node且source为"10.28.11.20"时,每秒不能高于50个请求。第32~35行定义了当destination为service-node时,每秒不能高于20个请求。第36~39行定义了当destination为service-python时,每5秒内不能高于2个请求。
第41~50行定义了名为quota的rule规则,由于没有指定条件,会把所有相关联的服务请求都分发给memquota适配器处理。
第52~61行定义了名为request-count的QuotaSpec,指定了名为requestcount的quota实例每次消耗一个配额。
第63~78行定义了名为request-count的QuotaSpecBinding,把default命名空间的service-go、service-node、service-python服务与名为request-count的QuotaSpec关联起来。
在memquota适配器配置的所有限流规则中,执行限流时会从第一条限流规则开始匹配。当遇到第一条匹配的规则后,后面的规则不再匹配。如果没有匹配到任何具体的规则,则使用默认的规则。所以第27~31行定义的限流规则不能与第32~35行定义的限流规则交换位置,如果交换位置就会导致第27~31行定义的限流规则永远不会被匹配到,所以配置限流规则的时候,越具体的匹配规则应该放在越靠前的位置,否则可能会出现达不到预期的限流效果。
quota实例具体可以使用获取哪些值用于区分请求,可以参考官方文档 [1] 。
2.基于Redis的限流
使用示例如下:
1 apiVersion: "config.istio.io/v1alpha2" 2 kind: quota 3 metadata: 4 name: requestcount 5 namespace: istio-system 6 spec: 7 dimensions: 8 source: request.headers["x-forwarded-for"] | "unknown" 9 destination: destination.labels["app"] | destination.workload.name | "unknown" 10 destinationVersion: destination.labels["version"] | "unknown" 11 --- 12 apiVersion: "config.istio.io/v1alpha2" 13 kind: redisquota 14 metadata: 15 name: handler 16 namespace: istio-system 17 spec: 18 redisServerUrl: redis-ratelimit.istio-system:6379 19 connectionPoolSize: 10 20 quotas: 21 - name: requestcount.quota.istio-system 22 maxAmount: 500 23 validDuration: 1s 24 bucketDuration: 500ms 25 rateLimitAlgorithm: ROLLING_WINDOW 26 overrides: 27 - dimensions: 28 destination: service-go 29 maxAmount: 50 30 - dimensions: 31 destination: service-node 32 source: "10.28.11.20" 33 maxAmount: 50 34 - dimensions: 35 destination: service-node 36 maxAmount: 20 37 - dimensions: 38 destination: service-python 39 maxAmount: 2 40 --- 41 apiVersion: config.istio.io/v1alpha2 42 kind: rule 43 metadata: 44 name: quota 45 namespace: istio-system 46 spec: 47 actions: 48 - handler: handler.redisquota 49 instances: 50 - requestcount.quota 51 --- 52 apiVersion: config.istio.io/v1alpha2 53 kind: QuotaSpec 54 metadata: 55 name: request-count 56 namespace: istio-system 57 spec: 58 rules: 59 - quotas: 60 - charge: 1 61 quota: requestcount 62 --- 63 apiVersion: config.istio.io/v1alpha2 64 kind: QuotaSpecBinding 65 metadata: 66 name: request-count 67 namespace: istio-system 68 spec: 69 quotaSpecs: 70 - name: request-count 71 namespace: istio-system 72 services: 73 - name: service-go 74 namespace: default 75 - name: service-node 76 namespace: default 77 - name: service-python 78 namespace: default
第12~39行定义了名为handler的redisquota适配器,第18行定义了Redis的连接地址,19行定义了Redis的连接池大小。
第22行定义了默认配额为500,23行定义了默认限流周期为1秒,即默认情况下每秒最高500个请求。
第25行定义了使用的限流算法有两种:FIXED_WINDOW和ROLLING_WINDOW,其中,FIXED_WINDOW为默认的算法:
·FIXED_WINDOW算法可以设置请求速率峰值高达2倍。
·ROLLING_WINDOW算法可以提高精确度,这也会额外消耗Redis的资源。
第27~39行定义了具体的限流规则,与memquota不同,这里不允许再单独为限流规则设置限流周期,只能使用默认的限流周期。
其余部分的配置与memquota的限流配置一样。下面举例说明限流方法。
(1)基于条件的限流
如下配置表示只对cookie中不存在user的请求做限流:
apiVersion: config.istio.io/v1alpha2 kind: rule metadata: name: quota namespace: istio-system spec: match: match(request.headers["cookie"], "user=*") == false actions: - handler: handler.memquota instances: - requestcount.quota
(2)对所有服务限流
如下的配置表示对所有服务进行限流。
apiVersion: config.istio.io/v1alpha2 kind: QuotaSpecBinding metadata: name: request-count namespace: istio-system spec: quotaSpecs: - name: request-count namespace: istio-system services: - service: '*'
【实验】
本次实验使用基于内存的memquota适配器来进行服务限流测试。如果使用基于Redis的redisquota适配器进行实验,可能会由于实验环境机器性能问题,导致Mixer访问Redis出现错误,进而导致请求速率还没有到达设置值时就出现被限流的情况,影响实验结果的准确性。
1)部署其他服务:
$ kubectl apply -f service/node/service-node.yaml $ kubectl apply -f service/lua/service-lua.yaml $ kubectl apply -f service/python/service-python.yaml $ kubectl get pod NAME READY STATUS RESTARTS AGE service-go-v1-7cc5c6f574-488rs 2/2 Running 0 15m service-go-v2-7656dcc478-bfq5x 2/2 Running 0 15m service-lua-v1-5c9bcb7778-d7qwp 2/2 Running 0 3m12s service-lua-v2-75cb5cdf8-g9vht 2/2 Running 0 3m12s service-node-v1-d44b9bf7b-z7vbr 2/2 Running 0 3m11s service-node-v2-86545d9796-rgtxw 2/2 Running 0 3m10s service-python-v1-79fc5849fd-xgfkn 2/2 Running 0 3m9s service-python-v2-7b6864b96b-5w6cj 2/2 Running 0 3m15s
2)启动用于并发测试的Pod:
$ kubectl apply -f kubernetes/fortio.yaml
3)创建限流规则:
$ kubectl apply -f istio/resilience/quota-mem-ratelimit.yaml
4)访问service-go服务,测试限流是否生效:
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -curl http://service-go/env HTTP/1.1 200 OK content-type: application/json; charset=utf-8 date: Wed, 16 Jan 2019 15:33:02 GMT content-length: 19 x-envoy-upstream-service-time: 226 server: envoy {"message":"go v1"} # 30 qps $ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 30 -n 300 -loglevel Error http://service-go/env 15:33:36 I logger.go:97> Log level is now 4 Error (was 2 Info) Fortio 1.0.1 running at 30 queries per second, 2->2 procs, for 300 calls: http://service-go/env Aggregated Function Time : count 300 avg 0.0086544419 +/- 0.005944 min 0.002929143 max 0.065596074 sum 2.59633258 # target 50% 0.007375 # target 75% 0.00938095 # target 90% 0.0115 # target 99% 0.0325 # target 99.9% 0.0647567 Sockets used: 4 (for perfect keepalive, would be 4) Code 200 : 300 (100.0 %) All done 300 calls (plus 0 warmup) 8.654 ms avg, 30.0 qps # 50 qps $ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 50 -n 500 -loglevel Error http://service-go/env 15:34:17 I logger.go:97> Log level is now 4 Error (was 2 Info) Fortio 1.0.1 running at 50 queries per second, 2->2 procs, for 500 calls: http://service-go/env Aggregated Function Time : count 500 avg 0.0086848862 +/- 0.005076 min 0.00307391 max 0.05419281 sum 4.34244311 # target 50% 0.0075 # target 75% 0.00959459 # target 90% 0.0132857 # target 99% 0.03 # target 99.9% 0.0531446 Sockets used: 4 (for perfect keepalive, would be 4) Code 200 : 500 (100.0 %) All done 500 calls (plus 0 warmup) 8.685 ms avg, 50.0 qps # 60 qps $ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 60 -n 600 -loglevel Error http://service-go/env 15:35:28 I logger.go:97> Log level is now 4 Error (was 2 Info) Fortio 1.0.1 running at 60 queries per second, 2->2 procs, for 600 calls: http://service-go/env Aggregated Function Time : count 600 avg 0.0090870522 +/- 0.008314 min 0.002537502 max 0.169680378 sum 5.45223134 # target 50% 0.00748529 # target 75% 0.0101538 # target 90% 0.0153548 # target 99% 0.029375 # target 99.9% 0.163872 Sockets used: 23 (for perfect keepalive, would be 4) Code 200 : 580 (96.7 %) Code 429 : 20 (3.3 %) All done 600 calls (plus 0 warmup) 9.087 ms avg, 59.9 qps
5)访问service-node服务,测试限流是否生效:
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -curl http://service-node/env HTTP/1.1 200 OK content-type: application/json; charset=utf-8 content-length: 77 date: Wed, 16 Jan 2019 15:36:13 GMT x-envoy-upstream-service-time: 1187 server: envoy {"message":"node v2","upstream":[{"message":"go v1","response_time":"0.51"}]} # 20 qps $ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 20 -n 200 -loglevel Error http://service-node/env 15:37:51 I logger.go:97> Log level is now 4 Error (was 2 Info) Fortio 1.0.1 running at 20 queries per second, 2->2 procs, for 200 calls: http://service-node/env Aggregated Sleep Time : count 196 avg -0.21285915 +/- 1.055 min -4.8433788589999995 max 0.190438028 sum -41.7203939 # range, mid point, percentile, count >= -4.84338 <= -0.001 , -2.42219 , 18.37, 36 > 0.003 <= 0.004 , 0.0035 , 20.41, 4 > 0.011 <= 0.013 , 0.012 , 20.92, 1 > 0.015 <= 0.017 , 0.016 , 21.43, 1 > 0.069 <= 0.079 , 0.074 , 21.94, 1 > 0.089 <= 0.099 , 0.094 , 24.49, 5 > 0.099 <= 0.119 , 0.109 , 28.57, 8 > 0.119 <= 0.139 , 0.129 , 33.67, 10 > 0.139 <= 0.159 , 0.149 , 38.27, 9 > 0.159 <= 0.179 , 0.169 , 68.37, 59 > 0.179 <= 0.190438 , 0.184719 , 100.00, 62 # target 50% 0.166797 WARNING 18.37% of sleep were falling behind Aggregated Function Time : count 200 avg 0.07655831 +/- 0.3601 min 0.007514854 max 5.046878744 sum 15.311662 # target 50% 0.0258696 # target 75% 0.045 # target 90% 0.104 # target 99% 0.55 # target 99.9% 5.0375 Sockets used: 4 (for perfect keepalive, would be 4) Code 200 : 200 (100.0 %) All done 200 calls (plus 0 warmup) 76.558 ms avg, 18.1 qps # 30 qps $ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 30 -n 300 -loglevel Error http://service-node/env 15:38:36 I logger.go:97> Log level is now 4 Error (was 2 Info) Fortio 1.0.1 running at 30 queries per second, 2->2 procs, for 300 calls: http://service-node/env Aggregated Sleep Time : count 296 avg 0.035638851 +/- 0.1206 min -0.420611573 max 0.132597685 sum 10.5491 # range, mid point, percentile, count >= -0.420612 <= -0.001 , -0.210806 , 24.66, 73 > -0.001 <= 0 , -0.0005 , 25.00, 1 ... # target 50% 0.0934 WARNING 24.66% of sleep were falling behind Aggregated Function Time : count 300 avg 0.06131494 +/- 0.08193 min 0.001977589 max 0.42055696 sum 18.3944819 # target 50% 0.03 # target 75% 0.0628571 # target 90% 0.175 # target 99% 0.4 # target 99.9% 0.418501 Sockets used: 55 (for perfect keepalive, would be 4) Code 200 : 249 (83.0 %) Code 429 : 51 (17.0 %) All done 300 calls (plus 0 warmup) 61.315 ms avg, 29.9 qps # 30 qps $ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 30 -n 300 -loglevel Error -H "x-forwarded-for: 10.28.11.20" http://service-node/env 15:40:34 I logger.go:97> Log level is now 4 Error (was 2 Info) Fortio 1.0.1 running at 30 queries per second, 2->2 procs, for 300 calls: http://service-node/env Aggregated Sleep Time : count 296 avg -1.4901022 +/- 1.952 min -6.08576837 max 0.123485559 sum -441.070241 # range, mid point, percentile, count >= -6.08577 <= -0.001 , -3.04338 , 69.59, 206 ... # target 50% -1.72254 WARNING 69.59% of sleep were falling behind Aggregated Function Time : count 300 avg 0.1177745 +/- 0.4236 min 0.008494289 max 5.14910151 sum 35.332351 # target 50% 0.0346875 # target 75% 0.0985714 # target 90% 0.25 # target 99% 0.55 # target 99.9% 5.12674 Sockets used: 4 (for perfect keepalive, would be 4) Code 200 : 300 (100.0 %) All done 300 calls (plus 0 warmup) 117.775 ms avg, 24.7 qps # 50 qps $ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 50 -n 500 -loglevel Error -H "x-forwarded-for: 10.28.11.20" http://service-node/env 15:45:31 I logger.go:97> Log level is now 4 Error (was 2 Info) Fortio 1.0.1 running at 50 queries per second, 2->2 procs, for 500 calls: http://service-node/env Aggregated Sleep Time : count 496 avg 0.0015264793 +/- 0.1077 min -0.382731569 max 0.078526418 sum 0.757133711 # range, mid point, percentile, count >= -0.382732 <= -0.001 , -0.191866 , 25.40, 126 > -0.001 <= 0 , -0.0005 , 25.60, 1 ... > 0.069 <= 0.0785264 , 0.0737632 , 100.00, 34 # target 50% 0.0566056 WARNING 25.40% of sleep were falling behind Aggregated Function Time : count 500 avg 0.039103632 +/- 0.05723 min 0.001972061 max 0.450959277 sum 19.5518159 # target 50% 0.0175385 # target 75% 0.0323529 # target 90% 0.0975 # target 99% 0.3 # target 99.9% 0.450719 Sockets used: 7 (for perfect keepalive, would be 4) Code 200 : 497 (99.4 %) Code 429 : 3 (0.6 %) All done 500 calls (plus 0 warmup) 39.104 ms avg, 48.4 qps # 60 qps $ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 60 -n 600 -loglevel Error -H "x-forwarded-for: 10.28.11.20" http://service-node/env 15:50:24 I logger.go:97> Log level is now 4 Error (was 2 Info) Fortio 1.0.1 running at 60 queries per second, 2->2 procs, for 600 calls: http://service-node/env Aggregated Sleep Time : count 596 avg -0.081667759 +/- 0.1592 min -0.626635518 max 0.064876123 sum -48.6739846 # range, mid point, percentile, count >= -0.626636 <= -0.001 , -0.313818 , 51.01, 304 > 0 <= 0.001 , 0.0005 , 51.34, 2 ... > 0.059 <= 0.0648761 , 0.0619381 , 100.00, 14 # target 50% -0.0133888 WARNING 51.01% of sleep were falling behind Aggregated Function Time : count 600 avg 0.04532505 +/- 0.04985 min 0.001904423 max 0.304644243 sum 27.1950299 # target 50% 0.0208163 # target 75% 0.07 # target 90% 0.1025 # target 99% 0.233333 # target 99.9% 0.303251 Sockets used: 19 (for perfect keepalive, would be 4) Code 200 : 585 (97.5 %) Code 429 : 15 (2.5 %) All done 600 calls (plus 0 warmup) 45.325 ms avg, 59.9 qps
6)访问service-python服务,测试限流是否生效:
$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -curl http://service-python/env HTTP/1.1 200 OK content-type: application/json content-length: 178 server: envoy date: Wed, 16 Jan 2019 15:47:30 GMT x-envoy-upstream-service-time: 366 {"message":"python v2","upstream":[{"message":"lua v2","response_time":0.19},{"message":"node v2","response_time":0.18,"upstream":[{"message":"go v1","response_time":"0.02"}]}]} $ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 1 -n 10 -loglevel Error http://service-python/env 15:48:02 I logger.go:97> Log level is now 4 Error (was 2 Info) Fortio 1.0.1 running at 1 queries per second, 2->2 procs, for 10 calls: http://service-python/env Aggregated Function Time : count 10 avg 0.45553668 +/- 0.5547 min 0.003725253 max 1.4107851249999999 sum 4.55536678 # target 50% 0.18 # target 75% 1.06846 # target 90% 1.27386 # target 99% 1.39709 # target 99.9% 1.40942 Sockets used: 6 (for perfect keepalive, would be 4) Code 200 : 5 (50.0 %) Code 429 : 5 (50.0 %) All done 10 calls (plus 0 warmup) 455.537 ms avg, 0.6 qps
从上面的实验结果,可以得出如下的结论:
·对于service-go服务,当qps低于50时,请求几乎全部正常通过,当qps大于50时,会有部分请求得到429的响应码,这说明我们针对service-go服务配置的限流规则已经生效。
·对于service-node服务,普通调用时,当qps大于20时,就会出现部分请求得到429响应码。但是当添加"x-forwarded-for:10.28.11.20"请求头时,只有qps大于50时,才会出现部分请求得到429响应码,这说明我们针对service-node服务配置的两条限流规则都已经生效。
·对于service-python服务,我们限定每5秒只允许2次请求的限制,当以每秒1qps请求时,10个请求只有3个请求通过,其他请求均得到429响应码。这说明我们针对service-python服务配置的限流规则也已经生效。
Istio通过quota实现限流,但是限流控制并不是非常精确,可能会存在部分误差,使用时需要注意。
7)清理:
$ kubectl delete -f kubernetes/fortio.yaml $ kubectl delete -f istio/resilience/quota-mem-ratelimit.yaml $ kubectl delete -f service/node/service-node.yaml $ kubectl delete -f service/lua/service-lua.yaml $ kubectl delete -f service/python/service-python.yaml
[1] 官方文档链接是https://istio.io/docs/reference/config/policy-and-telemetry/attribute-vocabulary/。