12.4 使用Helm定制部署Istio

上一章

12.6 一个请求的完整过程分析

下一章

更多图书

12.5 故障排除

本节主要介绍当网格中出现问题时,应该如何排查问题,以及如何解决这些问题。我们将主要学习故障排除的技巧和方向。

1.路由不生效

如果创建路由规则后并没有达到预期的效果,可以使用如下的方法来找到路由不生效的问题并解决。

1)查看Pilot是否正常:


$ kubectl get pod -l app=pilot -n istio-system
NAME                           READY   STATUS    RESTARTS   AGE
istio-pilot-64958c46fc-cdk62   2/2     Running   0          26m

2)查看网格整体情况:


$ istioctl proxy-status
PROXY         CDS       LDS       EDS       RDS        PILOT         VERSION
istio-egressgateway-7dc5cbbc56-rc5pl.istio-system    SYNCED    SYNCED   ZSYNCED (100%)    NOT SENT   istio-pilot-64958c46fc-cdk62     1.0.2
istio-ingressgateway-7958d776b5-qfpg7.istio-system    SYNCED    SYNCED   SYNCED (100%)    NOT SENT    istio-pilot-64958c46fc-cdk62    1.0.2
service-go-v1-7cc5c6f574-sj2mn.default    SYNCED    SYNCED    SYNCED (100%) SYNCED    istio-pilot-64958c46fc-cdk62    1.0.2
service-go-v2-7656dcc478-f8zwc.default    SYNCED    SYNCED    SYNCED (100%)     SYNCED    istio-pilot-64958c46fc-cdk62    1.0.2

如果服务实例没有出现在上述的命令结果里,表明服务实例可能出了故障,没能加入网格内。

上述命令的输出结果字段解释如下:

·CDS:集群发现服务,可以理解为服务实例的集群。

·LDS:监听器发现服务,可以理解为服务实例的监听端口。

·EDS:端点发现服务,可以理解为单个服务实例发现。

·RDS:路由发现服务,根据条件路由请求到不同的集群中。

上述命令的输出结果字段的状态解释如下:

·SYNCED:表示Envoy代理已经确认收到了Pilot上一次发送的配置。

·SYNCED(100%):表示Pilot已经将集群中全部的服务实例信息发送给了Envoy代理。

·NOT SENT:表示Pilot没有发送任何配置给Envoy代理,通常是由于没有什么可发送的。

·STALE:表示Pilot已经发送了更新配置给Envoy代理,但是还没有收到Envoy代理的确认,通常可能是由于Envoy代理和Pilot之前的网络连接出现了问题或者机器性能有问题。

3)检查Envoy代理能否连接到Pilot。

在没有启用mTLS时使用如下命令:


$ SERVICE_GO_POD_NAME=$(kubectl get pod -l app=service-go -o jsonpath={.items[0].metadata.name})
$ PILOT_POD_IP=$(kubectl get pod -l app=pilot -n istio-system -o jsonpath={.items[0].status.podIP})
$ kubectl exec $SERVICE_GO_POD_NAME -c istio-proxy -- curl -s http://$PILOT_POD_IP:15003/v1/registration

命令的执行结果如下:


[
  {
  "service-key": "grafana.istio-system.svc.cluster.local|http",
  "hosts": [
  {
    "ip_address": "10.244.1.7",
    "port": 3000
    }
    ]
  },
...
  {
  "service-key": "service-go.default.svc.cluster.local|http",
  "hosts": [
    {
    "ip_address": "10.244.1.15",
    "port": 80
    },
  {
  "ip_address": "10.244.2.10",
  "port": 80
    }
    ]
  },
...
  {
  "service-key": "zipkin.istio-system.svc.cluster.local|http",
  "hosts": [
    {
     "ip_address": "10.244.1.10",
     "port": 9411
    }
    ]
  }
]

4)查看Pilot和Envoy的配置文件的差异:


$ istioctl proxy-status service-go-v1-7cc5c6f574-sj2mn.default
Clusters Match
Listeners Match
Routes Match

正常情况下应该如上述的结果所示,如果出现了一不致,就说明Pilot到Envoy的配置同步出现了问题。

5)深入Envoy配置信息。

查看服务集群配置信息:


$ istioctl proxy-config clusters -n default service-go-v1-7cc5c6f574-sj2mn
SERVICE FQDN   PORT      SUBSET     DIRECTION     TYPE
BlackHoleCluster  -  -  -  STATIC
...
jaeger-query.istio-system.svc.cluster.local   16686   -    outbound    EDS
kube-dns.kube-system.svc.cluster.local        53      -    outbound    EDS
kubernetes.default.svc.cluster.local          443     -    outbound    EDS
prometheus.istio-system.svc.cluster.local     9090    -    outbound    EDS
prometheus_stats                              -       -    -           STATIC
service-go.default.svc.cluster.local          80      -    inbound     STATIC
service-go.default.svc.cluster.local          80      -    outbound    EDS
...
$ istioctl proxy-config clusters service-go-v1-7cc5c6f574-sj2mn --fqdn service-go.default.svc.cluster.local -o json
[
    {
        "name": "inbound|80||service-go.default.svc.cluster.local",
        "connectTimeout": "1.000s",
        "hosts": [
            {
                "socketAddress": {
                    "address": "127.0.0.1",
                    "portValue": 80
                }
            }
        ],
        "circuitBreakers": {
            "thresholds": [
                {}
            ]
        }
    },
    {
        "name": "outbound|80||service-go.default.svc.cluster.local",
        "type": "EDS",
        "edsClusterConfig": {
            "edsConfig": {
                "ads": {}
            },
            "serviceName": "outbound|80||service-go.default.svc.cluster.local"
        },
        "connectTimeout": "1.000s",
        "circuitBreakers": {
            "thresholds": [
                {}
            ]
        }
    }
]

查看服务监听端口配置信息:


$ istioctl proxy-config listeners -n default service-go-v1-7cc5c6f574-sj2mn
ADDRESS                            PORT                      TYPE
...
10.108.252.151                     443                   TCP
10.108.252.151                     15011                 TCP
10.96.0.1                          443                   TCP
10.96.0.10                         53                    TCP
0.0.0.0                            9090                  HTTP
0.0.0.0                            80                    HTTP
...
$ istioctl proxy-config listeners -n default service-go-v1-7cc5c6f574-sj2mn --port 15001 -o json
[
    {
        "name": "virtual",
        "address": {
            "socketAddress": {
                "address": "0.0.0.0",
                "portValue": 15001
            }
        },
        "filterChains": [
            {
                "filters": [
                    {
                        "name": "envoy.tcp_proxy",
                        "config": {
                            "cluster": "BlackHoleCluster",
                            "stat_prefix": "BlackHoleCluster"
                        }
                    }
                ]
            }
        ],
        "useOriginalDst": true
    }
]

查看路由配置信息:


$ istioctl proxy-config routes -n default service-go-v1-7cc5c6f574-sj2mn -o json
...
    "routes": [
        {
            "match": {
                "prefix": "/"
            },
            "route": {
                "cluster":"outbound|80||service-go.default.svc.cluster.local",
                "timeout": "0.000s",
                "maxGrpcTimeout": "0.000s"
            },
...

查看启动时的配置信息:


$ istioctl proxy-config bootstrap -n default service-go-v1-7cc5c6f574-sj2mn -o json
{
    "bootstrap": {
        "node": {
            "id": "sidecar~10.244.2.10~service-go-v1-7cc5c6f574-sj2mn.default~default.
                 svc.cluster.local",
            "cluster": "service-go",
            "metadata": {
                "INTERCEPTION_MODE": "REDIRECT",
                "ISTIO_PROXY_SHA": "istio-proxy:6953ca783697da07ebe565322d12e9
                                   69280d8b03",
                "ISTIO_PROXY_VERSION": "1.0.2",
                "ISTIO_VERSION": "1.0.3",
                "POD_NAME": "service-go-v1-7cc5c6f574-sj2mn",
                "app": "service-go",
                "istio": "sidecar",
                "pod-template-hash": "7cc5c6f574",
                "version": "v1"
            },
        "buildVersion": "0/1.8.0-dev//RELEASE"
    },

查看服务实例的日志信息:


$ kubectl logs -f -n default service-go-v1-7cc5c6f574-sj2mn istio-proxy
...
[2019-01-19 04:48:58.770][11][info][main] external/envoy/source/server/server.cc:401] all clusters initialized. initializing init manager
[2019-01-19 04:48:58.935][11][info][upstream] external/envoy/source/server/lds_api.cc:80] lds: add/update listener '10.105.194.10_15011'
[2019-01-19 04:48:58.938][11][info][upstream] external/envoy/source/server/lds_api.cc:80] lds: add/update listener '10.105.194.10_853'
...
[2019-01-19 04:48:58.990][11][info][upstream] external/envoy/source/server/lds_api.cc:80] lds: add/update listener '0.0.0.0_15031'
[2019-01-19 04:48:59.003][11][info][upstream] external/envoy/source/server/lds_api.cc:80] lds: add/update listener '0.0.0.0_8060'
[2019-01-19 04:48:59.010][11][info][upstream] external/envoy/source/server/lds_api.cc:80] lds: add/update listener '0.0.0.0_8080'

2.mTLS异常

当启用mTLS后,服务出现无法正常访问的情况,关闭mTLS后服务可以正常访问,可以使用如下的排查步骤来发现问题并解决。

1)查看Citadel是否正常:


$ kubectl get pod -l istio=citadel -n istio-system
NAME                               READY     STATUS      RESTARTS     AGE
istio-citadel-6955bc9cb7-dsl78     1/1       Running     0            36m

2)查看证书和密钥:


$ kubectl exec $(kubectl get pod -l app=service-go -o jsonpath={.items[0].metadata.name}) -c istio-proxy -- ls /etc/certs
cert-chain.pem
key.pem
root-cert.pem
$ kubectl exec $(kubectl get pod -l app=service-go -o jsonpath={.items[0].metadata.name}) -c istio-proxy -- cat /etc/certs/cert-chain.pem | openssl x509 -text -noout  | grep Validity -A 2
    Validity
        Not Before: Nov 30 04:53:59 2018 GMT
        Not After : Feb 28 04:53:59 2019 GMT
$ kubectl exec $(kubectl get pod -l app=service-go -o jsonpath={.items[0].metadata.name}) -c istio-proxy -- cat /etc/certs/cert-chain.pem | openssl x509 -text -noout  | grep 'Subject Alternative Name' -A 1
            X509v3 Subject Alternative Name: 
                URI:spiffe://cluster.local/ns/default/sa/default

3)检查mTLS配置。

查看默认全局mTLS策略:


$ kubectl get meshpolicy default -o yaml
apiVersion: authentication.istio.io/v1alpha1
kind: MeshPolicy
metadata:
...
  name: default
...
spec:
  peers:
  - mtls:
      mode: PERMISSIVE

由于Istio安装时会默认设置全局的mTLS策略,所以默认情况下会显示为CONFLICT状态,但是由于默认策略模式为PERMISSIVE,因此服务仍然可以正常访问。


$ istioctl authn tls-check service-go.default.svc.cluster.local
HOST:PORT  STATUS   SERVER   CLIENT  AUTHN POLICY  DESTINATION RULE
service-go.default.svc.cluster.local:80  CONFLICT  mTLS  HTTP  default/    -

istioctl authn tls-check命令的输出解释如下:

·HOST:PORT表示被检查服务的地址和端口。

·STATUS表示mTLS当前的状态,OK表示正常,CONFLICT表示mTLS的服务端与客户端配置存在冲突,访问异常。

·SERVER表示服务端使用的协议。

·CLIENT表示客户端使用的协议。

·AUTHN POLICY表示使用的Policy策略名称,形如“策略名称/命名空间名称”,default/default表示使用的是default命名空间的default策略。

·DESTINATION RULE表示使用的DestinationRule路由规则名称,形如“路由规则名称/命名空间名称”,service-go/default表示使用的是default命名空间的名为service-go的DestinationRule路由规则。

【实验】

模拟mTLS故障,排查问题。

1)创建测试Pod:


$ kubectl apply -f kubernetes/dns-test.yaml

2)故意设置错误的mTLS策略:


$ kubectl apply -f istio/security/mtls-service-go-bad-rule.yaml

3)查看认证策略:


$ istioctl authn tls-check service-go.default.svc.cluster.local
HOST:PORT  STATUS  SERVER  CLIENT  AUTHN POLICY    DESTINATION RULE
service-go.default.svc.cluster.local:80  CONFLICT  HTTP   mTLS  service-go/default   service-go/default

从上面的输出结果可知,service-go服务mTLS的配置存在冲突,服务端使用了HTTP,而客户端使用了mTLS协议,因此访问会出现异常。

4)访问服务:


$ kubectl exec dns-test -c dns-test -- curl -s http://service-go/env
upstream connect error or disconnect/reset before headers

5)更新mTLS策略:


$ kubectl apply -f istio/security/mtls-service-go-on.yaml

6)查看认证状态:


$ istioctl authn tls-check service-go.default.svc.cluster.local
HOST:PORT    STATUS    SERVER    CLIENT    AUTHN POLICY    DESTINATION RULE
service-go.default.svc.cluster.local:80    OK    mTLS    mTLS    service-go/default    service-go/default

7)访问服务:


$ kubectl exec dns-test -c dns-test -- curl -s http://service-go/env
{"message":"go v2"}

8)清理:


$ kubectl delete -f istio/security/mtls-service-go-on.yaml
$ kubectl delete -f kubernetes/dns-test.yaml

3.RBAC异常

当启用RBAC后,服务出现无法正常访问的情况,关闭RBAC后服务可以正常访问,可以使用如下的排查步骤来发现问题,并解决问题。

1)确保RBAC已经正确开启。

开启RBAC规则:


$ kubectl apply -f istio/security/rbac-config-on.yaml
rbacconfig.rbac.istio.io/default created

查看RbacConfig:


$ kubectl get rbacconfigs.rbac.istio.io --all-namespaces
NAMESPACE       NAME          AGE
default         default       48s

确保只有一个名为default的RbacConfig配置实例,否则Istio会禁用RBAC功能,并忽略所有策略。如果有多余的RbacConfig配置实例,删除所有多余的RbacConfig配置实例。

2)开启Polit的RBAC调试日志。

另外开一个终端执行如下的命令,该命令不会结束,会持续输出访问日志。设置RBAC调试日志完成后,按Ctrl+C停止该命令:


$ kubectl port-forward $(kubectl -n istio-system get pods -l istio=pilot -o jsonpath='{.items[0].metadata.name}') -n istio-system 19876:9876

使用iptables开放远程访问:


$ sudo iptables -t nat -I PREROUTING -d 11.11.11.111 -p tcp --dport 9876 -j DNAT --to-destination 127.0.0.1:19876

通过ControlZ控制功能开启RBAC调试日志。

访问地址http://11.11.11.111:9876/scopez/ ,设置RBAC的输出级别为debug,如图12-6所示。

图12-6 设置RBAC的输出级别为debug

应用服务的RBAC策略:


$ kubectl apply -f istio/security/rbac-service-go-user-agent-policy.yaml
servicerole.rbac.istio.io/service-viewer created
servicerolebinding.rbac.istio.io/bind-service-viewer created

查看Pilot的日志输出:


$ kubectl logs $(kubectl -n istio-system get pods -l istio=pilot -o jsonpath='{.items[0].metadata.name}') -c discovery -n istio-system | grep rbac
2019-01-19T03:07:07.413353Z  info  registering for apiVersion rbac.istio.io/v1alpha1
2019-01-19T05:00:56.365084Z  info rbac    no service role in namespace default
2019-01-19T05:00:56.365915Z  info    rbac  no service role binding in namespace default
2019-01-19T05:00:56.366458Z  info  rbac  built filter config for service-go.default.svc.cluster.local
2019-01-19T05:00:56.373868Z  info  rbac  no service role in namespace default
2019-01-19T05:00:56.373952Z  info  rbac  no service role binding in namespace default
2019-01-19T05:00:56.374160Z  info  rbac  built filter config for service-go.default.svc.cluster.local
2019-01-19T05:02:43.948543Z  debug  rbac  building filter config for {service-go.default.svc.cluster.local map[pod-template-hash:7656dcc478 version:v2 app:service-go] map[destination.name:service-go destination.namespace:default destination.user:default]}

3)确保Pilot正确的分发了策略

查看Envoy代理的配置:


$ kubectl exec  $(kubectl get pods -l app=service-go -o jsonpath='{.items[0].metadata.name}') -c istio-proxy -- curl localhost:15000/config_dump -s
...
{
  "name": "envoy.filters.http.rbac",
  "config": {
    "shadow_rules": {
      "policies": {}
    },
    "rules": {
      "policies": {
        "service-viewer": {
          "permissions": [
            {
              "and_rules": {
                "rules": [
                  {
                    "or_rules": {
                      "rules": [
                        {
                          "header": {
                            "exact_match": "GET",
                            "name": ":method"
                        }
...
        "principals": [
        {
          "and_ids": {
            "ids": [
              {
                "header": {
                  "name": "User-Agent",
                  "prefix_match": "RBAC-"
                }
              }
            ]
          }

从上述命令结果的"envoy.filters.http.rbac"中查看RBAC策略Envoy代理是否已经收到并应用到配置中。

4)确保Envoy代理正确的执行了策略。

设置Envoy代理RBAC的日志级别为debug:


$ kubectl exec -ti $(kubectl get pod -l app=service-go -o jsonpath={.items[0].metadata.name}) -c istio-proxy -- curl -X POST 127.0.0.1:15000/logging?rbac=debug
active loggers:
...
    http: info
    http2: info
...
    rbac: debug
    redis: info

5)创建测试Pod:


$ kubectl apply -f kubernetes/dns-test.yaml

6)访问service-go服务:


$ kubectl exec dns-test -c dns-test -- curl -s http://service-go/env
RBAC: access denied
$ kubectl exec dns-test -c dns-test -- curl -s -H "User-Agent: RBAC-TEST" http://service-go/env
{"message":"go v2"}
$ kubectl exec dns-test -c dns-test -- curl -s -H "User-Agent: RBAC-TEST" http://service-go/env
{"message":"go v1"}

7)查看Envoy代理日志:


$ kubectl logs $(kubectl get pods -l app=service-go -o jsonpath='{.items[0].metadata.name}') -c istio-proxy
[2019-01-19 04:57:30.512][19][debug][rbac] external/envoy/source/extensions/filters/http/rbac/rbac_filter.cc:65] checking request: remoteAddress: 10.244.1.16:46074, localAddress: 10.244.1.15:80, ssl: none, headers: ':authority', 'service-go'
':path', '/env'
':method', 'GET'
'user-agent', 'curl/7.35.0'
...
}
[2019-01-19 04:58:18.446][19][debug][rbac] external/envoy/source/extensions/filters/http/rbac/rbac_filter.cc:78] shadow denied
[2019-01-19 04:58:18.456][19][debug][rbac] external/envoy/source/extensions/filters/http/rbac/rbac_filter.cc:112] enforced denied
[2019-01-19 05:00:12.409][20][debug][rbac] external/envoy/source/extensions/filters/http/rbac/rbac_filter.cc:65] checking request: remoteAddress: 10.244.1.16:46594, localAddress: 10.244.1.15:80, ssl: none, headers: ':authority', 'service-go'
':path', '/env'
':method', 'GET'
'accept', '*/*'
'user-agent', 'RBAC-TEST'
...
}
[2019-01-19 05:05:25.669][20][debug][rbac] external/envoy/source/extensions/filters/http/rbac/rbac_filter.cc:78] shadow denied
[2019-01-19 05:05:25.669][20][debug][rbac] external/envoy/source/extensions/filters/http/rbac/rbac_filter.cc:108] enforced allowed
[2019-01-19T05:05:25.669Z] "GET /envHTTP/1.1" 200 - 0 19 6 1 "-" "RBAC-TEST" "9f594a94-6be0-924d-83a0-39448f020701" "service-go" "127.0.0.1:80" inbound|80||service-go.default.svc.cluster.local - 10.244.1.15:80 10.244.1.16:46074

从日志中可以看出,当'user-agent'='RBAC-TEST'请求被允许(enforced allowed),当'user-agent'='curl/7.35.0'请求被拒绝(enforced denied)。

8)清理:


#通过

ControlZ控制功能恢复

RBAC日志级别为

info
#删除

iptables规则


$ sudo iptables -t nat -D PREROUTING -d 11.11.11.111 -p tcp --dport 9876 -j DNAT --to-destination 127.0.0.1:19876
#重置

Envoy代理

RBAC的日志级别为

info
$ kubectl exec -ti $(kubectl get pod -l app=service-go -o jsonpath={.items[0].metadata.name}) -c istio-proxy -- curl -X POST 127.0.0.1:15000/logging?rbac=info
$ kubectl delete -f istio/security/rbac-config-on.yaml
$ kubectl delete -f istio/security/rbac-service-go-user-agent-policy.yaml
$ kubectl delete -f kubernetes/dns-test.yaml

4.指标或日志收集异常

这里只演示指标数据收集异常的情况,日志数据收集问题的思路类似,不再演示。

1)基础环境准备。

创建测试Pod:


$ kubectl apply -f kubernetes/fortio.yaml

访问服务:


$ kubectl exec fortio -c fortio /usr/local/bin/fortio -- load -qps 10 -n 100 -loglevel Error http://service-go/env
05:18:48 I logger.go:97> Log level is now 4 Error (was 2 Info)
Fortio 1.0.1 running at 10 queries per second, 2->2 procs, for 100 calls: http://service-go/env
Aggregated Function Time : count 100 avg 0.0067368204 +/- 0.003404 min 0.002795267 max 0.025809888 sum 0.673682036
# target 50%       0.00593333
# target 75%       0.00833333
# target 90%       0.0105
# target 99%       0.018
# target 99.9%     0.0257289
Sockets used: 4 (for perfect keepalive, would be 4)
Code 200 : 100 (100.0 %)
All done 100 calls (plus 0 warmup) 6.737 ms avg, 10.0 qps

2)查看Mixer是否接收到了Envoy代理报告请求。

使用如下命令查看Mixer接收到Envoy代理报告的请求次数:


$ TELEMETRY_IP=$(kubectl get svc istio-telemetry -n istio-system -o jsonpath='{.spec.clusterIP}')
$ curl -s $TELEMETRY_IP:9093/metrics | grep grpc_server_handled_total
# HELP grpc_server_handled_total Total number of RPCs completed on the server, regardless of success or failure.
# TYPE grpc_server_handled_total counter
grpc_server_handled_total{grpc_code="OK",grpc_method="Report",grpc_service="istio.mixer.v1.Mixer",grpc_type="unary"} 52

3)查看Mixer规则是否存在。

使用如下命令查看已经创建的rule:


$ kubectl get rules --all-namespaces
NAMESPACE                  NAME                         AGE
istio-system               kubeattrgenrulerule          28d
istio-system               promhttp                     28d
istio-system               promtcp                      28d
istio-system               stdio                        28d
istio-system               stdiotcp                     28d
istio-system               tcpkubeattrgenrulerule       28d

4)查看Prometheus处理程序是否存在。

使用如下命令查看已经创建的Prometheus处理程序:


$ kubectl get prometheuses.config.istio.io --all-namespaces
NAMESPACE                  NAME                          AGE
istio-system               handler                       28d

5)查看Mixer指标收集实例是否存在。

使用如下命令查看已经创建的metric实例:


$ kubectl get metrics.config.istio.io --all-namespaces
NAMESPACE                  NAME                         AGE
istio-system               requestcount                 28d
istio-system               requestduration              28d
istio-system               requestsize                  28d
istio-system               responsesize                 28d
istio-system               tcpbytereceived              28d
istio-system               tcpbytesent                  28d

6)查看是否有配置错误。

使用如下命令查看是存在错误的配置,当存在计数不为0的条目时,表示存在配置错误:


$ TELEMETRY_IP=$(kubectl get svc istio-telemetry -n istio-system -o jsonpath='{.spec.clusterIP}')
$ curl -s $TELEMETRY_IP:9093/metrics | grep config_error_count
# HELP mixer_config_adapter_info_config_error_count The number of errors encountered during processing of the adapter info configuration.
# TYPE mixer_config_adapter_info_config_error_count counter
...
mixer_config_instance_config_error_count{configID="17"} 1
mixer_config_instance_config_error_count{configID="2"} 0
...
mixer_config_rule_config_error_count{configID="15"} 0
mixer_config_rule_config_error_count{configID="16"} 0
mixer_config_rule_config_error_count{configID="17"} 1
...

7)查看Mixer日志。

使用如下的命令查看Mixer日志有无错误信息。如果需要,也可以像之前开启Pilot的RBAC的日志输出级别的步骤一样,开启Mixer的debug级别的输出日志:


$ kubectl logs -f -n istio-system $(kubectl get pod -l app=telemetry -n istio-system -o jsonpath='{.items[0].metadata.name}') mixer | egrep 'error|warn'
...
2018-12-28T07:16:30.155044Z    error    failed to evaluate expression for field 'Dimensions[source_service]'; unknown attribute source.service.name
2018-12-28T07:16:30.155882Z    error    Instance not found: instance='mytcpsentbytes.metric'
2018-12-28T07:16:30.156061Z    warn Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2018-12-28T07:16:30.265742Z    error    adapters    adapter did not close all the scheduled daemons    {"adapter": "handler.kubernetesenv.istio-system"}
2018-12-28T07:16:37.171114Z    warn input set condition evaluation error: id='9', error='lookup failed: 'destination.service''
2018-12-28T07:16:47.173559Z    warn input set condition evaluation error: id='9', error='lookup failed: 'destination.service''
...

8)查看Mixer是否发送了指标给Prometheus处理程序。

使用如下命令查看Mixer发送指标给Prometheus处理程序的次数:


$ TELEMETRY_IP=$(kubectl get svc istio-telemetry -n istio-system -o jsonpath='{.spec.clusterIP}')
$ curl -s $TELEMETRY_IP:9093/metrics | grep mixer_runtime_dispatch_count
# HELP mixer_runtime_dispatch_count Total number of adapter dispatches handled by Mixer.
# TYPE mixer_runtime_dispatch_count counter
mixer_runtime_dispatch_count{adapter="kubernetesenv",error="false",handler="handler.kubernetesenv.istio-system",meshFunction="kubernetes"} 884
mixer_runtime_dispatch_count{adapter="kubernetesenv",error="true",handler="handler.kubernetesenv.istio-system",meshFunction="kubernetes"} 0
mixer_runtime_dispatch_count{adapter="prometheus",error="false",handler="handler.prometheus.istio-system",meshFunction="metric"} 213
mixer_runtime_dispatch_count{adapter="prometheus",error="false",handler="tcphandler.prometheus.default",meshFunction="metric"} 83
mixer_runtime_dispatch_count{adapter="prometheus",error="true",handler="handler.prometheus.istio-system",meshFunction="metric"} 0
mixer_runtime_dispatch_count{adapter="prometheus",error="true",handler="tcphandler.prometheus.default",meshFunction="metric"} 0
mixer_runtime_dispatch_count{adapter="stdio",error="false",handler="handler.stdio.istio-system",meshFunction="logentry"} 213
mixer_runtime_dispatch_count{adapter="stdio",error="true",handler="handler.stdio.istio-system",meshFunction="logentry"} 0

9)查看Prometheus配置。

使用如下命令查看Prometheus配置文件,确认是否有Istio相关的配置:


$ PROMETHEUS_IP=$(kubectl get svc prometheus -n istio-system -o jsonpath='{.spec.clusterIP}')
$ curl -s $PROMETHEUS_IP:9090/config | grep -B15 'istio-telemetry;prometheus'
scrape_configs:
- job_name: istio-mesh
    scrape_interval: 5s
    scrape_timeout: 5s
    metrics_path: /metrics
    scheme: http
    kubernetes_sd_configs:
    - api_server: null
        role: endpoints
        namespaces:
            names:
            - istio-system
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        separator: ;
        regex: istio-telemetry;prometheus