k8s 的实时日志 Events

2019.04.16

Event

什么是k8s的Events？比如你启动了一个Deployment，k8s的各大组件就开始依次忙碌起来，最终完成Pod的创建，从声明一个Deployment开始，到Pod启动完成，会生成一些列Events，用来告知用户现在的状态。event就是用来回答一些问题，比如为什么pod没有启动，因为没有配置私有仓库的认证，为什么pod会被Kill，因为是超过limit的限制。具体如下所示：

kubectl get ev
LAST SEEN   TYPE     REASON              OBJECT                        MESSAGE
3s          Normal   Scheduled           pod/nginx-698898f666-smg7t    Successfully assigned default/nginx-698898f666-smg7t to east1-monitor1
3s          Normal   Pulled              pod/nginx-698898f666-smg7t    Container image "nginx:alpine" already present on machine
3s          Normal   Created             pod/nginx-698898f666-smg7t    Created container nginx
3s          Normal   Started             pod/nginx-698898f666-smg7t    Started container nginx
3s          Normal   SuccessfulCreate    replicaset/nginx-698898f666   Created pod: nginx-698898f666-smg7t
3s          Normal   ScalingReplicaSet   deployment/nginx              Scaled up replica set nginx-698898f666 to 1

这些信息会被存储在Etcd中，默认的保存时间为1小时。

/registry/events/default/nginx-698898f666-smg7t.1593fd25ca0c7e00
/registry/events/default/nginx-698898f666-smg7t.1593fd25f1ff3cca
/registry/events/default/nginx-698898f666-smg7t.1593fd25f4066f9f
/registry/events/default/nginx-698898f666-smg7t.1593fd25fd741cf6
/registry/events/default/nginx-698898f666.1593fd25c9abf7dc
/registry/events/default/nginx.1593fd25c9217fce

找到一条Event，可以看到完整的信息如下，值得关注的地方有involvedObject的name和source的host等等。

apiVersion: v1
count: 1
eventTime: null
firstTimestamp: "2019-04-10T03:01:30Z"
involvedObject:
  apiVersion: v1
  fieldPath: spec.containers{nginx}
  kind: Pod
  name: nginx-698898f666-ldwql
  namespace: default
  resourceVersion: "80088"
  uid: f04efe9e-5b3c-11e9-aa09-00163e132347
kind: Event
lastTimestamp: "2019-04-10T03:01:30Z"
message: Container image "nginx:alpine" already present on machine
metadata:
  creationTimestamp: "2019-04-10T03:01:30Z"
  name: nginx-698898f666-ldwql.1593fdbe68a2e26d
  namespace: default
  resourceVersion: "80095"
  selfLink: /api/v1/namespaces/default/events/nginx-698898f666-ldwql.1593fdbe68a2e26d
  uid: f0b646be-5b3c-11e9-aa09-00163e132347
reason: Pulled
reportingComponent: ""
reportingInstance: ""
source:
  component: kubelet
  host: k8s
type: Normal

与Events相关的参数主要在API Server中设置，如下所示，除此之外还可以看到审计相关的参数，也是属于event的，但是属于另外的apiVersion，以后会再介绍。

docker run -it --rm k8s.gcr.io/kube-apiserver:v1.14.0  kube-apiserver -h |grep event
      --audit-log-batch-buffer-size int             The size of the buffer to store events before batching and writing. Only used in batch mode. (default 10000)
      --audit-log-format string                     Format of saved audits. "legacy" indicates 1-line text format for each event. "json" indicates structured json format. Known formats are legacy,json. (default "json")
      --audit-log-mode string                       Strategy for sending audit events. Blocking indicates sending events should block server responses. Batch causes the backend to buffer and write events asynchronously. Known modes are batch,blocking,blocking-strict. (default "blocking")
      --audit-log-truncate-enabled                  Whether event and batch truncating is enabled.
      --audit-log-truncate-max-event-size int       Maximum size of the audit event sent to the underlying backend. If the size of an event is greater than this number, first request and response are removed, and if this doesn't reduce the size enough, event is discarded. (default 102400)
      --audit-log-version string                    API group and version used for serializing audit events written to log. (default "audit.k8s.io/v1")
      --audit-webhook-batch-buffer-size int         The size of the buffer to store events before batching and writing. Only used in batch mode. (default 10000)
      --audit-webhook-mode string                   Strategy for sending audit events. Blocking indicates sending events should block server responses. Batch causes the backend to buffer and write events asynchronously. Known modes are batch,blocking,blocking-strict. (default "batch")
      --audit-webhook-truncate-enabled              Whether event and batch truncating is enabled.
      --audit-webhook-truncate-max-event-size int   Maximum size of the audit event sent to the underlying backend. If the size of an event is greater than this number, first request and response are removed, and if this doesn't reduce the size enough, event is discarded. (default 102400)
      --audit-webhook-version string                API group and version used for serializing audit events written to webhook. (default "audit.k8s.io/v1")
      --oidc-groups-prefix string                         If provided, all groups will be prefixed with this value to prevent conflicts with other authentication strategies.
      --event-ttl duration                        Amount of time to retain events. (default 1h0m0s)
      
#这个参数在大量集群集群的时候 建议使用，减少对主etcd的压力
      --etcd-servers-overrides strings           Per-resource etcd servers overrides, comma separated. The individual override format: group/resource#servers, where servers are URLs, semicolon separated.

查看源码可以发现，event相关的代码似乎在 https://github.com/kubernetes/kubernetes/blob/release-1.14/pkg/kubelet/events/event.go 这里。

收集

对于这些集群的变更信息，如果不收集起来，对于日后的调试和问题追溯会很不方便，目前的开源的收集方案一共有以下几种：

https://github.com/heptiolabs/eventrouter 支持kafka，S3等
https://www.elastic.co/products/beats/metricbeat 支持ES
https://github.com/kubernetes-retired/heapster 支持Kafka，ES，InfluxDB（被舍弃）
Kube-watch发送给Slack等。

下面以metricsbeat为例子，输出到ES。基于官方的一个配置进行修改，主要增加了自定义索引名字，需要ES> 6.4版本

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: metricbeat-deployment-config
  namespace: kube-system
  labels:
    k8s-app: metricbeat
data:
  metricbeat.yml: |-
    metricbeat.config.modules:
      # Mounted `metricbeat-daemonset-modules` configmap:
      path: ${path.config}/modules.d/*.yml
      # Reload module configs as they change:
      reload.enabled: false

    # processors:
    #   - add_cloud_metadata:

    # cloud.id: ${ELASTIC_CLOUD_ID}
    # cloud.auth: ${ELASTIC_CLOUD_AUTH}

    output.elasticsearch:
      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
      index: "k8s-prod-%{[beat.version]}-%{+yyyy.MM.dd}"
      # username: ${ELASTICSEARCH_USERNAME}
      # password: ${ELASTICSEARCH_PASSWORD}
    setup.template:
      name: 'k8s-prod'
      pattern: 'k8s-prod-*'
      enabled: false

对于某一条event，在ES中的记录如下所示：

{
  "_index": "k8s-prod-6.7.0-2019.04.09",
  "_type": "doc",
  "_id": "9NUDAWoBg63YcA8TN-td",
  "_version": 1,
  "_score": 1,
  "_source": {
    "@timestamp": "2019-04-09T07:32:28.142Z",
    "event": {
      "dataset": "kubernetes.event"
    },
    "host": {
      "name": "hangzhou-k8s-prod-04"
    },
    "beat": {
      "name": "hangzhou-k8s-prod-04",
      "hostname": "hangzhou-k8s-prod-04",
      "version": "6.7.0"
    },
    "kubernetes": {
      "event": {
        "message": "Killing container with id docker://demo:Need to kill Pod",
        "reason": "Killing",
        "type": "Normal",
        "count": 10616,
        "involved_object": {
          "resource_version": "10016180",
          "name": "demo-8fd5f479-gmb4d",
          "kind": "Pod",
          "uid": "cba04cbc-4d0c-11e9-b6f6-00163e0f7ccb",
          "api_version": "v1"
        },
        "metadata": {
          "self_link": "/api/v1/namespaces/infra/events/demo-8fd5f479-gmb4d.158f635cd8e1c868",
          "generate_name": "",
          "uid": "48dfd324-4f74-11e9-b6f6-00163e0f7ccb",
          "resource_version": "12097622",
          "timestamp": {
            "created": "2019-03-26T03:07:26.000Z"
          },
          "name": "demo-8fd5f479-gmb4d.158f635cd8e1c868",
          "namespace": "infra"
        },
        "timestamp": {
          "first_occurrence": "2019-03-26T03:07:26.000Z",
          "last_occurrence": "2019-04-09T07:32:28.000Z"
        }
      }
    },
    "metricset": {
      "name": "event",
      "module": "kubernetes"
    }
  },
  "fields": {
    "kubernetes.event.timestamp.first_occurrence": [
      "2019-03-26T03:07:26.000Z"
    ],
    "kubernetes.event.timestamp.last_occurrence": [
      "2019-04-09T07:32:28.000Z"
    ],
    "kubernetes.event.metadata.timestamp.created": [
      "2019-03-26T03:07:26.000Z"
    ],
    "@timestamp": [
      "2019-04-09T07:32:28.142Z"
    ]
  }
}

审计

使用审计功能有两种配置方法，一个是配置policy文件，另一个是配置webhook地址。如下所示是一个policy文件

apiVersion: audit.k8s.io/v1beta1 # This is required.
kind: Policy
# Don't generate audit events for all requests in RequestReceived stage.
omitStages:
  - "RequestReceived"
rules:
  # The following requests were manually identified as high-volume and low-risk,
  # so drop them.
  - level: None
    users: ["system:kube-proxy"]
    verbs: ["watch"]
    resources:
      - group: "" # core
        resources: ["endpoints", "services"]
  - level: None
    users: ["system:unsecured"]
    namespaces: ["kube-system"]
    verbs: ["get"]
    resources:
      - group: "" # core
        resources: ["configmaps"]
  - level: None
    users: ["kubelet"] # legacy kubelet identity
    verbs: ["get"]
    resources:
      - group: "" # core
        resources: ["nodes"]
  - level: None
    userGroups: ["system:nodes"]
    verbs: ["get"]
    resources:
      - group: "" # core
        resources: ["nodes"]
  - level: None
    users:
      - system:kube-controller-manager
      - system:kube-scheduler
      - system:serviceaccount:kube-system:endpoint-controller
    verbs: ["get", "update"]
    namespaces: ["kube-system"]
    resources:
      - group: "" # core
        resources: ["endpoints"]
  - level: None
    users: ["system:apiserver"]
    verbs: ["get"]
    resources:
      - group: "" # core
        resources: ["namespaces"]
  # Don't log these read-only URLs.
  - level: None
    nonResourceURLs:
      - /healthz*
      - /version
      - /swagger*
  # Don't log events requests.
  - level: None
    resources:
      - group: "" # core
        resources: ["events"]
  # Secrets, ConfigMaps, and TokenReviews can contain sensitive & binary data,
  # so only log at the Metadata level.
  - level: Metadata
    resources:
      - group: "" # core
        resources: ["secrets", "configmaps"]
      - group: authentication.k8s.io
        resources: ["tokenreviews"]
  # Get repsonses can be large; skip them.
  - level: Request
    verbs: ["get", "list", "watch"]
    resources:
      - group: "" # core
      - group: "admissionregistration.k8s.io"
      - group: "apps"
      - group: "authentication.k8s.io"
      - group: "authorization.k8s.io"
      - group: "autoscaling"
      - group: "batch"
      - group: "certificates.k8s.io"
      - group: "extensions"
      - group: "networking.k8s.io"
      - group: "policy"
      - group: "rbac.authorization.k8s.io"
      - group: "settings.k8s.io"
      - group: "storage.k8s.io"
  # Default level for known APIs
  - level: RequestResponse
    resources:
      - group: "" # core
      - group: "admissionregistration.k8s.io"
      - group: "apps"
      - group: "authentication.k8s.io"
      - group: "authorization.k8s.io"
      - group: "autoscaling"
      - group: "batch"
      - group: "certificates.k8s.io"
      - group: "extensions"
      - group: "networking.k8s.io"
      - group: "policy"
      - group: "rbac.authorization.k8s.io"
      - group: "settings.k8s.io"
      - group: "storage.k8s.io"
  # Default level for all other requests.
  - level: Metadata

在收到请求后不立即记录日志，当返回体header发送后才开始记录。
对于大量冗余的kube-proxy watch请求，kubelet和system:nodes对于node的get请求，kube组件在kube-system下对于endpoint的操作，以及apiserver对于namespaces的get请求等不作审计。
对于/healthz*，/version*, /swagger*等只读url不作审计。
对于可能包含敏感信息或二进制文件的secrets，configmaps，tokenreviews接口的日志等级设为metadata，该level只记录请求事件的用户、时间戳、请求资源和动作，而不包含请求体和返回体。
对于一些如authenticatioin、rbac、certificates、autoscaling、storage等敏感接口，根据读写记录相应的请求体和返回体。

如下是一个示例audit，表示APIServer 调用了 scheduler去调度pod。

{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "Metadata",
  "auditID": "774882bc-46e4-4ff5-9517-eb025a9bec0d",
  "stage": "ResponseComplete",
  "requestURI": "/apis/scheduling.k8s.io/v1beta1?timeout=32s",
  "verb": "get",
  "user": {
    "username": "system:apiserver",
    "uid": "2621ccb8-dcab-4b2b-bd74-a8e94ce4340d",
    "groups": [
      "system:masters"
    ]
  },
  "sourceIPs": [
    "127.0.0.1"
  ],
  "userAgent": "hyperkube/v1.13.4 (linux/amd64) kubernetes/c27b913",
  "responseStatus": {
    "metadata": {},
    "code": 200
  },
  "requestReceivedTimestamp": "2019-04-17T06:44:27.256764Z",
  "stageTimestamp": "2019-04-17T06:44:27.256837Z",
  "annotations": {
    "authorization.k8s.io/decision": "allow",
    "authorization.k8s.io/reason": ""
  }
}

可以看出都是json格式化的，这样收集就特别方便了，以下是用fluent收集的实例：

    <source>
      @id k8s-audit.log
      @type tail
      path /var/log/kubernetes/kubernetes*
      pos_file /var/log/auditlog.pos
      tag audit.kubernetes.*
      read_from_head true
      <parse>
        @type multi_format
        <pattern>
          format json
          time_key time
          time_format %Y-%m-%dT%H:%M:%S.%NZ
        </pattern>
      </parse>
    </source>

    <match **>
      @id elasticsearch
      @type elasticsearch
      @log_level info
      type_name _doc
      include_tag_key true
      host 1.1.1.1
      port 9200
      logstash_format true
      logstash_prefix audit
      <buffer>
        @type file
        path /var/log/fluentd-buffers/kubernetes.system.buffer
        flush_mode interval
        retry_type exponential_backoff
        flush_thread_count 2
        flush_interval 5s
        retry_forever
        retry_max_interval 30
        chunk_limit_size 2M
        queue_limit_length 8
        overflow_action block
      </buffer>
    </match>