什么是k8s的Events?比如你启动了一个Deployment,k8s的各大组件就开始依次忙碌起来,最终完成Pod的创建,从声明一个Deployment开始,到Pod启动完成,会生成一些列Events,用来告知用户现在的状态。event就是用来回答一些问题,比如为什么pod没有启动,因为没有配置私有仓库的认证,为什么pod会被Kill,因为是超过limit的限制。具体如下所示:

1
2
3
4
5
6
7
8
kubectl get ev
LAST SEEN TYPE REASON OBJECT MESSAGE
3s Normal Scheduled pod/nginx-698898f666-smg7t Successfully assigned default/nginx-698898f666-smg7t to east1-monitor1
3s Normal Pulled pod/nginx-698898f666-smg7t Container image "nginx:alpine" already present on machine
3s Normal Created pod/nginx-698898f666-smg7t Created container nginx
3s Normal Started pod/nginx-698898f666-smg7t Started container nginx
3s Normal SuccessfulCreate replicaset/nginx-698898f666 Created pod: nginx-698898f666-smg7t
3s Normal ScalingReplicaSet deployment/nginx Scaled up replica set nginx-698898f666 to 1

这些信息会被存储在Etcd中,默认的保存时间为1小时。

1
2
3
4
5
6
/registry/events/default/nginx-698898f666-smg7t.1593fd25ca0c7e00
/registry/events/default/nginx-698898f666-smg7t.1593fd25f1ff3cca
/registry/events/default/nginx-698898f666-smg7t.1593fd25f4066f9f
/registry/events/default/nginx-698898f666-smg7t.1593fd25fd741cf6
/registry/events/default/nginx-698898f666.1593fd25c9abf7dc
/registry/events/default/nginx.1593fd25c9217fce

找到一条Event,可以看到完整的信息如下,值得关注的地方有involvedObject的name和source的host等等。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
apiVersion: v1
count: 1
eventTime: null
firstTimestamp: "2019-04-10T03:01:30Z"
involvedObject:
apiVersion: v1
fieldPath: spec.containers{nginx}
kind: Pod
name: nginx-698898f666-ldwql
namespace: default
resourceVersion: "80088"
uid: f04efe9e-5b3c-11e9-aa09-00163e132347
kind: Event
lastTimestamp: "2019-04-10T03:01:30Z"
message: Container image "nginx:alpine" already present on machine
metadata:
creationTimestamp: "2019-04-10T03:01:30Z"
name: nginx-698898f666-ldwql.1593fdbe68a2e26d
namespace: default
resourceVersion: "80095"
selfLink: /api/v1/namespaces/default/events/nginx-698898f666-ldwql.1593fdbe68a2e26d
uid: f0b646be-5b3c-11e9-aa09-00163e132347
reason: Pulled
reportingComponent: ""
reportingInstance: ""
source:
component: kubelet
host: k8s
type: Normal

与Events相关的参数主要在API Server中设置,如下所示,除此之外还可以看到审计相关的参数,也是属于event的,但是属于另外的apiVersion,以后会再介绍。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
docker run -it --rm k8s.gcr.io/kube-apiserver:v1.14.0 kube-apiserver -h |grep event
--audit-log-batch-buffer-size int The size of the buffer to store events before batching and writing. Only used in batch mode. (default 10000)
--audit-log-format string Format of saved audits. "legacy" indicates 1-line text format for each event. "json" indicates structured json format. Known formats are legacy,json. (default "json")
--audit-log-mode string Strategy for sending audit events. Blocking indicates sending events should block server responses. Batch causes the backend to buffer and write events asynchronously. Known modes are batch,blocking,blocking-strict. (default "blocking")
--audit-log-truncate-enabled Whether event and batch truncating is enabled.
--audit-log-truncate-max-event-size int Maximum size of the audit event sent to the underlying backend. If the size of an event is greater than this number, first request and response are removed, and if this doesn't reduce the size enough, event is discarded. (default 102400)
--audit-log-version string API group and version used for serializing audit events written to log. (default "audit.k8s.io/v1")
--audit-webhook-batch-buffer-size int The size of the buffer to store events before batching and writing. Only used in batch mode. (default 10000)
--audit-webhook-mode string Strategy for sending audit events. Blocking indicates sending events should block server responses. Batch causes the backend to buffer and write events asynchronously. Known modes are batch,blocking,blocking-strict. (default "batch")
--audit-webhook-truncate-enabled Whether event and batch truncating is enabled.
--audit-webhook-truncate-max-event-size int Maximum size of the audit event sent to the underlying backend. If the size of an event is greater than this number, first request and response are removed, and if this doesn't reduce the size enough, event is discarded. (default 102400)
--audit-webhook-version string API group and version used for serializing audit events written to webhook. (default "audit.k8s.io/v1")
--oidc-groups-prefix string If provided, all groups will be prefixed with this value to prevent conflicts with other authentication strategies.
--event-ttl duration Amount of time to retain events. (default 1h0m0s)
#这个参数在大量集群集群的时候 建议使用,减少对主etcd的压力
--etcd-servers-overrides strings Per-resource etcd servers overrides, comma separated. The individual override format: group/resource#servers, where servers are URLs, semicolon separated.

查看源码可以发现,event相关的代码似乎在 https://github.com/kubernetes/kubernetes/blob/release-1.14/pkg/kubelet/events/event.go 这里。

收集处理

对于这些集群的变更信息,如果不收集起来,对于日后的调试和问题追溯会很不方便,目前的开源的收集方案一共有以下几种:

  1. https://github.com/heptiolabs/eventrouter 支持kafka,S3等
  2. https://www.elastic.co/products/beats/metricbeat 支持ES
  3. https://github.com/kubernetes-retired/heapster 支持Kafka,ES,InfluxDB(被舍弃)

同时还有一个Kube-watch的项目,发送给Slack等。下面以metricsbeat为例子,输出到ES。基于官方的一个配置进行修改,主要增加了自定义索引名字,需要ES> 6.4版本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---
apiVersion: v1
kind: ConfigMap
metadata:
name: metricbeat-deployment-config
namespace: kube-system
labels:
k8s-app: metricbeat
data:
metricbeat.yml: |-
metricbeat.config.modules:
# Mounted `metricbeat-daemonset-modules` configmap:
path: ${path.config}/modules.d/*.yml
# Reload module configs as they change:
reload.enabled: false
# processors:
# - add_cloud_metadata:
# cloud.id: ${ELASTIC_CLOUD_ID}
# cloud.auth: ${ELASTIC_CLOUD_AUTH}
output.elasticsearch:
hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
index: "k8s-prod-%{[beat.version]}-%{+yyyy.MM.dd}"
# username: ${ELASTICSEARCH_USERNAME}
# password: ${ELASTICSEARCH_PASSWORD}
setup.template:
name: 'k8s-prod'
pattern: 'k8s-prod-*'
enabled: false

对于某一条event,在ES中的记录如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
{
"_index": "k8s-prod-6.7.0-2019.04.09",
"_type": "doc",
"_id": "9NUDAWoBg63YcA8TN-td",
"_version": 1,
"_score": 1,
"_source": {
"@timestamp": "2019-04-09T07:32:28.142Z",
"event": {
"dataset": "kubernetes.event"
},
"host": {
"name": "hangzhou-k8s-prod-04"
},
"beat": {
"name": "hangzhou-k8s-prod-04",
"hostname": "hangzhou-k8s-prod-04",
"version": "6.7.0"
},
"kubernetes": {
"event": {
"message": "Killing container with id docker://demo:Need to kill Pod",
"reason": "Killing",
"type": "Normal",
"count": 10616,
"involved_object": {
"resource_version": "10016180",
"name": "demo-8fd5f479-gmb4d",
"kind": "Pod",
"uid": "cba04cbc-4d0c-11e9-b6f6-00163e0f7ccb",
"api_version": "v1"
},
"metadata": {
"self_link": "/api/v1/namespaces/infra/events/demo-8fd5f479-gmb4d.158f635cd8e1c868",
"generate_name": "",
"uid": "48dfd324-4f74-11e9-b6f6-00163e0f7ccb",
"resource_version": "12097622",
"timestamp": {
"created": "2019-03-26T03:07:26.000Z"
},
"name": "demo-8fd5f479-gmb4d.158f635cd8e1c868",
"namespace": "infra"
},
"timestamp": {
"first_occurrence": "2019-03-26T03:07:26.000Z",
"last_occurrence": "2019-04-09T07:32:28.000Z"
}
}
},
"metricset": {
"name": "event",
"module": "kubernetes"
}
},
"fields": {
"kubernetes.event.timestamp.first_occurrence": [
"2019-03-26T03:07:26.000Z"
],
"kubernetes.event.timestamp.last_occurrence": [
"2019-04-09T07:32:28.000Z"
],
"kubernetes.event.metadata.timestamp.created": [
"2019-03-26T03:07:26.000Z"
],
"@timestamp": [
"2019-04-09T07:32:28.142Z"
]
}
}

结语

总得来说,对于集群中的任何事件都应该关注,不论是做实时报警还是日后分析,都是非常有用的。

Ref

Kubernetes(K8s)Events介绍(上)

https://www.kubernetes.org.cn/1031.html