[kubernetes] 작업자가 CSINodeIfo 실패 : CSINode 주석 업데이트 오류

2 개월 전에 kubernetes 클러스터 1 마스터 및 2 명의 작업자 노드를 만들었습니다. 오늘 1 명의 작업자 노드가 실패하기 시작했으며 이유를 모르겠습니다. 나는 내 직원에게 특별한 일이 없다고 생각합니다.

나는 flannel과 kubeadm을 사용하여 클러스터를 만들었고 잘 작동했습니다.

노드를 설명하면 :

tommy@bxybackend:~$ kubectl describe node bxybackend-node01
Name:               bxybackend-node01
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=bxybackend-node01
                    kubernetes.io/os=linux
Annotations:        flannel.alpha.coreos.com/backend-data: {"VtepMAC":"06:ca:97:82:50:10"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 10.168.10.4
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 03 Nov 2019 09:41:48 -0600
Taints:             node.kubernetes.io/not-ready:NoExecute
                    node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 11 Dec 2019 11:17:05 -0600   Wed, 11 Dec 2019 10:37:19 -0600   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 11 Dec 2019 11:17:05 -0600   Wed, 11 Dec 2019 10:37:19 -0600   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 11 Dec 2019 11:17:05 -0600   Wed, 11 Dec 2019 10:37:19 -0600   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Wed, 11 Dec 2019 11:17:05 -0600   Wed, 11 Dec 2019 10:37:19 -0600   KubeletNotReady              Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Addresses:
  InternalIP:  10.168.10.4
  Hostname:    bxybackend-node01
Capacity:
 cpu:                12
 ephemeral-storage:  102684600Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             14359964Ki
 pods:               110
Allocatable:
 cpu:                12
 ephemeral-storage:  94634127204
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             14257564Ki
 pods:               110
System Info:
 Machine ID:                 3afa24bb05994ceaaf00e7f22b9322ab
 System UUID:                80951742-F69F-6487-F2F7-BE2FB7FEFBF8
 Boot ID:                    115fbacc-143d-4007-90e4-7fdcb5462680
 Kernel Version:             4.15.0-72-generic
 OS Image:                   Ubuntu 18.04.3 LTS
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://18.9.7
 Kubelet Version:            v1.17.0
 Kube-Proxy Version:         v1.17.0
PodCIDR:                     10.244.1.0/24
PodCIDRs:                    10.244.1.0/24
Non-terminated Pods:         (2 in total)
  Namespace                  Name                           CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                           ------------  ----------  ---------------  -------------  ---
  kube-system                kube-flannel-ds-amd64-sslbg    100m (0%)     100m (0%)   50Mi (0%)        50Mi (0%)      8m31s
  kube-system                kube-proxy-c5gxc               0 (0%)        0 (0%)      0 (0%)           0 (0%)         8m52s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (0%)  100m (0%)
  memory             50Mi (0%)  50Mi (0%)
  ephemeral-storage  0 (0%)     0 (0%)
Events:
  Type     Reason                   Age                  From                           Message
  ----     ------                   ----                 ----                           -------
  Warning  SystemOOM                52m                  kubelet, bxybackend-node01     System OOM encountered, victim process: dotnet, pid: 12170
  Normal   NodeHasNoDiskPressure    52m (x12 over 38d)   kubelet, bxybackend-node01     Node bxybackend-node01 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     52m (x12 over 38d)   kubelet, bxybackend-node01     Node bxybackend-node01 status is now: NodeHasSufficientPID
  Normal   NodeNotReady             52m (x6 over 23d)    kubelet, bxybackend-node01     Node bxybackend-node01 status is now: NodeNotReady
  Normal   NodeHasSufficientMemory  52m (x12 over 38d)   kubelet, bxybackend-node01     Node bxybackend-node01 status is now: NodeHasSufficientMemory
  Warning  ContainerGCFailed        52m (x3 over 6d23h)  kubelet, bxybackend-node01     rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   NodeReady                52m (x13 over 38d)   kubelet, bxybackend-node01     Node bxybackend-node01 status is now: NodeReady
  Normal   NodeAllocatableEnforced  43m                  kubelet, bxybackend-node01     Updated Node Allocatable limit across pods
  Warning  SystemOOM                43m                  kubelet, bxybackend-node01     System OOM encountered, victim process: dotnet, pid: 9699
  Warning  SystemOOM                43m                  kubelet, bxybackend-node01     System OOM encountered, victim process: dotnet, pid: 12639
  Warning  SystemOOM                43m                  kubelet, bxybackend-node01     System OOM encountered, victim process: dotnet, pid: 16194
  Warning  SystemOOM                43m                  kubelet, bxybackend-node01     System OOM encountered, victim process: dotnet, pid: 19618
  Warning  SystemOOM                43m                  kubelet, bxybackend-node01     System OOM encountered, victim process: dotnet, pid: 12170
  Normal   Starting                 43m                  kubelet, bxybackend-node01     Starting kubelet.
  Normal   NodeHasSufficientMemory  43m (x2 over 43m)    kubelet, bxybackend-node01     Node bxybackend-node01 status is now: NodeHasSufficientMemory
  Normal   NodeHasSufficientPID     43m (x2 over 43m)    kubelet, bxybackend-node01     Node bxybackend-node01 status is now: NodeHasSufficientPID
  Normal   NodeNotReady             43m                  kubelet, bxybackend-node01     Node bxybackend-node01 status is now: NodeNotReady
  Normal   NodeHasNoDiskPressure    43m (x2 over 43m)    kubelet, bxybackend-node01     Node bxybackend-node01 status is now: NodeHasNoDiskPressure
  Normal   Starting                 42m                  kubelet, bxybackend-node01     Starting kubelet.

작업자에서 syslog를 보는 경우 :

Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.552152   19331 kuberuntime_manager.go:981] updating runtime config through cri with podcidr 10.244.1.0/24
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.552162   19331 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.552352   19331 docker_service.go:355] docker cri received runtime config &RuntimeConfig{NetworkConfig:&NetworkConfig{PodCidr:10.244.1.0/24,},}
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.552600   19331 kubelet_network.go:77] Setting Pod CIDR:  -> 10.244.1.0/24
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.555142   19331 kubelet_node_status.go:70] Attempting to register node bxybackend-node01
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.652843   19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "kube-proxy" (UniqueName: "kubernetes.io/configmap/d6b534db-c32c-491b-a665-cf1ccd6cd089-kube-proxy") pod "kube-proxy-c5gxc" (UID: "d6b534db-c32c-491b-a665-cf1ccd6cd089")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753179   19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "xtables-lock" (UniqueName: "kubernetes.io/host-path/d6b534db-c32c-491b-a665-cf1ccd6cd089-xtables-lock") pod "kube-proxy-c5gxc" (UID: "d6b534db-c32c-491b-a665-cf1ccd6cd089")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753249   19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "lib-modules" (UniqueName: "kubernetes.io/host-path/d6b534db-c32c-491b-a665-cf1ccd6cd089-lib-modules") pod "kube-proxy-c5gxc" (UID: "d6b534db-c32c-491b-a665-cf1ccd6cd089")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753285   19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "kube-proxy-token-ztrh4" (UniqueName: "kubernetes.io/secret/d6b534db-c32c-491b-a665-cf1ccd6cd089-kube-proxy-token-ztrh4") pod "kube-proxy-c5gxc" (UID: "d6b534db-c32c-491b-a665-cf1ccd6cd089")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753316   19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "run" (UniqueName: "kubernetes.io/host-path/6a2299cf-63a4-4e96-8b3b-acd373de12c2-run") pod "kube-flannel-ds-amd64-sslbg" (UID: "6a2299cf-63a4-4e96-8b3b-acd373de12c2")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753342   19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "cni" (UniqueName: "kubernetes.io/host-path/6a2299cf-63a4-4e96-8b3b-acd373de12c2-cni") pod "kube-flannel-ds-amd64-sslbg" (UID: "6a2299cf-63a4-4e96-8b3b-acd373de12c2")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753461   19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "flannel-cfg" (UniqueName: "kubernetes.io/configmap/6a2299cf-63a4-4e96-8b3b-acd373de12c2-flannel-cfg") pod "kube-flannel-ds-amd64-sslbg" (UID: "6a2299cf-63a4-4e96-8b3b-acd373de12c2")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753516   19331 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "flannel-token-ts2qt" (UniqueName: "kubernetes.io/secret/6a2299cf-63a4-4e96-8b3b-acd373de12c2-flannel-token-ts2qt") pod "kube-flannel-ds-amd64-sslbg" (UID: "6a2299cf-63a4-4e96-8b3b-acd373de12c2")
Dec 11 11:20:10 bxybackend-node01 kubelet[19331]: I1211 11:20:10.753531   19331 reconciler.go:156] Reconciler: start to sync state
Dec 11 11:20:12 bxybackend-node01 kubelet[19331]: I1211 11:20:12.052813   19331 kubelet_node_status.go:112] Node bxybackend-node01 was previously registered
Dec 11 11:20:12 bxybackend-node01 kubelet[19331]: I1211 11:20:12.052921   19331 kubelet_node_status.go:73] Successfully registered node bxybackend-node01
Dec 11 11:20:13 bxybackend-node01 kubelet[19331]: E1211 11:20:13.051159   19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:16 bxybackend-node01 kubelet[19331]: E1211 11:20:16.051264   19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:18 bxybackend-node01 kubelet[19331]: E1211 11:20:18.451166   19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:21 bxybackend-node01 kubelet[19331]: E1211 11:20:21.251289   19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:25 bxybackend-node01 kubelet[19331]: E1211 11:20:25.019276   19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:46 bxybackend-node01 kubelet[19331]: E1211 11:20:46.772862   19331 csi_plugin.go:267] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: the server could not find the requested resource
Dec 11 11:20:46 bxybackend-node01 kubelet[19331]: F1211 11:20:46.772895   19331 csi_plugin.go:281] Failed to initialize CSINodeInfo after retrying
Dec 11 11:20:46 bxybackend-node01 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Dec 11 11:20:46 bxybackend-node01 systemd[1]: kubelet.service: Failed with result 'exit-code'.



답변

kubeadm을 설치하는 동안 kubelet, kubeadm 및 kubectl 패키지를 유지하고 실수로 업그레이드되지 않도록 다음 명령을 실행해야합니다.

$ sudo apt-mark hold kubelet kubeadm kubectl

시나리오를 재현했으며 클러스터에서 발생한 일은 3 일 전에 새로운 버전의 Kubernetes (v 1.17.0)가 출시되었고 kubelet이 실수로 업그레이드되었다는 것입니다.

새로운 Kubernetes에서는 CSI 에서 변경된 부분이 있으므로이 노드에 문제가있는 것입니다.

이 노드를 비우고 Kubernetes 1.16.2로 새 노드를 설정 한 다음 새 노드를 클러스터에 가입시키는 것이 좋습니다.

이 노드를 비우려면 다음을 실행해야합니다.

$ kubectl drain bxybackend-node01 --delete-local-data --force --ignore-daemonsets

선택적으로 다음 명령을 사용하여 kubelet을 이전 버전으로 다운 그레이드 할 수 있습니다.

$ sudo apt-get install kubelet=1.16.2-00

kubelet이 다시 업그레이드되지 않도록 표시하는 것을 잊지 마십시오 :

$ sudo apt-mark hold kubelet

이 명령 apt-mark showhold을 사용하여 보류 된 모든 패키지를 나열하고 kubelet, kubeadm 및 kubectl이 보류 상태인지 확인할 수 있습니다.

1.16.x에서 1.17.x로 업그레이드하려면 Kubernetes 설명서 에서이 가이드 를 따르십시오 . 나는 그것을 확인했고 의도 한대로 작동합니다.


답변

CentOS Linux 릴리스 7.7.1908에서도 이와 동일한 문제에 직면했습니다. 내 kubernetes 버전은 v1.16.3이고 “yum update”명령을 실행하고 kubernetes 버전을 v1.17.0으로 업그레이드했습니다. 그 후 “음 히스토리 실행 취소”아니오 “를 수행 한 다음 이전 kubernetes 버전으로 되돌아 가서 다시 작동하기 시작했습니다. 그 후 공식 업그레이드 방법을 따랐으며 이제 kubernetes v1.17.0이 문제없이 작동합니다.

root@kube-master1:/root>kubectl get no -o wide
NAME           STATUS   ROLES    AGE    VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION               CONTAINER-RUNTIME
kube-master1   Ready    master   7d9h   v1.17.0   192.168.159.135   <none>        CentOS Linux 7 (Core)   3.10.0-1062.9.1.el7.x86_64   docker://1.13.1
kube-worker1   Ready    worker   7d9h   v1.17.0   192.168.159.136   <none>        CentOS Linux 7 (Core)   3.10.0-1062.9.1.el7.x86_64   docker://1.13.1
kube-worker2   Ready    worker   7d9h   v1.17.0   192.168.159.137   <none>        CentOS Linux 7 (Core)   3.10.0-1062.9.1.el7.x86_64   docker://1.13.1
root@kube-master1:/root>


답변