Unhealthy Nodes

If a node in the cluster goes down, the kube-controller-manager waits for 5 mins (default, max) for the node to come back online. If the node comes back online within 5 mins, the pods running on it are restarted and then everything works the same way.

Untitled

If the node doesn’t come back online within 5 mins, it is considered unhealthy and all the pods running on it that were associated with replicasets are spawned on other nodes. Any pod on that node that was not associated with any replicaset dies with the node.

The time for which the kube-controller-manager waits before declaring a node unhealthy is called pod eviction timeout and it is configured in the kube-controller-manager. If the node comes back online after the pod eviction timeout, it has no pod running on it.

Untitled