

Next step is to mark a node unschedulable, run this command: $ kubectl drain $NODENAME More information you can find here: backups-kubernetes.Ģ. This would allow you to keep track of the backup jobs inside Kubernetes just like you monitor your workloads. But since we are running Kubernetes anyway, use a Kubernetes CronJob. The easiest way to do this is probably to take the commands from the example above, create a small script and a cron job that runs the script every now and then.
#MASTER REBOOT UPDATE#
If master update went wrong you can then simply restore old version of master node.ĭoing a single backup manually may be a good first step but you really need to make regular backups for them to be useful. Storing this file makes it easy to initialize the master with the exact same configuration as before when restoring it. The final command is optional and only relevant if you use a configuration file for kubeadm. These certificates are used for secure communications between the various components in a Kubernetes cluster. The first one copies the folder containing all the certificates that kubeadm creates. There are three commands in the example and all of them should be run on the master node. Note that the contents of the backup folder should then be stored somewhere safe, where it can survive if the master is completely destroyed. $ sudo cp /etc/kubeadm/kubeadm-config.yaml backup/ Snapshot save /backup/etcd-snapshot-latest.db key=/etc/kubernetes/pki/etcd/healthcheck-client.key \ cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \ cacert=/etc/kubernetes/pki/etcd/ca.crt \ v /etc/kubernetes/pki/etcd:/etc/kubernetes/pki/etcd \ $ sudo docker run -rm -v $(pwd)/backup:/backup \ If you set up your cluster using kubeadm (with no specialĬonfiguration) you can do it similar to this: Backup certificates: Optionally the kubeadm configuration file for easily restoring the In addition to that, we need the certificates and
#MASTER REBOOT UPGRADE#
In case of Master nodes which host ETCDs you need to be extra careful in terms of rolling upgrade of ETCD and backing up the data.Īs mentioned previously, we need to backup etcd. It simply re-spawns Pods if they failed health check, deleted or terminated, matching desired replicas. If you compose any relevant K8s resource within defined set of replicas, then ReplicationController guarantees that a specified number of pod replicas are running at any one time through each available Node. As soon as you wish carefully prepare cluster Node reboot, you might have to adjust Maintenance job on this Node in order to drain it from scheduling and gracefully terminate all the existing Pods. Whenever you wish to reboot OS on the particular Node(Master, worker), K8s cluster engine does not aware for that action and it keeps all the cluster related events in ETCD key value storage, backing up the most recent data. Make the node schedulable again: kubectl uncordon $NODENAMEĪdditionally if the node is hosting ETCD then you need to be extra careful in terms of rolling upgrade of ETCD and backing up the data Additionally, if the pod is part of a service, then clients will automatically be redirected to the new pod.įor pods with no replica set, you need to bring up a new copy of the pod, and assuming it is not part of a service, redirect clients to it.

This keeps new pods from landing on the node while you are trying to get them off.įor pods with a replica set, the pod will be replaced by a new pod which will be scheduled to a new node.

Use kubectl drain to gracefully terminate all pods on the node while marking the node as unschedulable: kubectl drain $NODENAME If you want more control over the upgrading process, you may use the following workflow: So, in the case where all pods are replicated, upgrades can be done without special coordination, assuming that not all nodes will go down at the same time If there is a corresponding replica set (or replication controller), then a new copy of the pod will be started on a different node.

If the reboot takes longer (the default time is 5 minutes, controlled by -pod-eviction-timeout on the controller-manager), then the node controller will terminate the pods that are bound to the unavailable node. If you need to reboot a node (such as for a kernel upgrade, libc upgrade, hardware repair, etc.), and the downtime is brief, then when the Kubelet restarts, it will attempt to restart the pods scheduled to it.
