Kubernetes (K8s): a orchastration tool for containers in a cluster
scale up and down nodes
backup and restore data once failed
workload distribution accross nodes
Because the word "Kubernetes" is hard to spell right, and there are 8 letters "ubernete" between "K" and "s", "K8s" abbreviation is used.
There are many other tools in container ecosystems:
Prometheus is a monitoring tool for clusters.
Dashboard is basic UI, inside cluster, bad, don't use.
Lens is IDE-like dashboard on various OS.
grafana is a dashboard UI.
k9s is a command line dashboard UI
Terraform is a cloud service setup tool to replace graphical interface of each cloud service provider.
Note that the hostname
of a node cannot contain upper case letters to use Kubernetes. Use hostnamectl set-hostname
to change hostname
.
Add the following lines to /etc/docker/daemon.json
:
{
"insecure-registries" : ["localhost:32000"]
}
and then restart docker with: sudo systemctl restart docker
Assuming you have Ubuntu 22.04:
#!/bin/sh
# make sure docker is installed and we have permission
sudo usermod -a -G docker $USER
newgrp docker
# install microk8s to run locally
# stable version of 1.27
sudo snap install microk8s --classic --channel=1.27/stable
# setup user group
sudo usermod -a -G microk8s $USER
# refresh permission group
newgrp microk8s
# give access
sudo chown -f -R $USER ~/.kube
# check installation
microk8s status --wait-ready
# now install kubectl, although microk8s has its own, we will use system's kubectl
sudo apt-get install -y ca-certificates curl apt-transport-https
# get key
sudo mkdir /etc/apt/keyrings/
curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg
# add repo
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
# install kubectl
sudo apt-get update && sudo apt-get install -y kubectl
# now we need to connect kubectl with microk8s
# this might need to be reset when your network status changes
# by making ~/.kube/config
microk8s config > ~/.kube/config
# should display correctly
kubectl cluster-info
# install helm
curl https://baltocdn.com/helm/signing.asc | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null
sudo apt-get install -y apt-transport-https
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install -y helm
Go to Here and download the k9s_Linux_amd64.tar.gz
file, and put the file into /usr/local/bin
. That's it.
Master Node: manage Cluster Network, can have multiple master nodes
API server (entrypoint): UI, CLI, talk to etcd
Controller Manager: Kubernetes internal states, controls other controllers
Cloud Control Manager: everything has to do with 3rd party cloud service
Kube Scheduler: distribution Pods to Nodes based on
etcd: snapshot database for recovery, hold Status information for meeting Specification
addons: such as DNS, Dashboard, Cluster-level logging, resource monitoring
Worker Nodes: actual work, can have multiple containers in one node
Container Runtime: many implements Kubernetes Container Runtime Interface (CRI)
Kubelet: manage all pods lifecycle in worker node
Kube Proxy: manage network rules
Pods
Starting V1.19, Moby isn't needed as you will use
crictl
to run command directly in a node. Docker, before version 20.10, has a container runtime, but they started to use containerd as their underlying container runtime for better modularity.
Node Pool: a group of virtual machines of the same size
Application: e.g. docker, database
Cluster IP: virtual IP address in Cluster Network
Pod: an abstraction of containers (so you don't have to always use docker), ephemeral
usually one Application per Pod
IP: each pod will get a new Cluster IP address upon creation
Service: a static Cluster IP address, that is not ephemeral, usually a Service per Application
ConfigMap & Secrets: like a per-cluster .env
file
Volume: a storage (either local or remote) that is attached to a Pod for consistent data storage like database (because container themselves can't store persistent data)
Kubernetes doesn't manage data persistance
Deployments: blueprint for how to create a kind of Pods at scale.
Don't use Deployments to create Pods that is stateful (e.g. database) and need global consistency. Use StatefulSet instead.
StatefulSet: DB replication, scaling, synchronized read and writes.
In practice, don't use StatefulSet. You probably want to use 3rd party service for that.
Master can be accessed through
Configuration has 3 components
manually specified Kind: e.g. Deployment, Service
manually specified Metadata
automatically generated Specification
We directly use yaml
or json
configuration to launch Service, Deployment (which creates Pods), Secrets, and ConfigMap.
Use kubectl get all
, kubctl get secret
, and kubctl get configmap
to see all launched Components. Use kubctl get node -o wide
to see info of nodes.
You can generate
.yaml
config by usingkubectl --dry-run=client -o yaml
Configurations are hard to write correct, here are some great references:
Learn more about configurations Here
Tools:
K3s: lightweight, small production projects, pre-packed opinion
K3D: fast, preserve cluster state, wrapper of K3s in docker
MicroK8s: easy, quick feature enablement
Minikube: requires Hypervisor (like VirtualBox)
Docker Desktop: only one node, support Mac, Windows, heavy
Kind: simple, basic, can't preserve cluster state, runs on Docker Desktop
K0s
Context: which cluster you are working with and metadata associated to that context. Information stored in ~/.kube/config
Context Commands
kubectl config current-context # show which context is used currently
kubectl config get-context # show all context avaliable
kubectl config use-context [contextName] # switch to context
kubectx # a short name for "kubectl config"
kubectl config rename-context [oldName] [newName]
kubectl config delete-context [contextName]
Namespace Commands
kubectl get namespace # List all Namespace
kubectl get ns # List all Namespace
kubectl config set-context --current --namespace=[namespaceName] # switch to Namespace, all commands will send to that Namespace
kubectl create namespace [namespaceName]
kubectl delete namespace [namespaceName] # delete Namespace and all related Components
kubectl get pods --all-namespaces # list all pods in all Namespaces
kubectl get pods --namespace=[namespaceName] # list all pods in specific Namespace
kubectl get pods -n [namespaceName] # list all pods in specific Namespace
You can put
namespace
on any Component under themetadata
section (in addition toname
), so you can delete them in a batch. (Namespace is on its own a Component)
Labels: self-defined tags for each Components, work with Selectors
Selectors: use labels to filter or select Components, used to refer to Components (also you can pass --selector=[name]
to get
commands to filter stuff)
Pods Commands
kubectl create -f [fileName.yaml] # create a pod
kubectl get pods -o wide # print all pods
kubectl describe pod [podName] # show pod info
kubectl get pod [podName] -o yaml # extract pod definition in yaml
kubectl exec -it [podName] -- sh # launch interactive `sh` shell in a pod
kubectl exec -it [podName] -c [containerName] -- sh # for multi-container pod
kubectl logs [podName] -c [containerName] # getting logs for a container
kubectl delete -f [fileName.yaml] # delete a pod
kubectl delete pod [podName] # delete a pod in 30s
kubectl delete pod [podName] --wait=false # delete a pod with no wait
kubectl delete pod [podName] --grace-period=0 --force # force delete a pod
kubectl port-forward [kind]/[serviceName] [podPort]:[localPort] # ssh forward service
Node Commands
kubectl get nodes # see all Nodes
kubectl describe nodes [nodeName] # see status of a Node
Pod State:
Pending: accepted, not yet created
Running: bound to a node
Succeeded: exited with status 0
Failed: all containers exit and at least one exited with a non-zero status
Unknown: can't communicate
CrashLoopBackOff: repeatably starting and crashing
Init Containers: executable logics to run before actual app starts to run (install dependencies)
run one after the other, if one failed, restart
will not get probes (livenessProbe, readinessProbe, startupProbe)
IP Address
Containers in a Pod share IP address and Volume
Different Pods communicate through Service if within Cluster
External communication outside of Cluster goes through LoadBalancer (usually Cloud provider)
Workload: a Component that does heavy work, here are the inheritage
Since setting up database is hard and inefficient, I would prefer not to setup a 3rd party cloud service. So nothing is covered on this tutorial.
ReplicaSets Commands
kubectl get rs # list ReplicaSets
kubectl describe rs [replicaSetName]
kubectl delete -f [fileName.yaml]
kubectl delete rs [replicaSetName]
Deployment:
replicas
: number of pod instances
revisionHistoryLimit
: number of previous iterations to keep
strategy.type = RollingUpdate
: cycle through to update pods (also see maxSurge
, maxUnavailable
, default to 25%)
strategy.type = Recreate
: kill all pods before creating new ones
Deployment Commands
kubectl get rs # list ReplicaSets
kubectl get deploy # list ReHTTPplicaSets
kubectl describe deploy [deploymentName]
kubectl delete -f [fileName.yaml]
kubectl delete deploy [deploymentName]
kubectl apply -f [fileName.yaml] # to update a deployment (not `create`)
kubectl rolllout status # get the progress of the update
kubectl rollout history deployment [deploymentName] # get history
kubectl rollout undo [deploymentName] # rollback a deployment
kubectl rollout undo [deploymentName] --to-revision=[revisionNumber] # rollback to a revision
DaemonSet Commands
kubectl get ds # list ReplicaSets
kubectl describe ds [daemonSetName]
kubectl delete -f [fileName.yaml]
kubectl delete ds [daemonSetName]
Job Commands
kubectl get job # list ReplicaSets
kubectl describe job [jobName]
kubectl delete -f [fileName.yaml]
kubectl delete job [jobName]
CronJob Commands
kubectl get cj # list ReplicaSets
kubectl describe cj [cronJobName]
kubectl delete -f [fileName.yaml]
kubectl delete cj [cronJobName]
Blue: Pods in production v1
Green: Pods in new version v2
Blue-Green Deployment: when everything is ready, switch selector
of Service from v1
to v2
(switch blue and green)
downtime is minimal
requires 2x many nodes in cluster
Service: for both external and internal communication
update Service with kubectl apply -f [fileName.yaml]
command.
static IP address
static DNS name ([serviceName].[namespace].svc.cluster.local
)
types:
http://[serviceName]:[port]
, specify port
(internal Service port) and targetPort
(Pod port)nodePort
(external) statically defined or chosen from 30000-32767, reached by Node's ip.Service Commands
kubectl apply -f [fileName.yaml] # to deploy a service
kubectl get svc # list ReplicaSets
kubectl describe svc [serviceName]
kubectl delete -f [fileName.yaml]
kubectl delete svc [serviceName]
Volumes: cluster-wide storage system outside of cluster (provided by 3rd party service), vendors create plugins according to Container Storage Interface
PersistentVolumeClaim: an interface and abstraction over storage, that specify how much storage a Pod would use at maximum
PersistentVolume: cluster-wide static volumes, only one Persistent Volume Claim is allowed for one Persistent Volume, need to specify capacity
StorageClass: cluster-wide dynamic volumes, multiple Persistent Volume Claim is allowed for one Storage Class, no need to specify capacity
Reclaim Policies:
Delete: delete data upon pods deletion (default)
Retain: keep data upon pods deletion
Access Modes:
ReadWriteMany: all Pods can read and write
ReadOnlyMany: all Pods can only read
ReadWriteOnce: read-write for single Pod (that mount first), and read-only for other
States:
Avaliable: no PersistentVolumeClaim is created for a PersistentVolume or StorageClass
Bound: there are at least one PersistentVolumeClaim is created for a PersistentVolume or StorageClass
Released: PersistentVolumeClaim is deleted, but resource is not yet reclaimed by the cluster
Failed: failed
ConfigMap: Externalize Environment Configuration as a Component, can create from
Manifests
Files
Directories
To see the changes, containers have to restart and K8s will inject the environmental value. You could mount Volume in ConfigMap to make it non-static, but you need to read from files.
ConfigMap Commands
kubectl apply -f [fileName.yaml] # to deploy a ConfigMap
kubectl create cm [configMapName] --from-file=[fileName.txt] # create imperatively from file
kubectl create cm [configMapName] --from-file=[directory/] # create imperatively from directory
kubectl get cm
kubectl get cm [configMapName] -o YAML # save configMap as yaml file
kubectl delete -f [fileName.yaml]
Secrets: stored as base64 encoded string (not secure)
should protect with role-based access control (RBAC)
could store secrets elsewhere
Probes: for a Pod to know the status of a application inside container
Types of Probes
Checking Methods
Probes are not component, but used by Kubelet.
HorizontalPodAutoscaling Commands
kubectl get hpa [horizontalPodAutoscalingName]
kubectl delete hpa [horizontalPodAutoscalingName]
kubectl delete -f [fileName.yaml]
To use it, you need to install metrics-server (and add --kubelet-insdecure-tls
if on DockerDesktop)
Firstly, make sure you have no swap
sudo swapoff -a
sudo nano /etc/fstab # and remove swap
Then install kubeadm kubelet kubectl kubernetes-cni
after you set up repository following this guide on both worker nodes and the master node.
Then run sudo kubeadm init
only on master node, then join the worker nodes with the command output.
Then, copy configuration file to $HOME/.kube/config
on the master node (see Installation)
If you have problems [ERROR CRI]: container runtime is not running
, then run the following
sudo mv /etc/containerd/config.toml /etc/containerd/config.toml.bak
sudo systemctl restart containerd
sudo kubeadm init
In the end, you will get
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 104.171.203.218:6443 --token lzwo36.sjbuvacn3aube42l \
--discovery-token-ca-cert-hash sha256:b3b3db068180a8a7edf1d3025e77734974af735b421de4396ddd131ce61f4a65
Use the kubeadm join
commands on the worker nodes.
Then you can see
❯ kubectl get nodes
NAME STATUS ROLES AGE VERSION
104-171-203-218 NotReady control-plane 5m6s v1.27.4
We need to install Calico for its network security solution on master node:
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml
Then wait for a while, you should see the nodes are "Ready"
You can choose directly deploy
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/cloud/deploy.yaml
Or use helm
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
# install a new Helm package (chart) of type ingress controller and name it my-nginx-ingress
# Ingress resources will use external IP for routing traffic.
helm install my-nginx-ingress ingress-nginx/ingress-nginx --set controller.publishService.enabled=true
Then verify
kubectl get pods -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --watch
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-bktwj 0/1 Completed 0 15s
ingress-nginx-admission-patch-rmtcg 0/1 Completed 1 15s
ingress-nginx-controller-679bdfb778-zd5x9 0/1 Running 0 15s
It is kinda difficult to enable GPU on kubernetes for couple reasons:
docker system prune -a
helpful, tar
-ing file with pigz
might speedup work but I never get the format to work)nvidia-docker
, nvidia-docker2
at Here or pytorch-operator)nvcc
to compiledocker
engine instead of containerd
engine to enable GPU.Before proceed to next sections, you need to make sure your Dockerfile
is configured correctly. That means docker run -it --entrypoint /bin/bash [image-hash]
should allow you to access nvidia-smi
and pytorch should acknowledge the GPU. Note that you must use image hash for local images. (Also local images should not be tagged latest
, otherwise it will always be pulled from dockerhub)
You can check GPU avaliability by:
import torch
torch.cuda.is_available()
To remove docker container, do docker ps -a
, then docker remove [containerId]
k8s-device-plugin
Follow k8s-device-plugin instructions.
Make sure which nvidia-container-toolkit
and which nvidia-container-runtime
exist (you might want to reinstall nvidia-container-toolkit
) no matter which solution you are following.
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/libnvidia-container.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
Edit /etc/docker/daemon.json
to be: and then sudo systemctl restart docker
(or you may choose to run sudo nvidia-ctk runtime configure
instead as it might auto configure this file for you?)
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Edit /etc/containerd/config.toml
to be: and then sudo systemctl restart containerd
Also do sudo systemctl daemon-reload
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "nvidia"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
You don't need to do kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml
, as we choose to follow microk8s' setup. This is because following k8s-device-plugin
does not work fully for me.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvidia-device-plugin-daemonset
namespace: kube-system
spec:
selector:
matchLabels:
name: nvidia-device-plugin-ds
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
name: nvidia-device-plugin-ds
spec:
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
# Mark this pod as a critical add-on; when enabled, the critical add-on
# scheduler reserves resources for critical add-on pods so that they can
# be rescheduled after a failure.
# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
priorityClassName: "system-node-critical"
containers:
- image: nvcr.io/nvidia/k8s-device-plugin:v0.14.1
name: nvidia-device-plugin-ctr
env:
- name: FAIL_ON_INIT_ERROR
value: "false"
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
If you do follow above and do a kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml
(it will create a deamonset in a different namespace kubectl get namespaces
) and look at its log, you will get [factory.go:115] Incompatible platform detected
. If you ignore this problem and try to schedule a GPU job, you will get 0/1 nodes are available: 1 Insufficient nvidia.com/gpu
Use microk8s inspect
to check microk8s installation.
When you setup everything above, reboot the system, then do microk8s enable gpu
, you will get:
Infer repository core for addon gpu
Enabling NVIDIA GPU
Addon core/dns is already enabled
Addon core/helm3 is already enabled
Checking if NVIDIA driver is already installed
GPU 0: NVIDIA RTX A2000 8GB Laptop GPU (UUID: GPU-11886664-6f9b-bb01-311a-573dffe8aee4)
Using host GPU driver
"nvidia" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "nvidia" chart repository
Update Complete. ⎈Happy Helming!⎈
NAME: gpu-operator
LAST DEPLOYED: Mon Sep 4 06:29:55 2023
NAMESPACE: gpu-operator-resources
STATUS: deployed
REVISION: 1
TEST SUITE: None
NVIDIA is enabled
The command will also add many pods. You should wait for a long time (like 5min), then do microk8s kubectl logs -n gpu-operator-resources -lapp=nvidia-operator-validator -c nvidia-operator-validator
to verify successful configuration. You should get all validations are successful
.
Now, when you do kubectl describe node -A | grep nvidia
, you should see:
nvidia.com/cuda.driver.major=520
nvidia.com/cuda.driver.minor=61
nvidia.com/cuda.driver.rev=05
nvidia.com/cuda.runtime.major=11
nvidia.com/cuda.runtime.minor=8
nvidia.com/gfd.timestamp=1693823531
nvidia.com/gpu.compute.major=8
nvidia.com/gpu.compute.minor=6
nvidia.com/gpu.count=1
nvidia.com/gpu.deploy.container-toolkit=true
nvidia.com/gpu.deploy.dcgm=true
nvidia.com/gpu.deploy.dcgm-exporter=true
nvidia.com/gpu.deploy.device-plugin=true
nvidia.com/gpu.deploy.driver=true
nvidia.com/gpu.deploy.gpu-feature-discovery=true
nvidia.com/gpu.deploy.node-status-exporter=true
nvidia.com/gpu.deploy.operator-validator=true
nvidia.com/gpu.family=ampere
nvidia.com/gpu.machine=21D6004WUS
nvidia.com/gpu.memory=8192
nvidia.com/gpu.present=true
nvidia.com/gpu.product=NVIDIA-RTX-A2000-8GB-Laptop-GPU
nvidia.com/gpu.replicas=1
nvidia.com/mig.capable=false
nvidia.com/mig.strategy=single
nvidia.com/gpu: 1
nvidia.com/gpu: 1
gpu-operator-resources nvidia-container-toolkit-daemonset-kjp55 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28m
gpu-operator-resources nvidia-dcgm-exporter-kd852 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28m
gpu-operator-resources nvidia-device-plugin-daemonset-fq2rk 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28m
gpu-operator-resources nvidia-operator-validator-q9hz9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 28m
nvidia.com/gpu 0 0
#!/bin/sh
# nvidia-container-toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/libnvidia-container.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
# microk8s GPU
microk8s inspect
microk8s enable gpu
sudo nvidia-ctk runtime configure
sudo systemctl restart containerd
sudo systemctl restart docker
sudo systemctl daemon-reload
microk8s kubectl logs -n gpu-operator-resources -lapp=nvidia-operator-validator -c nvidia-operator-validator
# k9s
mkdir k9s
cd k9s
wget https://github.com/derailed/k9s/releases/download/v0.27.4/k9s_Linux_amd64.tar.gz
tar -xzf k9s_Linux_amd64.tar.gz
sudo cp k9s /usr/local/bin/k9s
cd ..
rm -rf k9s
For production and more advanced setup, follow Here
You might get the following error
Normal Scheduled 2m21s default-scheduler Successfully assigned ***-77c979f8bf-px4v9 to gke-***-389a7e33-t1hl Warning Failed 20s kubelet Error: context deadline exceeded Normal Pulled 7s (x3 over 2m20s) kubelet Container image "***" already present on machine Warning Failed 7s (x2 over 19s) kubelet Error: failed to reserve container name ***-77c979f8bf-px4v9_***": name "***-77c979f8bf-px4v9_***" is reserved for "818fcfef09165d91ac8c86ed88714bb159a8358c3eca473ec07611a51d72b140"
See Here
To deal with this issue, you might want to increase load time by appending
--runtime-request-timeout 30m0s
(See all settings Here and instructions Here) to the file/var/snap/microk8s/current/args/kubelet
. Then restartmicrok8s stop
,microk8s start
To build and use images locally, you might do the following (don't use :latest
tag):
docker save frontend:3.4 > tmp-image.tar
microk8s ctr image import tmp-image.tar
If you find your image in microk8s ctr images ls
, then microk8s can load your image.
However, docker save
and docker load
is often threshold by io, so we build a local registry: But local registry is faster to build, but slower for ContainerCreating
. So in development, don't use local registry.
# we need to enable this local registry function before hand
microk8s enable registry:size=20Gi
# to directly build image into local registry
docker build . -t localhost:32000/image_name:image_tag
# or import from existing image
docker tag some_hash_tag localhost:32000/image_name:image_tag
# finally push to registry
docker push localhost:32000/image_name
And then use image: localhost:32000/image_name:image_tag
in kubernetes. To check built image, checkout curl http://localhost:32000/v2/_catalog
Pushing to this insecure registry may fail in some versions of Docker unless the daemon is explicitly configured to trust this registry. To address this we need to edit
/etc/docker/daemon.json
and add, then dosudo systemctl restart docker
:{ "insecure-registries" : ["localhost:32000"] }
(also if docker failed to restart, you might check outdockerd
for error messages)
See Here for details.
K8s occupies a lot of disk space. We want to clean up.
To remove all stopped docker containers: docker rm $(docker ps -aq)
(usually docker ps
only display running containers, -a
will include stopped). It will automatically disallow you to remove running containers (or docker rm $(docker ps --filter "status=exited" -q)
to exclude dead
and created
containers). -q
means only display container id.
Then do docker image prune
to use unused ones if you don't want to remove them using docker image rm
manually. To be more aggressive, use docker system prune
:
WARNING! This will remove:
- all stopped containers
- all networks not used by at least one container
- all dangling images
- all dangling build cache
Similar to docker images
, you can also do microk8s ctr images ls
to see images imported to microk8s.
To remove all images in the microk8s: microk8s ctr images rm $(microk8s ctr images ls name~='localhost:32000' | awk {'print $1'})
I am too lazy to explain, but Here is a good video. Note that the volume persists even you delete
it. You cannot change sizes of your configuration .yaml
between delete
and apply
, unless your volume is empty.
Sometimes, application may request /dev/shm
instead of actual memory. So you might need to do the following (see Here, Here, and Here)
volumeMounts:
- mountPath: /dev/shm
name: dshm
...
volumes:
- name: dshm
emptyDir:
medium: Memory
sizeLimit: 32Gi
To expose ports, you need: microk8s enable ingress
. Documentation is Here. But it doesn't work since I am not using standard 80
and 443
for http web requests.
So since I am on bearmetal (not on AWS, GCP), I don't have load-balancer service. I need to use MetalLB in order to use LoadBalancer
, otherwise, you will get pending if you don't specify the externalIPs
field.
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.11/config/manifests/metallb-native.yaml
Then create these two configs
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: metallb-pool
namespace: metallb-system
spec:
addresses:
- 104.171.203.12-104.171.203.12
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: metallb-l2ad
namespace: metallb-system
You can use command
kubectl get service -n ingress-nginx
to see ifingress-nginx-controller
(LoadBalancer) has its external ip assigned to104.171.203.12
. Make sure you read Concepts
But you don't have to use LoadBalancer
, NodePort
would expose your port to host machine as long as you specify nodePort
in 30000
range. This video is helpful for setting up LoadBalancer
. Your containerPort
should match targetPort
, and your port
is what other pods can talk to this port internally in the cluster, and your nodePort
is what enables outside traffics.
Setting up a ssh server inside k8s pod is another annoying task:
ssh-keygen -A
, otherwise ssh server cannot start and CMD ["/usr/sbin/sshd", "-De"]
will fail. Also, you cannot generate these inside container because that will give you the same keys each time.root
user loginauthorized_keys
without bundling them into Dockerfile
, you can do it with Secrets, but that will give you read-only filesystem with permission too broad and ssh-server reject due to this reason.I have a private implementation at Here. I ended up with executing command using lifecycle.postStart.exec.command
to append authorized_keys
, and mount locally the generated keys into a different place using secrets then change the location of ssh_host_*
to the mounted locations.
In order to enable TLS/SSL, you need to add CA-signed certificates to k8s. We will use cert-manager to do that.
We first install cert-manager in cert-manager
namespace.
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.3/cert-manager.yaml
kubectl get pods --namespace cert-manager
# there should be cert-manager, cert-manager-cainjector, and cert-manager-webhook
WARNING: to delete cert-manager, please follow Guide carefully otherwise you might break your machine permanently.
Then we need to add ACME issuer (Automated Certificate Management Environment (ACME) Certificate Authority server). cert-manager offers two challenge validations - HTTP01 and DNS01 challenges.
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
# You must replace this email address with your own.
# Let's Encrypt will use this to contact you about expiring
# certificates, and issues related to your account.
email: [email protected]
disableAccountKeyGeneration: false # use to true if you don't want to generate key
server: https://acme-staging-v02.api.letsencrypt.org/directory # you can change to https://acme-v02.api.letsencrypt.org/directory for deployment instead of staging (testing)
privateKeySecretRef:
# Secret resource that will be used to store the account's private key. You don't need to create this resource.
name: example-issuer-account-key
# Add a single challenge solver, HTTP01 using nginx
solvers:
- http01:
ingress:
ingressClassName: nginx
kubectl get clusterissuer
cert-manager uses your existing Ingress or Gateway configuration in order to solve HTTP01 challenges.
Then re-configure the ingress like the example. You should modify as described in the comment. Make sure all domain access will return an http response (not socket or anything else).
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
# add an annotation indicating the issuer to use.
cert-manager.io/cluster-issuer: letsencrypt-staging
name: myIngress
namespace: myIngress
spec:
rules:
- host: ssl.kokecacao.me
http:
paths:
- pathType: Prefix
path: /
backend:
service:
name: myservice
port:
number: 80
tls: # < placing a host in the TLS config will determine what ends up in the cert's subjectAltNames
- hosts:
- ssl.kokecacao.me
secretName: myingress-cert # < cert-manager will store the created certificate in this secret. You don't need to create this resource.
Then you should get myingress-cert
shown when you execute the following commands
kubectl apply -f ingress.yaml
kubectl get certificates --all-namespaces
kubectl get secrets --all-namespaces
kubectl describe certificaterequest
kubectl describe order
When everything is complete change server to https://acme-v02.api.letsencrypt.org/directory
because LetsEncrypt staging is for testing.
To switch, you may want to execute
kubectl delete certificates myingress-cert
kubectl delete secrets myingress-cert
If you use cloudflare, make sure to set SSL/TLS encryption mode to strict
Table of Content