How should a loadbalancer for VanillaStack look like?

Hi there,

I want to do the external loadbalancer install.
I’ve got 3 x master and 4 x worker. All hosts are in DNS, pointing to the public interface of the host where the loadbalancer is to be installed.
What loadbalancer is recommended and how would a simple configuration concept look like?

Best regards
Uli

Hi Uli,

internal:
you need a cluster IP where all our dns records points to.
on the technical level VanillaStack deploys a haproxy and keeplived to the master nodes and balance the traffic for api and ingress.

external:
you need a external LB (Haproxy, nginx, F5, whatever) where all your dns recrds points to.
the loadbalancer needs the following rules:

LoadBalancer port 6443/tcp -> Master Nodes port 8443/tcp (Kubernetes API)
LoadBalancer port 80/tcp -> Worker Nodes port 30080/tcp (Ingress controller HTTP traffic)
LoadBalancer port 443/tcp -> Worker Nodes port 30443/tcp (Ingress controller HTTPS traffic)

that should be all.

Best regards
Kim

1 Like

Hi again,

does my external LB need SSL certs?
I assume no because the SSL termination happens in the nodes.

Best regards,
Uli

you’re right.
SSL termination is done by the API and ingress controller.

I set up my haproxy with mode tcp for Kubernetes API and Ingress controller HTTPS traffic. The install fails on the following:

TASK [deploy-kubernetes : Init Cluster] **************************************** fatal: [master1]: FAILED! => {"changed": true, "cmd": ["kubeadm", "init", "--service-cidr", "10.96.0.0/12", "--token", "f8dnz7.meo7mhofka5k5ryj", "--node-name", "master1", "--control-plane-endpoint", "api.www.sycosys.de:6443", "--upload-certs", "--apiserver-bind-port", "6443", "--certificate-key", "3b42f3b74c4fa5b9eb93ec19ba9e26045b65b42ee3af163203ac00eb4022e8be", "--ignore-preflight-errors=Port-6443", "--skip-phases=addon/kube-proxy", "--image-repository", "harbor.cloudical.net/vanillastack", "--kubernetes-version", "latest"], "delta": "0:05:24.699074", "end": "2020-10-22 14:03:49.835916", "msg": "non-zero return code", "rc": 1, "start": "2020-10-22 13:58:25.136842", "stderr": "I1022 13:58:25.889635 812 version.go:252] remote version is much newer: v1.20.0-alpha.3; falling back to: stable-1.19\nW1022 13:58:26.328007 812 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]\nerror execution phase wait-control-plane: couldn't initialize a Kubernetes cluster\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["I1022 13:58:25.889635 812 version.go:252] remote version is much newer: v1.20.0-alpha.3; falling back to: stable-1.19", "W1022 13:58:26.328007 812 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]", "error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "[init] Using Kubernetes version: v1.19.3\n[preflight] Running pre-flight checks\n[preflight] Pulling images required for setting up a Kubernetes cluster\n[preflight] This might take a minute or two, depending on the speed of your internet connection\n[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'\n[certs] Using certificateDir folder \"/etc/kubernetes/pki\"\n[certs] Generating \"ca\" certificate and key\n[certs] Generating \"apiserver\" certificate and key\n[certs] apiserver serving cert is signed for DNS names [api.www.sycosys.de kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local master1] and IPs [10.96.0.1 192.168.99.201]\n[certs] Generating \"apiserver-kubelet-client\" certificate and key\n[certs] Generating \"front-proxy-ca\" certificate and key\n[certs] Generating \"front-proxy-client\" certificate and key\n[certs] Generating \"etcd/ca\" certificate and key\n[certs] Generating \"etcd/server\" certificate and key\n[certs] etcd/server serving cert is signed for DNS names [localhost master1] and IPs [192.168.99.201 127.0.0.1 ::1]\n[certs] Generating \"etcd/peer\" certificate and key\n[certs] etcd/peer serving cert is signed for DNS names [localhost master1] and IPs [192.168.99.201 127.0.0.1 ::1]\n[certs] Generating \"etcd/healthcheck-client\" certificate and key\n[certs] Generating \"apiserver-etcd-client\" certificate and key\n[certs] Generating \"sa\" key and public key\n[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"\n[kubeconfig] Writing \"admin.conf\" kubeconfig file\n[kubeconfig] Writing \"kubelet.conf\" kubeconfig file\n[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file\n[kubeconfig] Writing \"scheduler.conf\" kubeconfig file\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Starting the kubelet\n[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"\n[control-plane] Creating static Pod manifest for \"kube-apiserver\"\n[control-plane] Creating static Pod manifest for \"kube-controller-manager\"\n[control-plane] Creating static Pod manifest for \"kube-scheduler\"\n[etcd] Creating static Pod manifest for local etcd in \"/etc/kubernetes/manifests\"\n[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s\n[kubelet-check] Initial timeout of 40s passed.\n\n\tUnfortunately, an error has occurred:\n\t\ttimed out waiting for the condition\n\n\tThis error is likely caused by:\n\t\t- The kubelet is not running\n\t\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)\n\n\tIf you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:\n\t\t- 'systemctl status kubelet'\n\t\t- 'journalctl -xeu kubelet'\n\n\tAdditionally, a control plane component may have crashed or exited when started by the container runtime.\n\tTo troubleshoot, list all containers using your preferred container runtimes CLI.\n\n\tHere is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:\n\t\t- 'crictl --runtime-endpoint /var/run/crio/crio.sock ps -a | grep kube | grep -v pause'\n\t\tOnce you have found the failing container, you can inspect its logs with:\n\t\t- 'crictl --runtime-endpoint /var/run/crio/crio.sock logs CONTAINERID'", "stdout_lines": ["[init] Using Kubernetes version: v1.19.3", "[preflight] Running pre-flight checks", "[preflight] Pulling images required for setting up a Kubernetes cluster", "[preflight] This might take a minute or two, depending on the speed of your internet connection", "[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'", "[certs] Using certificateDir folder \"/etc/kubernetes/pki\"", "[certs] Generating \"ca\" certificate and key", "[certs] Generating \"apiserver\" certificate and key", "[certs] apiserver serving cert is signed for DNS names [api.www.sycosys.de kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local master1] and IPs [10.96.0.1 192.168.99.201]", "[certs] Generating \"apiserver-kubelet-client\" certificate and key", "[certs] Generating \"front-proxy-ca\" certificate and key", "[certs] Generating \"front-proxy-client\" certificate and key", "[certs] Generating \"etcd/ca\" certificate and key", "[certs] Generating \"etcd/server\" certificate and key", "[certs] etcd/server serving cert is signed for DNS names [localhost master1] and IPs [192.168.99.201 127.0.0.1 ::1]", "[certs] Generating \"etcd/peer\" certificate and key", "[certs] etcd/peer serving cert is signed for DNS names [localhost master1] and IPs [192.168.99.201 127.0.0.1 ::1]", "[certs] Generating \"etcd/healthcheck-client\" certificate and key", "[certs] Generating \"apiserver-etcd-client\" certificate and key", "[certs] Generating \"sa\" key and public key", "[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"", "[kubeconfig] Writing \"admin.conf\" kubeconfig file", "[kubeconfig] Writing \"kubelet.conf\" kubeconfig file", "[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file", "[kubeconfig] Writing \"scheduler.conf\" kubeconfig file", "[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"", "[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"", "[kubelet-start] Starting the kubelet", "[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"", "[control-plane] Creating static Pod manifest for \"kube-apiserver\"", "[control-plane] Creating static Pod manifest for \"kube-controller-manager\"", "[control-plane] Creating static Pod manifest for \"kube-scheduler\"", "[etcd] Creating static Pod manifest for local etcd in \"/etc/kubernetes/manifests\"", "[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s", "[kubelet-check] Initial timeout of 40s passed.", "", "\tUnfortunately, an error has occurred:", "\t\ttimed out waiting for the condition", "", "\tThis error is likely caused by:", "\t\t- The kubelet is not running", "\t\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)", "", "\tIf you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:", "\t\t- 'systemctl status kubelet'", "\t\t- 'journalctl -xeu kubelet'", "", "\tAdditionally, a control plane component may have crashed or exited when started by the container runtime.", "\tTo troubleshoot, list all containers using your preferred container runtimes CLI.", "", "\tHere is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:", "\t\t- 'crictl --runtime-endpoint /var/run/crio/crio.sock ps -a | grep kube | grep -v pause'", "\t\tOnce you have found the failing container, you can inspect its logs with:", "\t\t- 'crictl --runtime-endpoint /var/run/crio/crio.sock logs CONTAINERID'"]}

Some of the corresponding error logs:

Oct 22 15:54:11 kubelet[1149]: I1022 15:54:11.112067    1149 kubelet_node_status.go:70] Attempting to register node master1
Oct 22 15:54:11 kubelet[1149]: E1022 15:54:11.113487    1149 kubelet_node_status.go:92] Unable to register node "master1" with API server: Post "https://api.www.domain.tld:6443/api/v1/nodes": EOF
Oct 22 15:54:11 kubelet[1149]: E1022 15:54:11.202281    1149 kubelet.go:2183] node "master1" not found

I’m not sure where to start debugging. Some hint welcome.

Best regards and thanks
Uli

could you jump via ssh to the master node and check which containers are running?
sudo crictl images
sudo crictl ps -a
sudo crictl --runtime-endpoint /var/run/crio/crio.sock ps -a

which linux distribution are you using?
do you have proxies in your communication flow in regards of internet connectivity?

Best regards
Kim

crictl images

harbor.cloudical.net/vanillastack/coredns                   1.7.0               bfe3a36ebd252       45.4MB
harbor.cloudical.net/vanillastack/etcd                      3.4.13-0            0369cf4303ffd       255MB
harbor.cloudical.net/vanillastack/kube-apiserver            v1.19.3             a301be0cd44bb       120MB
harbor.cloudical.net/vanillastack/kube-controller-manager   v1.19.3             9b60aca1d8180       112MB
harbor.cloudical.net/vanillastack/kube-proxy                v1.19.3             cdef7632a242b       120MB
harbor.cloudical.net/vanillastack/kube-scheduler            v1.19.3             aaefbfa906bd8       46.9MB
harbor.cloudical.net/vanillastack/pause                     3.2                 80d28bedfe5de       688kB
k8s.gcr.io/pause                                            3.2                 80d28bedfe5de       688kB

crictl ps -a

a01a5998a9a75       0369cf4303ffdb467dc219990960a9baa8512a54b0ad9283eaf55bd6c0adb934   7 minutes ago       Running             etcd                      0                   d046350236a02
00f9476c41d0f       aaefbfa906bd854407acc3495e8a3b773bb3770e4a36d836f7fd3255c299ab25   7 minutes ago       Running             kube-scheduler            0                   c87613ccf53a2
48d6dc3641979       a301be0cd44bb11162da49b9c55fc5d137f493bdefcf80226378204be403fa41   7 minutes ago       Running             kube-apiserver            0                   5d7cf042dbd84
2bd6e52666e1f       9b60aca1d8180ffa9754267ade44dbe267eadee60f2b946204b08a6c753ce6f6   7 minutes ago       Running             kube-controller-manager   0                   9412939f4f5ce

crictl --runtime-endpoint /var/run/crio/crio.sock ps -a
no difference

I’m using fedora32 no proxy except loadbalancer (haproxy) which acts as a router as well.

could you post your haproxy config?

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    log         127.0.0.1 local2

    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats

    # utilize system-wide crypto-policies
    ssl-default-bind-ciphers PROFILE=SYSTEM
    ssl-default-server-ciphers PROFILE=SYSTEM

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

listen port_80
    bind *:80
    mode http
    balance     roundrobin
    default-server check
    server  worker1 192.168.99.204:30080
    server  worker2 192.168.99.205:30080
    server  worker3 192.168.99.206:30080
    server  worker4 192.168.99.207:30080

listen port_6443
    bind *:6443
    mode tcp
    balance     roundrobin
    default-server check
    server  master1 192.168.99.201:8443
    server  master2 192.168.99.202:8443
    server  master3 192.168.99.203:8443    

listen port_443
    bind *:443
    mode tcp
    balance     roundrobin
    default-server check
    server  worker1 192.168.99.204:30443
    server  worker2 192.168.99.205:30443
    server  worker3 192.168.99.206:30443
    server  worker4 192.168.99.207:30443

I found an open port 6443 on master1. So I’m guessing listen port_6443 should be forwarded to port 6443 instead of 8443.

Works with changed port. Nevertheless I’ve got a new error that seems to be an issue already on github.

Thanks for helping
Uli