Deploy Talos Linux with Local VIP, Tailscale, Longhorn, MetalLB and Traefik
I wrote a dumb little script that does most of this:
https://github.com/joshrnoll/talos-scripts
Prerequisites
- Install talosctl:
brew install siderolabs/tap/talosctl
- Boot a VM to Talos ISO https://www.talos.dev/v1.9/introduction/getting-started/
- Decide on a cluster endpoint IP – this will be the VIP of the cluster. This IP should be within the same subnet that your nodes will be in (providing layer2 connectivity)
- Decide on a cluster name. This is just a friendly name for your cluster like ’nollhomelab'
Generate Config
- Go to factory.talos.dev to generate the image with desired extensions – in this case, iscsi-tools and tailscale
- Copy the schematic into a file schematic.yaml
1customization:
2 systemExtensions:
3 officialExtensions:
4 - siderolabs/iscsi-tools
5 - siderolabs/tailscale
- Use this file to get the schematic ID from factory.talos.dev via curl
curl -X POST --data-binary @schematic.yaml https://factory.talos.dev/schematics
- Then copy the ID from the output:
{"id":"e2e3b54334c85fdef4d78e88f880d185e0ce0ba0c9b5861bb5daa1cd6574db9b"}
Using this you can construct the install image url in this format:
factory.talos.dev/installer/{{ schematic ID }}:{{ talos version }}
example:
factory.talos.dev/installer/e2e3b54334c85fdef4d78e88f880d185e0ce0ba0c9b5861bb5daa1cd6574db9b:v1.9.2
- Generate the config with:
talosctl gen config <name-of-your-cluster> https://<ip-address-of-first-node>:6443 --install-image=factory.talos.dev/installer/e2e3b54334c85fdef4d78e88f880d185e0ce0ba0c9b5861bb5daa1cd6574db9b:v1.9.2
- In the controlplane.yaml file, add the network interface configuration under the machine: network: section – it will look like this:
1 # be sure to remove {} after network:
2 network:
3 interfaces:
4 - deviceSelector:
5 physical: true
6 dhcp: true
7 vip:
8 ip: 10.0.30.25
*note that the ‘physical: true’ section will select any physical network hardware – this works when the machine has only one network interface
from the docs – Since VIP functionality relies on etcd for elections, the shared IP will not come alive until after you have bootstrapped Kubernetes.
Apply and bootsrap
- Apply the controlplane config to the first control node
talosctl apply-config -f controlplane.yaml --insecure -n <ip-address-of-first-node>
- Bootstrap the cluster
talosctl bootstrap -n <ip-address-of-node> -e <ip-address-of-node> --talosconfig=./talosconfig
- Get kubeconfig
talosctl kubeconfig -n <ip-address-of-node> -e <ip-address-of-VIP> --talosconfig=./talosconfig
Install Longhorn
- Apply longhorn mounts
talosctl patch machineconfig -p @longhorn-mounts.yaml -n <node-ip>
- Create longhorn namespace and add pod security labels
kubectl create ns longhorn-system && kubectl label namespace longhorn-system pod-security.kubernetes.io/enforce=privileged
- Install longhorn
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.7.2/deploy/longhorn.yaml
- Apply longhorn pod security policies
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/podsecuritypolicy.yaml
- Verify longhorn was installed
kubectl get pods \
--namespace longhorn-system \
--watch
- Output should look like this
NAME READY STATUS RESTARTS AGE
longhorn-ui-b7c844b49-w25g5 1/1 Running 0 2m41s
longhorn-manager-pzgsp 1/1 Running 0 2m41s
longhorn-driver-deployer-6bd59c9f76-lqczw 1/1 Running 0 2m41s
longhorn-csi-plugin-mbwqz 2/2 Running 0 100s
csi-snapshotter-588457fcdf-22bqp 1/1 Running 0 100s
csi-snapshotter-588457fcdf-2wd6g 1/1 Running 0 100s
csi-provisioner-869bdc4b79-mzrwf 1/1 Running 0 101s
csi-provisioner-869bdc4b79-klgfm 1/1 Running 0 101s
csi-resizer-6d8cf5f99f-fd2ck 1/1 Running 0 101s
csi-provisioner-869bdc4b79-j46rx 1/1 Running 0 101s
csi-snapshotter-588457fcdf-bvjdt 1/1 Running 0 100s
csi-resizer-6d8cf5f99f-68cw7 1/1 Running 0 101s
csi-attacher-7bf4b7f996-df8v6 1/1 Running 0 101s
csi-attacher-7bf4b7f996-g9cwc 1/1 Running 0 101s
csi-attacher-7bf4b7f996-8l9sw 1/1 Running 0 101s
csi-resizer-6d8cf5f99f-smdjw 1/1 Running 0 101s
instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc 1/1 Running 0 114s
engine-image-ei-df38d2e5-cv6nc
Install Nginx Ingress for Longhorn UI
- Install the Nodeport version of nginx ingress:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/baremetal/deploy.yaml
https://kubernetes.github.io/ingress-nginx/deploy/#bare-metal-clusters https://kubernetes.github.io/ingress-nginx/deploy/baremetal/
- Create a basic auth file:
USER=<USERNAME_HERE>; PASSWORD=<PASSWORD_HERE>; echo "${USER}:$(openssl passwd -stdin -apr1 <<< ${PASSWORD})" >> auth
- Create a secret:
kubectl -n longhorn-system create secret generic basic-auth --from-file=auth
- Create an ingress manifest longhorn-ingress.yml
1apiVersion: networking.k8s.io/v1
2kind: Ingress
3metadata:
4 name: longhorn-ingress
5 namespace: longhorn-system
6 annotations:
7 # type of authentication
8 nginx.ingress.kubernetes.io/auth-type: basic
9 # prevent the controller from redirecting (308) to HTTPS
10 nginx.ingress.kubernetes.io/ssl-redirect: 'false'
11 # name of the secret that contains the user/password definitions
12 nginx.ingress.kubernetes.io/auth-secret: basic-auth
13 # message to display with an appropriate context why the authentication is required
14 nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required '
15 # custom max body size for file uploading like backing image uploading
16 nginx.ingress.kubernetes.io/proxy-body-size: 10000m
17spec:
18 ingressClassName: nginx
19 rules:
20 - http:
21 paths:
22 - pathType: Prefix
23 path: "/"
24 backend:
25 service:
26 name: longhorn-frontend
27 port:
28 number: 80
- Create the ingress
kubectl -n longhorn-system apply -f longhorn-ingress.yml
- Get the ingress IP:
kubectl -n longhorn-system get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
longhorn-ingress nginx * 10.0.30.176 80 45m
- Get the nodeport from the nginx controller:
kubectl get service -n ingress-nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ingress-nginx-controller NodePort 10.111.203.49 <none> 80:30136/TCP,443:31606/TCP 16m
ingress-nginx-controller-admission ClusterIP 10.108.52.190 <none> 443/TCP 16m
- Check connectivity by going to http://10.0.30.176:30136
MetalLb
- Install with:
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.9/config/manifests/metallb-native.yaml
- Create an IP pool and L2 advertisement resource in a YAML file metallb-config.yml:
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: lab-pool
namespace: metallb-system
spec:
addresses:
- 10.0.30.200-10.0.30.220
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: l2advertisement
namespace: metallb-system
spec:
ipAddressPools:
- lab-pool
- Run
kubectl apply -f metallb-config.yml
- Services of type
LoadBalancer
should now have an external IP assigned from the address pool:
kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 8d
nginx-service LoadBalancer 10.107.0.134 10.0.30.200 80:31000/TCP 6d23h
Installing the Tailscale Kubernetes Operator
- Create the following tags:
"tagOwners": {
"tag:k8s-operator": [],
"tag:k8s": ["tag:k8s-operator"],
}
- Create an Oauth Client with Devices Core and Auth Keys scopes.
- Add Tailscale helm repo:
helm repo add tailscale https://pkgs.tailscale.com/helmcharts && helm repo update
- Install the Tailscale operator:
helm upgrade \
--install \
tailscale-operator \
tailscale/tailscale-operator \
--namespace=tailscale \
--create-namespace \
--set-string oauth.clientId="<OAauth client ID>" \
--set-string oauth.clientSecret="<OAuth client secret>" \
--wait
- Create an authkey and kubernetes secret containing the authkey
apiVersion: v1
kind: Secret
metadata:
name: tailscale-auth
stringData:
TS_AUTHKEY: tskey-0123456789abcdef
or from cli:
kubectl create secret generic tailscale-auth --from-literal=TS_AUTHKEY='tskey-auth-authkey-goes-here'
- Create a tailscale-rbac.yml file with the following contents:
apiVersion: v1
kind: ServiceAccount
metadata:
name: tailscale
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: tailscale
rules:
- apiGroups: [""]
resourceNames: ["tailscale-auth"]
resources: ["secrets"]
verbs: ["get", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: tailscale
subjects:
- kind: ServiceAccount
name: tailscale
roleRef:
kind: Role
name: tailscale
apiGroup: rbac.authorization.k8s.io
and run kubectl apply -f tailscale-rbac.yml
8. Set the pod security context to allow privilege escalation for the tailscale namespace:
kubectl label namespace tailscale pod-security.kub
ernetes.io/enforce=privileged
Traefik
- can use Docker installation and simply enable Kubernetes ingress as a provider?
- For helm values examples – https://github.com/traefik/traefik-helm-chart/blob/master/EXAMPLES.md
- Add helm repo (Requires helm – ensure helm is installed):
helm repo add traefik https://helm.traefik.io/traefik
helm repo update
- Create traefik namespace
kubectl create namespace traefik
- Create values.yml file:
1---
2image:
3 repository: traefik
4 tag: v3.3.3
5 pullPolicy: IfNotPresent
6
7globalArguments:
8 - "--global.sendanonymoususage=false"
9 - "--global.checknewversion=false"
10
11ports:
12 websecure:
13 tls:
14 enabled: true
15 certResolver: cloudflare
16 web:
17 redirections:
18 entryPoint:
19 to: websecure
20 scheme: https
21 permanent: true
22
23persistence:
24 enabled: true
25 size: 128Mi
26 storageClass: longhorn
27
28deployment:
29 initContainers:
30 - name: volume-permissions
31 image: busybox:latest
32 command: ["sh", "-c", "touch /data/acme.json; chmod -v 600 /data/acme.json"]
33 volumeMounts:
34 - mountPath: /data
35 name: data
36
37service:
38 enabled: true
39 type: LoadBalancer
40 annotations:
41 tailscale.com/expose: "true"
42 spec:
43 loadBalancerClass: tailscale
44
45certificatesResolvers:
46 cloudflare:
47 acme:
48 email: [email protected]
49 storage: /data/acme.json
50 # caServer: https://acme-v02.api.letsencrypt.org/directory # prod (default)
51 caServer: https://acme-staging-v02.api.letsencrypt.org/directory # staging
52 dnsChallenge:
53 provider: cloudflare
54 #disablePropagationCheck: true # uncomment this if you have issues pulling certificates through cloudflare, By setting this flag to true disables the need to wait for the propagation of the TXT record to all authoritative name servers.
55 #delayBeforeCheck: 60s # uncomment along with disablePropagationCheck if needed to ensure the TXT record is ready before verification is attempted
56 resolvers:
57 - "1.1.1.1:53"
58 - "1.0.0.1:53"
59
60env:
61 - name: CF_DNS_API_TOKEN
62 valueFrom:
63 secretKeyRef:
64 key: apiKey
65 name: cloudflare-api-token
66
67extraObjects:
68 - apiVersion: v1
69 kind: Secret
70 metadata:
71 name: cloudflare-api-token
72 namespace: traefik
73 type: Opaque
74 stringData:
75 email: [email protected]
76 apiKey: VBydRIqspOKSaV_Mjk7d8_l1DKOhgyTZ7skMF8Qj
77
78logs:
79 general:
80 level: DEBUG # --> Change back to ERROR after testing
81 access:
82 enabled: false
- Install traefik
helm install --namespace=traefik traefik traefik/traefik --values=values.yaml
- Get Traefik’s tailscale IP with:
kubectl get service -n traefik
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
traefik LoadBalancer 10.98.226.75 100.115.176.29,traefik-traefik.mink-pirate.ts.net 80:32748/TCP,443:31958/TCP 26m
- Deployment example:
1---
2apiVersion: apps/v1
3kind: Deployment
4metadata:
5 name: k8s-traefik-uptime-kuma
6 namespace: uptime-kuma
7 labels:
8 app: k8s-traefik-uptime-kuma
9spec:
10 replicas: 1
11 selector:
12 matchLabels:
13 app: k8s-traefik-uptime-kuma
14 template:
15 metadata:
16 labels:
17 app: k8s-traefik-uptime-kuma
18 spec:
19 containers:
20 - name: k8s-traefik-uptime-kuma
21 image: louislam/uptime-kuma
22 ports:
23 - containerPort: 3001
24 volumeMounts: # Volume must be created along with volumeMount (see next below)
25 - name: k8s-traefik-uptime-kuma-data
26 mountPath: /app/data # Path within the container, like the right side of a docker bind mount -- /tmp/data:/app/data
27 volumes: # Defines a volume that uses an existing PVC (defined below)
28 - name: k8s-traefik-uptime-kuma-data
29 persistentVolumeClaim:
30 claimName: k8s-traefik-uptime-kuma-pvc
31---
32apiVersion: v1
33kind: Service
34metadata:
35 name: k8s-traefik-uptime-kuma-service
36 namespace: uptime-kuma
37spec:
38 selector:
39 app: k8s-traefik-uptime-kuma
40 ports:
41 - protocol: TCP
42 port: 3001
43 targetPort: 3001
44---
45apiVersion: networking.k8s.io/v1
46kind: Ingress
47metadata:
48 name: k8s-traefik-uptime-kuma-ingress
49 namespace: uptime-kuma
50spec:
51 rules:
52 - host: "k8s-traefik-uptime-kuma.nollhome.casa"
53 http:
54 paths:
55 - path: /
56 pathType: Prefix
57 backend:
58 service:
59 name: k8s-traefik-uptime-kuma-service
60 port:
61 number: 3001
62---
63apiVersion: v1
64kind: PersistentVolumeClaim
65metadata:
66 name: k8s-traefik-uptime-kuma-pvc
67 namespace: uptime-kuma
68spec:
69 accessModes:
70 - ReadWriteOnce
71 resources:
72 requests:
73 storage: 1Gi
74 storageClassName: longhorn # https://kubernetes.io/docs/concepts/storage/storage-classes/#default-storageclass
Traefik becomes the default ingress class, so no annotations are required on the ingress object. Ensure a DNS entry is created pointing k8s-traefik-uptime-kuma to Traefik’s tailscale IP. You can now reach uptime-kuma in the browser via https://k8s-traefik-uptime-kuma.nollhome.casa
Troubleshooting
Get tailscale logs:
talosctl logs ext-tailscale -e <ip-address of endpoint> -n <ip-address-of-node>