Deploy Talos Linux with Local VIP, Tailscale, Longhorn, MetalLB and Traefik

2025-05-11

I wrote a dumb little script that does most of this:

https://github.com/joshrnoll/talos-scripts

Prerequisites

Install talosctl:

brew install siderolabs/tap/talosctl

Boot a VM to Talos ISO https://www.talos.dev/v1.9/introduction/getting-started/
Decide on a cluster endpoint IP – this will be the VIP of the cluster. This IP should be within the same subnet that your nodes will be in (providing layer2 connectivity)
Decide on a cluster name. This is just a friendly name for your cluster like ’nollhomelab'

Generate Config

Go to factory.talos.dev to generate the image with desired extensions – in this case, iscsi-tools and tailscale
Copy the schematic into a file schematic.yaml

1customization:
2    systemExtensions:
3        officialExtensions:
4            - siderolabs/iscsi-tools
5            - siderolabs/tailscale

Use this file to get the schematic ID from factory.talos.dev via curl

curl -X POST --data-binary @schematic.yaml https://factory.talos.dev/schematics

Then copy the ID from the output:

{"id":"e2e3b54334c85fdef4d78e88f880d185e0ce0ba0c9b5861bb5daa1cd6574db9b"}

Using this you can construct the install image url in this format:

factory.talos.dev/installer/{{ schematic ID }}:{{ talos version }}

example:

factory.talos.dev/installer/e2e3b54334c85fdef4d78e88f880d185e0ce0ba0c9b5861bb5daa1cd6574db9b:v1.9.2

Generate the config with:

talosctl gen config <name-of-your-cluster> https://<ip-address-of-first-node>:6443 --install-image=factory.talos.dev/installer/e2e3b54334c85fdef4d78e88f880d185e0ce0ba0c9b5861bb5daa1cd6574db9b:v1.9.2

In the controlplane.yaml file, add the network interface configuration under the machine: network: section – it will look like this:

1    # be sure to remove {} after network:
2    network:
3        interfaces:
4            - deviceSelector:
5                physical: true
6              dhcp: true
7              vip:
8                ip: 10.0.30.25

*note that the ‘physical: true’ section will select any physical network hardware – this works when the machine has only one network interface

from the docs – Since VIP functionality relies on etcd for elections, the shared IP will not come alive until after you have bootstrapped Kubernetes.

Apply and bootsrap

Apply the controlplane config to the first control node

talosctl apply-config -f controlplane.yaml --insecure -n <ip-address-of-first-node>

Bootstrap the cluster

talosctl bootstrap -n <ip-address-of-node> -e <ip-address-of-node> --talosconfig=./talosconfig

Get kubeconfig

talosctl kubeconfig -n <ip-address-of-node> -e <ip-address-of-VIP> --talosconfig=./talosconfig

Install Longhorn

Apply longhorn mounts

talosctl patch machineconfig -p @longhorn-mounts.yaml -n <node-ip>

Create longhorn namespace and add pod security labels

kubectl create ns longhorn-system && kubectl label namespace longhorn-system pod-security.kubernetes.io/enforce=privileged

Install longhorn

kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.7.2/deploy/longhorn.yaml

Apply longhorn pod security policies

kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/podsecuritypolicy.yaml

Verify longhorn was installed

kubectl get pods \
--namespace longhorn-system \
--watch

Output should look like this

NAME                                                READY   STATUS    RESTARTS   AGE
longhorn-ui-b7c844b49-w25g5                         1/1     Running   0          2m41s
longhorn-manager-pzgsp                              1/1     Running   0          2m41s
longhorn-driver-deployer-6bd59c9f76-lqczw           1/1     Running   0          2m41s
longhorn-csi-plugin-mbwqz                           2/2     Running   0          100s
csi-snapshotter-588457fcdf-22bqp                    1/1     Running   0          100s
csi-snapshotter-588457fcdf-2wd6g                    1/1     Running   0          100s
csi-provisioner-869bdc4b79-mzrwf                    1/1     Running   0          101s
csi-provisioner-869bdc4b79-klgfm                    1/1     Running   0          101s
csi-resizer-6d8cf5f99f-fd2ck                        1/1     Running   0          101s
csi-provisioner-869bdc4b79-j46rx                    1/1     Running   0          101s
csi-snapshotter-588457fcdf-bvjdt                    1/1     Running   0          100s
csi-resizer-6d8cf5f99f-68cw7                        1/1     Running   0          101s
csi-attacher-7bf4b7f996-df8v6                       1/1     Running   0          101s
csi-attacher-7bf4b7f996-g9cwc                       1/1     Running   0          101s
csi-attacher-7bf4b7f996-8l9sw                       1/1     Running   0          101s
csi-resizer-6d8cf5f99f-smdjw                        1/1     Running   0          101s
instance-manager-b34d5db1fe1e2d52bcfb308be3166cfc   1/1     Running   0          114s
engine-image-ei-df38d2e5-cv6nc

Install Nginx Ingress for Longhorn UI

Install the Nodeport version of nginx ingress:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/baremetal/deploy.yaml

https://kubernetes.github.io/ingress-nginx/deploy/#bare-metal-clusters https://kubernetes.github.io/ingress-nginx/deploy/baremetal/

Create a basic auth file:

USER=<USERNAME_HERE>; PASSWORD=<PASSWORD_HERE>; echo "${USER}:$(openssl passwd -stdin -apr1 <<< ${PASSWORD})" >> auth

Create a secret:

kubectl -n longhorn-system create secret generic basic-auth --from-file=auth

Create an ingress manifest longhorn-ingress.yml

 1apiVersion: networking.k8s.io/v1
 2kind: Ingress
 3metadata:
 4  name: longhorn-ingress
 5  namespace: longhorn-system
 6  annotations:
 7    # type of authentication
 8    nginx.ingress.kubernetes.io/auth-type: basic
 9    # prevent the controller from redirecting (308) to HTTPS
10    nginx.ingress.kubernetes.io/ssl-redirect: 'false'
11    # name of the secret that contains the user/password definitions
12    nginx.ingress.kubernetes.io/auth-secret: basic-auth
13    # message to display with an appropriate context why the authentication is required
14    nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required '
15    # custom max body size for file uploading like backing image uploading
16    nginx.ingress.kubernetes.io/proxy-body-size: 10000m
17spec:
18  ingressClassName: nginx
19  rules:
20  - http:
21      paths:
22      - pathType: Prefix
23        path: "/"
24        backend:
25          service:
26            name: longhorn-frontend
27            port:
28              number: 80

Create the ingress

kubectl -n longhorn-system apply -f longhorn-ingress.yml

Get the ingress IP:

kubectl -n longhorn-system get ingress

NAME               CLASS   HOSTS   ADDRESS       PORTS   AGE
longhorn-ingress   nginx   *       10.0.30.176   80      45m

Get the nodeport from the nginx controller:

kubectl get service -n ingress-nginx

NAME                                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
ingress-nginx-controller             NodePort    10.111.203.49   <none>        80:30136/TCP,443:31606/TCP   16m
ingress-nginx-controller-admission   ClusterIP   10.108.52.190   <none>        443/TCP                      16m

Check connectivity by going to http://10.0.30.176:30136

MetalLb

Install with:

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.9/config/manifests/metallb-native.yaml

Create an IP pool and L2 advertisement resource in a YAML file metallb-config.yml:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: lab-pool
  namespace: metallb-system
spec:
  addresses:
  - 10.0.30.200-10.0.30.220
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: l2advertisement
  namespace: metallb-system
spec:
  ipAddressPools:
  - lab-pool

Run kubectl apply -f metallb-config.yml
Services of type LoadBalancer should now have an external IP assigned from the address pool:

kubectl get services
NAME            TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
kubernetes      ClusterIP      10.96.0.1      <none>        443/TCP        8d
nginx-service   LoadBalancer   10.107.0.134   10.0.30.200   80:31000/TCP   6d23h

Installing the Tailscale Kubernetes Operator

Create the following tags:

"tagOwners": {
   "tag:k8s-operator": [],
   "tag:k8s": ["tag:k8s-operator"],
}

Create an Oauth Client with Devices Core and Auth Keys scopes.
Add Tailscale helm repo:

helm repo add tailscale https://pkgs.tailscale.com/helmcharts && helm repo update

Install the Tailscale operator:

helm upgrade \
  --install \
  tailscale-operator \
  tailscale/tailscale-operator \
  --namespace=tailscale \
  --create-namespace \
  --set-string oauth.clientId="<OAauth client ID>" \
  --set-string oauth.clientSecret="<OAuth client secret>" \
  --wait

Create an authkey and kubernetes secret containing the authkey

apiVersion: v1
kind: Secret
metadata:
  name: tailscale-auth
stringData:
  TS_AUTHKEY: tskey-0123456789abcdef

or from cli:

kubectl create secret generic tailscale-auth --from-literal=TS_AUTHKEY='tskey-auth-authkey-goes-here'

Create a tailscale-rbac.yml file with the following contents:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: tailscale

---

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: tailscale
rules:
  - apiGroups: [""]
    resourceNames: ["tailscale-auth"]
    resources: ["secrets"]
    verbs: ["get", "update", "patch"]

---

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: tailscale
subjects:
  - kind: ServiceAccount
    name: tailscale
roleRef:
  kind: Role
  name: tailscale
  apiGroup: rbac.authorization.k8s.io

and run kubectl apply -f tailscale-rbac.yml 8. Set the pod security context to allow privilege escalation for the tailscale namespace:

kubectl label namespace tailscale pod-security.kub
ernetes.io/enforce=privileged

Traefik

can use Docker installation and simply enable Kubernetes ingress as a provider?
For helm values examples – https://github.com/traefik/traefik-helm-chart/blob/master/EXAMPLES.md

Add helm repo (Requires helm – ensure helm is installed):

helm repo add traefik https://helm.traefik.io/traefik
helm repo update

Create traefik namespace

kubectl create namespace traefik

Create values.yml file:

 1---
 2image:
 3  repository: traefik
 4  tag: v3.3.3
 5  pullPolicy: IfNotPresent
 6
 7globalArguments:
 8  - "--global.sendanonymoususage=false"
 9  - "--global.checknewversion=false"
10
11ports:
12  websecure:
13    tls:
14      enabled: true
15      certResolver: cloudflare
16  web:
17    redirections:
18      entryPoint:
19        to: websecure
20        scheme: https
21        permanent: true  
22
23persistence:
24  enabled: true
25  size: 128Mi
26  storageClass: longhorn
27
28deployment:
29  initContainers:
30    - name: volume-permissions
31      image: busybox:latest
32      command: ["sh", "-c", "touch /data/acme.json; chmod -v 600 /data/acme.json"]
33      volumeMounts:
34      - mountPath: /data
35        name: data
36
37service:
38  enabled: true
39  type: LoadBalancer
40  annotations:
41    tailscale.com/expose: "true"
42  spec:
43    loadBalancerClass: tailscale  
44
45certificatesResolvers:
46  cloudflare:
47    acme:
48      email: [email protected]
49      storage: /data/acme.json
50      # caServer: https://acme-v02.api.letsencrypt.org/directory # prod (default)
51      caServer: https://acme-staging-v02.api.letsencrypt.org/directory # staging
52      dnsChallenge:
53        provider: cloudflare
54        #disablePropagationCheck: true # uncomment this if you have issues pulling certificates through cloudflare, By setting this flag to true disables the need to wait for the propagation of the TXT record to all authoritative name servers.
55        #delayBeforeCheck: 60s # uncomment along with disablePropagationCheck if needed to ensure the TXT record is ready before verification is attempted 
56        resolvers:
57          - "1.1.1.1:53"
58          - "1.0.0.1:53"
59
60env:
61  - name: CF_DNS_API_TOKEN
62    valueFrom:
63      secretKeyRef:
64        key: apiKey
65        name: cloudflare-api-token
66
67extraObjects:
68  - apiVersion: v1
69    kind: Secret
70    metadata:
71      name: cloudflare-api-token
72      namespace: traefik
73    type: Opaque
74    stringData:
75      email: [email protected]
76      apiKey: VBydRIqspOKSaV_Mjk7d8_l1DKOhgyTZ7skMF8Qj
77
78logs:
79  general:
80    level: DEBUG # --> Change back to ERROR after testing
81  access:
82    enabled: false

Install traefik

helm install --namespace=traefik traefik traefik/traefik --values=values.yaml

Get Traefik’s tailscale IP with:

kubectl get service -n traefik
NAME      TYPE           CLUSTER-IP     EXTERNAL-IP                                         PORT(S)                      AGE
traefik   LoadBalancer   10.98.226.75   100.115.176.29,traefik-traefik.mink-pirate.ts.net   80:32748/TCP,443:31958/TCP   26m

Deployment example:

 1---
 2apiVersion: apps/v1
 3kind: Deployment
 4metadata:
 5  name: k8s-traefik-uptime-kuma
 6  namespace: uptime-kuma
 7  labels:
 8    app: k8s-traefik-uptime-kuma
 9spec:
10  replicas: 1
11  selector:
12    matchLabels:
13      app: k8s-traefik-uptime-kuma
14  template:
15    metadata:
16      labels:
17        app: k8s-traefik-uptime-kuma
18    spec:
19      containers:
20      - name: k8s-traefik-uptime-kuma
21        image: louislam/uptime-kuma
22        ports:
23        - containerPort: 3001
24        volumeMounts: # Volume must be created along with volumeMount (see next below)
25        - name: k8s-traefik-uptime-kuma-data
26          mountPath: /app/data # Path within the container, like the right side of a docker bind mount -- /tmp/data:/app/data
27      volumes: # Defines a volume that uses an existing PVC (defined below)
28      - name: k8s-traefik-uptime-kuma-data
29        persistentVolumeClaim:
30          claimName: k8s-traefik-uptime-kuma-pvc
31---
32apiVersion: v1
33kind: Service
34metadata:
35  name: k8s-traefik-uptime-kuma-service
36  namespace: uptime-kuma
37spec:
38  selector:
39    app: k8s-traefik-uptime-kuma
40  ports:
41    - protocol: TCP
42      port: 3001
43      targetPort: 3001
44---
45apiVersion: networking.k8s.io/v1
46kind: Ingress
47metadata:
48  name: k8s-traefik-uptime-kuma-ingress
49  namespace: uptime-kuma
50spec:
51  rules:
52  - host: "k8s-traefik-uptime-kuma.nollhome.casa"
53    http:
54      paths:
55      - path: /
56        pathType: Prefix
57        backend:
58          service:
59            name: k8s-traefik-uptime-kuma-service
60            port:
61              number: 3001
62---
63apiVersion: v1
64kind: PersistentVolumeClaim
65metadata:
66  name: k8s-traefik-uptime-kuma-pvc
67  namespace: uptime-kuma
68spec:
69  accessModes:
70    - ReadWriteOnce
71  resources:
72    requests:
73      storage: 1Gi
74  storageClassName: longhorn # https://kubernetes.io/docs/concepts/storage/storage-classes/#default-storageclass

Traefik becomes the default ingress class, so no annotations are required on the ingress object. Ensure a DNS entry is created pointing k8s-traefik-uptime-kuma to Traefik’s tailscale IP. You can now reach uptime-kuma in the browser via https://k8s-traefik-uptime-kuma.nollhome.casa

Troubleshooting

Get tailscale logs:

talosctl logs ext-tailscale -e <ip-address of endpoint> -n <ip-address-of-node>

#kubernetes #talos #tailscale #traefik #longhorn #metallb