HA by default • Auto-upgrading • Cost-optimized
A highly optimized, easy-to-use, auto-upgradable Kubernetes cluster powered by k3s on MicroOS
deployed for peanuts on Hetzner Cloud
|
💖 Love this project? |
🤖 KH Assistant |
Getting Started • Features • Usage • Examples • Contributing
Hetzner Cloud offers exceptional value with data centers across Europe and the US. This project creates a highly optimized Kubernetes installation that's easy to maintain, secure, and automatically upgrades both nodes and Kubernetes—functionality similar to GKE's Auto-Pilot.
We are not Hetzner affiliates, but we strive to be the optimal solution for deploying Kubernetes on their platform.
Built on the shoulders of giants:
- openSUSE MicroOS — Immutable container OS with automatic updates
- k3s — Certified, lightweight Kubernetes distribution
| Feature | Benefit |
|---|---|
| Immutable filesystem | Most of the OS is read-only—hardened by design |
| Auto-ban abusive IPs | SSH brute-force protection out of the box |
| Rolling release | Piggybacks on openSUSE Tumbleweed—always current |
| BTRFS snapshots | Automatic rollback if updates break something |
| Kured support | Safe, HA-aware node reboots |
| Feature | Benefit |
|---|---|
| Certified Kubernetes | Automatically synced with upstream k8s |
| Single binary | Deploy with one command |
| Batteries included | Built-in helm-controller |
| Easy upgrades | Via system-upgrade-controller |
|
|
| Platform | Installation Command |
|---|---|
| Homebrew (macOS/Linux) | brew install hashicorp/tap/terraform hashicorp/tap/packer kubectl hcloud |
| Arch Linux | yay -S terraform packer kubectl hcloud |
| Debian/Ubuntu | sudo apt install terraform packer kubectl |
| Fedora/RHEL | sudo dnf install terraform packer kubectl |
| Windows | choco install terraform packer kubernetes-cli hcloud |
Required tools: terraform or tofu, packer (initial setup only), kubectl, hcloud
| 1️⃣ | Create a Hetzner project at console.hetzner.cloud and grab an API token (Read & Write) |
| 2️⃣ | Generate an SSH key pair (passphrase-less ed25519) — or see SSH options |
| 3️⃣ | Run the setup script — creates your project folder and MicroOS snapshot: |
tmp_script=$(mktemp) && curl -sSL -o "${tmp_script}" https://raw.githubusercontent.com/kube-hetzner/terraform-hcloud-kube-hetzner/master/scripts/create.sh && chmod +x "${tmp_script}" && "${tmp_script}" && rm "${tmp_script}"Fish shell version
set tmp_script (mktemp); curl -sSL -o "{tmp_script}" https://raw.githubusercontent.com/kube-hetzner/terraform-hcloud-kube-hetzner/master/scripts/create.sh; chmod +x "{tmp_script}"; bash "{tmp_script}"; rm "{tmp_script}"Save as alias for future use
alias createkh='tmp_script=$(mktemp) && curl -sSL -o "${tmp_script}" https://raw.githubusercontent.com/kube-hetzner/terraform-hcloud-kube-hetzner/master/scripts/create.sh && chmod +x "${tmp_script}" && "${tmp_script}" && rm "${tmp_script}"'What the script does
mkdir /path/to/your/new/folder
cd /path/to/your/new/folder
curl -sL https://raw.githubusercontent.com/kube-hetzner/terraform-hcloud-kube-hetzner/master/kube.tf.example -o kube.tf
curl -sL https://raw.githubusercontent.com/kube-hetzner/terraform-hcloud-kube-hetzner/master/packer-template/hcloud-microos-snapshots.pkr.hcl -o hcloud-microos-snapshots.pkr.hcl
export HCLOUD_TOKEN="your_hcloud_token"
packer init hcloud-microos-snapshots.pkr.hcl
packer build hcloud-microos-snapshots.pkr.hcl
hcloud context create <project-name>| 4️⃣ | Customize your kube.tf — full reference in terraform.md |
cd <your-project-folder>
terraform init --upgrade
terraform validate
terraform apply -auto-approve~5 minutes later: Your cluster is ready! 🎉
⚠️ Once Terraform manages your cluster, avoid manual changes in the Hetzner UI. UsehcloudCLI to inspect resources.
View cluster details:
terraform output kubeconfig
terraform output -json kubeconfig | jqssh root@<control-plane-ip> -i /path/to/private_key -o StrictHostKeyChecking=noRestrict SSH access by configuring firewall_ssh_source in your kube.tf. See SSH docs for dynamic IP handling.
kubectl --kubeconfig clustername_kubeconfig.yaml get nodesOr set it as your default:
export KUBECONFIG=/<path-to>/clustername_kubeconfig.yamlTip: If
create_kubeconfig = false, generate it manually:terraform output --raw kubeconfig > clustername_kubeconfig.yaml
Default is Flannel. Switch by setting cni_plugin to "calico" or "cilium".
Customize via cilium_values with Cilium helm values.
| Feature | Variable |
|---|---|
| Full kube-proxy replacement | disable_kube_proxy = true |
| Hubble observability | cilium_hubble_enabled = true |
Access Hubble UI:
kubectl port-forward -n kube-system service/hubble-ui 12000:80
# or with Cilium CLI:
cilium hubble uiAdjust count in any nodepool and run terraform apply. Constraints:
- First control-plane nodepool minimum: 1
- Drain nodes before removing:
kubectl drain <node-name> - Only remove nodepools from the end of the list
- Rename nodepools only when count is 0
Advanced: Replace count with a nodes map for individual node control—see kube.tf.example.
Enable with autoscaler_nodepools. Powered by Cluster Autoscaler.
⚠️ Autoscaled nodes use a snapshot from the initial control plane. Ensure disk sizes match.
Default: 3 control-planes + 3 agents with automatic upgrades.
| Control Planes | Recommendation |
|---|---|
| 3+ (odd numbers) | Full HA with quorum maintenance |
| 2 | Disable auto OS upgrades, manual maintenance |
| 1 | Development only, disable auto upgrades |
See Rancher's HA documentation.
Handled by Kured—safe, HA-aware reboots. Configure timeframes via Kured options.
Managed by system-upgrade-controller. Customize the upgrade plan template.
# Disable OS upgrades (required for <3 control planes)
automatically_upgrade_os = false
# Disable k3s upgrades
automatically_upgrade_k3s = falseManual upgrade commands
Selective k3s upgrade:
kubectl label --overwrite node <node-name> k3s_upgrade=true
kubectl label node <node-name> k3s_upgrade- # disableOr delete upgrade plans:
kubectl delete plan k3s-agent -n system-upgrade
kubectl delete plan k3s-server -n system-upgradeManual OS upgrade:
kubectl drain <node-name>
ssh root@<node-ip>
systemctl start transactional-update.service
rebootUse the kustomization_backup.yaml file created during installation:
- Copy to
kustomization.yaml - Update source URLs to latest versions
- Apply:
kubectl apply -k ./
Most components use Helm Chart definitions via k3s Helm Controller.
Configure via helm values variables:
cilium_valuestraefik_valuesnginx_valueslonghorn_valuesrancher_values
See kube.tf.example for examples.
Integrate Hetzner Robot servers via the dedicated server guide.
Use Kustomize for additional deployments:
- Create
extra-manifests/kustomization.yaml.tpl - Supports Terraform templating via
extra_kustomize_parameters - Applied after cluster setup with
kubectl apply -k
Change folder name with extra_kustomize_folder. See example.
Custom post-install actions (ArgoCD, etc.)
For CRD-dependent applications:
extra_kustomize_deployment_commands = <<-EOT
kubectl -n argocd wait --for condition=established --timeout=120s crd/appprojects.argoproj.io
kubectl -n argocd wait --for condition=established --timeout=120s crd/applications.argoproj.io
kubectl apply -f /var/user_kustomize/argocd-projects.yaml
kubectl apply -f /var/user_kustomize/argocd-application-argocd.yaml
EOTUseful Cilium commands
# Status
kubectl -n kube-system exec --stdin --tty cilium-xxxx -- cilium status --verbose
# Monitor traffic
kubectl -n kube-system exec --stdin --tty cilium-xxxx -- cilium monitor
# List services
kubectl -n kube-system exec --stdin --tty cilium-xxxx -- cilium service listCilium Egress Gateway with Floating IPs
Control outgoing traffic with static IPs:
{
name = "egress",
server_type = "cx23",
location = "nbg1",
labels = ["node.kubernetes.io/role=egress"],
taints = ["node.kubernetes.io/role=egress:NoSchedule"],
floating_ip = true,
count = 1
}Configure Cilium:
locals {
cluster_ipv4_cidr = "10.42.0.0/16"
}
cluster_ipv4_cidr = local.cluster_ipv4_cidr
cilium_values = <<-EOT
ipam:
mode: kubernetes
k8s:
requireIPv4PodCIDR: true
kubeProxyReplacement: true
routingMode: native
ipv4NativeRoutingCIDR: "10.0.0.0/8"
endpointRoutes:
enabled: true
loadBalancer:
acceleration: native
bpf:
masquerade: true
egressGateway:
enabled: true
MTU: 1450
EOTExample policy:
apiVersion: cilium.io/v2
kind: CiliumEgressGatewayPolicy
metadata:
name: egress-sample
spec:
selectors:
- podSelector:
matchLabels:
org: empire
class: mediabot
io.kubernetes.pod.namespace: default
destinationCIDRs:
- "0.0.0.0/0"
excludedCIDRs:
- "10.0.0.0/8"
egressGateway:
nodeSelector:
matchLabels:
node.kubernetes.io/role: egress
egressIP: { FLOATING_IP }TLS with Cert-Manager (recommended)
Cert-Manager handles HA certificate management (Traefik CE is stateless).
- Configure your issuer
- Add annotations to Ingress:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-ingress
annotations:
cert-manager.io/cluster-issuer: letsencrypt
spec:
tls:
- hosts:
- "*.example.com"
secretName: example-com-letsencrypt-tls
rules:
- host: "*.example.com"
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-service
port:
number: 80Full Traefik + Cert-Manager guide
Ingress-Nginx with HTTP challenge: Add
lb_hostname = "cluster.example.org"to work around this known issue.
Managing snapshots
Create:
export HCLOUD_TOKEN=<your-token>
packer build ./packer-template/hcloud-microos-snapshots.pkr.hclDelete:
hcloud image list
hcloud image delete <image-id>Single-node development cluster
Set automatically_upgrade_os = false (attached volumes don't handle auto-reboots well).
Uses k3s service load balancer instead of external LB. Ports 80 & 443 open automatically.
Terraform Cloud deployment
- Create MicroOS snapshot in your project first
- Configure SSH keys as Terraform Cloud variables (mark private key as sensitive):
ssh_public_key = var.ssh_public_key
ssh_private_key = var.ssh_private_keyPassword-protected keys: Requires
localexecution mode with your own agent.
HelmChartConfig customization
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rancher
namespace: kube-system
spec:
valuesContent: |-
# Your values.yaml customizations hereWorks for all add-ons: Longhorn, Cert-manager, Traefik, etc.
Encryption at rest (HCloud CSI)
Create secret:
apiVersion: v1
kind: Secret
metadata:
name: encryption-secret
namespace: kube-system
stringData:
encryption-passphrase: foobarCreate storage class:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: hcloud-volumes-encrypted
provisioner: csi.hetzner.cloud
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
parameters:
csi.storage.k8s.io/node-publish-secret-name: encryption-secret
csi.storage.k8s.io/node-publish-secret-namespace: kube-systemEncryption at rest (Longhorn)
Create secret:
apiVersion: v1
kind: Secret
metadata:
name: longhorn-crypto
namespace: longhorn-system
stringData:
CRYPTO_KEY_VALUE: "your-encryption-key"
CRYPTO_KEY_PROVIDER: "secret"
CRYPTO_KEY_CIPHER: "aes-xts-plain64"
CRYPTO_KEY_HASH: "sha256"
CRYPTO_KEY_SIZE: "256"
CRYPTO_PBKDF: "argon2i"Create storage class:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-crypto-global
provisioner: driver.longhorn.io
allowVolumeExpansion: true
parameters:
nodeSelector: "node-storage"
numberOfReplicas: "1"
staleReplicaTimeout: "2880"
fromBackup: ""
fsType: ext4
encrypted: "true"
csi.storage.k8s.io/provisioner-secret-name: "longhorn-crypto"
csi.storage.k8s.io/provisioner-secret-namespace: "longhorn-system"
csi.storage.k8s.io/node-publish-secret-name: "longhorn-crypto"
csi.storage.k8s.io/node-publish-secret-namespace: "longhorn-system"
csi.storage.k8s.io/node-stage-secret-name: "longhorn-crypto"
csi.storage.k8s.io/node-stage-secret-namespace: "longhorn-system"Namespace-based architecture assignment
Enable admission controllers:
k3s_exec_server_args = "--kube-apiserver-arg enable-admission-plugins=PodTolerationRestriction,PodNodeSelector"Assign namespace to architecture:
apiVersion: v1
kind: Namespace
metadata:
annotations:
scheduler.alpha.kubernetes.io/node-selector: kubernetes.io/arch=amd64
name: this-runs-on-amd64With tolerations:
apiVersion: v1
kind: Namespace
metadata:
annotations:
scheduler.alpha.kubernetes.io/node-selector: kubernetes.io/arch=arm64
scheduler.alpha.kubernetes.io/defaultTolerations: '[{ "operator" : "Equal", "effect" : "NoSchedule", "key" : "workload-type", "value" : "machine-learning" }]'
name: this-runs-on-arm64Backup and restore cluster (etcd S3)
Setup backup:
- Configure
etcd_s3_backupin kube.tf - Add k3s_token output:
output "k3s_token" {
value = module.kube-hetzner.k3s_token
sensitive = true
}Restore:
- Add restoration config to kube.tf:
locals {
k3s_token = var.k3s_token
etcd_version = "v3.5.9"
etcd_snapshot_name = "name-of-the-snapshot"
etcd_s3_endpoint = "your-s3-endpoint"
etcd_s3_bucket = "your-s3-bucket"
etcd_s3_access_key = "your-s3-access-key"
etcd_s3_secret_key = var.etcd_s3_secret_key
}
variable "k3s_token" {
sensitive = true
type = string
}
variable "etcd_s3_secret_key" {
sensitive = true
type = string
}
module "kube-hetzner" {
k3s_token = local.k3s_token
postinstall_exec = compact([
(
local.etcd_snapshot_name == "" ? "" :
<<-EOF
export CLUSTERINIT=$(cat /etc/rancher/k3s/config.yaml | grep -i '"cluster-init": true')
if [ -n "$CLUSTERINIT" ]; then
k3s server \
--cluster-reset \
--etcd-s3 \
--cluster-reset-restore-path=${local.etcd_snapshot_name} \
--etcd-s3-endpoint=${local.etcd_s3_endpoint} \
--etcd-s3-bucket=${local.etcd_s3_bucket} \
--etcd-s3-access-key=${local.etcd_s3_access_key} \
--etcd-s3-secret-key=${local.etcd_s3_secret_key}
mv /etc/rancher/k3s/k3s.yaml /etc/rancher/k3s/k3s.backup.yaml
ETCD_VER=${local.etcd_version}
case "$(uname -m)" in
aarch64) ETCD_ARCH="arm64" ;;
x86_64) ETCD_ARCH="amd64" ;;
esac;
DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download
curl -L $DOWNLOAD_URL/$ETCD_VER/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz -o /tmp/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz
tar xzvf /tmp/etcd-$ETCD_VER-linux-$ETCD_ARCH.tar.gz -C /usr/local/bin --strip-components=1
nohup etcd --data-dir /var/lib/rancher/k3s/server/db/etcd &
echo $! > save_pid.txt
etcdctl del /registry/services/specs/traefik/traefik
etcdctl del /registry/services/endpoints/traefik/traefik
OLD_NODES=$(etcdctl get "" --prefix --keys-only | grep /registry/minions/ | cut -c 19-)
for NODE in $OLD_NODES; do
for KEY in $(etcdctl get "" --prefix --keys-only | grep $NODE); do
etcdctl del $KEY
done
done
kill -9 `cat save_pid.txt`
rm save_pid.txt
fi
EOF
)
])
}- Set environment variables:
export TF_VAR_k3s_token="..."
export TF_VAR_etcd_s3_secret_key="..."- Run
terraform apply
Pre-constructed private network (proxies)
resource "hcloud_network" "k3s_proxied" {
name = "k3s-proxied"
ip_range = "10.0.0.0/8"
}
resource "hcloud_network_subnet" "k3s_proxy" {
network_id = hcloud_network.k3s_proxied.id
type = "cloud"
network_zone = "eu-central"
ip_range = "10.128.0.0/9"
}
resource "hcloud_server" "your_proxy_server" { ... }
resource "hcloud_server_network" "your_proxy_server" {
depends_on = [hcloud_server.your_proxy_server]
server_id = hcloud_server.your_proxy_server.id
network_id = hcloud_network.k3s_proxied.id
ip = "10.128.0.1"
}
module "kube-hetzner" {
existing_network_id = [hcloud_network.k3s_proxied.id] # Note: brackets required!
network_ipv4_cidr = "10.0.0.0/9"
additional_k3s_environment = {
"http_proxy" : "http://10.128.0.1:3128",
"HTTP_PROXY" : "http://10.128.0.1:3128",
"HTTPS_PROXY" : "http://10.128.0.1:3128",
"CONTAINERD_HTTP_PROXY" : "http://10.128.0.1:3128",
"CONTAINERD_HTTPS_PROXY" : "http://10.128.0.1:3128",
"NO_PROXY" : "127.0.0.0/8,10.0.0.0/8,",
}
}Placement groups
Assign nodepools to placement groups:
agent_nodepools = [
{
...
placement_group = "special"
},
]Legacy compatibility:
placement_group_compat_idx = 1For >10 nodes, use map-based definition:
agent_nodepools = [
{
nodes = {
"0" : { placement_group = "pg-1" },
"30" : { placement_group = "pg-2" },
}
},
]Disable globally: placement_group_disable = true
Migrating from count to map-based nodes
Set append_index_to_node_name = false to avoid node replacement:
agent_nodepools = [
{
name = "agent-large",
server_type = "cx33",
location = "nbg1",
labels = [],
taints = [],
nodes = {
"0" : {
append_index_to_node_name = false,
labels = ["my.extra.label=special"],
placement_group = "agent-large-pg-1",
},
"1" : {
append_index_to_node_name = false,
server_type = "cx43",
labels = ["my.extra.label=slightlybiggernode"],
placement_group = "agent-large-pg-2",
},
}
},
]Delete protection
Protect resources from accidental deletion via Hetzner Console/API:
enable_delete_protection = {
floating_ip = true
load_balancer = true
volume = true
}Note: Terraform can still delete resources (provider lifts the lock).
Private-only cluster (Wireguard)
Requirements:
- Pre-configured network
- NAT gateway with public IP (Hetzner guide)
- Wireguard VPN access (Hetzner guide)
- Route
0.0.0.0/0through NAT gateway
Configuration:
existing_network_id = [YOURID]
network_ipv4_cidr = "10.0.0.0/9"
# In all nodepools:
disable_ipv4 = true
disable_ipv6 = true
# For autoscaler:
autoscaler_disable_ipv4 = true
autoscaler_disable_ipv6 = true
# Optional private LB:
control_plane_lb_enable_public_interface = falsePrivate-only cluster (NAT Router)
Fully private setup with:
- Egress: Single NAT router IP
- SSH: Through bastion (NAT router)
- Control plane: Through LB or NAT router port forwarding
- Ingress: Through agents LB only
August 11, 2025: Hetzner removed legacy Router DHCP option. This module now automatically persists routes via the virtual gateway.
Fix SELinux issues with udica
Create targeted SELinux profiles instead of weakening cluster-wide security:
# Find container
crictl ps
# Generate inspection
crictl inspect <container-id> > container.json
# Create profile
udica -j container.json myapp --full-network-access
# Install module
semodule -i myapp.cil /usr/share/udica/templates/{base_container.cil,net_container.cil}Apply to deployment:
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
- name: my-container
securityContext:
seLinuxOptions:
type: myapp.processThanks @carolosf
hcloud context create Kube-hetzner # First time only
hcloud server list # Check nodes
hcloud network describe k3s # Check network
hcloud loadbalancer describe k3s-traefik # Check LBssh root@<control-plane-ip> -i /path/to/private_key -o StrictHostKeyChecking=no
# View k3s logs
journalctl -u k3s # Control plane
journalctl -u k3s-agent # Agent nodes
# Check config
cat /etc/rancher/k3s/config.yaml
# Check uptime
last reboot
uptimeterraform destroy -auto-approveIf destroy hangs (LB or autoscaled nodes), use the cleanup script:
tmp_script=$(mktemp) && curl -sSL -o "${tmp_script}" https://raw.githubusercontent.com/kube-hetzner/terraform-hcloud-kube-hetzner/master/scripts/cleanup.sh && chmod +x "${tmp_script}" && "${tmp_script}" && rm "${tmp_script}"
⚠️ This deletes everything including volumes. Dry-run option available.
Update version in your kube.tf and run terraform apply.
- Run
createkhto get new packer template - Update version to
>= 2.0 - Remove
extra_packages_to_installandopensuse_microos_mirror_link(moved to packer) - Run
terraform init -upgrade && terraform apply
Help wanted! Consider asking Hetzner to add MicroOS as a default image (not just ISO) at get.opensuse.org/microos. More requests = faster deployments for everyone!
- Fork the project
- Create your branch:
git checkout -b AmazingFeature - Point your kube.tf
sourceto local clone - Useful commands:
../kube-hetzner/scripts/cleanup.sh packer build ../kube-hetzner/packer-template/hcloud-microos-snapshots.pkr.hcl
- Update
kube.tf.exampleif needed - Commit:
git commit -m 'Add AmazingFeature' - Push:
git push origin AmazingFeature - Open PR targeting
stagingbranch
This project includes agent skills in .claude/skills/ — reusable workflows for any AI coding agent (Claude Code, Cursor, Windsurf, Codex, etc.):
| Skill | Purpose |
|---|---|
/kh-assistant |
Interactive help for configuration and debugging |
/fix-issue <num> |
Guided workflow for fixing GitHub issues |
/review-pr <num> |
Security-focused PR review |
/test-changes |
Run terraform fmt, validate, plan |
PRs to improve these skills are welcome! See .claude/skills/ for the skill definitions.
If Kube-Hetzner saves you time and money, please consider supporting its development:
Your sponsorship directly funds:
🐛 Bug fixes and issue response
🚀 New features and improvements
📚 Documentation maintenance
🔒 Security updates and best practices
Every contribution matters. Thank you for keeping this project alive! 🙏
- k-andy — The starting point for this project
- Best-README-Template — README inspiration
- Hetzner Cloud — Outstanding infrastructure and Terraform provider
- HashiCorp — The amazing Terraform framework
- Rancher — k3s, the heart of this project
- openSUSE — MicroOS, next-level container OS
Made with ❤️ by the Kube-Hetzner community
