-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Description
Note: used Claude Opus for testing different configurations + documentation
When a Coder workspace template contains an azurerm_kubernetes_cluster resource, all coder_metadata items are unconditionally routed to that Kubernetes cluster resource in the Coder UI, regardless of what resource_id is set to in the Terraform configuration. This causes metadata to not display on the workspace card (agent card), because it ends up attached to a non-agent resource that isn't prominently displayed.
Evidence: side-by-side comparison
We have two templates that are structurally identical. Both use the same ansible submodule to provision a services VM with the Coder agent. The only difference is that the AKS template adds azurerm_kubernetes_cluster and azurerm_kubernetes_cluster_node_pool resources.
Working template (Azure VM -- no Kubernetes cluster resource)
coder_metadata.networkwithresource_id = module.ansible.resources.agent_installation[0].id- Coder API (
/api/v2/workspacebuilds/{id}/resources) shows metadata attached toterraform_data.agent_installation(where the agent lives) - Result: Metadata is visible on the workspace card
Broken template (AKS -- has Kubernetes cluster resource)
- Identical
coder_metadata.networkwith identicalresource_id = module.ansible.resources.agent_installation[0].id - Coder API shows metadata attached to
azurerm_kubernetes_cluster.clusterinstead - Result: Metadata is NOT visible on the workspace card
Minimal reproduction
Template that works (no azurerm_kubernetes_cluster):
resource "azurerm_linux_virtual_machine" "services" {
name = "my-services-vm"
# ... standard VM configuration
}
resource "coder_agent" "main" {
os = "linux"
arch = "amd64"
auth = "token"
}
resource "coder_agent_instance" "main" {
agent_id = coder_agent.main.id
instance_id = azurerm_linux_virtual_machine.services.id
}
# Ansible submodule provisions the VM via SSH and installs the Coder agent.
# It creates a terraform_data resource for the agent installation and
# returns it as module.ansible.resources.agent_installation[0].
module "ansible" {
source = "path/to/services-vm-ansible"
# ... connection and configuration inputs
coder_agent_configuration = {
enable_coder_agent = true
coder_agent_token = coder_agent.main.token
coder_agent_url = data.coder_workspace.me.access_url
}
}
resource "coder_metadata" "network" {
resource_id = module.ansible.resources.agent_installation[0].id
item {
key = "VNet ID"
value = azurerm_virtual_network.main.id
}
item {
key = "Services VM FQDN"
value = "my-vm.example.com"
}
}Result: Metadata displays on workspace card under the agent.
Template that breaks (add azurerm_kubernetes_cluster):
# Same as above, plus:
# All of the above (VM, agent, ansible module, metadata) is identical, plus:
resource "azurerm_kubernetes_cluster" "cluster" {
name = "my-aks-cluster"
location = azurerm_resource_group.aks.location
resource_group_name = azurerm_resource_group.aks.name
dns_prefix = "my-aks-cluster"
default_node_pool {
name = "system"
vm_size = "Standard_B2ms"
node_count = 3
}
identity {
type = "SystemAssigned"
}
}
resource "azurerm_kubernetes_cluster_node_pool" "user" {
name = "user"
kubernetes_cluster_id = azurerm_kubernetes_cluster.cluster.id
vm_size = "Standard_B8ms"
node_count = 3
}Result: Metadata is silently routed to azurerm_kubernetes_cluster.cluster instead of terraform_data.agent_installation. Not visible on workspace card.
Terraform state verification
We verified the Terraform state for each test. In every case, coder_metadata.network.resource_id contained the correct UUID matching the intended target resource. For example, when targeting terraform_data.agent_installation[0]::
{
"type": "coder_metadata",
"name": "network",
"instances": [
{
"attributes": {
"id": "58ab4b1e-2530-4506-a671-aa42b84824a6",
"resource_id": "e1284081-7b81-418d-eb5f-b2e68adaa9d1",
"hide": null,
"icon": null,
"item": [
{ "key": "VNet ID", "value": "/subscriptions/.../virtualNetworks/..." },
{ "key": "Services VM FQDN", "value": "my-vm.example.com" }
]
}
}
]
}We confirmed e1284081-7b81-418d-eb5f-b2e68adaa9d1 matches module.ansible.terraform_data.agent_installation[0].id in the state.
However, the Coder API response at GET /api/v2/workspacebuilds/{build_id}/resources returns:
terraform_data.agent_installation [HAS AGENT, 0 metadata items]
azurerm_kubernetes_cluster.cluster [0 agents, 12 metadata items] <-- WRONG
The metadata should be on terraform_data.agent_installation (where the agent is), not on azurerm_kubernetes_cluster.cluster.
Exhaustive testing of resource_id targets
We tested every reasonable resource_id target. None of them changed the outcome -- metadata always ended up on azurerm_kubernetes_cluster.cluster:
resource_id value |
Expected target | Actual target (Coder API) |
|---|---|---|
module.ansible.resources.agent_installation[0].id |
terraform_data.agent_installation |
azurerm_kubernetes_cluster.cluster |
coder_agent.main.id |
coder_agent.main |
azurerm_kubernetes_cluster.cluster |
azurerm_linux_virtual_machine.services.id |
azurerm_linux_virtual_machine.services |
azurerm_kubernetes_cluster.cluster |
(We also learned from terraform-provider-coder#305 that coder_* resources are not supported resource_id targets, but the other two should work.)
Provider version testing
We tested multiple combinations of provider versions with no change in behavior:
| Coder provider | azurerm provider | Result |
|---|---|---|
| v2.13.1 | v4.60.0 | Metadata on kubernetes cluster |
| v2.13.1 | v4.1.0 | Metadata on kubernetes cluster |
| v2.10.0 | v4.60.0 | Metadata on kubernetes cluster |
| v2.10.0 | v4.1.0 | Metadata on kubernetes cluster |
Other things tested (no effect)
- Verifying the Coder agent is running and connected on the services VM -- agent works fine, metadata batches update successfully in agent logs
Environment
| Component | Version |
|---|---|
| Coder server | v2.24.3 |
| Coder Terraform provider | Tested with v2.10.0, v2.13.1 |
| azurerm Terraform provider | Tested with v4.1.0, v4.60.0 |
| Terraform | v1.12.2 |
Using this template:
terraform {
required_providers {
coder = {
source = "coder/coder"
}
azurerm = {
source = "hashicorp/azurerm"
version = "~> 4.1"
}
kubernetes = {
source = "hashicorp/kubernetes"
}
}
}
data "coder_workspace" "me" {}
data "coder_workspace_owner" "me" {}
data "kubernetes_secret" "azure-credentials" {
metadata {
name = "azure-credentials"
namespace = "coder"
}
}
provider "azurerm" {
subscription_id = "<redacted>"
tenant_id = data.kubernetes_secret.azure-credentials.data["tenant_id"]
client_id = data.kubernetes_secret.azure-credentials.data["client_id"]
client_secret = data.kubernetes_secret.azure-credentials.data["client_secret"]
features {}
}
locals {
name = lower("meta-test-${data.coder_workspace.me.name}")
location = "eastus"
tags = {
owner = data.coder_workspace_owner.me.name
tool = "coder-bug-repro"
}
}
# ============================================================
# Resource Group
# ============================================================
resource "azurerm_resource_group" "main" {
name = "rg-${local.name}"
location = local.location
tags = local.tags
}
# ============================================================
# Network (minimal)
# ============================================================
resource "azurerm_virtual_network" "main" {
name = "${local.name}-vnet"
location = local.location
resource_group_name = azurerm_resource_group.main.name
address_space = ["10.0.0.0/8"]
tags = local.tags
}
resource "azurerm_subnet" "cluster" {
name = "${local.name}-subnet"
resource_group_name = azurerm_resource_group.main.name
virtual_network_name = azurerm_virtual_network.main.name
address_prefixes = ["10.1.0.0/16"]
}
# ============================================================
# AKS Cluster (the resource that triggers the bug)
# ============================================================
resource "azurerm_kubernetes_cluster" "cluster" {
name = local.name
location = local.location
resource_group_name = azurerm_resource_group.main.name
dns_prefix = local.name
default_node_pool {
name = "system"
vm_size = "Standard_B2s"
node_count = 1
vnet_subnet_id = azurerm_subnet.cluster.id
}
identity {
type = "SystemAssigned"
}
tags = local.tags
}
# ============================================================
# Coder Agent (runs locally via terraform_data, no real VM needed)
# ============================================================
resource "coder_agent" "main" {
os = "linux"
arch = "amd64"
auth = "token"
metadata {
display_name = "Test"
key = "test"
script = "echo hello"
interval = 60
timeout = 1
}
}
# Use a terraform_data resource to simulate the agent installation
# (same pattern as our ansible submodule uses in production)
resource "terraform_data" "agent_installation" {
input = {
agent_token = coder_agent.main.token
}
}
resource "coder_agent_instance" "main" {
agent_id = coder_agent.main.id
instance_id = azurerm_kubernetes_cluster.cluster.id
}
# ============================================================
# coder_metadata - THIS IS WHAT WE'RE TESTING
# resource_id targets terraform_data.agent_installation
# but Coder routes it to azurerm_kubernetes_cluster.cluster
# ============================================================
resource "coder_metadata" "info" {
resource_id = terraform_data.agent_installation.id
item {
key = "VNet ID"
value = azurerm_virtual_network.main.id
}
item {
key = "Cluster Name"
value = azurerm_kubernetes_cluster.cluster.name
}
item {
key = "Resource Group"
value = azurerm_resource_group.main.name
}
item {
key = "Bug Repro"
value = "If you see this on azurerm_kubernetes_cluster instead of terraform_data.agent_installation, the bug is confirmed"
}
}
Screen shot of the above basecase
