Skip to content

coder_metadata always routes to azurerm_kubernetes_cluster resource, ignoring resource_id #22103

@t1nfoil

Description

@t1nfoil

Description

Note: used Claude Opus for testing different configurations + documentation

When a Coder workspace template contains an azurerm_kubernetes_cluster resource, all coder_metadata items are unconditionally routed to that Kubernetes cluster resource in the Coder UI, regardless of what resource_id is set to in the Terraform configuration. This causes metadata to not display on the workspace card (agent card), because it ends up attached to a non-agent resource that isn't prominently displayed.

Evidence: side-by-side comparison

We have two templates that are structurally identical. Both use the same ansible submodule to provision a services VM with the Coder agent. The only difference is that the AKS template adds azurerm_kubernetes_cluster and azurerm_kubernetes_cluster_node_pool resources.

Working template (Azure VM -- no Kubernetes cluster resource)

  • coder_metadata.network with resource_id = module.ansible.resources.agent_installation[0].id
  • Coder API (/api/v2/workspacebuilds/{id}/resources) shows metadata attached to terraform_data.agent_installation (where the agent lives)
  • Result: Metadata is visible on the workspace card

Broken template (AKS -- has Kubernetes cluster resource)

  • Identical coder_metadata.network with identical resource_id = module.ansible.resources.agent_installation[0].id
  • Coder API shows metadata attached to azurerm_kubernetes_cluster.cluster instead
  • Result: Metadata is NOT visible on the workspace card

Minimal reproduction

Template that works (no azurerm_kubernetes_cluster):

  resource "azurerm_linux_virtual_machine" "services" {                                                                                                                                                                                     
    name                  = "my-services-vm"                                                                                                                                                                                                
    # ... standard VM configuration
  }

  resource "coder_agent" "main" {
    os   = "linux"
    arch = "amd64"
    auth = "token"
  }

  resource "coder_agent_instance" "main" {
    agent_id    = coder_agent.main.id
    instance_id = azurerm_linux_virtual_machine.services.id
  }

  # Ansible submodule provisions the VM via SSH and installs the Coder agent.
  # It creates a terraform_data resource for the agent installation and
  # returns it as module.ansible.resources.agent_installation[0].
  module "ansible" {
    source = "path/to/services-vm-ansible"

    # ... connection and configuration inputs

    coder_agent_configuration = {
      enable_coder_agent = true
      coder_agent_token  = coder_agent.main.token
      coder_agent_url    = data.coder_workspace.me.access_url
    }
  }

  resource "coder_metadata" "network" {
    resource_id = module.ansible.resources.agent_installation[0].id

    item {
      key   = "VNet ID"
      value = azurerm_virtual_network.main.id
    }
    item {
      key   = "Services VM FQDN"
      value = "my-vm.example.com"
    }
  }

Result: Metadata displays on workspace card under the agent.

Template that breaks (add azurerm_kubernetes_cluster):

# Same as above, plus:

# All of the above (VM, agent, ansible module, metadata) is identical, plus:

  resource "azurerm_kubernetes_cluster" "cluster" {
    name                = "my-aks-cluster"
    location            = azurerm_resource_group.aks.location
    resource_group_name = azurerm_resource_group.aks.name
    dns_prefix          = "my-aks-cluster"

    default_node_pool {
      name       = "system"
      vm_size    = "Standard_B2ms"
      node_count = 3
    }

    identity {
      type = "SystemAssigned"
    }
  }

  resource "azurerm_kubernetes_cluster_node_pool" "user" {
    name                  = "user"
    kubernetes_cluster_id = azurerm_kubernetes_cluster.cluster.id
    vm_size               = "Standard_B8ms"
    node_count            = 3
  }

Result: Metadata is silently routed to azurerm_kubernetes_cluster.cluster instead of terraform_data.agent_installation. Not visible on workspace card.

Terraform state verification

We verified the Terraform state for each test. In every case, coder_metadata.network.resource_id contained the correct UUID matching the intended target resource. For example, when targeting terraform_data.agent_installation[0]::

{
  "type": "coder_metadata",
  "name": "network",
  "instances": [
    {
      "attributes": {
        "id": "58ab4b1e-2530-4506-a671-aa42b84824a6",
        "resource_id": "e1284081-7b81-418d-eb5f-b2e68adaa9d1",
        "hide": null,
        "icon": null,
        "item": [
          { "key": "VNet ID", "value": "/subscriptions/.../virtualNetworks/..." },
          { "key": "Services VM FQDN", "value": "my-vm.example.com" }
        ]
      }
    }
  ]
}

We confirmed e1284081-7b81-418d-eb5f-b2e68adaa9d1 matches module.ansible.terraform_data.agent_installation[0].id in the state.

However, the Coder API response at GET /api/v2/workspacebuilds/{build_id}/resources returns:

terraform_data.agent_installation  [HAS AGENT, 0 metadata items]
azurerm_kubernetes_cluster.cluster [0 agents, 12 metadata items]  <-- WRONG

The metadata should be on terraform_data.agent_installation (where the agent is), not on azurerm_kubernetes_cluster.cluster.

Exhaustive testing of resource_id targets

We tested every reasonable resource_id target. None of them changed the outcome -- metadata always ended up on azurerm_kubernetes_cluster.cluster:

resource_id value Expected target Actual target (Coder API)
module.ansible.resources.agent_installation[0].id terraform_data.agent_installation azurerm_kubernetes_cluster.cluster
coder_agent.main.id coder_agent.main azurerm_kubernetes_cluster.cluster
azurerm_linux_virtual_machine.services.id azurerm_linux_virtual_machine.services azurerm_kubernetes_cluster.cluster

(We also learned from terraform-provider-coder#305 that coder_* resources are not supported resource_id targets, but the other two should work.)

Provider version testing

We tested multiple combinations of provider versions with no change in behavior:

Coder provider azurerm provider Result
v2.13.1 v4.60.0 Metadata on kubernetes cluster
v2.13.1 v4.1.0 Metadata on kubernetes cluster
v2.10.0 v4.60.0 Metadata on kubernetes cluster
v2.10.0 v4.1.0 Metadata on kubernetes cluster

Other things tested (no effect)

  • Verifying the Coder agent is running and connected on the services VM -- agent works fine, metadata batches update successfully in agent logs

Environment

Component Version
Coder server v2.24.3
Coder Terraform provider Tested with v2.10.0, v2.13.1
azurerm Terraform provider Tested with v4.1.0, v4.60.0
Terraform v1.12.2

Using this template:

terraform {
  required_providers {
    coder = {
      source = "coder/coder"
    }
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.1"
    }
    kubernetes = {
      source = "hashicorp/kubernetes"
    }
  }
}

data "coder_workspace" "me" {}
data "coder_workspace_owner" "me" {}

data "kubernetes_secret" "azure-credentials" {
  metadata {
    name      = "azure-credentials"
    namespace = "coder"
  }
}

provider "azurerm" {
  subscription_id = "<redacted>"
  tenant_id       = data.kubernetes_secret.azure-credentials.data["tenant_id"]
  client_id       = data.kubernetes_secret.azure-credentials.data["client_id"]
  client_secret   = data.kubernetes_secret.azure-credentials.data["client_secret"]
  features {}
}

locals {
  name     = lower("meta-test-${data.coder_workspace.me.name}")
  location = "eastus"
  tags = {
    owner = data.coder_workspace_owner.me.name
    tool  = "coder-bug-repro"
  }
}

# ============================================================
# Resource Group
# ============================================================
resource "azurerm_resource_group" "main" {
  name     = "rg-${local.name}"
  location = local.location
  tags     = local.tags
}

# ============================================================
# Network (minimal)
# ============================================================
resource "azurerm_virtual_network" "main" {
  name                = "${local.name}-vnet"
  location            = local.location
  resource_group_name = azurerm_resource_group.main.name
  address_space       = ["10.0.0.0/8"]
  tags                = local.tags
}

resource "azurerm_subnet" "cluster" {
  name                 = "${local.name}-subnet"
  resource_group_name  = azurerm_resource_group.main.name
  virtual_network_name = azurerm_virtual_network.main.name
  address_prefixes     = ["10.1.0.0/16"]
}

# ============================================================
# AKS Cluster (the resource that triggers the bug)
# ============================================================
resource "azurerm_kubernetes_cluster" "cluster" {
  name                = local.name
  location            = local.location
  resource_group_name = azurerm_resource_group.main.name
  dns_prefix          = local.name

  default_node_pool {
    name           = "system"
    vm_size        = "Standard_B2s"
    node_count     = 1
    vnet_subnet_id = azurerm_subnet.cluster.id
  }

  identity {
    type = "SystemAssigned"
  }

  tags = local.tags
}

# ============================================================
# Coder Agent (runs locally via terraform_data, no real VM needed)
# ============================================================
resource "coder_agent" "main" {
  os   = "linux"
  arch = "amd64"
  auth = "token"

  metadata {
    display_name = "Test"
    key          = "test"
    script       = "echo hello"
    interval     = 60
    timeout      = 1
  }
}

# Use a terraform_data resource to simulate the agent installation
# (same pattern as our ansible submodule uses in production)
resource "terraform_data" "agent_installation" {
  input = {
    agent_token = coder_agent.main.token
  }
}

resource "coder_agent_instance" "main" {
  agent_id    = coder_agent.main.id
  instance_id = azurerm_kubernetes_cluster.cluster.id
}

# ============================================================
# coder_metadata - THIS IS WHAT WE'RE TESTING
# resource_id targets terraform_data.agent_installation
# but Coder routes it to azurerm_kubernetes_cluster.cluster
# ============================================================
resource "coder_metadata" "info" {
  resource_id = terraform_data.agent_installation.id

  item {
    key   = "VNet ID"
    value = azurerm_virtual_network.main.id
  }
  item {
    key   = "Cluster Name"
    value = azurerm_kubernetes_cluster.cluster.name
  }
  item {
    key   = "Resource Group"
    value = azurerm_resource_group.main.name
  }
  item {
    key   = "Bug Repro"
    value = "If you see this on azurerm_kubernetes_cluster instead of terraform_data.agent_installation, the bug is confirmed"
  }
}

Screen shot of the above basecase

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions