Skip to content

Cache image-pull-secret reads in webhook via controller-runtime informer#7353

Merged
EngHabu merged 1 commit into
mainfrom
enghabu/webhook-secret-informer
May 7, 2026
Merged

Cache image-pull-secret reads in webhook via controller-runtime informer#7353
EngHabu merged 1 commit into
mainfrom
enghabu/webhook-secret-informer

Conversation

@EngHabu
Copy link
Copy Markdown
Contributor

@EngHabu EngHabu commented May 7, 2026

Tracking issue

Why are the changes needed?

The embedded secret manager's mutating webhook (flyteplugins/go/tasks/pluginmachinery/secret/embedded_secret_manager.go) issues two synchronous kube-apiserver Get calls per pod admission to resolve image pull secrets when EmbeddedSecretManagerConfig.ImagePullSecrets.Enabled is set:

  1. findAndAddImagePullSecret reads the reference corev1.Secret from the configured reference namespace.
  2. addImagePullSecretToPod reads the mirrored corev1.Secret from the pod's namespace (and creates it on first sight).

The controller-runtime client used for these reads is constructed via bare client.New(...), which is a direct (non-cached) client. Every admission round-trips to the API server, which is unnecessary load at scale (high pod churn across many namespaces).

What changes were proposed in this pull request?

Replace the bare client.New in flyteplugins/go/tasks/pluginmachinery/secret/secrets_injector.go with a cache-backed delegating client:

  • A controller-runtime informer cache is created and started for the lifetime of the webhook.
  • The cache watches Secrets cluster-wide; reads are served from the informer, writes (the Create of the mirrored secret) still go through the API server.
  • WaitForCacheSync blocks startup until the informer is ready, so the first admission isn't a guaranteed cache miss.

The existing in-process SecretValue cache (stdlibCache) is unrelated and is left unchanged — it caches values pulled from external secret managers (AWS / GCP / Azure / Vault), not the k8s Secret objects that back the image pull.

How was this patch tested?

  • go build ./flyteplugins/go/tasks/pluginmachinery/secret/... ./executor/... — clean.
  • go vet ./flyteplugins/go/tasks/pluginmachinery/secret/... ./executor/... — clean.
  • go test ./flyteplugins/go/tasks/pluginmachinery/secret/... -count=1 — all pass.

Manual verification of the runtime path (informer behavior under list/watch on Secrets, WaitForCacheSync blocking startup, informer-driven invalidation when reference secrets change) is recommended in a staging cluster before promoting.

Labels

  • changed

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Companion PR in the v1 fork: unionai/flyte#969

The embedded secret manager's mutating webhook issues two synchronous
kube-apiserver Gets per pod admission to resolve image pull secrets
(one against the reference namespace, one against the pod namespace
for the mirrored copy). At admission scale this is unnecessary load
on the API server.

Replace the bare controller-runtime client.New with a cache-backed
delegating client. The cache is started for the lifetime of the
webhook and watches Secrets cluster-wide; reads are served from the
informer, writes still go through the API server.

Signed-off-by: Haytham Abuelfutuh <[email protected]>
Copilot AI review requested due to automatic review settings May 7, 2026 18:36
@github-actions github-actions Bot added the flyte2 label May 7, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces kube-apiserver load from the embedded secret manager webhook by switching its controller-runtime client from direct API reads to cache-backed reads via a controller-runtime informer cache, primarily targeting repeated Secret Get calls during pod admission when image-pull-secret injection is enabled.

Changes:

  • Create and start a controller-runtime informer cache for Secrets during webhook initialization.
  • Build a controller-runtime client configured to use the informer cache for reads (while continuing to send writes directly to the API server).
  • Block webhook startup until the cache reports it has synced.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +73 to +80
go func() {
if err := secretInformerCache.Start(ctx); err != nil {
logger.Errorf(ctx, "secret informer cache stopped: %v", err)
}
}()
if !secretInformerCache.WaitForCacheSync(ctx) {
return nil, fmt.Errorf("secret informer cache failed to sync")
}
Comment on lines +66 to 87
secretInformerCache, err := ctrlcache.New(kubeConfig, ctrlcache.Options{
Scheme: ctrlRuntimeScheme,
})
if err != nil {
return nil, fmt.Errorf("failed to create informer cache: %w", err)
}

go func() {
if err := secretInformerCache.Start(ctx); err != nil {
logger.Errorf(ctx, "secret informer cache stopped: %v", err)
}
}()
if !secretInformerCache.WaitForCacheSync(ctx) {
return nil, fmt.Errorf("secret informer cache failed to sync")
}

ctrlRuntimeClient, err := client.New(kubeConfig, client.Options{
Scheme: ctrlRuntimeScheme,
Cache: &client.CacheOptions{
Reader: secretInformerCache,
},
})
Comment on lines +66 to +68
secretInformerCache, err := ctrlcache.New(kubeConfig, ctrlcache.Options{
Scheme: ctrlRuntimeScheme,
})
@EngHabu EngHabu requested review from AdilFayyaz, mhotan and pingsutw May 7, 2026 20:01
@EngHabu EngHabu merged commit 33a0fc8 into main May 7, 2026
47 of 48 checks passed
@EngHabu EngHabu deleted the enghabu/webhook-secret-informer branch May 7, 2026 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants