Skip to content

feat: add Apache Iggy scaler#7485

Open
lordmoocow wants to merge 20 commits intokedacore:mainfrom
lordmoocow:feature/apache-iggy
Open

feat: add Apache Iggy scaler#7485
lordmoocow wants to merge 20 commits intokedacore:mainfrom
lordmoocow:feature/apache-iggy

Conversation

@lordmoocow
Copy link
Copy Markdown

@lordmoocow lordmoocow commented Feb 25, 2026

Implements an Apache Iggy scaler that autoscales workloads based on consumer group lag. The scaler connects to Iggy over the TCP binary protocol using the official Go SDK (v0.7.0), queries per-partition offsets, and computes lag as currentOffset - storedOffset for each monitored partition.

  • Scaler: Uses KEDA's TypedConfig with struct tags for metadata parsing and validation. Connects via iggycli.NewIggyClient with a cancellable context to manage the SDK's heartbeat goroutine lifecycle. Identifier objects are cached on the scaler struct to avoid per-poll allocation.
  • Lag calculation: Per-partition lags are computed sequentially (the SDK serializes requests via a mutex, so parallelisation yields no benefit). Total lag is capped at partitionCount * lagThreshold by default so desired replicas never exceed partition count, with overrides via allowIdleConsumers, limitToPartitionsWithLag, and ensureEvenDistributionOfPartitions.
  • Invalid offset handling: All GetConsumerOffset errors and nil responses are treated as "no committed offset" — the SDK doesn't expose typed errors to distinguish missing offsets from transient failures. Behavior is controlled by offsetResetPolicy and scaleToZeroOnInvalidOffset.
  • Persistent lag tracking: When excludePersistentLag is enabled, the scaler tracks storedOffset per partition across polling intervals. Partitions where the consumer offset hasn't changed are excluded from the scaling metric but still count toward activation.
  • Auth: Supports username/password or personal access token (mutually exclusive), resolved via TriggerAuthentication secret refs.
  • Tests: Unit tests covering metadata parsing, validation, and lag calculation. E2E tests covering all scaler features with isolated topics and consumer groups per scenario.
  • TLS: Not supported by the upstream SDK for TCP connections (v0.7.0). Encrypted transport requires a service mesh.

Checklist

  • When introducing a new scaler, I agree with the scaling governance policy
  • I have verified that my change is according to the deprecations & breaking changes policy
  • Tests have been added (if applicable)
  • Ensure make generate-scalers-schema has been run to update any outdated generated files
  • Changelog has been updated and is aligned with our changelog requirements, only when the change impacts end users
  • A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
  • A PR is opened to update the documentation on (repo) (if applicable)
  • Commits are signed with Developer Certificate of Origin (DCO - learn more)

Fixes #7484

Relates to kedacore/keda-docs#1708

@keda-automation keda-automation requested review from a team February 25, 2026 22:51
@snyk-io
Copy link
Copy Markdown

snyk-io bot commented Feb 25, 2026

⚠️ Snyk checks are incomplete.

Status Scan Engine Critical High Medium Low Total (0)
⚠️ Open Source Security 0 0 0 0 See details

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@github-actions
Copy link
Copy Markdown

Thank you for your contribution! 🙏

Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected.

While you are waiting, make sure to:

  • Add an entry in our changelog in alphabetical order and link related issue
  • Update the documentation, if needed
  • Add unit & e2e tests for your changes
  • GitHub checks are passing
  • Is the DCO check failing? Here is how you can fix DCO issues

Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient.

Learn more about our contribution guide.

lordmoocow and others added 11 commits February 25, 2026 23:22
Add github.com/apache/iggy/foreign/go v0.6.0 for the new Apache Iggy
scaler. This provides TCP-based client access to Iggy's consumer group
offset tracking API.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Samuel J. Williams <[email protected]>
Implement a new KEDA scaler for Apache Iggy that autoscales based on
consumer group lag over the TCP binary protocol. Includes metadata
parsing with validation, GetMetricSpecForScaling, GetMetricsAndActivity
with per-partition offset tracking, lag calculation with partition
capping, and comprehensive unit tests.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Samuel J. Williams <[email protected]>
Allow filtering which partitions are monitored for lag via
the partitionLimitation metadata parameter. Supports comma-separated
IDs and ranges (e.g., "1,2,3" or "1-4,8,10-12"). Iggy partitions
are 1-indexed.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Samuel J. Williams <[email protected]>
When enabled, caps the maximum replica count to only the number of
partitions that actually have lag > 0, rather than the total partition
count. Prevents scaling to unused replicas when most partitions are
caught up.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Samuel J. Williams <[email protected]>
Track consumer offsets across polling intervals. When a partition's
consumer offset hasn't changed since the last check, exclude its lag
from the scaling metric while still counting it for activation. This
prevents scaling up for stuck consumers where adding replicas won't
help.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Samuel J. Williams <[email protected]>
Handle new/reset consumer groups gracefully. offsetResetPolicy
(earliest/latest, default=latest) controls lag behavior when no
committed offset exists. scaleToZeroOnInvalidOffset (default=false)
controls whether to scale to zero or keep one replica alive for
new consumer groups.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Samuel J. Williams <[email protected]>
…fPartitions

allowIdleConsumers removes the partition count cap, allowing scaling
beyond the number of partitions. ensureEvenDistributionOfPartitions
rounds replica count to the nearest factor of total partitions for
balanced assignment. Validates that allowIdleConsumers and
limitToPartitionsWithLag cannot be set simultaneously.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Samuel J. Williams <[email protected]>
- Remove dead earliest branch, document SDK limitation in comment
- Fix import ordering to 3-group KEDA convention
- Cache Identifier objects on scaler struct to avoid per-poll allocation
- Add context cancellation in Close() to stop SDK heartbeat goroutine
- Document error handling assumption in GetConsumerOffset
- Include consumerGroupId in metric name for uniqueness

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Samuel J. Williams <[email protected]>
End-to-end tests covering earliest/latest offset reset policies,
scaleToZeroOnInvalidOffset, excludePersistentLag,
limitToPartitionsWithLag, and ensureEvenDistributionOfPartitions.
Each scenario uses isolated topics and consumer groups.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Samuel J. Williams <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Samuel J. Williams <[email protected]>
Signed-off-by: Samuel J. Williams <[email protected]>
@keda-automation keda-automation requested a review from a team February 26, 2026 12:04
@keda-automation keda-automation requested a review from a team February 26, 2026 12:08
@keda-automation keda-automation requested a review from a team February 26, 2026 14:40
Signed-off-by: Rick Brouwer <[email protected]>
@lordmoocow lordmoocow marked this pull request as ready for review February 26, 2026 19:56
upperBound := totalPartitions

if ensureEvenDistribution {
nextFactor := getNextFactorThatBalancesConsumersToTopicPartitions(totalLag, totalPartitions, lagThreshold)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too sure if this is acceptable or not - the getNextFactorThatBalancesConsumersToTopicPartitions is defined in kafka_scaler.go, so this is adding a dependency to some code in another scaler here?

(I was going to say the same for the offsetResetPolicy type used in the metadata but I just realised that wasn't even used in the end so I'll clean that up.)

Potentially could move this and the FindFactors function to a shared location? If I'm honest I wasn't too sure what to call it so left it where it is but thought I'd mention it in case there was a preference.

lordmoocow and others added 5 commits March 11, 2026 21:04
…offset unavailable

scaleToZeroOnInvalidOffset=true was blindly returning lag=0 when GetConsumerOffset
returned nil/error (fresh consumer group), preventing scale-up even with pending
messages. Now consults the partition's high watermark from GetTopic() to determine
actual lag, matching the approach used by the Kafka scaler's earliest-offset path.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Signed-off-by: Samuel J. Williams <[email protected]>
The v0.7.0 SDK restructures packages (iggycli→client, tcp→client/tcp)
and moves the Client interface to contracts. The heartbeat goroutine is
now managed internally by the SDK, so Close() replaces manual context
cancellation. No user-facing behavioral changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Signed-off-by: Samuel J. Williams <[email protected]>
Signed-off-by: Samuel J. Williams <[email protected]>
@lordmoocow lordmoocow force-pushed the feature/apache-iggy branch from 1ac3151 to 7b66680 Compare March 21, 2026 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Apache Iggy Scaler

2 participants