Skip to content

🐛 Set recommended leaderelection settings#1663

Merged
dtfranz merged 1 commit intooperator-framework:mainfrom
thetechnick:recommended-leaderelection-settings
Jan 30, 2025
Merged

🐛 Set recommended leaderelection settings#1663
dtfranz merged 1 commit intooperator-framework:mainfrom
thetechnick:recommended-leaderelection-settings

Conversation

@thetechnick
Copy link
Copy Markdown
Contributor

@thetechnick thetechnick commented Jan 29, 2025

Description

Extensive e2e tests revealed that operator-controller and catalogd might run into leader election timeouts during cluster bootstrap, causing sporadic alerts being generated.

This commit uses recommended settings for leaderelection:

LeaseDuration: 15s -> 137s
RenewDeadline: 10s -> 107s
RetryPeriod:    2s ->  26s

Warning: This will increase potential down-time of our components to 163s in the worst case (up from 17s). (LeaseDuration + RetryPeriod)

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@thetechnick thetechnick requested a review from a team as a code owner January 29, 2025 15:01
@netlify
Copy link
Copy Markdown

netlify bot commented Jan 29, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 92cece5
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/679a79cecd73790008a545f2
😎 Deploy Preview https://deploy-preview-1663--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@thetechnick
Copy link
Copy Markdown
Contributor Author

Cross-ref catalogd PR:
operator-framework/catalogd#519

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 29, 2025

Codecov Report

Attention: Patch coverage is 46.66667% with 8 lines in your changes missing coverage. Please review.

Project coverage is 67.37%. Comparing base (158d974) to head (92cece5).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
catalogd/cmd/catalogd/main.go 0.00% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1663      +/-   ##
==========================================
- Coverage   67.42%   67.37%   -0.05%     
==========================================
  Files          55       55              
  Lines        4632     4644      +12     
==========================================
+ Hits         3123     3129       +6     
- Misses       1284     1290       +6     
  Partials      225      225              
Flag Coverage Δ
e2e 53.29% <100.00%> (-0.01%) ⬇️
unit 54.30% <0.00%> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

// https://github.com/openshift/enhancements/blob/61581dcd985130357d6e4b0e72b87ee35394bf6e/CONVENTIONS.md#handling-kube-apiserver-disruption
LeaseDuration: ptr.To(137 * time.Second),
RenewDeadline: ptr.To(107 * time.Second),
RetryPeriod: ptr.To(26 * time.Second),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :)

camilamacedo86
camilamacedo86 previously approved these changes Jan 29, 2025
Copy link
Copy Markdown
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with those changes 👍

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2025
@thetechnick thetechnick added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Jan 29, 2025
@thetechnick thetechnick force-pushed the recommended-leaderelection-settings branch from 63e2507 to 3983002 Compare January 29, 2025 16:57
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2025
@LalatenduMohanty
Copy link
Copy Markdown
Member

@thetechnick Looks like more than 5000+ files changes for the PR, seems like a rebase gone wrong or something similar.

@camilamacedo86
Copy link
Copy Markdown
Contributor

camilamacedo86 commented Jan 29, 2025

Oh @thetechnick

I think you just overview.
This one is for upstream we should not add the vendor
The vendor is only for downstream,
Sorry :-(

_ "k8s.io/client-go/plugin/pkg/client/auth"
"k8s.io/klog/v2"
"k8s.io/klog/v2/textlogger"
"k8s.io/utils/ptr"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to try and avoid pulling in a new dependency just for a small helper that returns a pointer to value?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a new dependency:

k8s.io/utils v0.0.0-20241210054802-24370beab758

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that 5000k lines vendor tricked me :)

@joelanford
Copy link
Copy Markdown
Member

@thetechnick can you remove vendor? (we don't vendor upstream)

@tmshort
Copy link
Copy Markdown
Contributor

tmshort commented Jan 29, 2025

+1 on removing vendor

Extensive e2e tests revealed that operator-controller might run into
leader election timeouts during cluster bootstrap, causing sporadic
alerts being generated.

This commit uses recommended settings for leaderelection
LeaseDuration: 15s -> 137s
RenewDeadline: 10s -> 107s
RetryPeriod:    2s ->  26s

Warning: This will increase potential down-time of catalogd to 163s in
the worst case (up from 17s). (LeaseDuration + RetryPeriod)
@tmshort tmshort force-pushed the recommended-leaderelection-settings branch from 3983002 to 92cece5 Compare January 29, 2025 18:56
Copy link
Copy Markdown
Contributor

@camilamacedo86 camilamacedo86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2025
Copy link
Copy Markdown
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tmshort tmshort added this pull request to the merge queue Jan 29, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 29, 2025
@camilamacedo86 camilamacedo86 added this pull request to the merge queue Jan 29, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 29, 2025
@dtfranz dtfranz added this pull request to the merge queue Jan 29, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 29, 2025
@dtfranz dtfranz added this pull request to the merge queue Jan 30, 2025
Merged via the queue into operator-framework:main with commit 037b9e2 Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants