Skip to content

Comments

Hyper core balancer#21

Merged
ddemlow merged 13 commits intoScaleComputing:masterfrom
wvcollenburg:HyperCoreBalancer
Nov 4, 2025
Merged

Hyper core balancer#21
ddemlow merged 13 commits intoScaleComputing:masterfrom
wvcollenburg:HyperCoreBalancer

Conversation

@wvcollenburg
Copy link
Contributor

A script to load balance vm's across a cluster based on CPU, including anti affinity, and making sure a threshold can be set for maximum RAM consumption (defaults to 70% as per discussion with Jones.

Anti affinity can be created by tagging vm's that should not run on the same node with anti_[name_of_vm]
So if per example i have an SQL always-on cluster and want to make sure they stay separated i can tag them as follows:
server SQL01 would get a tag 'anti_SQL02' and server SQL02 would get a tag 'anti_SQL01'

@wvancollenburg
Copy link
Contributor

Made some last minute changes, replaced the readme with a more readable one, and added some error handling for failure scenarios.

Based on recommendations by Dave, i added a check for environment vars using the same format as Ansible. Also added the option to exclude one or more nodes from being evaluated, for cornercases where one of the nodes in a cluster should be dedicated to one task (e.g. single GPU node in cluster or Oracle licensing etc)
@wvancollenburg
Copy link
Contributor

Based on recommendations by Dave, i added a check for environment vars using the same format as Ansible. Also added the option to exclude one or more nodes from being evaluated, for corner cases where one of the nodes in a cluster should be dedicated to one task (e.g. single GPU node in cluster or Oracle licensing etc)

Made all vars configurable through env vars.
When removing preferred node an error message appears on cluster because it does not know where to move it.
changed node_101 to node_1 to more clearly explain how the node_x should be formatted to work.
Fixed an error where a node that is excluded will still be determined as the coolest node.
check for upgrade in progress
when primary node down connect to others
updated readme.md
Also cleaned up some comments that were there for debugging purposes.
@ddemlow
Copy link
Member

ddemlow commented Nov 4, 2025

i've tested this and it works as expected for me @mrmcphail

@ddemlow ddemlow self-assigned this Nov 4, 2025
@ddemlow ddemlow self-requested a review November 4, 2025 14:49
Copy link
Member

@ddemlow ddemlow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested latest commit - works as expected

@ddemlow ddemlow merged commit 2efd478 into ScaleComputing:master Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants