-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Send node info when node is switched to ephemeral #21456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds completion synchronization support to ACLK queries to ensure node information is fully transmitted to the cloud before proceeding with ephemeral node cleanup operations. This prevents race conditions where a node might be unregistered before the cloud receives its final state update.
Key changes:
- Added completion pointer field to aclk_query_t structure to track query completion
- Introduced
send_node_info_with_wait()andsend_node_update_with_wait()helper functions for synchronous ACLK operations - Updated
aclk_host_state_update()andaclk_update_node_info()signatures to accept completion parameters
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/database/sqlite/sqlite_aclk_node.h | Declares new helper functions for waiting on node info/state updates |
| src/database/sqlite/sqlite_aclk_node.c | Implements wait helpers and updates build_node_info to accept completion parameter |
| src/daemon/service.c | Uses new wait functions during orphan host cleanup to ensure cloud sync before unregistration |
| src/daemon/commands.c | Uses new wait functions in CLI commands for marking nodes ephemeral |
| src/aclk/aclk_query_queue.h | Adds completion field to aclk_query_t structure |
| src/aclk/aclk_query_queue.c | Marks completion as complete when freeing queries |
| src/aclk/aclk_contexts_api.h | Updates aclk_update_node_info signature to accept completion parameter |
| src/aclk/aclk_contexts_api.c | Passes completion to query for node info updates |
| src/aclk/aclk.h | Updates aclk_host_state_update signature to accept completion parameter |
| src/aclk/aclk.c | Propagates completion through node state update call chain |
Comments suppressed due to low confidence (1)
src/aclk/aclk_contexts_api.c:41
- If payload generation fails (line 39 returns NULL), the QUEUE_IF_PAYLOAD_PRESENT macro will call aclk_query_free(query) which will mark the completion as complete. However, this means the completion will be marked complete even though the node info was never actually sent. Callers of send_node_info_with_wait expect the wait to indicate successful transmission, but this completion will be marked on failure too. Consider either propagating the error to the caller or documenting this behavior clearly.
void aclk_update_node_info(struct update_node_info *info, struct completion *compl)
{
aclk_query_t *query = aclk_query_new(UPDATE_NODE_INFO);
query->completion = compl;
query->data.bin_payload.topic = ACLK_TOPICID_NODE_INFO;
query->data.bin_payload.payload = generate_update_node_info_message(&query->data.bin_payload.size, info);
query->data.bin_payload.msg_name = "UpdateNodeInfo";
QUEUE_IF_PAYLOAD_PRESENT(query);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fe3a80a to
5bd1982
Compare
…djust syntax inconsistencies
5bd1982 to
717646a
Compare
Summary
netdatacli remove-stale_nodenetdatacli mark-stale-nodes-ephemeralSummary by cubic
Send full node info to Cloud when a node is marked ephemeral, and wait for the update to finish before unregistering. This ensures Cloud has accurate data and avoids race conditions during ephemeral cleanup.
New Features
Refactors
Written for commit 717646a. Summary will update automatically on new commits.