Skip to content

Conversation

@stelfrag
Copy link
Collaborator

@stelfrag stelfrag commented Dec 12, 2025

Summary
  • Submit complete node info when the node is switch to ephemeral by executing
    netdatacli remove-stale_node
    netdatacli mark-stale-nodes-ephemeral

Summary by cubic

Send full node info to Cloud when a node is marked ephemeral, and wait for the update to finish before unregistering. This ensures Cloud has accurate data and avoids race conditions during ephemeral cleanup.

  • New Features

    • Send node info when switching to ephemeral via netdatacli remove-stale_node and mark-stale-nodes-ephemeral.
    • Add completion support to ACLK queries to wait for node info/state updates.
    • New helpers: send_node_info_with_wait and send_node_update_with_wait used during orphan/ephemeral cleanup.
  • Refactors

    • aclk_host_state_update and aclk_update_node_info now accept a completion pointer; propagated through aclk_query_t and completed on query free.
    • Updated call sites to pass NULL or a completion where needed.

Written for commit 717646a. Summary will update automatically on new commits.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds completion synchronization support to ACLK queries to ensure node information is fully transmitted to the cloud before proceeding with ephemeral node cleanup operations. This prevents race conditions where a node might be unregistered before the cloud receives its final state update.

Key changes:

  • Added completion pointer field to aclk_query_t structure to track query completion
  • Introduced send_node_info_with_wait() and send_node_update_with_wait() helper functions for synchronous ACLK operations
  • Updated aclk_host_state_update() and aclk_update_node_info() signatures to accept completion parameters

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/database/sqlite/sqlite_aclk_node.h Declares new helper functions for waiting on node info/state updates
src/database/sqlite/sqlite_aclk_node.c Implements wait helpers and updates build_node_info to accept completion parameter
src/daemon/service.c Uses new wait functions during orphan host cleanup to ensure cloud sync before unregistration
src/daemon/commands.c Uses new wait functions in CLI commands for marking nodes ephemeral
src/aclk/aclk_query_queue.h Adds completion field to aclk_query_t structure
src/aclk/aclk_query_queue.c Marks completion as complete when freeing queries
src/aclk/aclk_contexts_api.h Updates aclk_update_node_info signature to accept completion parameter
src/aclk/aclk_contexts_api.c Passes completion to query for node info updates
src/aclk/aclk.h Updates aclk_host_state_update signature to accept completion parameter
src/aclk/aclk.c Propagates completion through node state update call chain
Comments suppressed due to low confidence (1)

src/aclk/aclk_contexts_api.c:41

  • If payload generation fails (line 39 returns NULL), the QUEUE_IF_PAYLOAD_PRESENT macro will call aclk_query_free(query) which will mark the completion as complete. However, this means the completion will be marked complete even though the node info was never actually sent. Callers of send_node_info_with_wait expect the wait to indicate successful transmission, but this completion will be marked on failure too. Consider either propagating the error to the caller or documenting this behavior clearly.
void aclk_update_node_info(struct update_node_info *info, struct completion *compl)
{
    aclk_query_t *query = aclk_query_new(UPDATE_NODE_INFO);
    query->completion = compl;
    query->data.bin_payload.topic = ACLK_TOPICID_NODE_INFO;
    query->data.bin_payload.payload = generate_update_node_info_message(&query->data.bin_payload.size, info);
    query->data.bin_payload.msg_name = "UpdateNodeInfo";
    QUEUE_IF_PAYLOAD_PRESENT(query);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant