Skip to content

Conversation

@Karakatiza666
Copy link
Contributor

Update the design of user popup

Add health indicator to user popup

@Karakatiza666
Copy link
Contributor Author

image image image image

@Karakatiza666 Karakatiza666 requested review from mihaibudiu and removed request for snkas January 5, 2026 16:44
@Karakatiza666 Karakatiza666 marked this pull request as ready for review January 5, 2026 16:44
Copilot AI review requested due to automatic review settings January 5, 2026 16:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a Cluster Health monitoring page and updates the user profile popup design. The changes include a new health monitoring dashboard with event timeline visualization, health status indicators in the user popup menu, and various UI refinements.

Key changes:

  • New Cluster Health page (/health) with event timeline and incident tracking
  • Health status indicator added to user popup with visual indicators for API, compiler, and runner status
  • Updated user popup design with reorganized menu structure and health navigation

Reviewed changes

Copilot reviewed 23 out of 34 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
+page.svelte (health) New health monitoring page with event timeline, filtering tabs, and incident logging
pipelineManager.ts Added API endpoints for cluster event retrieval
health.ts Core health event processing logic including event grouping and status conversion
date.ts Added ceilToHour utility for time calculations
array.ts Extended comparison and groupBy functions to support Date types
usePipelineManager.svelte.ts Integrated cluster event API methods
useClusterHealth.svelte.ts New composition for polling and managing cluster health status
VersionDisplay.svelte Updated styling with logo integration
TabPerformance.svelte Minor formatting cleanup
DarkModeSwitch.svelte Added className prop for styling flexibility
PipelineConfigurationsPopup.svelte Refactored inline handler to named function
AppHeader.svelte Integrated health status display
HealthEventDetails.svelte New component for displaying detailed health event information
EventLogList.svelte New component for incident log display with filtering
DangerDialog.svelte Minor color adjustment
ProfilePopupMenu.svelte Removed (functionality consolidated into ProfileButton)
ProfileButton.svelte Redesigned with health indicator, reorganized menu, and consolidated profile display
CurrentTenant.svelte Enhanced to handle single-tenant display differently
StatusTimeline.svelte New component for visual timeline representation of health events
CLAUDE.md New documentation placeholder
SVG/font files Added new icons (log-out, key, chevron-left) and updated icon fonts
cluster-monitoring.md Updated documentation to clarify event sorting order
Comments suppressed due to low confidence (3)

js-packages/web-console/src/routes/(system)/(authenticated)/health/+page.svelte:1

  • Commented-out filter code should either be removed or properly documented if it's intended for future use. Leaving commented code can cause confusion.
    js-packages/web-console/src/lib/functions/pipelines/health.ts:1
  • Corrected spelling of 'abscence' to 'absence'.
    js-packages/web-console/src/routes/(system)/(authenticated)/health/+page.svelte:1
  • Events are being filtered twice - once in the StatusTimeline component and once in splitClusterEvents. Consider filtering once and passing the filtered results to both components to avoid redundant computation.

@Karakatiza666
Copy link
Contributor Author

image

@Karakatiza666 Karakatiza666 force-pushed the monitoring-ui branch 2 times, most recently from d5e70c6 to 34186a4 Compare January 5, 2026 17:29
Copy link
Contributor

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we trying to replace datadog or some other similar tools?
I hope we can avoid doing that.


export interface PipelineEvent {
type: PipelineEventType
timestamp: string // ISO 8601 date string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you define a type for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove this type, it is actually not needed currently

return { ...message, text }
})
)
const healthMessage = $derived(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are multiple issues? Why is only the first one important?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will want to visit the health page if there are any issues with the cluster components. I figured just displaying one of them is sufficient - simpler UX (multiple things don't scream at you at the same time) and (slightly) code.

}
export interface TimelineGroup {
startTime: number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these times? How do they differ from the timestamps in the events?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiple non-healthy events are reported back-to-back they are visually grouped in a single prolonged event with startTime and endTime, while you can still click through the individual events in a group

events: TimelineEvent[]
}
interface Props {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does Props mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Common abbreviation in Svelte ecosystem meaning Svelte Component Properties - passed from parent component

events: TimelineEvent[]
startAt: Date
endAt: Date
unitDurationMs: number // milliseconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you document these fields?

type: toEventType(e.api_status),
description:
e.api_status === 'Healthy'
? 'The API server is healthy.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is" or "was"? This may be in the past

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here specifically we expect a non-healthy event to be ongoing, so the wording reflects that

}

/**
* Groups sequential unhealthy events with the same tag as a single incident based on the abscence of healthy events between them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absence

// segment that can be "open". So: active = (we ended with a non-empty current).
//
// Implementation detail: we already pushed current if non-empty. To know if last segment is open,
// we can re-check whether the last event in sorted is non-healthy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in sorted order

}

// 6) Global sort by timestampFrom
allGroups.sort((a, b) => a.timestampFrom.getTime() - b.timestampFrom.getTime())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why aren't they sorted by construction?

<button
onclick={() => (drawer.value = !drawer.value)}
class="fd fd-book-open btn-icon flex preset-tonal-surface text-[20px]"
aria-label="Open extras drawer"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aria-label? It is a recommended feature to enhance accessibility - it provides a human-readable description to narration tools for interactive UI elements that don't have visually readable text

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"open extras drawer" does not strike me as something that helps blind users

@snkas
Copy link
Contributor

snkas commented Jan 6, 2026

Are we trying to replace datadog or some other similar tools? I hope we can avoid doing that.

@mihaibudiu We are limiting the scope of this as much as possible -- the goal is to have the user be made aware of whether any of the components are operational, and whether they have been in the recent past.

@Karakatiza666 Thanks for the updates, I think with the individual bars it looks clearer.

  • The incident popup on the right looks really great, it's a satisfying slide to investigate further
  • The most important in terms of functionality is that all events are shown, and they can be individually investigated. This needs to be a direct mapping, without any filtering done client side. We want to prevent that some events cannot be find. I would suggest either keeping the current bucketing, but when you hover over it you see the timestamps of all the events in the bucket.
  • Each bucket can be 1h, so there would be 72 bars. If there is no event in the bucket, it should be gray but still be there.
  • Also healthy events are interesting to investigate
  • When I trigger an error in the compiler (e.g., just adding panic!("") in src/compiler/main.rs), and then restart again without it it to recover, it does not show the new event. This indicates there is some logic doing filtering, which we want to avoid.
  • Event timestamps should just be their exact timestamps, not ranges. On the bar x-axis legend, there could be on the left "72h ago" and on the right "Today"

@Karakatiza666
Copy link
Contributor Author

We discussed the UX with Simon at length, but did not come to a single conclusion.

The most straightforward adjustments for the current PR are:

  • Show gray bars when data is missing
  • Add a toggle (off by default) to display healthy intervals in the events log list
  • Add per-hour bucketing of both healthy and non-healthy intervals
  • Highlight the incidents related to the selected bar in the time series

Update the design of user popup

Add health indicator to user popup

Signed-off-by: Karakatiza666 <[email protected]>
… cross-selection between timeline and incidents

Signed-off-by: Karakatiza666 <[email protected]>
Copy link
Contributor

@snkas snkas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar enough with Web Console code to review it, but the UI looks great and works well as far as I can test. Only one request: instead of "No telemetry" for the gray bars label, can it be "No data"?

@Karakatiza666
Copy link
Contributor Author

Screencast.from.2026-01-09.19-50-15.webm

@gz
Copy link
Contributor

gz commented Jan 9, 2026

I can't see the mouse in the video but it made me wonder what those progress bars that pop up and disappear on the right frame represent?

How do we distinguish between service degradation and major issue?

@Karakatiza666
Copy link
Contributor Author

Karakatiza666 commented Jan 9, 2026

those progress bars that pop up and disappear

They indicate on-demand loading of per-event data. Once the content loads, the progress bar disappears and the text content is shown (blank space in the video)

@Karakatiza666
Copy link
Contributor Author

Karakatiza666 commented Jan 9, 2026

How do we distinguish between service degradation and major issue?

We don't really yet, once we do it'll be backend-driven

@Karakatiza666 Karakatiza666 added this pull request to the merge queue Jan 9, 2026
@Karakatiza666 Karakatiza666 removed this pull request from the merge queue due to a manual request Jan 9, 2026
@Karakatiza666 Karakatiza666 added this pull request to the merge queue Jan 9, 2026
Merged via the queue into main with commit 977bdd1 Jan 9, 2026
1 check passed
@Karakatiza666 Karakatiza666 deleted the monitoring-ui branch January 9, 2026 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants