-
Notifications
You must be signed in to change notification settings - Fork 92
Implement Cluster Health page #5336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d20178c to
8d73e8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a Cluster Health monitoring page and updates the user profile popup design. The changes include a new health monitoring dashboard with event timeline visualization, health status indicators in the user popup menu, and various UI refinements.
Key changes:
- New Cluster Health page (
/health) with event timeline and incident tracking - Health status indicator added to user popup with visual indicators for API, compiler, and runner status
- Updated user popup design with reorganized menu structure and health navigation
Reviewed changes
Copilot reviewed 23 out of 34 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
+page.svelte (health) |
New health monitoring page with event timeline, filtering tabs, and incident logging |
pipelineManager.ts |
Added API endpoints for cluster event retrieval |
health.ts |
Core health event processing logic including event grouping and status conversion |
date.ts |
Added ceilToHour utility for time calculations |
array.ts |
Extended comparison and groupBy functions to support Date types |
usePipelineManager.svelte.ts |
Integrated cluster event API methods |
useClusterHealth.svelte.ts |
New composition for polling and managing cluster health status |
VersionDisplay.svelte |
Updated styling with logo integration |
TabPerformance.svelte |
Minor formatting cleanup |
DarkModeSwitch.svelte |
Added className prop for styling flexibility |
PipelineConfigurationsPopup.svelte |
Refactored inline handler to named function |
AppHeader.svelte |
Integrated health status display |
HealthEventDetails.svelte |
New component for displaying detailed health event information |
EventLogList.svelte |
New component for incident log display with filtering |
DangerDialog.svelte |
Minor color adjustment |
ProfilePopupMenu.svelte |
Removed (functionality consolidated into ProfileButton) |
ProfileButton.svelte |
Redesigned with health indicator, reorganized menu, and consolidated profile display |
CurrentTenant.svelte |
Enhanced to handle single-tenant display differently |
StatusTimeline.svelte |
New component for visual timeline representation of health events |
CLAUDE.md |
New documentation placeholder |
| SVG/font files | Added new icons (log-out, key, chevron-left) and updated icon fonts |
cluster-monitoring.md |
Updated documentation to clarify event sorting order |
Comments suppressed due to low confidence (3)
js-packages/web-console/src/routes/(system)/(authenticated)/health/+page.svelte:1
- Commented-out filter code should either be removed or properly documented if it's intended for future use. Leaving commented code can cause confusion.
js-packages/web-console/src/lib/functions/pipelines/health.ts:1 - Corrected spelling of 'abscence' to 'absence'.
js-packages/web-console/src/routes/(system)/(authenticated)/health/+page.svelte:1 - Events are being filtered twice - once in the StatusTimeline component and once in splitClusterEvents. Consider filtering once and passing the filtered results to both components to avoid redundant computation.
js-packages/web-console/src/lib/compositions/health/useClusterHealth.svelte.ts
Outdated
Show resolved
Hide resolved
js-packages/web-console/src/lib/components/auth/ProfileButton.svelte
Outdated
Show resolved
Hide resolved
js-packages/web-console/src/lib/components/health/EventLogList.svelte
Outdated
Show resolved
Hide resolved
ffb9c3d to
75c4fa5
Compare
d5e70c6 to
34186a4
Compare
mihaibudiu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we trying to replace datadog or some other similar tools?
I hope we can avoid doing that.
|
|
||
| export interface PipelineEvent { | ||
| type: PipelineEventType | ||
| timestamp: string // ISO 8601 date string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you define a type for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove this type, it is actually not needed currently
| return { ...message, text } | ||
| }) | ||
| ) | ||
| const healthMessage = $derived( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there are multiple issues? Why is only the first one important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will want to visit the health page if there are any issues with the cluster components. I figured just displaying one of them is sufficient - simpler UX (multiple things don't scream at you at the same time) and (slightly) code.
| } | ||
| export interface TimelineGroup { | ||
| startTime: number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are these times? How do they differ from the timestamps in the events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If multiple non-healthy events are reported back-to-back they are visually grouped in a single prolonged event with startTime and endTime, while you can still click through the individual events in a group
| events: TimelineEvent[] | ||
| } | ||
| interface Props { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does Props mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Common abbreviation in Svelte ecosystem meaning Svelte Component Properties - passed from parent component
| events: TimelineEvent[] | ||
| startAt: Date | ||
| endAt: Date | ||
| unitDurationMs: number // milliseconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you document these fields?
| type: toEventType(e.api_status), | ||
| description: | ||
| e.api_status === 'Healthy' | ||
| ? 'The API server is healthy.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"is" or "was"? This may be in the past
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here specifically we expect a non-healthy event to be ongoing, so the wording reflects that
| } | ||
|
|
||
| /** | ||
| * Groups sequential unhealthy events with the same tag as a single incident based on the abscence of healthy events between them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absence
| // segment that can be "open". So: active = (we ended with a non-empty current). | ||
| // | ||
| // Implementation detail: we already pushed current if non-empty. To know if last segment is open, | ||
| // we can re-check whether the last event in sorted is non-healthy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in sorted order
| } | ||
|
|
||
| // 6) Global sort by timestampFrom | ||
| allGroups.sort((a, b) => a.timestampFrom.getTime() - b.timestampFrom.getTime()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why aren't they sorted by construction?
| <button | ||
| onclick={() => (drawer.value = !drawer.value)} | ||
| class="fd fd-book-open btn-icon flex preset-tonal-surface text-[20px]" | ||
| aria-label="Open extras drawer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aria-label? It is a recommended feature to enhance accessibility - it provides a human-readable description to narration tools for interactive UI elements that don't have visually readable text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"open extras drawer" does not strike me as something that helps blind users
@mihaibudiu We are limiting the scope of this as much as possible -- the goal is to have the user be made aware of whether any of the components are operational, and whether they have been in the recent past. @Karakatiza666 Thanks for the updates, I think with the individual bars it looks clearer.
|
|
We discussed the UX with Simon at length, but did not come to a single conclusion. The most straightforward adjustments for the current PR are:
|
Update the design of user popup Add health indicator to user popup Signed-off-by: Karakatiza666 <[email protected]>
Signed-off-by: Karakatiza666 <[email protected]>
Signed-off-by: Karakatiza666 <[email protected]>
… cross-selection between timeline and incidents Signed-off-by: Karakatiza666 <[email protected]>
34186a4 to
64b4e4c
Compare
snkas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar enough with Web Console code to review it, but the UI looks great and works well as far as I can test. Only one request: instead of "No telemetry" for the gray bars label, can it be "No data"?
Screencast.from.2026-01-09.19-50-15.webm |
…ack navigation Signed-off-by: Karakatiza666 <[email protected]>
|
I can't see the mouse in the video but it made me wonder what those progress bars that pop up and disappear on the right frame represent? How do we distinguish between service degradation and major issue? |
They indicate on-demand loading of per-event data. Once the content loads, the progress bar disappears and the text content is shown (blank space in the video) |
We don't really yet, once we do it'll be backend-driven |
Signed-off-by: Karakatiza666 <[email protected]>
b8cccb0 to
c44a923
Compare





Update the design of user popup
Add health indicator to user popup