Skip to content

fix(gui-client): explicitly catch panics inside Handler::run#12028

Open
thomaseizinger wants to merge 2 commits intomainfrom
fix/gui-client-service-crash
Open

fix(gui-client): explicitly catch panics inside Handler::run#12028
thomaseizinger wants to merge 2 commits intomainfrom
fix/gui-client-service-crash

Conversation

@thomaseizinger
Copy link
Member

@thomaseizinger thomaseizinger commented Feb 5, 2026

Right now, when Handler::run within the tunnel process panics, we take down the whole process. Unless the OS or the user restarts the process, they won't be able to launch Firezone anymore. Note that we also already have a panic-catching boundary for connlib itself (which also lives inside the tunnel process).

Even though these panics should be rare, they do happen and in that case, it is a bad user experience. To fix that, we now explicitly catch these panics, log an error and restart the IPC listening loop, allowing a new instance of the GUI client to connect.

@vercel
Copy link

vercel bot commented Feb 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
firezone Ready Ready Preview, Comment Feb 5, 2026 5:26am

Request Review

Comment on lines +122 to +127
#[cfg(not(test))]
let device_id =
device_id::get_or_create_client().context("Failed to read / create device ID")?;

#[cfg(test)]
let device_id = device_id::DeviceId::test();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not super happy about this but I think it strikes a good trade-off. We don't want to create the device ID outside because ipc_listen is already the top-level abstraction that is shared across Linux and Windows. Yet, inside the test, we don't want to interact with the local ID because it requires elevated permissions.

So even though it is not pretty, I think a bit a cfg code here strikes a nice balance.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the robustness of the GUI client tunnel service by adding panic handling to the Handler::run method. When a panic occurs in the handler, instead of terminating the entire process, the service now catches the panic, logs an error, and restarts the IPC listening loop, allowing new GUI clients to connect.

Changes:

  • Added panic catching with AssertUnwindSafe and catch_unwind around Handler::run to prevent process termination
  • Introduced SocketId parameter to ipc_listen for testability
  • Added test infrastructure including DeviceId::test() helper and ClientMsg::Panic variant
  • Improved span instrumentation by using .instrument() instead of .entered() for async functions

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
rust/libs/bin-shared/src/device_id.rs Added test() helper method to create test DeviceId instances
rust/gui-client/src-tauri/src/service/windows.rs Updated to pass SocketId::Tunnel parameter to ipc_listen
rust/gui-client/src-tauri/src/service/linux.rs Updated to pass SocketId::Tunnel parameter to ipc_listen
rust/gui-client/src-tauri/src/service.rs Core changes: panic handling, test support, conditional device_id creation, span instrumentation improvements, and new test case

Comment on lines +830 to +858
#[tokio::test]
async fn panic_inside_handler_doesnt_interrupt_service() {
let _guard = logging::test("debug");

let id = SocketId::Test(rand::random());

let handle = tokio::spawn(async move {
let (_, log_filter_reloader) = logging::try_filter::<()>("info").unwrap();
let mut signals = signals::Terminate::new().unwrap();

ipc_listen(
DnsControlMethod::default(),
&log_filter_reloader,
id,
&mut signals,
)
.await
});

let (_, mut tx) = ipc::connect::<ServerMsg, ClientMsg>(id, ipc::ConnectOptions::default())
.await
.unwrap();

tx.send(&ClientMsg::Panic).await.unwrap();

let _ = tokio::time::timeout(Duration::from_secs(1), handle)
.await
.unwrap_err(); // We want to timeout because that means the task is still running.
}
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test verifies that the service task continues running after a panic, but it doesn't verify that the service can accept a new connection. Consider adding an assertion that a second client can successfully connect after the panic to fully validate the fix. This would demonstrate that the IPC listening loop has truly restarted and is ready to accept new connections.

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +35
impl DeviceId {
pub fn test() -> Self {
Self {
id: String::new(),
source: Source::HardwareId,
}
}
}
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DeviceId::test() method is missing a #[cfg(test)] attribute. Since this method is only used for testing purposes, it should be conditionally compiled to avoid including test-only code in production builds.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant