fix(gui-client): explicitly catch panics inside Handler::run#12028
fix(gui-client): explicitly catch panics inside Handler::run#12028thomaseizinger wants to merge 2 commits intomainfrom
Handler::run#12028Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
| #[cfg(not(test))] | ||
| let device_id = | ||
| device_id::get_or_create_client().context("Failed to read / create device ID")?; | ||
|
|
||
| #[cfg(test)] | ||
| let device_id = device_id::DeviceId::test(); |
There was a problem hiding this comment.
I am not super happy about this but I think it strikes a good trade-off. We don't want to create the device ID outside because ipc_listen is already the top-level abstraction that is shared across Linux and Windows. Yet, inside the test, we don't want to interact with the local ID because it requires elevated permissions.
So even though it is not pretty, I think a bit a cfg code here strikes a nice balance.
There was a problem hiding this comment.
Pull request overview
This PR enhances the robustness of the GUI client tunnel service by adding panic handling to the Handler::run method. When a panic occurs in the handler, instead of terminating the entire process, the service now catches the panic, logs an error, and restarts the IPC listening loop, allowing new GUI clients to connect.
Changes:
- Added panic catching with
AssertUnwindSafeandcatch_unwindaroundHandler::runto prevent process termination - Introduced
SocketIdparameter toipc_listenfor testability - Added test infrastructure including
DeviceId::test()helper andClientMsg::Panicvariant - Improved span instrumentation by using
.instrument()instead of.entered()for async functions
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| rust/libs/bin-shared/src/device_id.rs | Added test() helper method to create test DeviceId instances |
| rust/gui-client/src-tauri/src/service/windows.rs | Updated to pass SocketId::Tunnel parameter to ipc_listen |
| rust/gui-client/src-tauri/src/service/linux.rs | Updated to pass SocketId::Tunnel parameter to ipc_listen |
| rust/gui-client/src-tauri/src/service.rs | Core changes: panic handling, test support, conditional device_id creation, span instrumentation improvements, and new test case |
| #[tokio::test] | ||
| async fn panic_inside_handler_doesnt_interrupt_service() { | ||
| let _guard = logging::test("debug"); | ||
|
|
||
| let id = SocketId::Test(rand::random()); | ||
|
|
||
| let handle = tokio::spawn(async move { | ||
| let (_, log_filter_reloader) = logging::try_filter::<()>("info").unwrap(); | ||
| let mut signals = signals::Terminate::new().unwrap(); | ||
|
|
||
| ipc_listen( | ||
| DnsControlMethod::default(), | ||
| &log_filter_reloader, | ||
| id, | ||
| &mut signals, | ||
| ) | ||
| .await | ||
| }); | ||
|
|
||
| let (_, mut tx) = ipc::connect::<ServerMsg, ClientMsg>(id, ipc::ConnectOptions::default()) | ||
| .await | ||
| .unwrap(); | ||
|
|
||
| tx.send(&ClientMsg::Panic).await.unwrap(); | ||
|
|
||
| let _ = tokio::time::timeout(Duration::from_secs(1), handle) | ||
| .await | ||
| .unwrap_err(); // We want to timeout because that means the task is still running. | ||
| } |
There was a problem hiding this comment.
The test verifies that the service task continues running after a panic, but it doesn't verify that the service can accept a new connection. Consider adding an assertion that a second client can successfully connect after the panic to fully validate the fix. This would demonstrate that the IPC listening loop has truly restarted and is ready to accept new connections.
| impl DeviceId { | ||
| pub fn test() -> Self { | ||
| Self { | ||
| id: String::new(), | ||
| source: Source::HardwareId, | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The DeviceId::test() method is missing a #[cfg(test)] attribute. Since this method is only used for testing purposes, it should be conditionally compiled to avoid including test-only code in production builds.
Right now, when
Handler::runwithin the tunnel process panics, we take down the whole process. Unless the OS or the user restarts the process, they won't be able to launch Firezone anymore. Note that we also already have a panic-catching boundary for connlib itself (which also lives inside the tunnel process).Even though these panics should be rare, they do happen and in that case, it is a bad user experience. To fix that, we now explicitly catch these panics, log an error and restart the IPC listening loop, allowing a new instance of the GUI client to connect.