Take the proposed resolution with a pinch of salt — drafted quickly with AI assistance. The diagnosis (sync websocket call blocking the event loop) is verified against live logs, but the exact fix shape deserves maintainer review.
Summary
send_nostr_dm() in lnbits/core/services/nostr.py uses the synchronous websocket.create_connection() inside an async function and iterates relays sequentially. One unreachable or slow relay blocks the asyncio event loop for up to ~30 seconds per attempt, making NWC Provider and other WebSocket-based extensions unresponsive during that time.
This is a separate event-loop-starvation bug from #3917 / #3918 (IN_FLIGHT payment polling). After applying the #3918 fix, this became the next visible bottleneck.
Offending code
lnbits/core/services/nostr.py, line ~30:
ws_connections: list[WebSocket] = []
for relay in relays:
try:
ws = create_connection(relay, timeout=2) # sync call in async fn
ws.send(notification)
ws_connections.append(ws)
except Exception as e:
logger.warning(f"Error sending notification to relay {relay}: {e}")
await asyncio.sleep(1)
Issues:
create_connection() is synchronous — it blocks the event loop on TCP connect + TLS handshake
- The
timeout=2 is not always honored (we observe 30s blocks in production against wss://relay.snort.social and wss://relay.nostr.band)
- Relays are tried sequentially — 3 dead relays = 3 × blocking time added up
ws.send() is also sync
With 3 dead relays in the user's configured list, a single send_nostr_dm() call can block the event loop for 60–90 seconds.
Symptoms
- NWC
list_transactions, get_balance, pay_invoice intermittently time out
docker logs repeatedly shows:
WARNING | Error sending notification to relay wss://nostr.wine: Handshake status 403 Forbidden
WARNING | Error sending notification to relay wss://relay.snort.social: Connection timed out
WARNING | Error sending notification to relay wss://relay.nostr.band: timed out
- Web UI works (HTTP served separately), masking the issue
- Only recovery from a pile-up is
docker restart
Proposed fix
Two options (either or both):
A. Offload sync calls via asyncio.to_thread and run concurrently:
async def _publish(relay: str):
try:
ws = await asyncio.wait_for(
asyncio.to_thread(create_connection, relay, timeout=2),
timeout=3,
)
await asyncio.to_thread(ws.send, notification)
return ws
except Exception as e:
logger.warning(f"Error sending notification to relay {relay}: {e}")
return None
results = await asyncio.gather(*(_publish(r) for r in relays))
ws_connections = [ws for ws in results if ws]
B. Switch to an async websocket library (websockets or aiohttp) — larger change but removes the sync/async footgun entirely.
Option A is minimally invasive and fixes the starvation. Happy to open a PR with that approach.
Related
Environment
- LNbits v1.5.3
- LNDRest backend
- Python 3.x, asyncio
Summary
send_nostr_dm()inlnbits/core/services/nostr.pyuses the synchronouswebsocket.create_connection()inside an async function and iterates relays sequentially. One unreachable or slow relay blocks the asyncio event loop for up to ~30 seconds per attempt, making NWC Provider and other WebSocket-based extensions unresponsive during that time.This is a separate event-loop-starvation bug from #3917 / #3918 (IN_FLIGHT payment polling). After applying the #3918 fix, this became the next visible bottleneck.
Offending code
lnbits/core/services/nostr.py, line ~30:Issues:
create_connection()is synchronous — it blocks the event loop on TCP connect + TLS handshaketimeout=2is not always honored (we observe 30s blocks in production againstwss://relay.snort.socialandwss://relay.nostr.band)ws.send()is also syncWith 3 dead relays in the user's configured list, a single
send_nostr_dm()call can block the event loop for 60–90 seconds.Symptoms
list_transactions,get_balance,pay_invoiceintermittently time outdocker logsrepeatedly shows:docker restartProposed fix
Two options (either or both):
A. Offload sync calls via
asyncio.to_threadand run concurrently:B. Switch to an async websocket library (
websocketsoraiohttp) — larger change but removes the sync/async footgun entirely.Option A is minimally invasive and fixes the starvation. Happy to open a PR with that approach.
Related
lnbits/core/services/nostr.pyEnvironment