Bug
tests/fixtures/utils.py provides is_port_open() which is used by fixture availability checks (e.g. spark_available(), and similar guards in other fixtures):
def is_port_open(host: str, port: int, timeout: float = 1.0) -> bool:
try:
with socket.create_connection((host, port), timeout=timeout):
return True
except (TimeoutError, OSError):
return False
A naive TCP connect returns True for any service that accepts the connection — including system-level HTTP daemons (enterprise security agents, VPN clients, corporate endpoint tools) that bind common ports and respond with HTTP.
When a fixture then opens a binary protocol connection (Thrift for Spark, etc.) to that port, the driver misinterprets the HTTP response as binary framing data and hangs indefinitely — the test never times out, and the test suite stalls.
Root cause (concrete example)
A Qualys Cloud Agent (agentid-service) binds TCP port 10001 on all interfaces on macOS. It runs as root, so lsof -nP -iTCP:10001 shows nothing without sudo. is_port_open("localhost", 10001) returns True.
pyhive's Thrift binary transport then reads the first 5 bytes of the HTTP response (HTTP/) as a frame header: status byte 0x48 + 4-byte length 0x5454502F = 1,414,676,527 bytes (~1.3 GB). The driver blocks waiting for that data. TCP stays open; the call never returns.
Fix
Replace the naive socket check with a protocol probe that distinguishes binary servers from HTTP interceptors:
def probe_port(host: str, port: int, timeout: float = 0.5) -> str:
"""Classify a TCP port: 'refused', 'http', 'binary', or 'timeout'."""
try:
with socket.create_connection((host, port), timeout=timeout) as sock:
sock.settimeout(timeout)
try:
sock.sendall(b"GET / HTTP/1.0\r\nHost: localhost\r\n\r\n")
data = sock.recv(8)
return "http" if data.startswith(b"HTTP/") else "binary"
except socket.timeout:
return "binary" # binary protocol server — didn't respond to HTTP
except ConnectionRefusedError:
return "refused"
except (TimeoutError, OSError):
return "timeout"
Fixture availability checks for binary protocol servers should use probe_port(host, port) == "binary" instead of is_port_open(host, port).
Two additional helpers are also useful:
find_safe_port(start, end) — scans for the first port returning 'refused' (truly free for compose port mapping)
find_thrift_port(host, start, end) — scans for the first port returning 'binary' (auto-discovers a running binary server without hardcoding the port)
Detection
# Detect HTTP interceptors on a port (no sudo required):
curl -s --connect-timeout 2 http://localhost:PORT/
# If you get any HTTP response, it is not your container.
# See who owns any port (shows root-owned processes too):
netstat -anv | grep "\.PORT "
Reproduced on: macOS 15, Podman Desktop, pyhive 0.7.0 against Spark Thrift Server.
Bug
tests/fixtures/utils.pyprovidesis_port_open()which is used by fixture availability checks (e.g.spark_available(), and similar guards in other fixtures):A naive TCP connect returns
Truefor any service that accepts the connection — including system-level HTTP daemons (enterprise security agents, VPN clients, corporate endpoint tools) that bind common ports and respond with HTTP.When a fixture then opens a binary protocol connection (Thrift for Spark, etc.) to that port, the driver misinterprets the HTTP response as binary framing data and hangs indefinitely — the test never times out, and the test suite stalls.
Root cause (concrete example)
A Qualys Cloud Agent (
agentid-service) binds TCP port 10001 on all interfaces on macOS. It runs as root, solsof -nP -iTCP:10001shows nothing withoutsudo.is_port_open("localhost", 10001)returnsTrue.pyhive's Thrift binary transport then reads the first 5 bytes of the HTTP response (
HTTP/) as a frame header: status byte0x48+ 4-byte length0x5454502F= 1,414,676,527 bytes (~1.3 GB). The driver blocks waiting for that data. TCP stays open; the call never returns.Fix
Replace the naive socket check with a protocol probe that distinguishes binary servers from HTTP interceptors:
Fixture availability checks for binary protocol servers should use
probe_port(host, port) == "binary"instead ofis_port_open(host, port).Two additional helpers are also useful:
find_safe_port(start, end)— scans for the first port returning'refused'(truly free for compose port mapping)find_thrift_port(host, start, end)— scans for the first port returning'binary'(auto-discovers a running binary server without hardcoding the port)Detection
Reproduced on: macOS 15, Podman Desktop, pyhive 0.7.0 against Spark Thrift Server.