The Python SDK for Provably. Adds a deterministic eval layer (verifiable guardrails — distinct from proof verification) to any Python agent: every outbound HTTP call is recorded, every claim handed off to another agent is evaluated against a trusted Provably query record, and the policy edge — which endpoints an agent is allowed to talk to at all — is enforced before the request leaves the process.
- Distribution name:
provably-sdk(PyPI publish pending — see below) - Import name:
provably - Source layout:
src/provably/
- What it does
- Framework coverage
- Install
- Quick start
- Configuration
- The five pillars
- Public API
- Development
- Tests
- Docker
- Security model
- Status
flowchart LR
agent[Your agent]
patch[Provably interceptor<br/>monkey-patched requests + httpx]
reg[("trusted_endpoints<br/><i>your Postgres</i>")]
ext[(External API)]
store[("provably_intercepts<br/><i>your Postgres</i>")]
eval[Eval service<br/>verifiable guardrails]
rec[("Provably query record<br/><i>Provably backend</i>")]
agent -- "requests.get / httpx.post" --> patch
patch -- "1 trusted?" --> reg
reg -- "yes / BLOCKED" --> patch
patch -- "2 request" --> ext
ext -- "3 response" --> patch
patch -- "4 store row" --> store
patch -- "response" --> agent
agent -- "post_handoff(HandoffPayload)" --> eval
eval -- "evaluate_handoff:<br/>fetch query record" --> rec
eval -- "PASS / CAUGHT" --> agent
The flow, in order:
- Intercept + Police — every outbound
requests/httpx/aiohttpcall goes through the SDK's monkey-patched HTTP path. Inside the interceptor, before the request leaves the process, the URL is checked against thetrusted_endpointstable. If the URL is not registered the call is killed withRuntimeError("BLOCKED: ...")and never reaches the network. - Capture + Store — if the endpoint is trusted, the request goes out, the
response is captured (status + headers + raw body), canonicalized, and
inserted by the interceptor into
provably_intercepts. The agent only sees the response after the row is written. - Hand off — when an agent finishes its work it builds a typed
HandoffPayload(oneHandoffClaimper external call, describing what the agent claims about that response) and ships it to the next agent / service viapost_handoff(...). - Eval — the receiving service runs
evaluate_handoff(payload)(the SDK's verifiable-guardrails check; not to be confused with proof verification). For each claim the evaluator pulls the corresponding query record from the Provably backend and runs one of four deterministic comparisons (verbatim,field_extraction,schema_type,range_threshold); the result isPASS,CAUGHT, orERRORper claim.
Nothing in this loop relies on a model self-evaluating its own output.
| Component | Hosted by | Notes |
|---|---|---|
trusted_endpoints table |
You — sits in whatever Postgres POSTGRES_URL points to. |
The SDK ships the schema (ensure_trusted_endpoints_table), the policy check, and CRUD helpers; it does not host the registry. Same DB instance as provably_intercepts. |
provably_intercepts table |
You — same Postgres as above. | Append-only. The interceptor inserts one row per outbound HTTP call, keyed by query_record_id so claims can be linked back. |
| Eval service | You — any HTTP service that calls provably.evaluate_handoff(...) on the incoming payload. |
The SDK gives you the function; you decide where to host it. |
| Provably query record | Provably — fetched over HTTPS by the eval service using the integration_api_key from the handoff payload. |
This is the source of truth the evaluator compares each claim against. |
The interceptor patches the central HTTP transport choke points, so coverage of agent frameworks follows automatically from which library a framework uses under the hood. As of v0.3.0:
Transport patches
| Transport | Patched at |
|---|---|
requests |
module-level get/post + Session.send |
httpx |
module-level get/post + Client.send + AsyncClient.send |
aiohttp |
ClientSession._request (soft dep — patches only when aiohttp is importable) |
botocore / urllib3 |
pending — see issue #10 |
Agent / LLM frameworks
| Framework | Status | Notes |
|---|---|---|
| OpenAI SDK | ✅ | httpx |
| Anthropic SDK | ✅ | httpx |
| Pydantic AI | ✅ | delegates to AsyncOpenAI / AsyncAnthropic |
| LangChain | ✅ | delegates to provider SDKs |
| LangGraph | ✅ | same |
| LlamaIndex | ✅ | httpx via OpenAI SDK |
| AutoGen | ✅ | AsyncOpenAI |
| Haystack | ✅ | migrated to httpx (2024–25) |
| Phidata / Agno | ✅ | AsyncOpenAI / httpx[http2] |
| OpenAI Agents SDK | ✅ | httpx; e2e suite at tests/e2e/test_openai_agents_e2e.py; demo at examples/openai_agents/ |
| Google GenAI | ✅ | httpx default + optional aiohttp extra |
| LiteLLM | ✅ | aiohttp transport (default since v1.71) |
| DSPy | ✅ | LiteLLM only |
| smolagents | ✅ | OpenAI SDK / HF / LiteLLM paths covered |
| CrewAI | OpenAI/Anthropic ✅, LiteLLM fallback ✅, Bedrock provider ❌ (boto3) | |
| AWS Strands | ❌ | boto3/botocore → urllib3; tracked in issue #10 |
Out of scope for the HTTP interception layer (separate shipping units):
MCP servers, in-process LLMs (transformers, mlx_lm), gRPC (Google ADK
A2A), websockets, raw sockets.
Status: v0.2 — not yet published to PyPI. Install from source.
# from source (editable, recommended for now)
git clone [email protected]:ProvablyAI/provably-python-sdk.git
pip install -e ./provably-python-sdk
# or build a wheel
cd provably-python-sdk && python -m build
pip install dist/provably_sdk-0.2.0-py3-none-any.whlWhen PyPI publishing lands the install will become:
pip install provably-sdkThe intended PyPI distribution name is provably-sdk. The import name is
provably. Requires Python 3.11+.
import provably
import requests
# One-call bootstrap: initialize runtime + install interceptor + enable recording
provably.configure_indexing(True)
response = requests.get("https://my-trusted-endpoint.example/data")
record = response.json()
payload = provably.HandoffPayload(
provably_org_id="my-org",
integration_api_key="...",
task="discharge_summary",
claims=[
provably.HandoffClaim(
action_name="lookup_patient",
claimed_value=record,
query_record_id="qr_123",
),
],
)
provably.post_handoff("https://my-eval-service.example", payload)On the eval-service side:
import provably
result = provably.evaluate_handoff(
payload,
provably_base_url="https://api.provably.ai",
)
# outcome is "PASS", "CAUGHT", or "ERROR"
assert result["outcome"] in ("PASS", "CAUGHT", "ERROR")The SDK reads configuration from environment variables. A typed
Provably(api_key=..., org_id=..., ...) client that replaces these globals is
planned (issue #2).
- Sign up at app.provably.ai.
- Create an organisation. Its id is what goes in
PROVABLY_ORG_ID. - In the left-side menu, go to Integrations and create one. The generated key is your
PROVABLY_API_KEY.
Full product docs: provably.ai/docs.
| Variable | Used by | Required |
|---|---|---|
PROVABLY_API_KEY |
initialize_runtime, integration cache |
yes |
PROVABLY_ORG_ID |
initialize_runtime, intercept allow-list |
yes |
PROVABLY_RUST_BE_URL |
initialize_runtime, evaluator |
yes |
POSTGRES_URL |
intercept storage, trusted endpoints, handoff preprocess | yes |
PROVABLY_APP_UI_URL |
optional UI deep-links | no |
PROVABLY_QUERY_RESOLVE_MAX_WAIT_S |
max seconds to wait for a query record to appear (default 15) | no |
POSTGRES_URL is a hard dependency today. Three SDK modules open Postgres
directly (provably.intercept._storage, provably.trusted_endpoints,
provably.handoff._preprocess). Issue
#1 tracks moving
them onto a caller-injected connection and making psycopg2-binary optional.
import provably
# Option A: one-call bootstrap (recommended)
provably.configure_indexing(True) # bootstrap + interceptor + enable
provably.configure_indexing(False) # interceptor only, recording off
# Option B: step-by-step
provably.initialize_runtime() # one-time bootstrap; idempotent per process
provably.init_interceptor() # install monkey-patches for requests + httpx
provably.enable() # turn recording on (default after init_interceptor)initialize_runtime() registers a Provably middleware, onboards the configured
Postgres database, ensures the provably_intercepts collection exists, and
warms an in-memory cache with the integration API key.
configure_indexing(enable_indexing) is the recommended single-call entry point.
Pass True to enable full indexing (bootstrap + intercept + record); pass False
to install the interceptor in passthrough mode (patches installed, recording off).
provably.enable() # default after init_interceptor()
provably.disable() # stop recording (patch stays installed)
provably.is_enabled() # bool
provably.set_interceptor_context( # tag the next intercept rows
agent_id="cluster_a",
action_name="lookup_patient",
intercept_index=0,
)
provably.set_intercept_body_hook( # optional: (intercept_index, raw) -> what the caller sees
lambda _idx, raw: {"user_edited": True},
)
provably.set_intercept_url_allowlist( # scope simulation hook to specific URLs
["https://my-trusted-api.example/v1"],
)The interceptor records every successful requests.get/post and httpx.get/post
into provably_intercepts. The original wire response is stored first; the hook
only affects the object returned to application code, not the stored row.
set_intercept_url_allowlist scopes the simulation body hook to an explicit set
of URLs. Internal SDK calls (bootstrap API, handoff transport) are never passed
to the hook regardless of this setting.
⚠ The interceptor monkey-patches the global
requestsandhttpxmodules. This is intentional — every consumer in the process gets observed automatically — but it means hosts that need a request-scoped opt-out should wrap calls indisable()/enable()blocks.
from provably import HandoffPayload, HandoffClaim, post_handoff
payload = HandoffPayload(
provably_org_id="my-org",
integration_api_key="key",
claims=[HandoffClaim(action_name="get", claimed_value=..., query_record_id="qr_1")],
)
post_handoff("https://my-eval-service.example", payload, headers={"x-trace-id": "abc"})post_handoff POSTs canonical JSON to {base_url}/handoffs/receive and raises
on any non-2xx response.
build_handoff_payload assembles a HandoffPayload automatically from the
interceptor's in-memory state — no manual claim construction needed:
from provably import build_handoff_payload, post_handoff
# fetch_and_claim is the raw JSON dict the LLM emitted
payload = build_handoff_payload(fetch_and_claim, run_id="run-001")
post_handoff("https://your-verifier.example.com", payload)claim_contract generates the system-prompt text that tells an LLM how to
emit the correct HandoffClaim JSON shape:
from provably import claim_contract
system_prompt = claim_contract(
action_names=["lookup_patient", "fetch_records"],
wrapper_fields={"reasoning": "string"},
)default_instructions and field_descriptions give you ready-made
instructions and per-field notes to embed in a receiving-agent prompt:
from provably import default_instructions, field_descriptions
# provably_indexing=True when you want the receiver to call the evaluator
instructions = default_instructions(provably_indexing=True)
guide = field_descriptions(provably_indexing=True)from provably import evaluate_handoff
result = evaluate_handoff(payload, provably_base_url="https://api.provably.ai")
# {"outcome": "PASS" | "CAUGHT" | "ERROR", "per_claim": [...], "errors": [...]}Outcome semantics:
PASS— every claim's content matched its proven indexed value and every proof verified.CAUGHT— at least one claim disagreed with the indexed value or a proof failed.ERROR— the evaluator could not run (missing config, Provably backend unreachable, transient server error). Not evidence of tampering — the system was unhealthy, not the agent.
CAUGHT means the indexed value the evaluator pulled from provably_intercepts doesn't match the claim. In practice when this surprises you, it's almost always one of:
- The tool body never ran.
@function_tool(or any agent-framework decorator) only registers the function — you still need an agent loop (e.g.Runner.run(...)) to invoke it. Bare LLM calls don't execute tools. intercept_context(...)was called withoutwith. It's a context manager; a bare call is a no-op (see the function's docstring).agent_idmismatch. Theagent_idyou pass tointercept_context(...)inside the tool must match theintercept_agent_idyou pass tobuild_handoff_payload(...)(default"fetch_and_claim"). Mismatch → the lookup misses → emptyrequest_payload.- Wrong row-id helper. Use
get_intercept_row_id(agent_id, action_name)to pick the row tagged with your action.take_last_intercept_row_id()returns the globally last insert (typically the final LLM POST), which is rarely what you want.
Comparison modes (the VerificationMode type):
| Mode | Comparison |
|---|---|
verbatim |
Canonical-JSON equality between claimed_value and the indexed payload (or json_path slice). |
field_extraction |
Equality on the value at json_path only. |
schema_type |
claimed_value is ignored; the value at json_path is validated against expected_json_schema. |
range_threshold |
Numeric claimed_value must equal the indexed numeric and lie in [range_min, range_max]. |
from provably import outcome_from_trace, aggregate_outcome
# Extract verdict from a raw evaluate_handoff result dict
verdict = outcome_from_trace(result) # "PASS" | "CAUGHT" | None
# Roll up verification_results from a HandoffPayload
verdict = aggregate_outcome(payload) # "PASS" | "CAUGHT"import psycopg2
from provably import (
is_trusted_endpoint,
list_trusted_endpoints,
normalize_url_for_trust,
ensure_trusted_endpoints_table,
)
conn = psycopg2.connect("...")
ensure_trusted_endpoints_table(conn)
ok = is_trusted_endpoint("https://api.example.com/v1/data", "my-org", conn)
rows = list_trusted_endpoints(conn, "my-org")The registry is a single Postgres table (DDL embedded; created on first use).
URLs are normalized (lowercase scheme + host, default ports collapsed, trailing
slash dropped) before any read or write so that https://API.EXAMPLE.COM/x/
and https://api.example.com/x collide on the same row.
Concrete URLs match exactly. To authorize a family of URLs with a single entry —
useful for templated routes like /customers/{id} or runtime-generated ids —
register the URL with FastAPI/Express-style placeholders:
| Placeholder | Matches | Example |
|---|---|---|
{name} |
exactly one path segment (no /) |
https://api.example.com/customers/{id} matches …/customers/42 but not …/customers/42/orders |
{name:path} |
any subtree (including / separators) |
https://api.example.com/customers/{rest:path} matches both …/customers/42 and …/customers/42/orders |
The placeholder name (id, rest, …) is purely descriptive and does not affect
matching. Plain URLs without { characters keep exact-match semantics — no
behavior change for existing entries.
-- Register a templated route once instead of enumerating every concrete id
INSERT INTO trusted_endpoints (org_id, normalized_url, display_label, entry_type)
VALUES ('my-org', 'https://api.example.com/customers/{id}', 'Customers (by id)', 'endpoint');is_trusted_endpoint and the snapshot tamper-check inside evaluate_handoff
both honor the same matching rules, so a claim against …/customers/42 will
pass both gates when only the templated entry is registered.
All public symbols are re-exported from the top-level provably namespace. See
src/provably/__init__.py for the full list.
from provably import (
# init
initialize_runtime,
configure_indexing,
# intercept
init_interceptor, enable, disable, is_enabled,
set_interceptor_context, set_intercept_body_hook,
set_intercept_url_allowlist, take_last_intercept_row_id,
# handoff types
HandoffPayload, HandoffClaim, HandoffProofAction, HandoffProofBundle,
BenchmarkRow, Outcome, VerificationMode,
# handoff transport
post_handoff,
# handoff builders
build_handoff_payload, DEFAULT_HANDOFF_TASK,
claim_contract, default_instructions, field_descriptions,
# eval
evaluate_handoff, extract_indexed_from_query_record,
outcome_from_trace, aggregate_outcome,
# trusted endpoints
is_trusted_endpoint, list_trusted_endpoints,
check_claim_endpoints_are_trusted, normalize_url_for_trust,
ensure_trusted_endpoints_table,
)git clone [email protected]:ProvablyAI/provably-python-sdk.git
cd provably-python-sdk
uv sync --extra devuv run ruff check .
uv run pytest
python -m build # wheel + sdist into ./dist/The SDK has no fastapi, langgraph, or LLM-vendor dependencies, and CI
should keep it that way — see docs/architecture.md
for the dependency rules.
The suite is split into two layers:
tests/
unit/ # fast, hermetic, mocks for httpx + psycopg2
e2e/ # drives real requests + httpx against a loopback HTTP server
uv run pytest tests/unit # ~0.2 s
uv run pytest tests/e2e # ~5 s (real http.server on a loopback port)
uv run pytest # both
uv run pytest -m "not e2e" # unit-equivalent inner loopE2E tests register routes on a per-test FakeHttpServer and drive the real
requests / httpx patches against it. The Postgres-touching storage layer
is patched per-test, so the suite stays hermetic and runs without a live
database.
The repo ships a multi-stage Dockerfile. Three build targets are exposed:
| Target | What it produces |
|---|---|
builder |
Wheel + sdist in /dist (used by the other stages, not run alone). |
test |
Wheel installed + dev tools; CMD runs ruff check && pytest -q. |
runtime |
Slim image with only the wheel; CMD smoke-imports provably. |
Run the full lint + test suite in a container:
docker build --target test -t provably-sdk:test .
docker run --rm provably-sdk:testSmoke-import the runtime image:
docker build --target runtime -t provably-sdk:runtime .
docker run --rm provably-sdk:runtimeUse docker-compose.yml for local development runs (no database required —
tests are hermetic):
docker compose run --rm sdk # ruff + pytest
docker compose run --rm sdk pytest -q -m e2e # only e2e tests- The interceptor monkey-patches the global
requests.get/postandhttpx.get/post. Hosts in process control are observed automatically; subprocesses and other languages are not. - Trusted-endpoint enforcement happens before any row is inserted into
provably_intercepts. AGETto an unlisted URL raisesRuntimeError("BLOCKED: ...")and never reaches Postgres. - The evaluator pulls query records over HTTPS using
x-api-keyfrom the payload'sintegration_api_key. Revoking that key revokes eval access for all in-flight handoffs.
v0.2 — see CHANGELOG.md. License: Proprietary —
see LICENSE.md.