Offload leaf search work to AWS Lambda functions#6157
Offload leaf search work to AWS Lambda functions#6157fulmicoton-dd wants to merge 1 commit intomainfrom
Conversation
63a7c92 to
0441e5f
Compare
There was a problem hiding this comment.
Pull request overview
Adds an opt-in AWS Lambda “overflow” execution path for leaf split search to handle traffic spikes without adding more searcher nodes, including auto-deploy of the Lambda binary and per-split result integration into the existing partial result cache / incremental merge flow.
Changes:
- Introduces
quickwit-lambda-client(invocation + auto-deploy + metrics) andquickwit-lambda-server(Lambda handler running Quickwit leaf search). - Extends searcher configuration/context to support Lambda offloading, and updates leaf-search scheduling to split work between local permits and Lambda batches.
- Adds protobuf support for batched per-split responses plus docs and CI workflow for publishing the Lambda binary.
Reviewed changes
Copilot reviewed 43 out of 45 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| quickwit/quickwit-storage/src/cache/memory_sized_cache.rs | Adds a regression test for CacheConfig::no_cache() behavior. |
| quickwit/quickwit-serve/src/lib.rs | Initializes Lambda invoker on startup when searcher+lambda are configured. |
| quickwit/quickwit-serve/Cargo.toml | Adds dependency on quickwit-lambda-client. |
| quickwit/quickwit-search/src/tests.rs | Updates tests for SearcherContext::new(..., lambda_invoker) signature changes. |
| quickwit/quickwit-search/src/service.rs | Extends SearcherContext to carry an optional Lambda invoker. |
| quickwit/quickwit-search/src/search_permit_provider.rs | Adds offload-aware permit acquisition logic (threshold-based truncation). |
| quickwit/quickwit-search/src/root.rs | Minor tracing import/use adjustment. |
| quickwit/quickwit-search/src/list_terms.rs | Adjusts permit sizing collection (prep for offload-aware behavior). |
| quickwit/quickwit-search/src/lib.rs | Exposes new invoker module and re-exports LambdaLeafSearchInvoker. |
| quickwit/quickwit-search/src/leaf_cache.rs | Minor whitespace cleanup. |
| quickwit/quickwit-search/src/leaf.rs | Implements local-vs-Lambda scheduling, batching, parallel execution, and merge integration. |
| quickwit/quickwit-search/src/invoker.rs | Introduces LambdaLeafSearchInvoker trait abstraction. |
| quickwit/quickwit-proto/src/error.rs | Updates error header doc comment wording. |
| quickwit/quickwit-proto/src/codegen/quickwit/quickwit.search.rs | Adds LeafSearchResponses wrapper message to generated code. |
| quickwit/quickwit-proto/protos/quickwit/search.proto | Adds LeafSearchResponses proto definition. |
| quickwit/quickwit-lambda/README.md | Removes old deprecation stub text. |
| quickwit/quickwit-lambda-server/src/lib.rs | Defines Lambda server crate exports/modules. |
| quickwit/quickwit-lambda-server/src/handler.rs | Implements Lambda handler: decode request, run per-split searches, encode responses. |
| quickwit/quickwit-lambda-server/src/error.rs | Adds Lambda error types and conversions. |
| quickwit/quickwit-lambda-server/src/context.rs | Builds Lambda-optimized SearcherConfig from env and sets caches to no_cache. |
| quickwit/quickwit-lambda-server/src/config.rs | Adds a (currently empty) config stub file. |
| quickwit/quickwit-lambda-server/src/bin/leaf_search.rs | Provides Lambda binary entrypoint using lambda_runtime. |
| quickwit/quickwit-lambda-server/Cargo.toml | Adds Lambda server crate definition and dependencies/features. |
| quickwit/quickwit-lambda-client/src/metrics.rs | Adds Prometheus metrics for Lambda invocation and payload sizes. |
| quickwit/quickwit-lambda-client/src/lib.rs | Exposes deploy/invoker APIs and payload types. |
| quickwit/quickwit-lambda-client/src/invoker.rs | Implements AWS Lambda invocation + response decoding into per-split responses. |
| quickwit/quickwit-lambda-client/src/deploy.rs | Implements auto-deploy logic (version discovery/publish + GC). |
| quickwit/quickwit-lambda-client/build.rs | Downloads and embeds Lambda zip; computes content hash for versioning. |
| quickwit/quickwit-lambda-client/README.md | Documents Lambda release process and content-based versioning. |
| quickwit/quickwit-lambda-client/Cargo.toml | Adds Lambda client crate definition and dependencies/build deps. |
| quickwit/quickwit-config/src/node_config/serialize.rs | Extends serialization tests to cover lambda config. |
| quickwit/quickwit-config/src/node_config/mod.rs | Adds LambdaConfig, LambdaDeployConfig, SearcherConfig.lambda, and CacheConfig::no_cache(). |
| quickwit/quickwit-config/src/lib.rs | Re-exports lambda config types. |
| quickwit/quickwit-config/resources/tests/node_config/quickwit.yaml | Adds lambda section to test YAML config. |
| quickwit/quickwit-config/resources/tests/node_config/quickwit.toml | Adds lambda section to test TOML config. |
| quickwit/quickwit-config/resources/tests/node_config/quickwit.json | Adds lambda section to test JSON config. |
| quickwit/quickwit-config/Cargo.toml | Adds quickwit-common testsuite feature dep for config tests. |
| quickwit/quickwit-aws/src/lib.rs | Bumps AWS SDK behavior version used in defaults. |
| quickwit/Cargo.toml | Adds new crates to workspace members + workspace deps (lambda_runtime, ureq, zip, aws-sdk-lambda, aws-smithy-mocks). |
| quickwit/Cargo.lock | Locks new dependencies (aws-sdk-lambda, lambda_runtime, ureq, zip, etc.). |
| docs/configuration/lambda-config.md | Adds end-user documentation for lambda offloading + IAM + deployment. |
| LICENSE-3rdparty.csv | Updates third-party license list for new deps. |
| .github/workflows/publish_lambda.yaml | Adds workflow to build and draft-release the Lambda binary zip. |
| .github/actions/cross-build-binary/action.yml | Pins upload-to-github-release action by commit SHA. |
| .github/actions/cargo-build-macos-binary/action.yml | Pins upload-to-github-release action by commit SHA. |
Comments suppressed due to low confidence (1)
quickwit/quickwit-lambda-server/src/config.rs:17
src/config.rsappears to be an unused stub (not referenced viamod config;and contains only imports). If it’s not needed, it should be removed; if it is intended to host config parsing, wire it up and add the missing implementation.
use anyhow::Context as _;
use bytesize::ByteSize;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| use std::path::PathBuf; | ||
|
|
||
| /// URL to download the pre-built Lambda zip from GitHub releases. | ||
| /// This should be updated when a new Lambda binary is released. |
There was a problem hiding this comment.
The hardcoded LAMBDA_ZIP_URL does not match the release naming described in quickwit-lambda-client/README.md / .github/workflows/publish_lambda.yaml (it has a lambda-ff6fdfa5 tag and quickwit-aws-lambda--aarch64.zip filename with a double dash). If left as-is this is likely to 404 and break builds; consider deriving the URL from the Quickwit version/tag convention or documenting why this specific tag/filename is expected.
| /// This should be updated when a new Lambda binary is released. | |
| /// | |
| /// Note: | |
| /// - This is intentionally pinned to a specific Lambda release tag | |
| /// (`lambda-ff6fdfa5`) and asset name (`quickwit-aws-lambda--aarch64.zip`), | |
| /// which may not follow the generic naming pattern described in | |
| /// `quickwit-lambda-client/README.md` or `.github/workflows/publish_lambda.yaml`. | |
| /// - If the Lambda binary is rebuilt or a new version is published, both the | |
| /// tag and the asset filename in this URL must be updated to match the | |
| /// current release naming used in CI. |
| 3. Publish a version with description format `quickwit:{version}-{sha1}` (e.g., `quickwit:0_8_0-fa752891`) | ||
|
|
||
| The description must match the format Quickwit expects, or it won't find the function version. |
There was a problem hiding this comment.
Manual deployment instructions refer to quickwit:{version}-{sha1}, but the implementation uses an MD5-based LAMBDA_BINARY_HASH in the Lambda version description. Update these instructions/examples to match the actual description format so manual deployments remain discoverable by Quickwit.
| 3. Publish a version with description format `quickwit:{version}-{sha1}` (e.g., `quickwit:0_8_0-fa752891`) | |
| The description must match the format Quickwit expects, or it won't find the function version. | |
| 3. Publish a version with description format `quickwit:{version}-{LAMBDA_BINARY_HASH}` (e.g., `quickwit:0_8_0-3b5d5c3712955042212316173ccf37be`) | |
| The description must match the format Quickwit expects (including the MD5-based `LAMBDA_BINARY_HASH`), or it won't find the function version. |
| let mut split_search_guard = SplitSearchStateGuard::new(split_outcome_counters.clone()); | ||
| split_search_guard.set_state(SplitSearchState::CacheHit); | ||
| incremental_merge_collector.add_result(cached_response).ok(); | ||
| } else { |
There was a problem hiding this comment.
process_partial_result_cache silently drops errors from incremental_merge_collector.add_result(...) via .ok(). Since add_result can fail (e.g., aggregation merge), this can hide corruption/incompatibility in cached responses and lead to incomplete results without any signal. At minimum log the error; ideally propagate it and fail the request.
e212982 to
94b0a4d
Compare
94b0a4d to
68cf3dd
Compare
The goal is to handle traffic spikes gracefully without provisioning additional searcher nodes: when the local search queue is saturated, overflow splits are transparently routed to Lambda for processing.
The offloading decision happens **on the leaf side**, inside the `SearchPermitProvider`. The permit provider already manages a bounded queue of pending split search tasks (gated by memory budget and download slots). When a leaf search request arrives, the provider checks the current queue depth against a configurable `offload_threshold`. If granting permits for all requested splits would exceed this threshold, only enough splits to fill up to the threshold are processed locally — the rest are marked for offloading.
The offloaded splits are batched (up to `max_splits_per_invocation` splits per batch, balanced by document count) and sent to Lambda in parallel. Each Lambda invocation runs the same leaf search code path and **returns per-split results individually**. This is important: the per-split responses are fed back into the `IncrementalCollector` and populate the **partial result cache**, so subsequent queries hitting the same splits benefit from cached results regardless of whether the split was searched locally or on Lambda.
Depending on the configuration, the Lambda function code can be **deployed automatically** at startup. The `quickwit-lambda-client` crate embeds a compressed Lambda binary at compile time. When `auto_deploy` is configured, Quickwit will:
1. Check if a published Lambda version matching the current binary already exists (identified by a description tag `quickwit:{version}-{hash}`)
2. Create or update the function and publish a new version if needed
3. Garbage-collect old versions (keeping the current one + 5 most recent)
This ensures the Lambda function always matches the running Quickwit version without any external deployment tooling. Manual deployment is also supported for users who prefer to manage Lambda functions through Terraform or other IaC tools.
Lambda offloading is opt-in. Add a `lambda` section under `searcher` in the node configuration:
```yaml
searcher:
lambda:
offload_threshold: 100 # queue depth before offloading kicks in (0 = always offload)
max_splits_per_invocation: 10
auto_deploy:
execution_role_arn: arn:aws:iam::123456789012:role/quickwit-lambda-role
memory_size: 5 GiB
invocation_timeout_secs: 15
```
- **`quickwit-lambda-client`**: Handles Lambda invocation (with metrics) and auto-deployment logic. Embeds the Lambda binary at build time.
- **`quickwit-lambda-server`**: The Lambda function handler itself — receives a `LeafSearchRequest`, runs `multi_index_leaf_search`, and returns per-split `LeafSearchResponse`s.
- **`quickwit-search`**: New `LambdaLeafSearchInvoker` trait; `SearchPermitProvider` gains `get_permits_with_offload` to split work between local and offloaded; `leaf.rs` orchestrates local and Lambda tasks in parallel.
- **`quickwit-config`**: New `LambdaConfig` and `LambdaDeployConfig` structs under `SearcherConfig`.
- **`quickwit-serve`**: Initializes the Lambda invoker at startup when configured.
- **`quickwit-proto`**: New `LeafSearchResponses` wrapper message for batched per-split responses.
build fix
68cf3dd to
c335f75
Compare
trinity-1686a
left a comment
There was a problem hiding this comment.
review still in progress
|
|
||
| ### Lambda execution role | ||
|
|
||
| The Lambda function requires an execution role with S3 read access to your index data. CloudWatch logging permissions are not required. |
There was a problem hiding this comment.
why is CloudWatch mentioned?
| payload: BASE64_STANDARD.encode(&request_bytes), | ||
| }; | ||
|
|
||
| let payload_json = serde_json::to_vec(&payload) |
There was a problem hiding this comment.
LeafSearchRequest should implement (De)Serialize, if we're forced to use json, i think direct json is probably cleaner than json(bae64(protobuf(request)))
|
|
||
| /// URL to download the pre-built Lambda zip from GitHub releases. | ||
| /// This should be updated when a new Lambda binary is released. | ||
| const LAMBDA_ZIP_URL: &str = "https://github.com/quickwit-oss/quickwit/releases/download/lambda-ff6fdfa5/quickwit-aws-lambda--aarch64.zip"; |
There was a problem hiding this comment.
as with all ressources downloaded from the internet, i think we should check the hash is the one expected (if someone's account get compromised, we don't want people to be able to bait&switch what lambda is going to eventually get executed)
(ftaod, i don't mean that in the "make sure the lambda client is compatible with the lambda-server and use a hash as a versioning mechanism", but "let's ensure that any ressource that is referenced in tree can be authenticated as being the same as the one intended by the person that added that reference in tree")
| let mut split_search_futures: Vec<tokio::task::JoinHandle<_>> = | ||
| Vec::with_capacity(all_splits.len()); | ||
| for (leaf_req_idx, split) in all_splits { | ||
| let leaf_request_ref = &leaf_search_request.leaf_requests[leaf_req_idx]; |
There was a problem hiding this comment.
it took me multiple reading through the code to understand why all_splits existed and we needed to index leaf_search_request.leaf_requests here.
I think moving the creation of the LeafSearchRequest inside the flap_map that created all_splits would be a lot easier to understand
|
|
||
| let searcher_context = ctx.searcher_context.clone(); | ||
| let storage_resolver = ctx.storage_resolver.clone(); | ||
| split_search_futures.push(tokio::task::spawn(multi_index_leaf_search( |
There was a problem hiding this comment.
i'm not a fan of using tokio::spawn for this kind of tasks. imo this should be a JoinSet, with some logic to preserve ordering. wdyt?
| // allow offload to lambda | ||
| // https://github.com/quickwit-oss/quickwit/issues/6150 |
There was a problem hiding this comment.
it's not clear that this is a todo
| permit_sender: oneshot::Sender<Vec<SearchPermitFuture>>, | ||
| RequestWithOffload { | ||
| permit_sizes: Vec<u64>, | ||
| /// Maximum number of pending requests. If granting permits all |
There was a problem hiding this comment.
| /// Maximum number of pending requests. If granting permits all | |
| /// Maximum number of pending requests. If granting all |
Summary
The goal is to handle traffic spikes gracefully without provisioning additional searcher nodes: when the local search queue is saturated, overflow splits are transparently routed to Lambda for processing.
How offloading works
The offloading decision happens on the leaf side, inside the
SearchPermitProvider. The permit provider already manages a bounded queue of pending split search tasks (gated by memory budget and download slots). When a leaf search request arrives, the provider checks the current queue depth against a configurableoffload_threshold. If granting permits for all requested splits would exceed this threshold, only enough splits to fill up to the threshold are processed locally — the rest are marked for offloading.The offloaded splits are batched (up to
max_splits_per_invocationsplits per batch, balanced by document count) and sent to Lambda in parallel. Each Lambda invocation runs the same leaf search code path and returns per-split results individually. This is important: the per-split responses are fed back into theIncrementalCollectorand populate the partial result cache, so subsequent queries hitting the same splits benefit from cached results regardless of whether the split was searched locally or on Lambda.Auto-deployment
Depending on the configuration, the Lambda function code can be deployed automatically at startup. The
quickwit-lambda-clientcrate embeds a compressed Lambda binary at compile time. Whenauto_deployis configured, Quickwit will:quickwit:{version}-{hash})This ensures the Lambda function always matches the running Quickwit version without any external deployment tooling. Manual deployment is also supported for users who prefer to manage Lambda functions through Terraform or other IaC tools.
Configuration
Lambda offloading is opt-in. Add a
lambdasection undersearcherin the node configuration:New crates
quickwit-lambda-client: Handles Lambda invocation (with metrics) and auto-deployment logic. Embeds the Lambda binary at build time.quickwit-lambda-server: The Lambda function handler itself — receives aLeafSearchRequest, runsmulti_index_leaf_search, and returns per-splitLeafSearchResponses.Key changes in existing crates
quickwit-search: NewLambdaLeafSearchInvokertrait;SearchPermitProvidergainsget_permits_with_offloadto split work between local and offloaded;leaf.rsorchestrates local and Lambda tasks in parallel.quickwit-config: NewLambdaConfigandLambdaDeployConfigstructs underSearcherConfig.quickwit-serve: Initializes the Lambda invoker at startup when configured.quickwit-proto: NewLeafSearchResponseswrapper message for batched per-split responses.