Allow adding/editing files not referenced by config#7275
Conversation
d6c5834 to
4a901d3
Compare
132b322 to
8a20cf5
Compare
4a901d3 to
e68483f
Compare
217dc60 to
65af36b
Compare
2bb0d77 to
234904b
Compare
5ab0cc8 to
de96668
Compare
de96668 to
4d3a06b
Compare
234904b to
d383465
Compare
63ec7ab to
cf0aad6
Compare
|
@BugBot review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cf0aad6fb8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| qb.push_values(free_files, |mut b, (path, body)| { | ||
| let hash = blake3::hash(body.as_bytes()).as_bytes().to_vec(); | ||
| b.push_bind(Uuid::now_v7()) | ||
| .push_bind(path) | ||
| .push_bind(body) |
There was a problem hiding this comment.
Deduplicate free-file inserts before writing stored_files
write_free_files_in_tx unconditionally inserts every free file on every apply, but this path never reuses an existing (file_path, content_hash) row like write_collected_files does. Re-saving an unchanged editor state repeatedly will accumulate duplicate active rows for the same file path, and those rows are only tombstoned when the path is removed entirely, so table size and GET /internal/config_toml scan cost grow with save count.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
The CTE in write_free_files_in_tx already dedups on the editor's "content unchanged" path: its WHERE NOT EXISTS (SELECT 1 ... WHERE file_path = input.file_path AND content_hash = input.content_hash AND deleted_at IS NULL) clause skips the insert when an active row with the same (file_path, content_hash) already exists, so re-saving an unchanged editor state is a no-op on that path.
Content-change dedup is separately covered — the paired UPDATE ... FROM new_rows ... WHERE t.id != n.id tombstones any previously-active row for the same file_path, and write_collected_files (reached via extra_templates for free files too) now does the same tombstone step for referenced files (d177b54). Postgres regression test added in 0641cb8.
If there's a scenario you think is still doubled up, happy to look — but I don't see one in the current code.
| SELECT DISTINCT ON (file_path) file_path, source_body | ||
| FROM tensorzero.stored_files | ||
| WHERE deleted_at IS NULL | ||
| ORDER BY file_path, created_at DESC | ||
| ", |
There was a problem hiding this comment.
Add a deterministic tie-breaker to editor file selection
This DISTINCT ON (file_path) query orders only by created_at DESC, so when multiple active rows for the same path share a timestamp, Postgres may return either row nondeterministically. In this change, one apply can create multiple same-path rows in a single transaction, and NOW() is transaction-stable, so ties are realistic; if tied rows differ, editor path_contents (and therefore CAS signatures) can flip unpredictably across reads.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
we should reject multiple rows with the same path
There was a problem hiding this comment.
Done — load_editor_path_contents now returns an error (in b736742) instead of silently picking a row when stored_files has more than one active row for a file_path. Also dropped the DISTINCT ON (and the no-longer-needed created_at DESC column from idx_stored_files_editor_latest) — the write path is the single source of truth for maintaining the at-most-one-active-row invariant.
There was a problem hiding this comment.
Addressed — the DISTINCT ON is gone entirely. load_editor_path_contents now does a plain SELECT file_path, source_body WHERE deleted_at IS NULL and errors if any file_path appears more than once (b736742). The write path maintains the "at most one active row per file_path" invariant: free files via the CTE's paired INSERT … RETURNING + UPDATE tombstone, referenced files via write_collected_files's new tombstone step (d177b54). Pre-migration duplicates are backfilled (tombstoned) in 1d20517 so the invariant holds from day one. Postgres regression test in 0641cb8.
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit cf0aad6. Configure here.
c3591e3 to
a1dcf7c
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 1d20517. Configure here.
Per review: the write path maintains the invariant that at most one active row exists per file_path, so we should surface a duplicate as a data integrity error rather than silently picking a row via `DISTINCT ON` + tie-breaker. Drops the `created_at DESC` column from the partial index since the read no longer sorts. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The `load_editor_path_contents` integrity check added in b736742 assumes the write path maintains "at most one active row per `file_path`". The free-file CTE already maintains this, but `write_collected_files` (used for referenced files) inserted new rows on content change without tombstoning old ones, which is what caused CI failures once the read path started rejecting duplicates. Make `write_collected_files` maintain the invariant: - Filter existing-row lookup on `deleted_at IS NULL` so we never reuse a tombstoned UUID (which would leave the path with zero active rows). - After inserting new rows, tombstone any OTHER active rows for the processed `file_path`s. Config consumers read files by UUID without a `deleted_at` filter, so older function/tool/eval versions still resolve correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Two follow-ups from the review: 1. `write_free_files_in_tx` tombstones any active row whose `file_path` is not in `all_new_paths`. Previously `all_new_paths` only contained what the client sent in `request.path_contents`, so an apply that omitted a canonical file from `path_contents` would tombstone its still-referenced active row. Union the canonical paths into `all_new_paths` so canonical files are always protected regardless of what the client includes. 2. Add a Postgres sqlx test asserting that editing a tool's referenced file body through `write_stored_config` twice leaves exactly one active row for the path (old version tombstoned, new version carries the updated content). This locks in the `write_collected_files` tombstone fix from d177b54. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Before this migration, `write_collected_files` could leave multiple rows with the same `file_path` (when a referenced file's content was edited, a new row was inserted without tombstoning the old one). Once the `deleted_at` column is added all of those rows default to `deleted_at IS NULL`, which would break the new editor read path: it enforces "at most one active row per `file_path`" and now returns a data-integrity error rather than silently picking a row. Tombstone every non-latest row per `file_path` in the same migration, using `(created_at DESC, id DESC)` so the row we keep active matches what the previous `DISTINCT ON` read-time tie-breaker would have returned. The invariant holds from migration time forward and the editor GET doesn't 500 on any deployment that already had duplicates. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Cursor review flagged the error message for using single-quoted `{:?}`
output around the file path and leaving `stored_files` / `file_path`
unquoted. Project convention prefers backticks for technical terms in
error messages and user-visible strings. Switch to backtick-wrapped
identifiers and `{}` formatting.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
a9a479e to
2a05d98
Compare

We add "deleted_at" to stored files so we can show all latest version of the same file that are not deleted, regardless of whether it's referenced in the config toml. This allows persisting files that are created but not yet referenced in the config toml (users may want to write a file, save it, then change the config to reference it).
Note
Medium Risk
Adds a new soft-delete column and changes stored-file write semantics, which affects Postgres data integrity and editor-visible behavior. Risk is mitigated by a backfill migration plus expanded e2e/DB tests, but mistakes could hide files in the editor or break CAS signatures.
Overview
The config editor now persists and returns “free” files (files present in
path_contentsbut not referenced by the TOML) by treatingtensorzero.stored_filesas the source of truth for the editor’s file list.This introduces soft-delete/tombstoning for stored files via a new
deleted_atcolumn and enforces an invariant of at most one active row perfile_path:write_collected_filesnow reuses only non-tombstoned rows and tombstones superseded active versions, andload_editor_path_contentsloads all non-tombstoned files and errors on duplicates.GET /internal/config_tomlandPOST /internal/config_toml/applyare updated to merge DB-loaded editor files with canonical referenced files for consistentbase_signaturecomputation, andapplynow tombstones removed files and upserts free files in-transaction. Adds a migration backfill + index and new e2e/UI tests covering free-file add/edit/remove and rename persistence.Reviewed by Cursor Bugbot for commit 2a05d98. Bugbot is set up for automated code reviews on this repo. Configure here.