Skip to content

Allow adding/editing files not referenced by config#7275

Draft
shuyangli wants to merge 8 commits into
mainfrom
sl/allow-files-not-referenced-by-config
Draft

Allow adding/editing files not referenced by config#7275
shuyangli wants to merge 8 commits into
mainfrom
sl/allow-files-not-referenced-by-config

Conversation

@shuyangli
Copy link
Copy Markdown
Contributor

@shuyangli shuyangli commented Apr 13, 2026

We add "deleted_at" to stored files so we can show all latest version of the same file that are not deleted, regardless of whether it's referenced in the config toml. This allows persisting files that are created but not yet referenced in the config toml (users may want to write a file, save it, then change the config to reference it).


Note

Medium Risk
Adds a new soft-delete column and changes stored-file write semantics, which affects Postgres data integrity and editor-visible behavior. Risk is mitigated by a backfill migration plus expanded e2e/DB tests, but mistakes could hide files in the editor or break CAS signatures.

Overview
The config editor now persists and returns “free” files (files present in path_contents but not referenced by the TOML) by treating tensorzero.stored_files as the source of truth for the editor’s file list.

This introduces soft-delete/tombstoning for stored files via a new deleted_at column and enforces an invariant of at most one active row per file_path: write_collected_files now reuses only non-tombstoned rows and tombstones superseded active versions, and load_editor_path_contents loads all non-tombstoned files and errors on duplicates.

GET /internal/config_toml and POST /internal/config_toml/apply are updated to merge DB-loaded editor files with canonical referenced files for consistent base_signature computation, and apply now tombstones removed files and upserts free files in-transaction. Adds a migration backfill + index and new e2e/UI tests covering free-file add/edit/remove and rename persistence.

Reviewed by Cursor Bugbot for commit 2a05d98. Bugbot is set up for automated code reviews on this repo. Configure here.

@shuyangli shuyangli force-pushed the sl/allow-files-not-referenced-by-config branch from d6c5834 to 4a901d3 Compare April 13, 2026 18:34
@shuyangli shuyangli force-pushed the sl/ui-config-editor branch 2 times, most recently from 132b322 to 8a20cf5 Compare April 13, 2026 19:09
@shuyangli shuyangli force-pushed the sl/allow-files-not-referenced-by-config branch from 4a901d3 to e68483f Compare April 13, 2026 19:09
@shuyangli shuyangli force-pushed the sl/ui-config-editor branch 2 times, most recently from 217dc60 to 65af36b Compare April 13, 2026 19:50
@shuyangli shuyangli force-pushed the sl/allow-files-not-referenced-by-config branch 2 times, most recently from 2bb0d77 to 234904b Compare April 13, 2026 21:00
@shuyangli shuyangli force-pushed the sl/ui-config-editor branch 2 times, most recently from 5ab0cc8 to de96668 Compare April 13, 2026 21:14
@shuyangli shuyangli force-pushed the sl/ui-config-editor branch from de96668 to 4d3a06b Compare April 13, 2026 21:18
@shuyangli shuyangli force-pushed the sl/allow-files-not-referenced-by-config branch from 234904b to d383465 Compare April 13, 2026 21:18
Base automatically changed from sl/ui-config-editor to main April 13, 2026 22:35
@shuyangli shuyangli force-pushed the sl/allow-files-not-referenced-by-config branch from 63ec7ab to cf0aad6 Compare April 14, 2026 14:44
@shuyangli shuyangli marked this pull request as ready for review April 14, 2026 14:44
@shuyangli
Copy link
Copy Markdown
Contributor Author

@BugBot review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf0aad6fb8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +383 to +387
qb.push_values(free_files, |mut b, (path, body)| {
let hash = blake3::hash(body.as_bytes()).as_bytes().to_vec();
b.push_bind(Uuid::now_v7())
.push_bind(path)
.push_bind(body)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Deduplicate free-file inserts before writing stored_files

write_free_files_in_tx unconditionally inserts every free file on every apply, but this path never reuses an existing (file_path, content_hash) row like write_collected_files does. Re-saving an unchanged editor state repeatedly will accumulate duplicate active rows for the same file path, and those rows are only tombstoned when the path is removed entirely, so table size and GET /internal/config_toml scan cost grow with save count.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CTE in write_free_files_in_tx already dedups on the editor's "content unchanged" path: its WHERE NOT EXISTS (SELECT 1 ... WHERE file_path = input.file_path AND content_hash = input.content_hash AND deleted_at IS NULL) clause skips the insert when an active row with the same (file_path, content_hash) already exists, so re-saving an unchanged editor state is a no-op on that path.

Content-change dedup is separately covered — the paired UPDATE ... FROM new_rows ... WHERE t.id != n.id tombstones any previously-active row for the same file_path, and write_collected_files (reached via extra_templates for free files too) now does the same tombstone step for referenced files (d177b54). Postgres regression test added in 0641cb8.

If there's a scenario you think is still doubled up, happy to look — but I don't see one in the current code.

Comment on lines +984 to +988
SELECT DISTINCT ON (file_path) file_path, source_body
FROM tensorzero.stored_files
WHERE deleted_at IS NULL
ORDER BY file_path, created_at DESC
",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add a deterministic tie-breaker to editor file selection

This DISTINCT ON (file_path) query orders only by created_at DESC, so when multiple active rows for the same path share a timestamp, Postgres may return either row nondeterministically. In this change, one apply can create multiple same-path rows in a single transaction, and NOW() is transaction-stable, so ties are realistic; if tied rows differ, editor path_contents (and therefore CAS signatures) can flip unpredictably across reads.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should reject multiple rows with the same path

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — load_editor_path_contents now returns an error (in b736742) instead of silently picking a row when stored_files has more than one active row for a file_path. Also dropped the DISTINCT ON (and the no-longer-needed created_at DESC column from idx_stored_files_editor_latest) — the write path is the single source of truth for maintaining the at-most-one-active-row invariant.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the DISTINCT ON is gone entirely. load_editor_path_contents now does a plain SELECT file_path, source_body WHERE deleted_at IS NULL and errors if any file_path appears more than once (b736742). The write path maintains the "at most one active row per file_path" invariant: free files via the CTE's paired INSERT … RETURNING + UPDATE tombstone, referenced files via write_collected_files's new tombstone step (d177b54). Pre-migration duplicates are backfilled (tombstoned) in 1d20517 so the invariant holds from day one. Postgres regression test in 0641cb8.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit cf0aad6. Configure here.

@shuyangli shuyangli force-pushed the sl/allow-files-not-referenced-by-config branch 2 times, most recently from c3591e3 to a1dcf7c Compare April 17, 2026 13:51
Comment thread crates/tensorzero-core/src/endpoints/internal/config_toml.rs Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1d20517. Configure here.

Comment thread crates/tensorzero-core/src/db/postgres/stored_config_queries.rs
shuyangli and others added 8 commits April 24, 2026 17:20
Per review: the write path maintains the invariant that at most one
active row exists per file_path, so we should surface a duplicate as a
data integrity error rather than silently picking a row via
`DISTINCT ON` + tie-breaker. Drops the `created_at DESC` column from
the partial index since the read no longer sorts.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The `load_editor_path_contents` integrity check added in b736742
assumes the write path maintains "at most one active row per
`file_path`". The free-file CTE already maintains this, but
`write_collected_files` (used for referenced files) inserted new rows
on content change without tombstoning old ones, which is what caused
CI failures once the read path started rejecting duplicates.

Make `write_collected_files` maintain the invariant:
- Filter existing-row lookup on `deleted_at IS NULL` so we never reuse
  a tombstoned UUID (which would leave the path with zero active rows).
- After inserting new rows, tombstone any OTHER active rows for the
  processed `file_path`s. Config consumers read files by UUID without a
  `deleted_at` filter, so older function/tool/eval versions still
  resolve correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Two follow-ups from the review:

1. `write_free_files_in_tx` tombstones any active row whose `file_path`
   is not in `all_new_paths`. Previously `all_new_paths` only contained
   what the client sent in `request.path_contents`, so an apply that
   omitted a canonical file from `path_contents` would tombstone its
   still-referenced active row. Union the canonical paths into
   `all_new_paths` so canonical files are always protected regardless
   of what the client includes.

2. Add a Postgres sqlx test asserting that editing a tool's referenced
   file body through `write_stored_config` twice leaves exactly one
   active row for the path (old version tombstoned, new version
   carries the updated content). This locks in the
   `write_collected_files` tombstone fix from d177b54.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Before this migration, `write_collected_files` could leave multiple
rows with the same `file_path` (when a referenced file's content was
edited, a new row was inserted without tombstoning the old one). Once
the `deleted_at` column is added all of those rows default to
`deleted_at IS NULL`, which would break the new editor read path: it
enforces "at most one active row per `file_path`" and now returns a
data-integrity error rather than silently picking a row.

Tombstone every non-latest row per `file_path` in the same migration,
using `(created_at DESC, id DESC)` so the row we keep active matches
what the previous `DISTINCT ON` read-time tie-breaker would have
returned. The invariant holds from migration time forward and the
editor GET doesn't 500 on any deployment that already had duplicates.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Cursor review flagged the error message for using single-quoted `{:?}`
output around the file path and leaving `stored_files` / `file_path`
unquoted. Project convention prefers backticks for technical terms in
error messages and user-visible strings. Switch to backtick-wrapped
identifiers and `{}` formatting.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants