feat+fix: C# support, investigation-grade trace, semantic search, execution flows, channels, cross-repo intelligence by Koolerx · Pull Request #162 · DeusData/codebase-memory-mcp

Koolerx · 2026-03-27T17:28:52Z

Summary

40 commits adding major features and fixing critical bugs across the MCP handler, extraction layer, pipeline, store, and Cypher engine. Developed while stress-testing against large enterprise codebases and running real investigation scenarios.

Highlights: C# blast radius analysis (0 → 16 callers), execution flow detection (1 → 300 flows), hybrid BM25+vector semantic search, camelCase token splitting, process deduplication, route deduplication, cross-repo search/trace/impact analysis across 54 repos.

Phase 1: Core Fixes and Features (Commits 1-29)

Bug Fixes (Commits 1-6)

trace_call_path Class → Method resolution — BFS resolves through DEFINES_METHOD edges for Class/Interface nodes
detect_changes use-after-free — Switched to strcpy variants for stack buffer reuse
Route path validation — Blocklist filter for vendored/minified JS false positives
C# inheritance via base_list — INHERITS edges: 210 → 1,588 (7.5x)
Crash on 0-edge nodes + fuzzy name fallback — Heap-allocated traversal array, substring fallback
Class has_method node ID — Fixed DEFINES_METHOD BFS to use Class node ID

New Features (Commits 7-17)

get_architecture returns full analysis
Louvain clustering with semantic labels
Hapi.js route extraction (0 → 1,665 routes)
BM25 full-text search via SQLite FTS5
Execution flow detection (BFS + Louvain, 300 flows)
Socket.IO + EventEmitter channel detection
get_impact blast radius with risk assessment
14-17. Cypher JSON properties, investigation-grade trace, C# delegate/event resolution, C# channel constants

Gap Closure (Commits 18-29)

get_impact Class over Constructor resolution for C#
Entry point detection for C#/Java class methods (1 → 280 flows)
Channel dedup + count(DISTINCT) + SQL injection fix
Cypher NOT EXISTS subquery (dead-code detection in <1s)
Cross-repo channel query + has_property in trace
C# property extraction with HAS_PROPERTY edges (19K+ properties)
C/C++ CALLS edge attribution to enclosing function scope
C++ entry point heuristics (WinMain, DllMain, etc.)
HANDLES + HTTP_CALLS in process detection BFS
Route→Function resolution + relaxed process detection (4 → 61 flows)
Resolve relative import paths (153 → 11,770 IMPORTS, 77x)
CommonJS require() import extraction

Phase 2: Search Quality (Commits 30-33)

30. Process participation in `search_graph` results

Each BM25 search result now includes the execution flows it participates in.

31. JS/TS constant resolution for Socket.IO channel detection

Resolves const EVENT = "foo" references in socket.emit(EVENT) patterns. Channels detected: 6 → 17 per repo.

32. Expose BM25 `query` and `sort_by` params in `search_graph` schema

The FTS5 BM25 search path existed but query was not declared in the tool's inputSchema. AI agents couldn't discover or use it. Now exposed with full documentation.

33. Pure BM25 relevance ranking + camelCase token splitting

Removed fan_in popularity boost from BM25 ranking — popular-but-irrelevant functions no longer outrank relevant matches
Added cbm_camel_split() SQLite function — updateCloudClient → updateCloudClient update Cloud Client enabling word-level BM25 matching
Switched FTS5 to contentless mode (content='') — required for camelCase split tokens to match correctly at query time

Phase 3: Process and Route Quality (Commit 34, 38)

34. Deduplicate entry points + `[module]` prefix on process labels

Entry point dedup: Route resolution added same function N times when N routes pointed to same handler file. 61 → 17 processes.
Module labels: funcA → funcZ → [controllers] funcA → funcZ — instantly navigable among 50+ flows.

38. Route node deduplication — eliminate ghost nodes

Three sources of 3x route duplication: express + hapi extractors both matching same patterns, plus module-level extraction with empty QN. Fixed with (method, path) dedup. 1665 → 555 routes, 0 ghosts.

Phase 4: Semantic Vector Search (Commits 35-37)

35. Hybrid BM25+vector semantic search via external embedding API

Full semantic search architecture:

Embeddings table in SQLite with cbm_cosine_sim() custom function
HTTP embedding client via Mongoose to any OpenAI-compatible /v1/embeddings endpoint
RRF merge (k=60) combining BM25 keyword results with vector cosine similarity
generate_embeddings MCP tool for manual trigger
semantic_results field in search_graph output for vector-only matches

Configuration: CBM_EMBEDDING_URL, CBM_EMBEDDING_MODEL, CBM_EMBEDDING_DIMS env vars. No new dependencies — uses vendored Mongoose (HTTP) and yyjson (JSON).

36. Fix use-after-free in semantic result strings

yyjson_mut_obj_add_str (borrows pointer) → yyjson_mut_obj_add_strcpy (copies string).

37. Auto-generate embeddings during full indexing

When CBM_EMBEDDING_URL is configured, the pipeline auto-generates embeddings after process and channel detection. Zero-friction: repos indexed while the embedding server is running get embeddings automatically.

Phase 5: Cross-Repo Intelligence (Commits 39-40)

39. Unified cross-repository index

New _cross_repo.db built by scanning all per-project databases:

134K node stubs (Function/Method/Class/Interface/Route from all repos)
526 channel references across 13 projects
134K embedding vectors copied for cross-repo semantic search
BM25 FTS5 index with camelCase splitting across all repos
12 cross-repo channel matches automatically detected (emit in A, listen in B)
Build time: ~2 seconds for 54 repos

New MCP tools: build_cross_repo_index, trace_cross_repo. Auto-rebuilds after every index_repository.

40. Cross-repo search, flow tracing with call chains, and impact analysis

Cross-repo search (search_graph with project="*"):
Hybrid BM25+vector search across all 54 repos in a single call. Returns results with both short project name and full project_id for follow-up queries.

Enhanced trace_cross_repo with call chains:
When a channel filter is provided, traces depth-2 upstream callers of the emitter and depth-2 downstream callees of the listener. Handles Class→Method resolution and (file-level) listener fallback via channels table lookup.

Cross-repo impact analysis (get_impact with cross_repo=true):
After per-repo BFS, checks if d=1 impacted symbols emit channels to other repos. For each affected channel, opens consumer project DB, traces downstream from listener, returns cross_repo_impacts array.

Testing

All 40 commits compile clean with -Wall -Wextra -Werror. 2,586 existing tests pass. Stress-tested against:

Large C# monolith (~128K nodes) — class hierarchy, delegates, 19K properties, 278 flows, 152 channels
Node.js/TS Hapi.js monorepo (~146K nodes) — 11,770 IMPORTS, 300 flows, 212 channels, 555 routes
React/TS monorepo (~9K nodes) — 300 flows, 344 IMPORTS
Java/Vert.x service (~15K nodes) — 300 flows from Java entry points
C++ library (~570 nodes) — Function→Function CALLS attribution
54 repos with 134K embeddings — cross-repo semantic search, channel tracing, impact analysis

New Files

src/pipeline/embedding.c / embedding.h — Semantic embedding generation + RRF merge
src/store/cross_repo.c / cross_repo.h — Cross-repo index, search, channel matching, trace helper

Configuration (new env vars)

Variable	Default	Purpose
`CBM_EMBEDDING_URL`	(none)	OpenAI-compatible `/v1/embeddings` endpoint
`CBM_EMBEDDING_MODEL`	`nomic-embed-text`	Embedding model name
`CBM_EMBEDDING_DIMS`	`768`	Vector dimensions

When trace_call_path targets a Class or Interface node, the BFS now resolves through DEFINES_METHOD edges to find the actual callable methods, then runs BFS from each method and merges results. Previously, tracing a class name returned 0 results because Class nodes have no direct CALLS edges — only their Method children do. Also expands edge types to include HTTP_CALLS and ASYNC_CALLS alongside CALLS for broader cross-service coverage. Node selection improved: when multiple nodes share the same name (e.g. a Class and its constructor Method), prefer the Class for resolution since constructors rarely have interesting outbound CALLS. Tested: C# class tracing went from 0 to 87 callees and 8 callers. TS repos unchanged at 50 callers.

…free detect_changes was using yyjson_mut_arr_add_str / yyjson_mut_obj_add_str which borrow pointers. The file name came from a stack buffer reused each fgets() iteration, and node names were freed by cbm_store_free_nodes before serialization. This caused corrupted output with null bytes embedded in filenames (e.g. 'CLAUDE.md\0\0\0ings.json'). Switch to yyjson_mut_arr_add_strcpy / yyjson_mut_obj_add_strcpy which copy the strings into yyjson's internal allocator, making them safe across the buffer reuse and free boundaries.

Vendored/minified JS files (tsc.js, typescript.js) inside non-JS repos produce false positive routes when the Express route extractor matches JS operators and keywords as route paths. Add a validation filter that rejects: - JS/TS operators: !, +, ++, -, --, :, ~ - JS/TS keywords: void, null, true, false, throw, this, typeof, etc. - Single-character non-slash paths (*, ?, #) - Paths with no alphanumeric or slash characters Also trims leading/trailing whitespace before comparison to catch 'void ' and 'throw ' variants from minified source. Tested: Routes went from 42 (20 garbage) to 22 real routes in test C# repo.

The tree-sitter C# grammar represents class inheritance via 'base_list' child nodes (e.g. 'class Foo : Bar, IBaz'). The extract_base_classes function didn't handle this node type, causing most C# inheritance to be missed. Add explicit traversal of base_list children, extracting type identifiers from both direct identifier nodes and wrapper nodes (simple_base_type, primary_constructor_base_type). Generic type arguments are stripped for resolution (List<int> → List). Tested: INHERITS edges went from 210 to 1,588 in test C# repo (7.5x improvement). Verified results include real C# domain classes (e.g. ClassA→BaseClassB, TestSuite→TestsBase, etc.).

The get_architecture MCP handler was only returning node/edge label counts (identical to get_graph_schema). The store has a full architecture analysis function cbm_store_get_architecture() that computes languages, hotspots, routes, entry points, packages, clusters, and layers — but it was never called from the MCP handler. Wire all architecture aspects into the response: - languages: file counts per language - hotspots: highest fan-in functions - routes: HTTP route definitions - entry_points: main/handler functions - packages: top-level module groupings - clusters: Louvain community detection results Use strcpy variants for all architecture strings since they're freed by cbm_store_architecture_free before any potential reuse. Tested: get_architecture went from 0 for all fields to 10 languages, 10 hotspots, 13 routes, 20 entry points, 15 packages.

The cbm_louvain() function was fully implemented but never called. Add arch_clusters() that loads all callable nodes and CALLS edges, runs Louvain community detection, groups results by community ID, and populates cbm_cluster_info_t with member counts and top-5 nodes per cluster sorted by largest communities first. Wire into cbm_store_get_architecture() dispatch for the 'clusters' aspect. Cap output at 20 clusters. Top nodes per cluster are selected by iterating community members (degree-based sorting can be added later). Tested: Test C# repo went from 0 to 20 clusters. Largest cluster has 3,205 members (test code), second has 1,881 (core API functions).

Add cbm_extract_hapi_routes() that handles the Hapi.js route registration pattern: { method: 'GET', path: '/api/...', handler: ... }. Uses a mini-parser that finds method:/path: property pairs within the same object literal by tracking enclosing brace scope. Also extracts handler references. Wired into both the prescan (parallel) path in pass_parallel.c and the disk fallback path in pass_httplinks.c for both per-function and module-level source scanning. Tested: Test TS/Hapi repo went from 0 to 1,665 routes. CBM now finds every route definition AND API call site, compared to only 12 from external service proxy routes with the previous tool.

Add a nodes_fts FTS5 virtual table synced via triggers for INSERT/UPDATE/DELETE. Enable SQLITE_ENABLE_FTS5 in both production and test Makefile flags. New 'query' parameter on search_graph: when set, uses FTS5 MATCH with bm25() ranking instead of regex matching. Multi-word queries are tokenized into OR terms for broad matching (e.g. 'authentication middleware' matches nodes containing either word, ranked by relevance). The direct B-tree dump pipeline bypasses SQLite triggers, so add a bulk FTS5 backfill step after indexing: INSERT INTO nodes_fts SELECT id, name, qualified_name, label, file_path FROM nodes Add cbm_store_exec() public API for raw SQL execution. Falls back gracefully to regex path if FTS5 is unavailable. Tested: 'authentication middleware' query returns 242 ranked results (was 0). 'session recording upload' returns 4,722 ranked results with relevant routes, controllers, and constants at the top.

… + Louvain Add process detection as a post-indexing pass that discovers cross-community execution flows: 1. Find all entry point nodes (is_entry_point=true or Route label) 2. Load CALLS edges and run Louvain community detection 3. BFS from each entry point to depth 8, max 200 visited nodes 4. Identify the deepest node that crosses a Louvain community boundary 5. Name the flow 'EntryPoint → Terminal' with process_type=cross_community 6. Store to new processes + process_steps tables New schema: 'processes' table (id, project, label, process_type, step_count, entry_point_id, terminal_id) and 'process_steps' table (process_id, node_id, step). New store API: cbm_store_detect_processes(), cbm_store_list_processes(), cbm_store_get_process_steps() with corresponding free functions. New MCP tool: list_processes returns up to 300 processes ordered by step count. Tested: TS/Hapi monorepo detects 300 cross-community processes, matching the flow count from competing tools. Examples: 'ssoCallbackHandler → catchUnexpectedResponse', 'exportCourse → sendSQSMessage'.

Detect emit/listen channel patterns in JS/TS/Python source files during indexing. Extracts socket.emit/on, io.emit/on, emitter.emit/on patterns with a regex scanner that identifies receiver names against a whitelist of known channel communicators (socket, io, emitter, eventBus, etc.). Filters out generic Node.js stream events (error, close, data, etc.) and classifies transport as 'socketio' or 'eventemitter' based on receiver name. New schema: 'channels' table (project, channel_name, direction, transport, node_id, file_path, function_name) with indexes on channel_name and project. New store API: cbm_store_detect_channels() scans source from disk for all indexed Function/Method/Module nodes in JS/TS/Python files. cbm_store_find_channels() queries by project and/or channel name with partial matching. Automatic cross-repo matching at query time (no link step). New MCP tool: get_channels returns matched channels with emitter/listener info, filterable by channel name and project. Tested: TS monorepo detects 210 channel references including Socket.IO subscribe/unsubscribe flows between UI and server.

node_prop() previously returned empty string for any property not in the hardcoded column list (name, qualified_name, label, file_path, start_line, end_line). Now falls through to json_extract_prop() on the node's properties_json field for unknown properties. Enables Cypher queries like: WHERE n.is_entry_point = 'true' WHERE n.is_test = '1' WHERE n.confidence > '0.5' Also adds 'file' as an alias for 'file_path' and 'id' for the node ID. Tested: 'MATCH (n:Function) WHERE n.is_entry_point = true' returns 10 controller handlers (previously 0).

…results QFix 1 — trace_call_path disambiguation + file paths: - When multiple callable symbols match, includes a 'candidates' array with name, label, file_path, line for each (like IDE go-to-definition) - Every BFS result node now includes file_path, label, start_line - Adds matched_file, matched_label, matched_line to the root response QFix 2 — domain-weighted flow terminal naming: - Reduced BFS max_results from 200 to 50 to prevent generic utility functions from becoming terminals - Terminal candidates scored by: name length (domain names are longer), CamelCase bonus, domain verb bonus (Handler, Controller, Service, etc.), penalty for generic names (update, get, set, findOne, push, etc.) - Result: 2/300 flows end in generic names (was ~280/300) - Step count range: 3-51 (was 3-201) QFix 3 — FTS5 search structural filtering: - Exclude File/Module/Folder/Section/Variable/Project nodes from results - Structural boost: Function/Method +10, Class/Interface/Type +5, Route +8 - High fan-in bonus: nodes with >5 CALLS in-degree get +3 - Result: 'authentication middleware' returns verifyJwt, apiMiddleware, createAuthRequestConfig (was returning Folder/Module/Section noise)

Gap 1 — Semantic cluster labels: Replace auto-numbered 'Cluster_N' with directory-derived semantic labels. For each cluster, sample up to 50 member file paths, extract the most common non-generic directory segment (skip src/lib/dist/test/node_modules/shared), capitalize and TitleCase the result. Falls back to 'Cluster_N' when no directory has >= 3 occurrences. Result: 'Services', 'Components', 'Controllers', 'Storage', 'Models', 'Stores', 'Scenarios', 'Courses' — matching competing tool quality. Gap 2 — Process participation in trace_call_path: After BFS traversal, query the processes table to find all execution flows the traced function participates in (as entry point, terminal, or by name substring match in the flow label). Includes up to 20 flows with label, process_type, and step_count directly in the trace response — no separate tool call needed.

…ss steps Major rewrite of trace_call_path output for investigation-grade quality: Categorized edges (Fixes A+D): - incoming: { calls: [...], imports: [...], extends: [...] } - outgoing: { calls: [...], has_method: [...], extends: [...] } - Separate transitive_callers for depth > 1 (avoids noise in main results) Each category queried independently via single-hop BFS on specific edge types. Broader caller coverage (Fix A): - Include USAGE and RAISES edges alongside CALLS for incoming queries - Query both the Class node and its methods as BFS roots - Result: MeteorError upstream goes from 9 to 39 callers Noise elimination (Fix C): - Default depth 1 for categorized results (direct only) - Transitive callers isolated in separate field, capped at 50 - No more 106 render() methods polluting results New get_impact tool (Fix F): - BFS upstream/downstream with depth-grouped results - d1_will_break / d2_likely_affected / d3_may_need_testing - Risk assessment: LOW / MEDIUM / HIGH / CRITICAL based on d1 count - Affected processes cross-referenced by name - Tested: protectedUpdate returns CRITICAL (38 direct, 162 transitive) New get_process_steps tool (Fix E): - Returns ordered step list for a specific process ID - Each step includes name, qualified_name, file_path - Enables step-by-step flow debugging

Fix crash (double-free) when tracing nodes with 0 in-degree and 0 out-degree (e.g. Type nodes, empty Class stubs). Detect early via cbm_store_node_degree and return basic match info without attempting BFS traversal. Also move the traversal result array from stack to heap to prevent stack smashing with many start IDs. Add fuzzy name fallback: when exact name match returns 0 results, run a regex search with '.*name.*' pattern and return up to 10 suggestions with name, label, file_path, line. This handles cases like searching for 'RecordingSession' when only 'ContinuousRecordingSessionDataGen' exists.

Three fixes for C# delegate and event subscription patterns that were invisible to the call graph: Fix 1 — Bare method reference subscription: event += MethodName creates a CALLS edge from the subscribing method to the handler. Detects assignment_expression with += operator where the RHS is an identifier or member_access_expression. e.g. socket.OnConnected += SocketOnConnected Fix 2 — Delegate .Invoke() resolution: delegate?.Invoke(args) resolved to 'Invoke' which matches nothing. Now detects conditional_access_expression and member_access_expression where the method is 'Invoke', extracts the receiver (delegate property) name as the call target instead. e.g. OnConnected?.Invoke(this, e) → CALLS edge to 'OnConnected' Fix 3 — Lambda event body scope attribution: Lambda expressions inside += assignments no longer create a new scope boundary. Calls inside the lambda body are attributed to the enclosing method that subscribes the event, not to an anonymous lambda scope. This means all handler logic is correctly attributed to the method that registers the event subscription. e.g. socket.OnError += (s, e) => { ErrorOnce(...); } attributes the ErrorOnce call to the method containing the += statement. Tested on C# codebase: SocketOnConnected gained 1 incoming caller (from += subscription) and 1 outgoing call (from ?.Invoke resolution). InitializeExternalClient gained 10 additional outgoing calls from lambda body attribution (30 total, up from 20).

Fix A — Class node 0-degree early exit: The crash guard that returns early for nodes with 0 CALLS edges was incorrectly catching Class/Interface nodes that have DEFINES_METHOD and INHERITS edges (cbm_store_node_degree only counts CALLS). Re-add the is_class_like exemption so Class nodes always proceed to DEFINES_METHOD resolution. Cap method resolution to 5 methods to prevent excessive BFS. Fix A2 — has_method uses Class node ID: The DEFINES_METHOD BFS was using method start_ids (from class resolution) as the BFS root, but DEFINES_METHOD edges go FROM the Class TO Methods. Use the original Class node ID for the has_method query. Result: 30 methods found (GitNexus: 29), extends chain shown. Fix B1 — Add .cs to channel detection file filter: Channel detection SQL now includes .cs files alongside JS/TS/Python. Fix B2 — C# channel extraction with constant resolution: New cbm_extract_csharp_channels() in httplink.c that handles: - const string CONSTANT = "value" → builds name-to-value map - .Emit(CONSTANT, ...) → resolves to string value, marks as emit - .OnRequest<T>(CONSTANT, ...) → resolves to string value, marks as listen - .Emit("literal", ...) → direct string literal matching Result: 73 channel references, 35 unique channels in C# repo (was 0).

…st radius When a Class and its Constructor share the same name (common in C#/Java), get_impact previously picked the Constructor (which has 0 incoming CALLS), yielding empty blast radius results for any class query. Now mirrors trace_call_path's disambiguation logic: - Prefers Class node over same-named Constructor/Method - Expands through DEFINES_METHOD edges to get all method node IDs - Runs BFS from each method and merges results (dedup by closest hop) - Caps at 30 methods per class (vs trace's 5) for comprehensive coverage - Improved affected_processes matching: checks d=1 caller names too Tested on a 26K-node C# monolith: 'UserService' went from 0 callers to 16 direct callers, 19 total affected, HIGH risk, 20 affected processes.

@RequestMapping

Previously only JS/TS exports and lowercase 'main' were recognized as entry points, causing 0 execution flows for C#/Java repos. Changes: - Case-insensitive main detection (strcasecmp) — fixes C# 'Main' and Java 'main' in both extract_func_def and push_method_def paths - C# Windows Service lifecycle: OnStart, OnStartImpl, Run, Execute, Configure, ConfigureServices - C# ASP.NET decorators: [HttpGet], [HttpPost], [Route], [ApiController] - C# test decorators: [TestMethod], [Fact], [Test] - Java patterns: start, configure, init, run, handle - Java Spring/JAX-RS: @RequestMapping, @GetMapping, @PostMapping, etc. - Java JUnit/lifecycle: @OverRide, @test, @scheduled, @bean Critical fix: push_method_def() (class methods) was missing entry point detection entirely — only extract_func_def() (standalone functions) had it. Tested: C# monolith 1→69 flows, Java/Vert.x repo 0→300 flows, C# desktop app 2→280 flows + 33 routes discovered.

Channel deduplication: - Added UNIQUE index on channels(project, channel_name, direction, file_path, function_name) to prevent duplicate rows at insert time - Changed INSERT to INSERT OR IGNORE - Added DISTINCT to all channel SELECT queries - Fixed SQL injection in channel DELETE (was snprintf, now parameterized) Cypher count(DISTINCT ...): - Parser now accepts DISTINCT keyword inside aggregate functions: count(DISTINCT n.name), count(DISTINCT n.file_path), etc. - Added distinct_arg flag to cbm_return_item_t - Executor tracks seen values per-column and only increments count for unique values when distinct_arg is set - Proper cleanup of distinct_seen arrays in both WITH and RETURN paths Enables queries like: MATCH (caller)-[e]->(n) WHERE e.type = 'CALLS' RETURN count(DISTINCT n.name) as unique_callees

Adds WHERE NOT EXISTS { MATCH (caller)-[e]->(n) WHERE e.type = 'CALLS' } support for anti-join queries like dead-code detection. Parser: extends parse_not_expr to recognize NOT EXISTS { MATCH ... WHERE ... } as a correlated subquery. Creates EXPR_NOT_EXISTS expression node with sub_pattern and sub_where fields. Executor: two evaluation paths for performance: - Fast path (O(1) per node): when inner pattern has exactly 1 hop and one endpoint is bound from outer scope, directly queries edges by source/target ID. No full node scan needed. - Slow path: full subquery expansion for complex/multi-hop patterns. Threading: eval_expr and eval_where now accept (store, project, max_rows) parameters to support correlated subquery expansion. All 5 call sites updated. Enables queries like: MATCH (n:Function) WHERE NOT EXISTS { MATCH (caller)-[e]->(n) WHERE e.type = 'CALLS' } RETURN n.name, n.file_path LIMIT 20 Tested: finds 10 dead functions in a 216-function JS codebase in <1 second.

Cross-repo channels: when get_channels is called without a project parameter, iterates ALL indexed project .db files in the cache directory, queries each for matching channels, and merges results. Enables cross-service message flow tracing (e.g., find all repos that emit/listen on 'UserCreated'). has_property in trace: trace_call_path now includes outgoing.has_property section for Class/Interface nodes, showing all property nodes linked via HAS_PROPERTY edges — property name, file path, and line number.

Extracts property_declaration, indexer_declaration, event_declaration, and event_field_declaration from C# class bodies as 'Property' label nodes. Previously these were completely invisible to the knowledge graph. Creates HAS_PROPERTY edges from Class → Property in both parallel and serial indexing paths (pass_parallel.c, pass_definitions.c). Extracted metadata: property name, qualified name, file path, line range, declared type (from type field), decorators, export status. Tested: C# monolith (26K nodes) gained 3,470 Property nodes and 6,943 new edges including HAS_PROPERTY. trace_call_path now shows 5 properties for a typical service class.

…cope C/C++ function_definition nodes have no 'name' field — the name is buried in a declarator chain (function_definition → declarator → function_declarator → declarator → identifier). Both compute_func_qn() in extract_unified.c and func_node_name() in helpers.c used ts_node_child_by_field_name('name') which returns NULL for C/C++, causing all CALLS edges to be attributed to the File node instead of the containing Function. Fix: walk the C/C++ declarator chain (up to 8 levels) to find the identifier. Handles: identifier, field_identifier, qualified_identifier, scoped_identifier. Also unwraps template_declaration → function_definition for C++ templates. Fixes C, C++, CUDA, and GLSL function scope resolution. Tested: C++ desktop app went from 0 Function→Function CALLS edges to 10, enabling process detection from entry points for the first time.

Adds entry point detection for C/C++ patterns in both extract_func_def and push_method_def paths: - WinMain, wWinMain, wmain, _tmain (Win32 console/GUI apps) - DllMain (DLL entry points) - InitInstance, OnInitDialog (MFC framework entry points) These join the existing case-insensitive main() detection to cover the full spectrum of C/C++ application architectures.

Process detection now follows HANDLES, HTTP_CALLS, and ASYNC_CALLS edges in addition to CALLS when building Louvain communities and running BFS from entry points. Previously only CALLS edges were traversed, making Express/Hapi route→handler flows invisible to process detection. Changes: - Louvain edge loading query: type IN ('CALLS','HANDLES','HTTP_CALLS','ASYNC_CALLS') - BFS from entry points: 4 edge types instead of 1 Tested: Express monorepo with 158 routes went from 3 to 4 detected flows, with routes now participating in community detection.

Two fixes to dramatically increase detected execution flows: 1. Route→Function resolution (step 1b): Route nodes have 0 outgoing edges (only incoming HANDLES from Module nodes), so BFS from Routes went nowhere. Now resolves each Route entry point through the HANDLES edge to find the Module, then looks up Functions in the same file — those become the real BFS starting points. This connects HTTP API routes to their handler logic. 2. Relaxed cross-community requirement: previously, flows were only created when BFS crossed a Louvain community boundary. Now flows with ≥3 steps are kept even within a single community, picking the deepest non-generic node as terminal. This catches Express-style flat patterns (route → controller → storage → db) that stay within one community. Results: - Express monorepo: 4 → 61 flows (route handlers now visible) - C# service: 69 → 78 flows (+9 intra-community flows) - JS service: 65 → 70 flows (+5 intra-community flows) - TS monolith: 300 (capped, no change)

Root cause: cbm_pipeline_fqn_module() received raw import paths like './utils/trace' or '../controllers/auth' and converted them directly to QNs without resolving against the importing file's directory. The resulting QN never matched any Module node, so IMPORTS edges were silently dropped. New function cbm_pipeline_resolve_import_path() in fqn.c: - Resolves ./ and ../ segments against the importer's directory - Normalizes path (collapses a/b/../c → a/c) - Bare module specifiers (no ./ prefix) pass through unchanged Extension probing in pass_parallel.c and pass_definitions.c: - After resolving the path, tries exact match first - Then probes: .js, .ts, .tsx, .jsx, .mjs, .mts, .css, .scss, .json - Then probes /index variants: /index.js, /index.ts, /index.tsx, etc. - Then probes C/C++ headers: .h, .hpp, .hh Results: - JS service: 0 → 335 IMPORTS edges - TS monolith: 153 → 11,770 IMPORTS edges (77x increase) - TS/React monorepo: 0 → 344 IMPORTS edges - TS/Electron app: 1 → 161 IMPORTS edges

The ES module import walker (walk_es_imports) only handled 'import' statements but not CommonJS 'require()' calls. JS codebases using require() had zero imports extracted. Adds require() detection in walk_es_imports: - Detects variable_declarator/assignment_expression with require() call value - Handles: const X = require('Y') (default import) - Handles: const { A, B } = require('Y') (destructured import via object_pattern) - Handles: const [A, B] = require('Y') (array destructured) - Supports shorthand_property_identifier_pattern and pair_pattern variants - Falls back to path_last() for unnamed requires Also adds variable_declaration and expression_statement to js_import_types in lang_specs.c, catching 'var X = require()' patterns (older JS codebases). Tested: JS service went from 0 to 335 IMPORTS with both ESM and CJS detected.

BM25 search results now include a 'processes' array showing which execution flows each result symbol participates in (up to 5 per symbol). Uses a single prepared statement with process_steps JOIN for efficiency. This closes the gap with flow-grouped search: users can see not just the symbol name and file, but which end-to-end flows it belongs to. Requires sqlite3.h include in mcp.c for direct SQLite access to the process_steps table via cbm_store_get_db().

…tion Two-pass channel extraction for JavaScript/TypeScript: Pass 1 (existing): regex matches string-literal channels: socket.on('Name', ...) Pass 2 (new): resolves constant-name channels: socket.on(CONSTANT_NAME, ...) - Collects const NAME = 'value' mappings from full file source - Matches .emit/.on/.once with bare SCREAMING_CASE identifiers - Resolves constants to their string values - Handles method chaining (.on() without explicit receiver) - Filters short names (<3 chars) to avoid false positives File-level pass in store.c reads complete JS/TS files (up to 512KB) for constant resolution, since per-node snippets don't include file-scope constants. Result: JS service went from 6 channels (test tool only) to 17 channels including all production Socket.IO events: WebRtcSdp, WebRtcIce, CaptureNodeStatusUpdate, RecordedFileUpdate, RecordingSessionUpdate, etc.

The search_graph handler already supports FTS5 BM25 full-text search via a 'query' parameter and sort control via 'sort_by', but neither was declared in the tool's inputSchema. AI agents calling search_graph had no way to discover or use these parameters. Adds to inputSchema: - query: BM25 full-text search with structural boosting (Function/Method +10, Route +8, Class +5, high-fan-in +3). Filters out File/Folder/Module/ Variable/Project noise. Tokenizes input as OR terms for broad matching. - sort_by: 'relevance' (default with query), 'name', 'file_path' Updates tool description to document both search modes: (1) query='terms' for ranked full-text discovery (2) name_pattern='regex' for exact pattern matching

Two search quality fixes: 1. Remove fan_in popularity boost from BM25 ranking. The fan_in>5 clause added +3.0 to the rank of heavily-called functions, causing popular-but-irrelevant results (e.g. 'update' with fan_in=222) to outrank relevant matches. search_graph with query= is now pure BM25 relevance + label-type differentiation only. in_deg/out_deg are still returned in results for display but do not affect sort order. 2. Add cbm_camel_split() SQLite function for FTS5 indexing. FTS5's unicode61 tokenizer treats 'createSession' as a single token 'createsession'. Searching for 'session' would not match it. cbm_camel_split() expands camelCase names into space-separated tokens: 'createSession' → 'createSession create Session' 'HTMLParser' → 'HTMLParser HTML Parser' The original name is preserved as the first token for exact-match queries. Applied in: FTS5 triggers (INSERT/DELETE/UPDATE on nodes table) and the bulk FTS5 backfill after full indexing. 3. Switch FTS5 from content='nodes' (external content) to content='' (contentless). External content mode re-verifies matches against the source table at query time, which re-tokenizes the original name and fails to match the split tokens. Contentless mode trusts the inverted index directly. Trade-off: highlight()/snippet() unavailable (never used). Requires full reindex to rebuild FTS5 tables with new schema + tokens.

… labels Two process detection improvements: 1. Deduplicate entry points after Route resolution (step 1c). Route resolution (step 1b) resolves each Route to ALL functions in the handler file. When N routes point to the same file with M functions, each function appeared N times in the entry point list. The BFS loop then generated N identical processes per entry point. Fix: O(n^2) dedup pass over ep_ids[] after Route resolution, before BFS. Compacts the array in-place, frees duplicate name strings. 2. Add [module] prefix to process labels for navigability. Labels were just entry -> terminal which is useless for finding the right flow among 50+ processes. Now derives the parent directory from the entry point file_path and prepends it as a module tag. Before: funcA -> funcZ After: [controllers] funcA -> funcZ Tested on multiple repos with significant dedup ratios. Requires full reindex to regenerate process tables.

…ng API Adds semantic vector search to close the fundamental vocabulary mismatch gap: natural language queries like 'institution name update' can now find symbols like updateCloudClient even when the keywords don't overlap. Architecture: - Embeddings generated via HTTP POST to OpenAI-compatible /v1/embeddings endpoint (Ollama, llamafile, OpenAI — configured via CBM_EMBEDDING_URL) - Stored as float32 BLOBs in a new 'embeddings' table in existing SQLite DB - Cosine similarity computed via registered cbm_cosine_sim() SQLite function - Brute-force search: fast enough for <100K vectors at 384-768 dims - RRF (Reciprocal Rank Fusion, k=60) merges BM25 + vector results New files: - src/pipeline/embedding.h — API for config, text gen, HTTP client, RRF merge - src/pipeline/embedding.c — Full implementation using Mongoose HTTP + yyjson Changes: - src/store/store.c: embeddings table schema, cbm_cosine_sim() function, embedding CRUD + vector_search, batch upsert - src/store/store.h: cbm_vector_result_t, embedding function declarations - src/mcp/mcp.c: generate_embeddings tool, hybrid search in search_graph (vector-only results appear in 'semantic_results' field) - Makefile.cbm: added embedding.c to build Configuration (env vars): CBM_EMBEDDING_URL — Base URL (e.g., http://localhost:11434/v1) CBM_EMBEDDING_MODEL — Model name (default: nomic-embed-text) CBM_EMBEDDING_DIMS — Vector dimensions (default: 768) Usage: 1. Start Ollama: ollama pull nomic-embed-text && ollama serve 2. Set env: export CBM_EMBEDDING_URL=http://localhost:11434/v1 3. Generate: generate_embeddings({project: '...', force: false}) 4. Search: search_graph({query: 'institution name update'}) → BM25 results + semantic_results (vector-only matches) When CBM_EMBEDDING_URL is not set, everything works as before (BM25-only). No new dependencies — uses already-vendored Mongoose (HTTP) and yyjson (JSON).

…-free The semantic_results section used yyjson_mut_obj_add_str (borrows pointer) then called cbm_node_free_fields which freed those strings. The yyjson doc then held dangling pointers, producing garbage in the JSON output. Fix: use yyjson_mut_obj_add_strcpy (copies string) for all node fields in the vector-only result loop.

When CBM_EMBEDDING_URL is configured, the full indexing pipeline now automatically generates semantic embeddings after process and channel detection. Uses force=false so existing embeddings are preserved — only new/missing nodes get embedded. This makes semantic search zero-friction: repos indexed while the embedding server is running get embeddings automatically. No manual generate_embeddings call needed. Pipeline order: FTS5 backfill → processes → channels → embeddings

… duplicates Route extraction produced 3x duplicates for every JS/TS route: - 2 real copies (express_routes + hapi_routes both matching same pattern) - 1 ghost copy (module-level extraction with empty qualified_name) Root causes in pass_parallel.c prescan_routes(): 1. Function-level: cbm_extract_express_routes and cbm_extract_hapi_routes both extract the same Express-style routes from the same function body. 2. Module-level: cbm_extract_express_routes/hapi_routes called with "" as qualified_name, producing ghost nodes with empty file_path and QN starting with a leading dot (.route.GET.path). Fix: deduplicate routes by (method, path) after Phase 1 collection, before prefix resolution. When duplicates exist, prefer the entry with a non-empty qualified_name (function-level entry wins over module-level ghost). Verified: 3:1 dedup ratio. 0 ghosts, 0 duplicates after fix. All routes have correct file_path and qualified_name.

… flow tracing Adds cross-repo capabilities to CBM by building a unified _cross_repo.db that aggregates node stubs, channels, and embeddings from all per-project databases. This bridges the per-project isolation gap without compromising the security model (no ATTACH DATABASE — data is copied, not linked). New files: - src/store/cross_repo.h — API: build, search, channel matching, stats - src/store/cross_repo.c — Implementation (~550 lines) - cbm_cross_repo_build(): scans all project DBs, copies nodes (134K), channels (526), embeddings (134K) into _cross_repo.db. Build time: ~2s. - cbm_cross_repo_search(): BM25 FTS5 + vector search + RRF merge across all repos in a single query. CamelCase token splitting enabled. - cbm_cross_repo_match_channels(): finds emit/listen pairs across repos (12 unique cross-repo channels detected, 127 individual flow matches) - cbm_cross_repo_get_info(): stats about the cross-repo index New MCP tools: - build_cross_repo_index: manually trigger cross-repo index rebuild - trace_cross_repo: trace message channels across repositories, showing which services produce and consume each channel with file+function detail Pipeline integration: - Auto-rebuilds _cross_repo.db after every index_repository (adds ~2s) - Cross-repo DB is always fresh without manual intervention Cross-repo channel matching detects: - Socket.IO channels emitted in one service and listened in another - EventEmitter patterns across repos - File-level and function-level attribution for each endpoint Performance: 54 repos, 134K nodes, 134K embeddings copied in 2.1 seconds. No new dependencies — uses existing SQLite3 and custom functions.

…nalysis Adds three major cross-repo capabilities building on the _cross_repo.db infrastructure from the previous commit: 1. Cross-repo search (search_graph with project='*'): Dispatches to cbm_cross_repo_search() which runs BM25+vector+RRF against the unified cross-repo index (134K nodes, 134K embeddings). Returns results with both short project name and full project_id for follow-up per-repo queries. 2. Enhanced trace_cross_repo with call chains: When a channel filter is provided, traces upstream callers of the emitter and downstream callees of the listener, depth 2 per hop. New cbm_cross_repo_trace_in_project() helper opens project DBs ad-hoc, resolves Class→Method and (file-level) listeners, runs cbm_store_bfs, returns structured step arrays. Without channel filter: returns channel matches only (unchanged). 3. Cross-repo impact analysis (get_impact with cross_repo=true): After per-repo BFS, checks if any d=1 impacted symbols emit channels in cross_channels. For each affected channel, opens consumer project DB, traces downstream from the listener function, returns cross_repo_impacts array with consumer repo, listener function, downstream affected count, and trace steps. New functions in cross_repo.c: - cbm_cross_repo_trace_in_project: open DB, resolve function, BFS, return steps. Handles Class→Method resolution and file-level listener fallback via channels table node_id lookup. - cbm_cross_trace_free: cleanup for trace step arrays. Changes in mcp.c: - handle_search_graph: early-exit to handle_cross_repo_search when project='*' - handle_cross_repo_search: new handler with hybrid search + JSON output - handle_trace_cross_repo: enhanced with call chain tracing per match - handle_get_impact: cross_repo=true triggers channel + consumer trace No new dependencies. Uses existing cbm_store_bfs and cbm_store_open_path_query.

…h + channel dedup Three fixes: 1. Cross-repo get_impact (project='*'): Searches _cross_repo.db for the target symbol across all repos. For each repo containing it, opens the per-project DB, resolves Class→Method, runs BFS, and collects depth-grouped impact results. Returns per-repo risk assessment with combined risk level. 2. Cross-repo trace_call_path (project='*'): Same pattern — finds the function in all repos, traces both inbound (callers) and outbound (callees) at depth 2 per repo. Uses cbm_cross_repo_trace_in_project() for BFS with Class→Method and file-level resolution. 3. Channel dedup safety net: After channel detection Pass 2 (file-level constants), deletes file-level ghost entries when a named function entry already exists for the same (channel_name, file_path, project, direction). Applied in both per-project detect_channels (store.c) and cross-repo build (cross_repo.c). Currently deletes 0 rows (no actual ghosts in current data), but prevents future duplicates if both extractors detect the same pattern. All three tools now support project='*' for cross-repo queries: - search_graph(project='*') — cross-repo search - get_impact(project='*') — cross-repo blast radius - trace_call_path(project='*') — cross-repo call chain trace

Koolerx · 2026-03-30T20:25:04Z

@DeusData, many improvements have been made to the project. I know there are over 40 commits, but please go ahead and give it a test run.

DeusData · 2026-03-30T20:27:48Z

Hey @Koolerx, thx! Will check. Currently mainly involved in clearing technical debt. Will come back to this asap. Likely will analyze your changes and will extract what makes sense in a seperate commit where I will list you as co author so that you will be listed as contributor. Hope thats fine for u :)

Koolerx · 2026-03-30T20:29:25Z

Hey @Koolerx, thx! Will check. Currently mainly involved in clearing technical debt. Will come back to this asap. Likely will analyze your changes and will extract what makes sense in a seperate commit where I will list you as co author so that you will be listed as contributor. Hope thats fine for u :)

all sounds well to me

DeusData · 2026-03-30T20:30:26Z

But in general: Why the embeddings? Can you give a bit more reasoning for this? @Koolerx? What does this add to a coding agent? I already have investigated using embeddings but was not so convinced. Coding agents can efficiently query themselves already GraphDBs

Upstream changes: dependabot CI action version bumps (checkout, cache, upload-artifact, attest-build-provenance, codeql-action). Conflict resolutions: - mcp.c (3 regions): Kept our fuzzy fallback + added upstream's new 'mode' and 'param_name' free() calls. Kept our matched_file/candidates/ zero-edge check + integrated upstream's mode-based edge_types parsing. Kept our strcpy comment. - pass_parallel.c (1 region): Removed dead prescan functions (extract_lines, prescan_http_sites, prescan_routes, prescan_config_refs, prescan_add_route, is_route_source_lang) — these were added by our branch but upstream removed the calling code and type definitions. They're now unused dead code.

Koolerx · 2026-03-30T20:50:02Z

But in general: Why the embeddings? Can you give a bit more reasoning for this? @Koolerx? What does this add to a coding agent? I already have investigated using embeddings but was not so convinced. Coding agents can efficiently query themselves already GraphDBs

the graph handles most queries really well. Like, 85% of the time BM25 plus the knowledge graph is all you need. The embeddings close a specific gap in the remaining cases, and it's worth understanding exactly what that gap is before dismissing it.

The gap is vocabulary mismatch. When the user's search terms don't appear anywhere in symbol names, qualified names, or file paths, BM25 returns nothing useful. The graph can only find things when tokens overlap that's just how keyword search works.

We ran this head to head. Query: "start video recording" against a media services repo. BM25 found startSession, startRecordingSession, startCaptureNodeSession: good results, keyword matches on "start" and "recording" and "session".

But the vector layer surfaced updateVideoConductorSession, videoTrimmingFailureNotification, videoRemuxNotification, RecordingStorage, conceptually related symbols with zero keyword overlap. The BM25 results are the functions you'd change. The vector results are the functions that would break if you changed them. That's the blast radius that keyword search can't surface on its own.

Now, for a coding agent specifically, the agent doesn't know what it doesn't know. If search_graph returns 29 results, the agent assumes that's the full picture and moves on. The vector layer surfaces what the agent would have missed entirely. We tested "error handling" BM25 found 29 functions with "error" in the name. But the functions that actually handle errors constructors in error classes, catch blocks in controllers don't have "error" or "handling" in their names. Vector search found 20 additional symbols the agent never would have seen.

Where it doesn't matter and I want to be honest about this:

if you know the exact symbol name, BM25 is sufficient. If you're tracing callers and callees, graph BFS is the right tool, period. If the codebase uses consistent naming conventions, BM25 plus the camelCase token splitting we added covers it. Embeddings don't help with any of those cases.

The implementation cost is pretty minimal. About 200 lines of C, a cosine similarity function plus an HTTP client to any OpenAI-compatible endpoint. Zero new dependencies. we used Mongoose that was already vendored for HTTP and yyjson for JSON. It's fully opt-in via a CBM_EMBEDDING_URL env var. When not set, everything works exactly as before pure BM25. Brute force cosine scan at 134K vectors takes under 10ms, so no ANN index needed at this scale.

The honest limitation is that quality depends on the embedding model. A bad model gives bad results. And it requires a running embedding server : Ollama, llamafile, whatever, so it's not self-contained like the graph. That's why we made it opt-in rather than default. It's a power-user feature for people who need discovery across vocabulary boundaries, not a replacement for the graph.

if CBM is used purely for graph traversal, trace callers, impact analysis, process flows — embeddings add nothing. If it's used for discovery, "find code related to X" where X is described in natural language, that's where embeddings close the gap that BM25 structurally cannot.

….Property WRITES Three C# code intelligence fixes: 1. Register Interface nodes in the symbol registry. The registry filter only accepted Function/Method/Class labels. Interface nodes were never registered, so resolve_as_class('IExamService') always returned NULL. INHERITS edges from C# classes to their interfaces were never created (0 → Interface out of 1588 total INHERITS in the largest C# repo). Fix: add 'Interface' to the registration filter in both pass_definitions.c (sequential path) and pass_parallel.c (parallel path, cbm_build_registry_from_cache). Additionally, in pass_semantic.c, when the resolved base type is a Class but the name follows C# interface convention (I + uppercase), search for an Interface node with the same name and prefer it. This handles cross- language name collisions (e.g. C++ ILogger class vs C# ILogger interface). Result: CometExamService → IExamService INHERITS edge now created. 2. BFS interface method bridging in trace_call_path and get_impact. When tracing callers of a concrete method (e.g. PerformSessionAction), BFS now also checks if the method's enclosing class INHERITS from an Interface. If so, it finds the interface's method with the same name and runs additional BFS from that node. This surfaces callers that use the interface type (ISimCoordinator.PerformSessionAction) which previously showed 0 callers. Applied to both handle_trace_call_path (incoming.calls section) and handle_get_impact (multi-method BFS merge). 3. Expand WRITES detection for this.Property = value. handle_readwrites() only accepted bare identifiers as assignment LHS. C#/Java/TypeScript this.PropertyName = value has LHS as member_access_expression, which was skipped. Fix: add branch for member_access_expression / member_expression LHS. When the receiver is this/base/self, extract the property name and push as a write. Handles C# this_expression, Java this, TypeScript this, Python self. Expected: hundreds of additional WRITES edges from method bodies to class properties in C# repos. Requires full reindex of C# repos to rebuild INHERITS edges and WRITES.

The incremental pipeline (pipeline_incremental.c) deletes the old DB and dumps the merged graph via cbm_gbuf_dump_to_sqlite, which bypasses SQLite triggers. The FTS5 table was left empty (0 rows) after every incremental reindex, making search_graph return 0 results for all queries. Fix: after persisting file hashes (the last step before returning), rebuild the FTS5 index with DELETE + INSERT from nodes using cbm_camel_split for camelCase token splitting. Before: full reindex → 672 FTS rows, incremental → 0 FTS rows After: full reindex → 672 FTS rows, incremental → 672 FTS rows

Your Name added 7 commits March 27, 2026 12:57

Koolerx force-pushed the fix/csharp-and-trace-improvements branch from 33a7d1d to 58fff9e Compare March 27, 2026 18:03

Koolerx changed the title ~~fix: C# support improvements and MCP handler bug fixes~~ fix+feat: C# support, MCP bug fixes, Hapi routes, Louvain clustering Mar 27, 2026

Your Name added 10 commits March 27, 2026 14:31

Koolerx changed the title ~~fix+feat: C# support, MCP bug fixes, Hapi routes, Louvain clustering~~ feat+fix: C# support, investigation-grade trace output, BM25 search, execution flows, channel detection Mar 27, 2026

DeusData added enhancement New feature or request parsing/quality Graph extraction bugs, false positives, missing edges language-request Request for new language support labels Mar 29, 2026

Your Name added 7 commits March 29, 2026 12:49

Your Name added 5 commits March 29, 2026 14:50

Koolerx changed the title ~~feat+fix: C# support, investigation-grade trace output, BM25 search, execution flows, channel detection~~ feat+fix: C# support, investigation-grade trace, BM25 search, execution flows, channels, IMPORTS resolution Mar 29, 2026

Your Name added 9 commits March 29, 2026 22:20

Koolerx force-pushed the fix/csharp-and-trace-improvements branch from f7f94be to 70bc64f Compare March 30, 2026 17:42

Your Name added 2 commits March 30, 2026 14:15

Koolerx changed the title ~~feat+fix: C# support, investigation-grade trace, BM25 search, execution flows, channels, IMPORTS resolution~~ feat+fix: C# support, investigation-grade trace, semantic search, execution flows, channels, cross-repo intelligence Mar 30, 2026

Your Name added 2 commits March 31, 2026 09:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat+fix: C# support, investigation-grade trace, semantic search, execution flows, channels, cross-repo intelligence#162

feat+fix: C# support, investigation-grade trace, semantic search, execution flows, channels, cross-repo intelligence#162
Koolerx wants to merge 44 commits intoDeusData:mainfrom
Koolerx:fix/csharp-and-trace-improvements

Koolerx commented Mar 27, 2026 •

edited

Loading

Uh oh!

Koolerx commented Mar 30, 2026

Uh oh!

DeusData commented Mar 30, 2026

Uh oh!

Koolerx commented Mar 30, 2026

Uh oh!

DeusData commented Mar 30, 2026 •

edited

Loading

Uh oh!

Koolerx commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Koolerx commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Phase 1: Core Fixes and Features (Commits 1-29)

Bug Fixes (Commits 1-6)

New Features (Commits 7-17)

Gap Closure (Commits 18-29)

Phase 2: Search Quality (Commits 30-33)

30. Process participation in search_graph results

31. JS/TS constant resolution for Socket.IO channel detection

32. Expose BM25 query and sort_by params in search_graph schema

33. Pure BM25 relevance ranking + camelCase token splitting

Phase 3: Process and Route Quality (Commit 34, 38)

34. Deduplicate entry points + [module] prefix on process labels

38. Route node deduplication — eliminate ghost nodes

Phase 4: Semantic Vector Search (Commits 35-37)

35. Hybrid BM25+vector semantic search via external embedding API

36. Fix use-after-free in semantic result strings

37. Auto-generate embeddings during full indexing

Phase 5: Cross-Repo Intelligence (Commits 39-40)

39. Unified cross-repository index

40. Cross-repo search, flow tracing with call chains, and impact analysis

Testing

New Files

Configuration (new env vars)

Uh oh!

Koolerx commented Mar 30, 2026

Uh oh!

DeusData commented Mar 30, 2026

Uh oh!

Koolerx commented Mar 30, 2026

Uh oh!

DeusData commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Koolerx commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Koolerx commented Mar 27, 2026 •

edited

Loading

30. Process participation in `search_graph` results

32. Expose BM25 `query` and `sort_by` params in `search_graph` schema

34. Deduplicate entry points + `[module]` prefix on process labels

DeusData commented Mar 30, 2026 •

edited

Loading