feat(knowledge): connectors, user exclusions, expanded tools & airtable integration#3230
feat(knowledge): connectors, user exclusions, expanded tools & airtable integration#3230waleedlatif1 wants to merge 10 commits intostagingfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
Greptile SummaryThis PR introduces a comprehensive knowledge base connector system — a sync engine that periodically pulls documents from external sources (Confluence, GitHub, Google Drive, Jira, Linear, Notion, Airtable) into knowledge bases via SHA-256 change detection. It also adds six new knowledge tools ( Several previously reported issues have been resolved in this iteration:
Key issues that remain:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Cron as Cron Scheduler
participant SyncRoute as /api/knowledge/connectors/sync
participant Dispatch as dispatchSync()
participant Trigger as Trigger.dev
participant Engine as executeSync()
participant Connector as ConnectorRegistry
participant Source as External Source
participant DB as Database
participant Storage as StorageService
Cron->>SyncRoute: GET (every 5 min)
SyncRoute->>DB: Query active connectors WHERE nextSyncAt <= now
SyncRoute->>Dispatch: dispatchSync(connectorId) per due connector
alt Trigger.dev available
Dispatch->>Trigger: knowledgeConnectorSync.trigger(payload)
Trigger->>Engine: executeSync(connectorId)
else Fallback
Dispatch->>Engine: executeSync() fire-and-forget
end
Engine->>DB: Lock connector (status='syncing')
Engine->>DB: Insert sync log (status='started')
Engine->>DB: Fetch OAuth token via refreshAccessTokenIfNeeded
Engine->>Connector: listDocuments(token, sourceConfig, cursor)
Connector->>Source: Paginated API calls (up to 500 pages)
Source-->>Connector: ExternalDocument[]
Connector-->>Engine: ExternalDocumentList
Engine->>DB: Load existing docs (connectorId, deletedAt IS NULL)
Engine->>DB: Load excluded docs (userExcluded=true, deletedAt IS NULL)
loop For each external document
alt New document
Engine->>Storage: uploadFile(content as .txt)
Engine->>DB: INSERT document record
Engine->>Engine: processDocumentAsync() fire-and-forget
else Content hash changed
Engine->>Storage: uploadFile(updated content)
Engine->>DB: UPDATE document record in-place
Engine->>Engine: processDocumentAsync() fire-and-forget
else Unchanged
Engine->>Engine: docsUnchanged++
end
end
alt Full sync mode
Engine->>DB: Soft-delete stale docs (deletedAt=now)
end
Engine->>DB: UPDATE sync log (status='completed', counts)
Engine->>DB: UPDATE connector (status='active', nextSyncAt)
Last reviewed commit: 4453edb |
826c8b4 to
3eaf5ba
Compare
|
@greptile |
|
@cursor review |
|
bugbot review |
|
@cursor review |
|
bugbot review |
|
@greptile review |
...orkspace/[workspaceId]/knowledge/[id]/components/add-connector-modal/add-connector-modal.tsx
Show resolved
Hide resolved
.../workspace/[workspaceId]/knowledge/[id]/components/connectors-section/connectors-section.tsx
Outdated
Show resolved
Hide resolved
apps/sim/app/api/knowledge/[id]/connectors/[connectorId]/documents/route.ts
Show resolved
Hide resolved
apps/sim/app/api/knowledge/[id]/connectors/[connectorId]/documents/route.ts
Outdated
Show resolved
Hide resolved
apps/sim/app/api/knowledge/[id]/connectors/[connectorId]/documents/route.ts
Show resolved
Hide resolved
… avoid redundant schema fetches
|
@cursor review |
|
@greptile |
apps/sim/app/workspace/[workspaceId]/knowledge/[id]/[documentId]/document.tsx
Show resolved
Hide resolved
|
@greptile |
|
@cursor review |
|
@cursor review |
|
@greptile |
apps/sim/app/api/knowledge/[id]/connectors/[connectorId]/documents/route.ts
Show resolved
Hide resolved
|
@cursor review |
|
@greptile |
apps/sim/app/api/knowledge/[id]/connectors/[connectorId]/documents/route.ts
Show resolved
Hide resolved
|
@cursor review |
|
@greptile |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
...orkspace/[workspaceId]/knowledge/[id]/components/add-connector-modal/add-connector-modal.tsx
Show resolved
Hide resolved
| transformResponse: async (response, params): Promise<KnowledgeDeleteDocumentResponse> => { | ||
| const result = await response.json() | ||
|
|
||
| return { | ||
| success: true, | ||
| output: { | ||
| documentId: params?.documentId ?? '', | ||
| message: result.data?.message ?? 'Document deleted successfully', | ||
| }, |
There was a problem hiding this comment.
transformResponse swallows API errors silently
response.json() is called unconditionally and success: true is always returned regardless of whether the API responded with a 4xx or 5xx. A caller trying to delete a document that doesn't exist (404) or is unauthorized (403) will receive success: true with no indication of the failure. This is especially problematic for destructive operations.
| transformResponse: async (response, params): Promise<KnowledgeDeleteDocumentResponse> => { | |
| const result = await response.json() | |
| return { | |
| success: true, | |
| output: { | |
| documentId: params?.documentId ?? '', | |
| message: result.data?.message ?? 'Document deleted successfully', | |
| }, | |
| transformResponse: async (response, params): Promise<KnowledgeDeleteDocumentResponse> => { | |
| if (!response.ok) { | |
| const error = await response.text() | |
| throw new Error(`Failed to delete document: ${response.status} ${error}`) | |
| } | |
| const result = await response.json() | |
| return { | |
| success: true, | |
| output: { | |
| documentId: params?.documentId ?? '', | |
| message: result.data?.message ?? 'Document deleted successfully', | |
| }, | |
| } | |
| }, |
| transformResponse: async (response, params): Promise<KnowledgeDeleteChunkResponse> => { | ||
| const result = await response.json() | ||
|
|
||
| return { | ||
| success: true, | ||
| output: { | ||
| chunkId: params?.chunkId ?? '', | ||
| documentId: params?.documentId ?? '', | ||
| message: result.data?.message ?? 'Chunk deleted successfully', | ||
| }, | ||
| } | ||
| }, |
There was a problem hiding this comment.
transformResponse swallows API errors silently
Same issue as delete_document.ts — success: true is always returned regardless of HTTP status. A failed chunk delete (404, 403, 500) is reported as success.
| transformResponse: async (response, params): Promise<KnowledgeDeleteChunkResponse> => { | |
| const result = await response.json() | |
| return { | |
| success: true, | |
| output: { | |
| chunkId: params?.chunkId ?? '', | |
| documentId: params?.documentId ?? '', | |
| message: result.data?.message ?? 'Chunk deleted successfully', | |
| }, | |
| } | |
| }, | |
| transformResponse: async (response, params): Promise<KnowledgeDeleteChunkResponse> => { | |
| if (!response.ok) { | |
| const error = await response.text() | |
| throw new Error(`Failed to delete chunk: ${response.status} ${error}`) | |
| } | |
| const result = await response.json() | |
| return { | |
| success: true, | |
| output: { | |
| chunkId: params?.chunkId ?? '', | |
| documentId: params?.documentId ?? '', | |
| message: result.data?.message ?? 'Chunk deleted successfully', | |
| }, | |
| } | |
| }, |
| transformResponse: async (response, params): Promise<KnowledgeUpdateChunkResponse> => { | ||
| const result = await response.json() | ||
| const chunk = result.data || {} | ||
|
|
||
| return { | ||
| success: true, | ||
| output: { | ||
| documentId: params?.documentId ?? '', | ||
| id: chunk.id ?? '', | ||
| chunkIndex: chunk.chunkIndex ?? 0, | ||
| content: chunk.content ?? '', | ||
| contentLength: chunk.contentLength ?? 0, | ||
| tokenCount: chunk.tokenCount ?? 0, | ||
| enabled: chunk.enabled ?? true, | ||
| updatedAt: chunk.updatedAt ?? null, | ||
| }, | ||
| } | ||
| }, |
There was a problem hiding this comment.
transformResponse swallows API errors silently
success: true is always returned regardless of HTTP status. If the chunk doesn't exist (404) or the request is malformed (422), the tool reports success and returns empty/default values for all fields. The caller has no way to detect the failure.
| transformResponse: async (response, params): Promise<KnowledgeUpdateChunkResponse> => { | |
| const result = await response.json() | |
| const chunk = result.data || {} | |
| return { | |
| success: true, | |
| output: { | |
| documentId: params?.documentId ?? '', | |
| id: chunk.id ?? '', | |
| chunkIndex: chunk.chunkIndex ?? 0, | |
| content: chunk.content ?? '', | |
| contentLength: chunk.contentLength ?? 0, | |
| tokenCount: chunk.tokenCount ?? 0, | |
| enabled: chunk.enabled ?? true, | |
| updatedAt: chunk.updatedAt ?? null, | |
| }, | |
| } | |
| }, | |
| transformResponse: async (response, params): Promise<KnowledgeUpdateChunkResponse> => { | |
| if (!response.ok) { | |
| const error = await response.text() | |
| throw new Error(`Failed to update chunk: ${response.status} ${error}`) | |
| } | |
| const result = await response.json() | |
| const chunk = result.data || {} | |
| return { | |
| success: true, | |
| output: { | |
| documentId: params?.documentId ?? '', | |
| id: chunk.id ?? '', | |
| chunkIndex: chunk.chunkIndex ?? 0, | |
| content: chunk.content ?? '', | |
| contentLength: chunk.contentLength ?? 0, | |
| tokenCount: chunk.tokenCount ?? 0, | |
| enabled: chunk.enabled ?? true, | |
| updatedAt: chunk.updatedAt ?? null, | |
| }, | |
| } | |
| }, |
| if (params.limit) queryParams.set('limit', String(params.limit)) | ||
| if (params.offset) queryParams.set('offset', String(params.offset)) |
There was a problem hiding this comment.
offset: 0 is treated as falsy and silently dropped
if (params.limit) and if (params.offset) use truthiness checks, so a value of 0 for offset is never appended to the query string. While the server likely defaults to 0 anyway, this makes it impossible to explicitly request the first page when the LLM passes offset: 0, and creates a behavioral inconsistency.
| if (params.limit) queryParams.set('limit', String(params.limit)) | |
| if (params.offset) queryParams.set('offset', String(params.offset)) | |
| if (params.limit != null) queryParams.set('limit', String(params.limit)) | |
| if (params.offset != null) queryParams.set('offset', String(params.offset)) |
| if (folderId?.trim()) { | ||
| parts.push(`'${folderId.trim()}' in parents`) | ||
| } |
There was a problem hiding this comment.
folderId injected into Drive query without escaping single quotes
The Drive API q parameter uses single-quoted string literals. A folderId value containing a single quote (e.g. abc' OR '1'='1) would break out of the literal and potentially corrupt the query. Drive folder IDs are typically alphanumeric so exploitation is unlikely in practice, but since this is user-controlled input, defensive escaping is recommended:
| if (folderId?.trim()) { | |
| parts.push(`'${folderId.trim()}' in parents`) | |
| } | |
| if (folderId?.trim()) { | |
| const safeFolderId = folderId.trim().replace(/\\/g, '\\\\').replace(/'/g, "\\'") | |
| parts.push(`'${safeFolderId}' in parents`) | |
| } |
Summary
userExcludedfor connector docs via SQL CASElist_documents,list_chunks,delete_document,delete_chunk,update_chunk,list_tagslist_basesandget_base_schematoolshooks/queries/kb/andhooks/queries/oauth/userExcludedcolumn, document fieldsType of Change
Testing
Tested manually
Checklist