Skip to content

fix(kb): parse HTML with unstructured and fail fast on unsupported types#174

Open
sqhyz55 wants to merge 1 commit intoxorbitsai:mainfrom
sqhyz55:fix/kb-html-parsing-errors
Open

fix(kb): parse HTML with unstructured and fail fast on unsupported types#174
sqhyz55 wants to merge 1 commit intoxorbitsai:mainfrom
sqhyz55:fix/kb-html-parsing-errors

Conversation

@sqhyz55
Copy link
Copy Markdown
Collaborator

@sqhyz55 sqhyz55 commented Mar 17, 2026

Add HTML/HTM support to the unstructured parser path and tighten default parser routing to avoid selecting PDF-only parsers for incompatible extensions. Return a clear 422 error when ingestion receives an allowed upload type that has no available parser.

Fixes #161

Add HTML/HTM support to the unstructured parser path and tighten default parser routing to avoid selecting PDF-only parsers for incompatible extensions. Return a clear 422 error when ingestion receives an allowed upload type that has no available parser.

Fixes xorbitsai#161
@XprobeBot XprobeBot added the bug Something isn't working label Mar 17, 2026
@sqhyz55 sqhyz55 requested a review from rogercloud March 17, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(kb): HTML file parsing fails - PyPdfParser only supports PDF.

2 participants