Skip to content

andyfe76/Page-Layout-LLM-context

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Page layout extraction for LLM context

Converts PDF, Excel and HTML files to text preserving layout.

When using RAG with LLMs, you do not have access to layout position of text extracted from pages. Using this approach, the LLM can be instructed to look for a specific information using position instructions - e.g. "extract puchase order number from top right, after text 'Order #:'"

res = convert_pdf("sample_table.pdf")

res = convert_xls("sample.xls")

About

Convert PDF/Excel/HTML to text maintaining layout

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages