Based on Foxit Quick PDF Library,python interface
-
Updated
Apr 4, 2020 - Python
Based on Foxit Quick PDF Library,python interface
A simple demonstration of how you can implement retrieval augmented generation (RAG) for a book.
PDF 문서에서 GPU 가속 처리로 고품질 질의응답(QA) 데이터를 자동 생성하고 LLM을 효율적으로 파인튜닝하는 솔루션입니다. Unstructured 라이브러리와 AWS Bedrock Claude로 도메인 특화 QA 쌍을 생성하고, LoRA 기법으로 경량 모델을 훈련합니다.
Converts scanned documents and ordinary documents into speech mp3 using Amazon Polly
A Telegram bot which extract Text from PDF, also extract the Images of PDF Pages. Made with Python
Document Intelligence Platform — Extract, refine, and query documents with vision LLMs and config-driven RAG.
NLP Pdf Minning Extracting text from pdf
A resume parser that extracts key details from PDF files using Groq's LLM
CLI for merging PDF contexts.
Highlights the key matches between your Given PDF and the description text
A PDF text extractor, processor and formatter. Supports regex based exclusions and other niceties.
PDF Text Finder Console App along with page number
Tests of OCR and RAG with LLMs
UnchainedText: Break free from PDFs! Easily extract raw text to .txt for preprocessing.
A local, Python-based GUI toolbox for common PDF operations such as merge, split, scan, OCR, and document preprocessing. Fully offline, extensible, and open source.
An AI-powered invoice and receipt analyzer that extracts structured invoice data from images (JPG/PNG) and PDF documents using a Large Language Model (LLM).
This repository implements an end-to-end NLP pipeline for legal documents, including OCR-based text extraction, neural language modeling from scratch (NumPy), sentence and document embeddings, extractive and abstractive summarization, grammar refinement, and semantic case similarity retrieval using cosine similarity.
Serverless OCR & PDF Text Extraction microservice for Personal AI Factory v1. Built with TypeScript and Vercel Serverless Functions, using pdf-parse, and node-fetch for high-performance parsing of machine-readable PDFs. Supports extracting clean text from textual PDFs and exposes a clean HTTP API returning structured JSON output for downstream n8n.
Multiple File Format (PDF/DOC/DOCX/XLSX/XLS/CSV) Text Extraction Utility Project in Java Programming Language
GPU-accelerated batch PDF text extraction wrapper for marker-pdf on NVIDIA GraceBlackwell.
Add a description, image, and links to the pdf-text-extraction topic page so that developers can more easily learn about it.
To associate your repository with the pdf-text-extraction topic, visit your repo's landing page and select "manage topics."