profile

Document Parsing API & SDKs

GroupDocs.Parser is a document parsing and data extraction API. Extract text, metadata, barcodes, structured fields, images, tables, and document entities from PDFs, Office files, emails, eBooks, and archives—built for search indexing, compliance, data capture, and content ingestion workflows.

📰 Latest Parser News & Updates

See the latest release notes on NuGet and Maven Central for parser engine improvements, faster template-based extraction, and better table detection.
Updated sample apps show invoice data extraction, email parsing, and PDF text extraction scenarios.
New how-tos on templated parsing and container file processing in the documentation.

📂 Supported Platforms & Repository Groups

🌐 .NET Document Parsing (C#, ASP.NET, WinForms)

High-performance APIs for document parsing on .NET Framework and .NET Core.

GroupDocs.Parser-for-.NET: Core C# API for text, metadata, tables, and template-based extraction.
Samples & Demos: Explore runnable examples in the repository to parse PDFs, DOCX, XLSX, PPTX, MSG/EML, EPUB, ZIP, and more.

// Quick .NET Parsing Example
using (var parser = new GroupDocs.Parser.Parser("invoice.pdf"))
{
    // Extract plain text from the document
    using (var reader = parser.GetText())
    {
        Console.WriteLine(reader.ReadToEnd());
    }
}

☕ Java Document Parsing (Maven, Spring)

Native Java library for text, metadata, and structured data extraction.

GroupDocs.Parser-for-Java: Java API for PDF/Office/email parsing, table detection, and template-driven extraction.

// Quick Java Parsing Example
try (com.groupdocs.parser.Parser parser = new com.groupdocs.parser.Parser("contract.docx")) {
    java.io.Reader reader = parser.getText();
    if (reader != null) {
        char[] buffer = new char[2048];
        int read;
        while ((read = reader.read(buffer)) != -1) {
            System.out.print(new String(buffer, 0, read));
        }
    }
}

🐍 Python Document Parsing (Python via .NET)

Cross-platform Python bindings for text, metadata, and structured data extraction.

GroupDocs.Parser-for-Python-via-.NET: Python API for PDF/Office/email parsing, table detection, template-based field extraction, and attachments.

# Quick Python Parsing Example
from groupdocs.parser import Parser

with Parser("sample.pdf") as parser:
    text = parser.GetText()
    print(text)

🧠 Business Use-Cases

Invoice & receipt data extraction: pull totals, dates, vendors, and line items via templates.
Email & attachment parsing: extract headers, bodies, attachments, and metadata from MSG/EML.
Contract analysis: capture clauses, signatures, and key fields from DOCX/PDF.
PDF table extraction: pull line items and financial tables from PDFs (see table extraction sample).
Content migration: normalize mixed file types into structured outputs.

✅ API Key Features & Benefits

High-fidelity text extraction for PDF, DOC/DOCX, XLS/XLSX, PPT/PPTX, HTML, RTF, TXT, EPUB.
Template-based extraction to capture labeled fields, tables, and repeating blocks reliably.
Table recognition with cell-by-cell extraction for spreadsheets and tabular PDFs.
Metadata parsing (built-in and custom) for compliance and governance.
Container support for ZIP, OST/PST, MSG/EML, and attachments within archived files.
Image & embedded object extraction for logos, signatures, and inline graphics.
Page-level & area-limited parsing to target specific regions for faster processing.
Performance & scaling tuned for server-side, multi-document workloads.

🆘 Technical Support & Resources

Documentation: Comprehensive Guides and Tutorials.
Support: Expert help at the GroupDocs Free Support Forum.
Evaluation: Get a Temporary License for full feature testing.
Live Demo: Try parsing online at GroupDocs.Parser apps.

🏷️ Tags

groupdocs-parser document-parser pdf-parser text-extraction data-extraction metadata-parser email-parser invoice-parsing table-extraction template-based-parsing content-ingestion document-ai search-indexing enterprise-parsing

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Document Parsing API & SDKs

📰 Latest Parser News & Updates

📂 Supported Platforms & Repository Groups

🌐 .NET Document Parsing (C#, ASP.NET, WinForms)

☕ Java Document Parsing (Maven, Spring)

🐍 Python Document Parsing (Python via .NET)

🧠 Business Use-Cases

✅ API Key Features & Benefits

🆘 Technical Support & Resources

🏷️ Tags

FilesExpand file tree

profile

Directory actions

More options

Directory actions

More options

Latest commit

History

profile

Folders and files

parent directory

README.md

Document Parsing API & SDKs

📰 Latest Parser News & Updates

📂 Supported Platforms & Repository Groups

🌐 .NET Document Parsing (C#, ASP.NET, WinForms)

☕ Java Document Parsing (Maven, Spring)

🐍 Python Document Parsing (Python via .NET)

🧠 Business Use-Cases

✅ API Key Features & Benefits

🆘 Technical Support & Resources

🏷️ Tags