zerox

未分类

getomni-ai

GitHub Website

OCR & Document Extraction using vision models

12.1k

Stars

829

Forks

Issues

Contributors

Watchers

ocrpdf

TypeScript

{"name":"MIT License","spdxId":"MIT"}

Project Description

Zerox OCR is a versatile tool designed to convert documents (PDF, DOCX, images, etc.) into Markdown format for AI ingestion. It processes files by converting them into images, then uses vision models (e.g., OpenAI, Azure OpenAI, AWS Bedrock, Google Gemini) to extract and aggregate Markdown content. Key features include support for multiple file types, structured data extraction, per-page processing, and customizable system prompts. Available as both Node.js and Python packages, Zerox offers options for maintaining format, concurrent processing, and error handling. It’s ideal for handling complex layouts, tables, and charts, making it a powerful solution for document-to-text conversion in AI workflows.