markitdown
未分类microsoft
Python tool for converting files and office documents to Markdown.
Project Description
MarkItDown is a lightweight Python utility designed to convert various file formats into Markdown, optimized for use with Large Language Models (LLMs) and text analysis pipelines. It supports a wide range of formats, including PDF, PowerPoint, Word, Excel, images, audio, HTML, and more, preserving document structure like headings, lists, and tables. The tool is ideal for machine consumption rather than high-fidelity human-readable output. It offers a command-line interface, Python API, and Docker support, with optional dependencies for specific file types. MarkItDown also integrates with Azure Document Intelligence and supports third-party plugins for extended functionality. Installation is straightforward via pip, and contributions are encouraged through issues, PRs, and plugin development.