microsoft

markitdown

未分类

microsoft

Python tool for converting files and office documents to Markdown.

55.1k
Stars
2.8k
Forks
185
Issues
10.0k
Contributors
193
Watchers
langchainopenaiautogen-extensionautogenmarkdownmicrosoft-officepdf
Python
{"name":"MIT License","spdxId":"MIT"}

Project Description

MarkItDown is a lightweight Python utility designed to convert various file formats into Markdown, optimized for use with Large Language Models (LLMs) and text analysis pipelines. It supports a wide range of formats, including PDF, PowerPoint, Word, Excel, images, audio, HTML, and more, preserving document structure like headings, lists, and tables. The tool is ideal for machine consumption rather than high-fidelity human-readable output. It offers a command-line interface, Python API, and Docker support, with optional dependencies for specific file types. MarkItDown also integrates with Azure Document Intelligence and supports third-party plugins for extended functionality. Installation is straightforward via pip, and contributions are encouraged through issues, PRs, and plugin development.

© 2025 GitHub Fun. All rights reserved.