DeepSeek OCR Transforms Documents intoStructured Data with AI

Experience the power of DeepSeek OCR - an open-source AI model that converts complex documents, PDFs, and images into clean Markdown. Try the official demo below, and stay tuned for our upcoming API service with enhanced features.

Loading Demo...

DeepSeek OCR is powered by cutting-edge AI technology

Hugging Face

PyTorch

vLLM

DeepSeek LLM

SAM ViT-B

CLIP-L

DeepSeek OCR Performance Metrics

Built for Speed and Accuracy

DeepSeek OCR is engineered for production-grade document processing with adaptive resolution modes, high-performance vLLM inference, and efficient token usage. Whether you're processing simple receipts or complex academic papers, DeepSeek OCR scales to meet your needs.

Resolution Modes

From Tiny (512px) to Gundam (dynamic)

2500+

Tokens per Second

High-performance inference on A100-40G

400

Max Vision Tokens

Support for high-resolution documents

DeepSeek OCR Features

Everything You Need

Explore how DeepSeek OCR delivers end-to-end document intelligence, from precise recognition to structure-aware conversion.

Transform any document into clean, structured Markdown while preserving headings, tables, lists, and semantic layout. DeepSeek OCR understands document structure, not just text - perfect for content migration, documentation workflows, and knowledge base creation. The Markdown output is ready for version control, static site generators, or content management systems.

DeepSeek OCR document to Markdown output preview

Why Choose DeepSeek OCR?

Powerful Benefits

DeepSeek OCR delivers unique advantages that set it apart from traditional OCR solutions and commercial alternatives.

Open Source Freedom

DeepSeek OCR is completely open source and free to use. Deploy on your own infrastructure without licensing fees, API limits, or vendor lock-in. The model is available on GitHub and Hugging Face for self-hosting, customization, and commercial use.

Context-Aware Intelligence

Unlike traditional pattern-matching OCR engines, DeepSeek OCR uses vision-language models to understand document context. This enables error correction using surrounding text, semantic understanding of document structure, and intelligent format conversion that preserves meaning, not just characters.

Production-Ready Performance

Built for real-world workloads, DeepSeek OCR supports high-throughput batch processing, streaming outputs, and efficient memory usage. Process hundreds of pages per minute on modern GPUs with vLLM optimization, or use smaller resolution modes for cost-effective cloud deployment.

Flexible Resolution Modes

Choose the optimal balance between accuracy and efficiency for your specific documents. From lightweight Tiny mode (64 tokens) for simple text to powerful Gundam mode with adaptive multi-crop for complex academic papers, DeepSeek OCR adapts to your needs without sacrificing quality.

Comprehensive Document Support

Handle diverse document types including PDFs, scanned images, photographs, screenshots, and handwritten notes. DeepSeek OCR processes multilingual content, mathematical formulas, tables, charts, and complex layouts with consistent accuracy across different input formats.

Easy Integration

Whether you prefer Python API, command-line tools, or REST API integration, DeepSeek OCR provides multiple deployment options. Use the Transformers library for simple scripts, vLLM for production workloads, or our upcoming API service for cloud-based processing without infrastructure management.

DeepSeek OCR Use Cases

Built for Every Scenario

From academic research to business automation, DeepSeek OCR handles diverse document processing challenges with consistent accuracy and efficiency.

Academic Papers

Extract complete text, mathematical formulas, citations, and figure captions from academic papers and research documents. DeepSeek OCR recognizes LaTeX math notation, chemical formulas, and complex equations, making it ideal for literature review, knowledge management, and digital library creation. Process thesis documents, journal articles, and conference papers while maintaining academic formatting and structure.

Business Documents

Digitize invoices, contracts, reports, and business correspondence with structure-aware OCR that understands tables, headers, and hierarchical layouts. DeepSeek OCR automates data entry, enables searchable document archives, and accelerates business process automation. Perfect for accounts payable processing, contract management, and compliance documentation.

Scanned Images

Convert old scanned documents, handwritten notes, and low-quality images into clean, editable text. DeepSeek OCR's vision-language model handles image noise, skewed scans, and varying quality levels to produce searchable text datasets. Ideal for archival digitization, historical document preservation, and legacy data migration projects.

Charts & Figures

Extract data from charts, bar graphs, line plots, diagrams, and infographics for analysis and reporting. DeepSeek OCR understands visual data representation beyond text, capturing labels, legends, axis values, and trend information. Transform visual business intelligence into structured data for further processing and analytics workflows.

DeepSeek OCR Architecture

Powered by State-of-the-Art AI

DeepSeek OCR combines state-of-the-art vision processing with powerful language models to deliver accurate, efficient document understanding. The technology stack is optimized for production use, balancing accuracy, speed, and resource efficiency.

DeepSeek OCR performance benchmark chart

Vision Encoders

DeepSeek OCR employs sophisticated vision encoders that capture both global document layout and fine-grained text details. This dual-level understanding ensures accurate text extraction even in complex documents with mixed content types, varying fonts, and intricate formatting. The encoder architecture is specifically optimized for document processing rather than general image understanding.

Multi-scale feature extraction

DeepSeek LLM

At the core of DeepSeek OCR is a powerful language model that brings contextual understanding to OCR. Unlike traditional pattern-matching OCR, the LLM can correct errors using context, understand document semantics, and generate structured output formats like Markdown. This enables intelligent features like grounding, reference extraction, and format-aware text generation.

Supports grounding, reference, and multi-modal reasoning

vLLM High-Performance Inference

DeepSeek OCR leverages vLLM (Very Large Language Model) serving technology for production-grade performance. With continuous batching, efficient memory management, and GPU optimization, vLLM enables streaming outputs and high-throughput batch processing. On high-performance hardware like A100 GPUs, process thousands of pages per hour with consistent sub-second latency.

~2500 tokens/s throughput on A100-40G

Dynamic Resolution

Gundam mode represents DeepSeek OCR's intelligent adaptive resolution system. Instead of using a fixed resolution for all documents, Gundam mode analyzes document complexity and dynamically adjusts vision token allocation. This multi-crop strategy balances accuracy on dense content (formulas, tables) while maintaining efficiency on simpler sections, resulting in optimal performance across varied document types.

Gundam mode with multi-crop strategy

Frequently Asked Questions

Got Questions?

Find answers to the most common DeepSeek OCR questions, from supported formats to deployment options.

DeepSeek OCR Transforms Documents intoStructured Data with AI

DeepSeek OCR is powered by cutting-edge AI technology

DeepSeek OCR Performance Metrics

Built for Speed and Accuracy

DeepSeek OCR Features

Everything You Need

Document to Markdown

PDF Batch Processing

Multi-Resolution Support

Grounding & Reference

Why Choose DeepSeek OCR?

Powerful Benefits

Open Source Freedom

Context-Aware Intelligence

Production-Ready Performance

Flexible Resolution Modes

Comprehensive Document Support

Easy Integration

DeepSeek OCR Use Cases

Built for Every Scenario

DeepSeek OCR Architecture

Powered by State-of-the-Art AI

Frequently Asked Questions

What is DeepSeek OCR?

What is currently available?

What file formats are supported?

How accurate is the OCR compared to other solutions?

Can it handle handwritten text and non-English languages?

What are the different resolution modes?

Will there be an API available?

What kind of hardware is required?

How does pricing work?

Can I contribute to the open-source version?