DeepSeek OCR Transforms Documents intoStructured Data with AI
Experience the power of DeepSeek OCR - an open-source AI model that converts complex documents, PDFs, and images into clean Markdown. Try the official demo below, and stay tuned for our upcoming API service with enhanced features.
Loading Demo...
DeepSeek OCR is powered by cutting-edge AI technology
DeepSeek OCR Performance Metrics
Built for Speed and Accuracy
DeepSeek OCR is engineered for production-grade document processing with adaptive resolution modes, high-performance vLLM inference, and efficient token usage. Whether you're processing simple receipts or complex academic papers, DeepSeek OCR scales to meet your needs.
Resolution Modes
From Tiny (512px) to Gundam (dynamic)
Tokens per Second
High-performance inference on A100-40G
Max Vision Tokens
Support for high-resolution documents
DeepSeek OCR Features
Everything You Need
Explore how DeepSeek OCR delivers end-to-end document intelligence, from precise recognition to structure-aware conversion.
Explore how DeepSeek OCR delivers end-to-end document intelligence, from precise recognition to structure-aware conversion.

Why Choose DeepSeek OCR?
Powerful Benefits
DeepSeek OCR delivers unique advantages that set it apart from traditional OCR solutions and commercial alternatives.
Open Source Freedom
DeepSeek OCR is completely open source and free to use. Deploy on your own infrastructure without licensing fees, API limits, or vendor lock-in. The model is available on GitHub and Hugging Face for self-hosting, customization, and commercial use.
Context-Aware Intelligence
Unlike traditional pattern-matching OCR engines, DeepSeek OCR uses vision-language models to understand document context. This enables error correction using surrounding text, semantic understanding of document structure, and intelligent format conversion that preserves meaning, not just characters.
Production-Ready Performance
Built for real-world workloads, DeepSeek OCR supports high-throughput batch processing, streaming outputs, and efficient memory usage. Process hundreds of pages per minute on modern GPUs with vLLM optimization, or use smaller resolution modes for cost-effective cloud deployment.
Flexible Resolution Modes
Choose the optimal balance between accuracy and efficiency for your specific documents. From lightweight Tiny mode (64 tokens) for simple text to powerful Gundam mode with adaptive multi-crop for complex academic papers, DeepSeek OCR adapts to your needs without sacrificing quality.
Comprehensive Document Support
Handle diverse document types including PDFs, scanned images, photographs, screenshots, and handwritten notes. DeepSeek OCR processes multilingual content, mathematical formulas, tables, charts, and complex layouts with consistent accuracy across different input formats.
Easy Integration
Whether you prefer Python API, command-line tools, or REST API integration, DeepSeek OCR provides multiple deployment options. Use the Transformers library for simple scripts, vLLM for production workloads, or our upcoming API service for cloud-based processing without infrastructure management.
DeepSeek OCR Use Cases
Built for Every Scenario
From academic research to business automation, DeepSeek OCR handles diverse document processing challenges with consistent accuracy and efficiency.
Extract complete text, mathematical formulas, citations, and figure captions from academic papers and research documents. DeepSeek OCR recognizes LaTeX math notation, chemical formulas, and complex equations, making it ideal for literature review, knowledge management, and digital library creation. Process thesis documents, journal articles, and conference papers while maintaining academic formatting and structure.
Digitize invoices, contracts, reports, and business correspondence with structure-aware OCR that understands tables, headers, and hierarchical layouts. DeepSeek OCR automates data entry, enables searchable document archives, and accelerates business process automation. Perfect for accounts payable processing, contract management, and compliance documentation.
Convert old scanned documents, handwritten notes, and low-quality images into clean, editable text. DeepSeek OCR's vision-language model handles image noise, skewed scans, and varying quality levels to produce searchable text datasets. Ideal for archival digitization, historical document preservation, and legacy data migration projects.
Extract data from charts, bar graphs, line plots, diagrams, and infographics for analysis and reporting. DeepSeek OCR understands visual data representation beyond text, capturing labels, legends, axis values, and trend information. Transform visual business intelligence into structured data for further processing and analytics workflows.
DeepSeek OCR Architecture
Powered by State-of-the-Art AI
DeepSeek OCR combines state-of-the-art vision processing with powerful language models to deliver accurate, efficient document understanding. The technology stack is optimized for production use, balancing accuracy, speed, and resource efficiency.

DeepSeek OCR employs sophisticated vision encoders that capture both global document layout and fine-grained text details. This dual-level understanding ensures accurate text extraction even in complex documents with mixed content types, varying fonts, and intricate formatting. The encoder architecture is specifically optimized for document processing rather than general image understanding.
Multi-scale feature extraction
At the core of DeepSeek OCR is a powerful language model that brings contextual understanding to OCR. Unlike traditional pattern-matching OCR, the LLM can correct errors using context, understand document semantics, and generate structured output formats like Markdown. This enables intelligent features like grounding, reference extraction, and format-aware text generation.
Supports grounding, reference, and multi-modal reasoning
DeepSeek OCR leverages vLLM (Very Large Language Model) serving technology for production-grade performance. With continuous batching, efficient memory management, and GPU optimization, vLLM enables streaming outputs and high-throughput batch processing. On high-performance hardware like A100 GPUs, process thousands of pages per hour with consistent sub-second latency.
~2500 tokens/s throughput on A100-40G
Gundam mode represents DeepSeek OCR's intelligent adaptive resolution system. Instead of using a fixed resolution for all documents, Gundam mode analyzes document complexity and dynamically adjusts vision token allocation. This multi-crop strategy balances accuracy on dense content (formulas, tables) while maintaining efficiency on simpler sections, resulting in optimal performance across varied document types.
Gundam mode with multi-crop strategy
Frequently Asked Questions
Got Questions?
Find answers to the most common DeepSeek OCR questions, from supported formats to deployment options.