Mistral AI has introduced OCR 4, a new model built to read documents such as PDFs, Word files, and PowerPoint presentations. The release matters because the model is not limited to extracting text: it also tries to understand how a document is organized on the page.
That shift turns OCR 4 into more than a simple text-recognition tool. According to the source, it can identify document elements, assign roles to them, and provide confidence scores that help users judge how reliable the output may be.
What OCR 4 is designed to read
OCR 4 is aimed at common document formats that often contain more than continuous paragraphs. The source specifically names PDFs, Word files, and PowerPoint presentations, which can include mixed layouts, tables, titles, equations, and other structured elements.
Traditional OCR output can be useful when the only goal is to recover readable text. But many documents are not just streams of words. Their meaning often depends on where content appears, how sections are separated, and whether a block is a heading, a table, an equation, or another type of element.
OCR 4 addresses that by reading both the text and the surrounding page structure. This gives downstream systems more context than a plain transcript would provide.
Why block classification changes the output
The major difference described in the source is OCR 4's ability to classify blocks on the page. Instead of only returning raw text, the model identifies where elements sit and what role they play.
The source lists examples including a title, a table, an equation, and a signature. That classification can help divide a document into meaningful parts automatically, which is important when the document needs to be searched or processed by an AI system.
For search systems, structured output can make it easier to separate headings from body text or tables from narrative sections. For AI agents, the same structure can help preserve the logic of the original document rather than flattening everything into one undifferentiated text layer.
The model also provides confidence scores. These scores estimate how certain OCR 4 is about each word or page it reads. That is useful because document extraction can produce uncertain results, and a confidence signal gives users another way to decide where review may be needed.
Language coverage and reported testing
Mistral AI says OCR 4 supports 170 languages. The company also says the model works well with less common languages, according to the source.
The performance claim in the article comes from a blind test with over 600 documents. In that test, independent reviewers preferred OCR 4's results 72 percent of the time over competing models, the company says.
Those details are important for understanding how Mistral AI is positioning the model. The claim is not only that OCR 4 can read many languages, but that its results were preferred by reviewers in a comparison where the source says the test was blind.
At the same time, the source frames the 72 percent figure as a company claim. That distinction matters: the article reports what Mistral AI says about the model's performance, rather than presenting an independently reproduced benchmark inside the article itself.
Where OCR 4 is available and what it costs
OCR 4 is available through the API, Mistral Studio, and Microsoft Foundry. That gives users several routes to access the model depending on whether they are building software, using Mistral AI's own environment, or working through Microsoft Foundry.
The listed price is $4 per 1,000 pages. In batch mode, the price is $2.
For organizations processing large document collections, the distinction between standard pricing and batch mode may matter. The source does not provide details about batch mode beyond the price, so the practical choice would depend on the workflow supported by the available access channels.
What this means for document AI workflows
OCR 4's main promise is that document understanding can start earlier in the pipeline. If a model can return text, layout position, element type, and confidence scores together, then later systems do not have to infer all of that structure from plain text alone.
That can be especially relevant when documents contain multiple kinds of information in one file. A presentation slide, a formatted Word document, or a PDF with tables and signatures may lose important meaning if converted into a simple sequence of words.
By classifying blocks, OCR 4 creates output that is closer to the original structure of the document. By adding confidence scores, it also gives users a signal about where the model believes its reading is stronger or weaker.
The result is a document model positioned for search systems and AI agents, not just archive conversion. Based on the source, Mistral AI is presenting OCR 4 as a step toward making documents easier for software to interpret, organize, and act on.