LLMs for OCR and the move towards using LLMs for contextual retrieval

February 10, 2025

This article compares the cost vs accuracy of different llms for ocr / document extraction, with gemini 2.0 flash being a noticeable step-change in performance with 6000 pages per dollar (pdf to markdown) at near perfect accuracy - https://www.sergey.fyi/articles/gemini-flash-2, this pushes more on the narrative of whether semantic search is necessary at all if we can just use an llm for contextual retrieval given prices are being driven down, some papers / code have been calling this reasoning-augmented generation or hypothetical document embeddings - https://github.com/superagent-ai/reag + https://github.com/texttron/hyde.