Python
fromRealpython
1 week agoPython MarkItDown: Convert Documents Into LLM-Ready Markdown - Real Python
MarkItDown rapidly converts PDFs, Office files, images, HTML, audio, and URLs into LLM-ready Markdown for fast AI-pipeline integration.
In a Tuesday post on its Engineering blog, four Grab staffers explained that the company needs to accurately extract information from ID cards, driver's licenses, and registration certificates for compliance chores like know-your-customer checks. Grab tried Optical Character Recognition (OCR) systems, but its chosen tech "struggled with the variety of document templates it had to process." It's 2025, so the org investigated whether large language models could solve its problem.
The Snipping Tool in Windows 11 provides a handy way to capture screenshots of text, images, and other items that appear on your screen. But sometimes you might want to learn more about the item you've captured. For that, the tool now offers a visual search engine that uses Bing to dig up information on the content in your screenshot.
You can send Agenda Hero an image or a PDF with info about events or even paste over entire paragraphs of text, and in the blink of an eye, it'll identify and extract all the relevant details and put 'em into proper calendar event format. From there, you can add any and all events it's identified into your calendar (Google Calendar as well as Outlook, Apple, or practically any other platform) with a single click.
If you need to convert a stack of PDFs into Word, Excel, or PowerPoint, or turn files and images into polished PDFs, consider it done. The software handles batch conversions and keeps layouts, formatting, and links intact. Plus, with built-in OCR, even scanned or image-based documents become fully editable - perfect for modernizing old files or pulling data from printed pages.
OCR has taken unprecedented steps to streamline its functions according to demand: for example, amid a growing volume of Title IX complaints, OCR partnered with the Department of Justice to expeditiously investigate sex-based discrimination claims.