DeepSeek's new model sees text differently, opening new possibilities for enterprise AI | Fortune
Briefly

DeepSeek's new model sees text differently, opening new possibilities for enterprise AI | Fortune
"The DeepSeek-OCR model fundamentally reimagines how large language models process information by compressing text into visual representations. Instead of feeding text into a language model as tokens, DeepSeek has converted it into images. The result is up to ten times more efficient and opens the door for much larger context windows-the amount of text a language model can actively consider at once when generating a response. This could also mean a new and cheaper way for enterprise customers to harness the power of AI."
"Early tests have shown impressive results. For every 10 text tokens, the model only needs 1 "vision token" to represent the same information with 97% accuracy, the researchers wrote in their technical paper. Even when compressed up to 20 times, the accuracy is still about 60%. This means the model can store and handle 10 times more information in the same space, making it especially good for long documents or letting the AI understand bigger sets of data at once."
DeepSeek-OCR converts text into images and compresses textual information into visual representations for large language model inputs. The approach reduces input size dramatically, requiring roughly one vision token per ten text tokens while maintaining about 97% information fidelity at 10:1 compression and around 60% at 20:1 compression. The method enables up to ten times more information to be stored and processed in the same space, expanding feasible context windows and improving handling of long documents or large datasets. The compression could lower compute and storage costs for enterprises and enable models to consider much larger amounts of text at once.
Read at Fortune
Unable to calculate read time
[
|
]