We used Reducto and it did struggle with long documents. As we process financial documents going over 300+ pages using Gemini 3 Flash is producing high accuracy extracts super fast.
raunakchowdhuri 21 hours ago [-]
We've made a lot of changes in the past few months that make our standard extract much, much better, as well as Deep Extract for documents even longer than that. We'd love for you to give it a try!
willwjack 1 days ago [-]
Any learnings from deploying agents at such massive scale?
raunakchowdhuri 21 hours ago [-]
The big one is that LLMs get lazy on repetitive tasks. They'll skip rows or consolidate entries instead of grinding through every last one. So you need verify-and-re-extract loops rather than single-pass processing. Breaking work into sub-agent chunks with explicit correctness criteria defined upfront (e.g., "line items must sum to the stated total") lets the system self-verify autonomously. At scale (28M+ fields), this approach actually outperformed expert human labelers!
We're releasing an open dataset for challenging structured extraction tasks as a starting point for people to do any comparisons soon!
vikp and the Datalab team have done great work in the space, but their structured extraction product is closer to our baseline /extract api since both of those are single pass extractions.
Deep Extract is more accurate than any structured extraction product we've tried, but the approach comes with a very clear cost/latency tradeoff over a single pass extraction. We have free credits if you'd like to do a side by side
nbnn 8 hours ago [-]
Irud
cyanydeez 24 hours ago [-]
I like to play guess which LLM open source package is that XKCD comic.
vikp and the Datalab team have done great work in the space, but their structured extraction product is closer to our baseline /extract api since both of those are single pass extractions.
Deep Extract is more accurate than any structured extraction product we've tried, but the approach comes with a very clear cost/latency tradeoff over a single pass extraction. We have free credits if you'd like to do a side by side
Looks like it's something like: https://huggingface.co/docs/transformers/model_doc/layoutxlm