NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
OpenAI's In-House Data Agent (openai.com)
tillvz 13 minutes ago [-]
Trust & explainability is the biggest issue here.

We've been building natural language analytics at Veezoo (https://www.veezoo.com/) for 10 years, and what we find is that straight Text-to-SQL doesn't scale. If AI writes SQL directly, you're building on a probabilistic foundation. When a CFO asks for revenue the number can't just be correct 99% of times. Also you can't get the CFO to read SQL to verify.

We're solving that with an abstraction layer (Knowledge Graph) in between. AI translates natural language to a semantic query language, which then compiles to SQL deterministically.

At the same time you can translate the semantic query deterministically back into an explanation for the business user, so they can easily verify if the result matches their intent.

Business logic lives in the Knowledge Graph and the compiler ensures every query adheres to it 100%, every time. No AI is involved in that step.

Veezoo Architecture: https://docs.veezoo.com/veezoo/architecture-overview

Leynos 6 minutes ago [-]
Don't you still need to unit test and version control the SQL artefact that is produced? You need to be able to see which query was used on which date and how it was validated.

(Prompts need to be version controlled too, of course)

maxchehab 27 minutes ago [-]
Trust is the hardest part to scale here.

We're building something similar and found that no matter how good the agent loop is, you still need "canonical metrics" that are human-curated. Otherwise non-technical users (marketing, product managers) are playing a guessing game with high-stakes decisions, and they can't verify the SQL themselves.

Our approach: 1. We control the data pipeline and work with a discrete set of data sources where schemas are consistent across customers 2. We benchmark extensively so the agent uses a verified metric when one exists, falls back to raw SQL when it doesn't, and captures those gaps as "opportunities" for human review

Over time, most queries hit canonical metrics. The agent becomes less of a SQL generator and more of a smart router from user intent -> verified metric.

The "Moving fast without breaking trust" section resonates, their eval system with golden SQL is essentially the same insight: you need ground truth to catch drift.

Wrote about the tradeoffs here: https://www.graphed.com/blog/update-2

qsort 15 minutes ago [-]
Very, very good stuff here. I think a possible missing piece is how to explain how the results were computed. Here it seems they're relying on the fact that users are somewhat technical (that's great for OpenAI -- it's an internal agent after all) and can at least read SQL, but it's an interesting design problem how you would structure the interaction with nontechnical users.

When working on data systems you quickly realize that often how the question was answered (how the metric is defined, what data was taken into account and so on) is just as important as the answer.

sjsishah 55 minutes ago [-]
Given my personal experience with various BI systems I think an AI agent like this is the perfect use case. These systems are operating on multiple layers of being wrong as is - layer 1 being your query is likely wrong, layer 2 being how you interpret the data is likely wrong.

Mix them together and you’re already deep in make believe land, so letting AI take over step 1 seems like a perfect fit.

I was hoping to read this article and be surprised by how OpenAI was able to solve the reliability problem, but alas.

0xferruccio 1 hours ago [-]
At Amplitude we built Moda which is super similar to this.

Our chief engineer Wade gave an awesome demo to Claire Vo some months back here: https://www.youtube.com/watch?v=9Q9Yrj2RTkg

I use this basically every day asking all sorts of questions

htrp 29 minutes ago [-]
data problems are not tech problems but rather org problems
exogenousdata 12 minutes ago [-]
So true. In my career (anecdotally), I’ve never encountered a data problem where the answer was ‘you didn’t choose this tech/language/product over another.’ It always comes down to decisions of governance and ownership. It’s Conway’s Law all the way down.
spiderfarmer 22 minutes ago [-]
I'm more interested in Kimi's In-House Data Agent
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 20:15:16 GMT+0000 (Coordinated Universal Time) with Vercel.