Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲What I'm Finding About LLM Code Style and Token Costs (jimmont.com)

38 points by jimmont 1 days ago | 17 comments

bombcar 1 days ago [-]

I just had Claude try to process an RSS feed and it was about to ZALGΌ IS TOƝȳ THË PO NY itself and I pointed that out and it immediately said "Wordpress has a json interface, I'll use that".

You need to know the shape of the solution ...

anttiharju 1 days ago [-]

Context about tony the pony

https://stackoverflow.com/questions/1732348/regex-match-open...

jimbokun 17 hours ago [-]

> Have you tried using an XML parser instead?

Balinares 1 days ago [-]

Thank you for helpfully de-darmoking that for the less pony-attuned of us.

vadansky 1 days ago [-]

If feels like the photoshop paint bucket tool.

If you draw a sloppy circle and fill it in, it'll "escape" and try to paint the whole canvas (and back in the day would get my slow computer stuck until I spam "esc").

You have to be able to draw a good circle to use it.

joshka 11 hours ago [-]

I think the reasonable thing that needs to happen here is frontier labs need to look at ways to incorporate better up to date gilded chunks of gold data into their post training. I'd love to be able to throw a chunk of scenarios and expected outputs for using various tooling and have this applied to gpt-next / opus-next. I recently did a test with having Claude do this for generating a skill for jujutsu vcs that was based on eval-ing and ablating many many instructions over hundreds of scenarios. This is good, but skill based post training costs tokens for every user rather than at the model level.

Maybe there should be a submit your post training corpus here thing somewhere.

E.g. this really comes down to advice changing in short time frames that aren't represented in data that satisfies knowledge cutoff which can be as long as 14 months in some things or more for older models. It's not just the problem of knowledge, but the grading of seen output. The models have been trained to produce the older style code because it compiles and solves the problems.

Here, the reframe that likely makes sense noting is that "Deno and runtimes like Cloudflare Workers implement the Web API surface natively", that's the strongest single thing that would help steer an agent to correctly write code for the code in question (assuming the Web API surfaces that are key are in distribution). Add something like - "Where there may be reasonable obvious updates that can be used in 2026, use them" ...

datadrivenangel 1 days ago [-]

The code comments are an especially brutal thing to add cruft and bloat and confuse the coding agents.

And it feels like claude code has gotten more verbose with the multiline comments lately

lelanthran 1 days ago [-]

It's really hard for me to parse LLM-generated prose in blog posts - the reasoning is disjointed, logic is split everywhere.

Is it already too late to have humans just write down what they are thinking instead of passing it through a lossy expander?

allanmacgregor 21 hours ago [-]

Human writting will come at a premium. I don't think LLM-generated prose goes away at all; the scary part is that I'm starting to catch people adopt the same tropes and language patterns in their speech/writing, manufactured contrarianism for example seems rampant in social media and blog posts ... but I guess nobody is talking about that :D.

ftaisdeal 1 days ago [-]

Excellent article, with impeccable analysis, that will fundamentally change how I work with Claude myself. I have already learned to give Claude both a "do" and a "don't" in order to limit unpleasant surprises.

username135 20 hours ago [-]

More than half my battle using things like codex comes from removing unnecessary code checks and verbose logic. Even when prompted, it just can't help itself. It's a willful beast.

joshka 11 hours ago [-]

Click the thumbs down report on the output (to better train the model in future to produce better code / provide signal to the people doing training that a common behavior is problematic) and then stick some guidance in your AGENTS.md. It can be fairly whack-a-mole at times on this sort of thing, but sharpening your personal saw with AGENTS.md is an effective approach to generally doing better.

1 days ago [-]

Izkata 1 days ago [-]

The "Form data" section is doing two completely different things, the large one is two different implementations of a React component mixed together while the short one doesn't store "data" anywhere for use later like the React one does.

Edit: Similar with the "UI components" section, the long one is missing the UI while the short one is UI without the trigger to activate it. You'd probably combine the two, using state from the first to control the UI in the second (replacing the contents of the useEffect with the dialog API to get the modal effect).

jimmont 1 days ago [-]

Reviewing my experience using LLMs, to improve results, reduce churn and token usage. Discovering the gap between what they produce and what I'd normally do is a significant source of output cost, regressions and surfacing a bit of why and how to fix it. Notably Claude is remarkably bad at/about this, producing errors even when directed toward modern Web solutions—that cut token use a lot, like toward 90% occasionally, which together with the frustrating churn led me to review how I'm working, what is happening and generate this article.

defytonofficial 1 days ago [-]

This matches my experience. I've been using OpenRouter with GPT-4o for an image verification service, and the prompt engineering choices have a measurable impact on cost.

One thing I found: asking the model to respond in structured JSON (with a strict schema) vs free-form text cuts token output by ~40% on average. The model stops "explaining itself" and just gives you the answer.

Also noticed that including a reference image in vision calls roughly doubles the input cost but improves accuracy enough that you save on retries. Net cost ended up lower for my use case.

Curious if you've measured the difference between asking for "concise" output vs actually constraining the response format.

jimmont 1 days ago [-]

That's an excellent idea I plan to try, thanks—re using structured JSON with schema. The most success I've had is saying "be brief" or an explicit size, like one line, or do not explain, etc. I haven't measured other instructions so extensively. They do work but the more specific the better. Other strategies around outputs that are more natural language seem to be hands-down the direction to take, and get away from the machine language habits we've used in the past. It's super interesting seeing this new practice emerging and more or less inventing parts of it along the way. Right now I'm at the place where my brute force and elaborate explanations were reaching their limit and in the frustration just realized I need to take a few days and try to figure out the tool. Across all these the pattern seems entirely that the constraints bound the probability space, whether it's the format like you suggested, or the instruction we give, including the space we point it toward (Web APIs, runtime, schema, etc). In all instances where it's not working the solution seems to be what does the pattern reduce to, and what specifics are the do/don't to go with that, and most of the time the results improve immediately. Your tip seems excellent for this. An easy-button.

linzhangrun 1 days ago [-]

why still use gpt-4o?

Rendered at 08:57:35 GMT+0000 (Coordinated Universal Time) with Vercel.