So many people have Apple a hard time for not focusing enough on ai.
Seems that the UX will be enough to win over users and investors
harrouet 1 days ago [-]
This is Apple commoditizing LLMs while keeping control of the UX.
They are a hardware company and will keep selling the best machine for AI use. Well done.
tedggh 1 days ago [-]
Benedict Evans may be right after all; frontier models look more and more like telecom companies in the 90s. Billions and billions of investment in infrastructure while others further up the stack captured all the value.
CuriouslyC 1 days ago [-]
There will be frontier models that are non-commoditized, but they'll be kept guarded and hidden away, and you'll only get the final result, so that they can't be distilled and their harness can't be reverse engineered. They'll be billed like employees, rather than like a tool.
hedora 1 days ago [-]
The non-commodity network services of the early 1990’s and the non-commodity 3d graphics hardware of the mid-1990s made the same argument.
ls612 21 hours ago [-]
They didn’t have the security state backing up their business thesis at gunpoint.
yandie 1 days ago [-]
I doubt that. What stops the Chinese labs from figuring it out? It’s not like these models are fundamentally different from each other
CuriouslyC 1 days ago [-]
If all you have is the starting point and the finishing point, the lack of the path taken from one point to another limits your ability to train models that can efficiently recreate the work, and increases its cost enough that it's possible the US labs can progress capabilities faster than Chinese labs can distill that behavior.
OpenAI and Anthropic may have gone silent on how they build their models, but other companies have different incentives.
lumost 21 hours ago [-]
This just looks like a capex problem. There is no evidence that Anthropic has secret sauce above and beyond access to capital. If there is secret sauce, it's unclear that it changes the required amount of capital by all that much.
China will spend all of the money required to catch up, Google and OpenAI will both spend money to catch up as well. NVidia and others will not allow a frontier lab to become the AI bottleneck.
sealeck 1 days ago [-]
> lack of the path taken from one point to another limits your ability to train models that can efficiently recreate the work
Isn’t this the problem inference (training) a model is designed to solve :)))
jmalicki 1 days ago [-]
It is!
And it's a hard problem.
What's an easier form of training is being able to see the intermediate results and train to imitate them.
wahnfrieden 1 days ago [-]
That’s already the case. Chinese ingenuity allowed them to achieve what they did without access to reasoning outputs
throw10920 15 hours ago [-]
This has got to be satire. Everyone, especially Singaporeans, know what "Chinese ingenuity" really is.
wahnfrieden 9 hours ago [-]
It’s merely descriptive of ingenuity required to distill models back into reasoning models without having any of the chain of thought. You underestimate the original work required because of biases
nyrikki 23 hours ago [-]
[dead]
mingqiz 1 days ago [-]
Isn't that what they are doing already? The model is already guarded and hidden and i only get to send it what i want. Talk with it to clarify my requirements. And i can switch to a different provider for cheaper/better results.
lacy_tinpot 1 days ago [-]
They tried to do that with operating systems and the browser.
throwaway85825 21 hours ago [-]
The economically useful frontier models will be fine tuned on data to make them useful for a specific project or task.
naravara 1 days ago [-]
I think this will be isolated to highly specialized fields where training data will need to be selectively curated.
greenavocado 1 days ago [-]
Everything can be distilled, it will just become more painful
alecco 1 days ago [-]
In spite of their deeper pockets, massive datacenters, colosal amounts of user data, and hundreds of thousands of top developers, even Amazon, Meta, Microsoft, and Google are well behind.
I think Evans is completely wrong. There are only 2 truly frontier models. (at least for now). And Anthropic seems to be leaving OpenAI behind so there might be only 1 in the near future. (which is scary/dangerous)
ksec 1 days ago [-]
>I think Evans is completely wrong.
I wish there was a case where I find Evans is wrong. As far as my memory served me, I failed to record a single one.
I disagree that Amazon, Meta, Microsoft, and Google are "well" behind. If anything the frontier model advantage seems to be at best 6 - 9 months. And that the Chinese model are all doing well.
One of Steve Jobs's line, "It is a feature, not a product." Even if Apple were a generation behind or 1 year behind frontier model. The advantage of default is enough to hold a lot of its user.
To put it simply, even if OpenAI or Anthropic were better, there is zero chances they would topple Apple in hardware sales, user or ecosystem. On the other hand, even if Apple's AI were 6 - 9 months or a generation behind, most user would settle for it and damage OpenAI / Anthropic.
ak_111 24 hours ago [-]
Just top of my head (and I don't even follow his takes that closely), just check his takes on Magic Leap which he consistently promoted using quite dramatic langauge (along with the entire AR space) and check how it panned out.
overfeed 1 days ago [-]
> On the other hand, even if Apple's AI were 6 - 9 months or a generation behind,
Do you mean Google's AI with Apple wrappers? Apple's in-house AI is further behind Google, amd very far from the frontier according to your ranking. IMO, Google is on the frontier - I recall Altman calling for an OpenAI all-hands-on deck when Gemini was released because of how good it was compared to ChatGPT. I also suspect Google has the lowest operating expenses due to scale, experience and luck/planning (TPUs), there will come a time when AI investments will slow down, and the cost of revenue will become more important.
alecco 1 days ago [-]
Even their own employees get frustrated if they can't use Claude or Codex. 6-9 months is a big difference and I think it's closer to 9 than 6. And never mind the harness etc are also many months behind.
geodel 1 days ago [-]
This is just wishful thinking. I am sure someone from gossip media will also find Apple employees who are ready to leave job if Apple disallows Claude usage.
If anything Apple should notice it is Anthropic has got a really good marketing team and it would be no shame if they pick a trick or two from them.
throwaway98797 1 days ago [-]
people use outlook when gmail exists.
employees will always suffer.
hedora 1 days ago [-]
Remember the implicit “pareto” in “frontier models”.
Anthropic and OpenAI are far behind state of the art for the entire curve except the “extremely expensive for barely measurable improvements” part.
GLM is probably the third most expensive frontier model (benchmarks and reviews will say for sure), and is apparently ~Opus 4.6 for 10% the inference cost.
The last I checked, qwen was still owning the 24-32GiB RAM range (it runs reasonably without a GPU!) and somewhere around 3.5-4 generation models.
Also, even anthropic says Mythos ~= ChatGPT 5.5, so it’s unlikely either one is leaving the other behind. The big problem they both have is they asked for the government to gate keep model releases and use cases, and their wish was granted.
That’s knocked them back 6 months already. Anthropic’s only frontier offering has been taken down.
tedggh 1 days ago [-]
I use both Claude and Codex and don’t see any meaningful difference between the two. My use case is modeling semi complex physical processes (energy and manufacturing) in code for simulations. I also have to do a good fair of automation via scripting in Python or PowerShell for manipulating data as well as legacy code analysis (C, Fortran, COBOL). Given I provide the models with the information and documentation they need, both perform very similarly. I recently did a full codebase review (for design patterns and vulnerabilities) and both Codex and Fable agreed 100% about the most critical findings. I do very little front end development, although some of my automation scripts have TUIs and again no problem with either Claude or Codex generating them for me. At this point I go with the less expensive, which seems to be Codex. With the $100 plan I rarely hit the limits. With Claude I max out my plan in about 4-6 hours of work.
joenot443 1 days ago [-]
Did you find much of a difference between Fable and Opus?
thrill 1 days ago [-]
Yes. Fable is much more organized and consistent at taking small bites of the (sorry) apple when solving a problem. Specifically I'm talking about a machine learning problem I'd been working on for awhile with Opus and it was (and is, again) constantly stating that all the signal is exploited, everything is now overfit, etc, etc, etc. The first day I pointed Fable at the situation I got a 10% improvement by paying attention to the little details that Opus instead took slightly negative results and extrapolated to "fully exploited". I've had to drop back, again, to forcing Opus to explain what it's looked at and the detail it has quietly assumed away.
It's like the difference to talking to two smartest kids in a class, but one really belongs a grade higher - and the other hasn't learned yet to ask the questions that encourage it to dig in that little bit more for the additional multi-order effects.
yfontana 1 days ago [-]
Had a very similar experience. Opus went "look, t-sne shows your features are neatly clustered" (it didn't) and left it at that. Fable didn't fully explore the problem/data, but it did go much further, implementing models to check for correlations and adjust feature clusters. Opus was able to finish the job after Fable was cut, but required much prodding (doing exactly what you described: pointing it towards things that look off and asking it, are you sure that's all there is to this?).
tedggh 1 days ago [-]
I have used Fable only once to do an in depth codebase review of a complex system. I asked it to flag deviations from a particular design and also compile a list of vulnerabilities. It took about 15-20 minutes. The result was very similar to Codex for the most critical findings, different suggestions on how to address them but it found exactly the same critical issues as Codex. This is still not a good test to evaluate Fable. But my feeling is that the latest models are all pretty good and now it comes down to your personal setup and workflow, that’s where you can get the productivity gains IMO. It’s like picking between MacOS or Windows as development environment. For some Windows sucks and for a some is the opposite, but both groups of people can be equally productive if they know their environments well and know how to go around their respective limitations.
hedora 1 days ago [-]
I constantly hit safety blocks in Fable (I’m trying to write secure software, which is equivalent to finding security holes, so banned).
I didn’t use it on big enough tasks to notice any improvement.
I had been hitting plan limits pretty regularly, but fixed it by changing my workflow. That also increased the success rate of claude by an order of magnitude.
awongh 23 hours ago [-]
That's true now, but long-term (maybe just a few years) it doesn't seem feasible for the status quo to continue from a financial point of view.
Spend for compute seems like it needs to increase to get the next iterations of models, and even if they IPO the money might run out before they can solidify their revenue streams.
All while Google just needs to survive long enough with their good-enough models and do it without really putting themselves in any existential financial risk.
And ideally the chinese models are also still there keeping everyone honest.
The true dystopic worst case is a Google monopoly on cutting edge AI.
embedding-shape 1 days ago [-]
> I think Evans is completely wrong. There are only 2 truly frontier models. (at least for now). And Anthropic seems to be leaving OpenAI behind so there might be only 1 in the near future. (which is scary/dangerous)
Truly fascinating ecosystem and community in general, as experiences differ so wildly. Anthropic's models seems far behind OpenAI to me, especially when you get into "Pro" territory, and there doesn't seem to be any worthy competition to Pro Mode available at all.
And this is said with someone who use both platforms, and spend a lot of my day interacting with agents and LLMs in various ways. The interesting part is that probably so do you too, and probably your experience and what you share lines up with what you experience! Yet we come away with basically opposite takeaways :) I don't think either of us are wrong either, somehow.
haellsigh 1 days ago [-]
I agree with what you're saying.
I have a Claude plan for work and I prefer using Claude more than any other LLM I've tried.
Having recently tried the Codex 100€ plan with GPT-5.5 in high/xhigh, I don't think it's worse that the Opus models, just different.
I've noticed that depending on how you talk to it, you get wildly different outputs. This seems to happen less with Opus: it mostly understand what I want. GPT is often a bit too literal.
Just my two cents.
embedding-shape 1 days ago [-]
> I've noticed that depending on how you talk to it, you get wildly different outputs. This seems to happen less with Opus: it mostly understand what I want. GPT is often a bit too literal.
Yeah, exact prompting matters a lot, seemingly more than people think. There is definitely tradeoffs between how literal the models takes the prompts, on one hand it's useful for the model to ignore their own instinct when you know better, so they don't go chasing geese randomly, but on the other hand it's useful sometimes when they self-direct, when you misworded something and it's obvious you meant something different because of the context, and similar things. They're basically good at different things.
Really agree every model isn't equal and they aren't as interchangeable without adjusting how you prompt them as people seem to think.
WarmWash 1 days ago [-]
People use a model as their daily driver, get very familiar with it and it's behavior, and then go and use another model and have a hard time. It's very difficult to separate "the model is bad" from "the model works differently".
JumpCrisscross 1 days ago [-]
> It's very difficult to separate "the model is bad" from "the model works differently"
At which point it’s fair to reject the commoditization label.
Also missing from these discussions are e.g. Qwen, which is at least as good as one back from OpenAI or Anthropic’s frontiers.
embedding-shape 1 days ago [-]
> Also missing from these discussions are e.g. Qwen, which is at least as good as one back from OpenAI or Anthropic’s frontiers.
They're missing in the discussion because the ones you can run locally, aren't actually "one step away from other closed-source labs" in practice when you use them. They might benchmark as such, but they're sadly far away from measuring up to those scores except for very specific use cases, even when you have say 96GB of VRAM available to run the bigger models even most (at home) consumers won't be able to run.
JumpCrisscross 1 days ago [-]
> the ones you can run locally, aren't actually "one step away from other closed-source labs"
And they probably won’t be for at least another decade. Comparing like with like, flagship model running on the best hardware it can run on, Qwen is close.
embedding-shape 1 days ago [-]
> Qwen is close
I wish so badly this was true, but sadly today it just isn't.
JumpCrisscross 1 days ago [-]
To be clear, I’m relaying my subjective experience comparing Opus and Qwen.
computerex 1 days ago [-]
For HPC/ai work opus blows gpt away, it’s no competition.
embedding-shape 21 hours ago [-]
As someone who just spent the last three days (tried using both, ended up using mostly Codex) implementing DiffusionGemma in Rust, I think they're more or less equal when it comes to machine learning and AI. They get stuck at different points, but wouldn't say one is a clear winner over the other. HPC I have no idea so I'll take your word for it :)
alecco 1 days ago [-]
When you say "Pro" territory, do you include Fable?
embedding-shape 1 days ago [-]
You mean the model that was available for a whole of three days? No, I had played around with it a tiny bit, but not much than that. I guess time will tell if it gets close.
jimbokun 1 days ago [-]
Is Google behind? The general opinions I read suggest Gemini is very competitive with Anthropic and OpenAI's top models.
wolttam 1 days ago [-]
I think it's highly likely that there will remain one or two companies on the very bleeding edge of AI development for the foreseeable future.
But what I think a lot of people miss is that the market for the truly bleeding edge (developing bio-tech, building the most sophisticated software stacks (probably with a tilt towards simulation, GPU kernel optimization, etc)) is not the whole market.
There's a plethora of use-cases for models that are not on the bleeding edge. If I can solve my relatively simple problems with an off-the-shelf model for a minuscule fraction of the cost of the frontier, I'm going to.
thewebguyd 1 days ago [-]
Anecdotal case in point, but writing mostly enterprise CRUD in C#, I've gotten plenty of mileage out of Sonnet, very rarely do I need to use Opus.
Its somewhat of a myth that you need the most advanced, expensive model for software development.
johsole 23 hours ago [-]
There was a time when Opus was the only model really worth using, I think that was maybe 4.4 or 4.5, but I agree Sonnet is pretty good now and can be used quite often.
nxobject 16 hours ago [-]
> And Anthropic seems to be leaving OpenAI behind so there might be only 1 in the near future.
Well, in domains like SWE where Anthropic's putting in the effort. I don't they'll make the claims that OpenAI makes about how their models are pushing the life sciences forward, for example.
bushbaba 1 days ago [-]
I'm perfectly happy at claude opus 4.6. All improvements since then have not meaningfully improved my day to day. If i can get 4.6 on my laptop for 5-10k, i'd gladly start shifting my ~1k/month Anthropic spend over.
Some of the harness even let you run a local model for most things, and only pay for the latest frontier models when needed, which cuts down cost drastically.
afavour 1 days ago [-]
Maybe I’m alone in thinking this but I think the long term victor will be the one that works out pricing best.
Fable might well be a better model but it’s too expensive for everyday AI use. Definitely if we’re talking about the kind of stuff you’re going to want to do on your phone. Even for coding, I’m not going to reach for Fable (well, when I can…) for 95% of the work I do.
I don’t believe a mature AI industry is going to have a one size fits all, single winner.
tedggh 1 days ago [-]
Yes, and pricing is one of the features of a commodity, because users can jump back and forth between services, it becomes a pricing race to the bottom. Agree also that you don’t need the best model all the time. You could have the most powerful model draft the design, requirements, guidelines, policies or whatnot then get the lower tier models execute it. Then again you can have the most powerful model do the testing and review, and give back feedback, rinse and repeat. Just like in the real world you don’t need an entire staff of lead engineers.
zitterbewegung 1 days ago [-]
It is much better. Imagine if the whole Manhattan project could have been outsourced and costs you nothing. I expect in a short time that open source models will be almost or almost parity by 2030 and running on consumer devices.
HPsquared 1 days ago [-]
Market phenomena like this are a bit like the Manhattan project in that you pay for it, and make use of it, whether you want to or not. It's functionally very similar to the government doing something.
axus 1 days ago [-]
Last I checked the telcos made plenty of money in the 90s. Should Verizon be getting a cut of my Claude Pro subscription, since I use FIOS to access it?
tedggh 24 hours ago [-]
I haven’t fact checked, but according to Evans big telecom builders didn’t make a lot of money after all the capacity investment. Some actually went bankrupt or got acquired as distressed assets. Big tech was very profitable monetizing that same infrastructure.
hylaride 23 hours ago [-]
Some went bankrupt, with Worldcom being the most famous example...though that was fraud. But even those that remained had large amounts of debt that never ends as there's always CAPEX for upgrades to networks to fund (both fixed and wireless). Now a lot of the debt is also from some of them going on media ownership adventures, but even those that didn't eventually got folded into larger companies (eg Sprint).
Most of the ones that survived did so due to being able to pick up distressed assets and at values that could then be profitably monetized - a move that it would not surprise me to see repeat itself in the LLM space (we'll see).
colechristensen 1 days ago [-]
This is what everybody is TRYING to do. They built something and will do everything they can to charge outsized rent on it far past the value it provides to take revenue from anyone downstream.
The fact that telcos couldn't charge rent was a primary reason the Internet was so successful.
Remember $0.10 per text message? You bet in some alternate timeline AT&T charges $0.10 per webpage visit and we're stuck on 100kbps connections because the monopoly doesn't want to innovate.
enos_feedler 21 hours ago [-]
He denies comparing them to telecom companies and even says at various points in his writing. Instead he compares their usage to the usage of mobile data.
paulsutter 20 hours ago [-]
Try Mythos
post-it 1 days ago [-]
> while keeping control of the UX.
Extremely tangential, but this is my favourite upshot of AI. For decades, companies have been walling off their services and forcing us into their fuckass UIs. Now over the course of the last twelve months, suddenly everything has an MCP and I can use it through my command line chat interface.
Any company that doesn't adapt gets so hammered by people's AI-DIY web scrapers that they have no choice but to cave.
swingboy 1 days ago [-]
Does “the best machine for AI use” apply here considering these models are still server-side?
embedding-shape 1 days ago [-]
The play here seems pretty evidence, if I may assume. Apple creates an interface that is generalized enough so you can easily swap models, and while Claude is preferred by Apple today, it may be any provider or even local models in the future, and the APIs the developers use remain the same, so "migration" becomes easier.
ABS 1 days ago [-]
for the on-device model, yes it runs on the Neural Engine (at the moment) so a newer chip means faster, cheaper local inference. For the server side path this Claude package is about your machine is irrelevant since it's a network call. The same API covers both, so "best machine for AI" only bites when the session is actually local.
But we can imagine that the balance of what's on-device vs what's remote will move continuously towards the former as time, improved HW and improved local models keep progressing
brookst 1 days ago [-]
I would think so, as “use” doesn’t specify implementation. If you use a word processor it may be running locally or remotely.
From a user’s perspective, it doesn’t matter.
sqquima 1 days ago [-]
[dead]
WorldMaker 1 days ago [-]
Apple's been trying to make the marketing appeal that "Private Compute Cloud" is also a hardware project. Given it seems to rely on low level details of device Hardware Security Modules, it's maybe even at least a little bit more than just "marketing spin".
5701652400 23 hours ago [-]
looks like it is not "Private iCloud Compute" at all.
Anthropic literally says "Requests go directly from your app to the Claude API; Apple is not in the request path and does not see prompts or responses." — Apple straight up lied
Tagbert 22 hours ago [-]
No, that post is about Claude for Foundation Models. That is not the same as Apple Intelligence.
the Swift package for Claude for Foundation Models is about sending calls to Claude. That had nothing to do with Apples models which do use local models and models on Private Cloud Compute.
Your accusation that "Apple straight up lied" is based on misunderstanding TFA.
halJordan 1 days ago [-]
It's been clear for years now that eventually ai will be embedded at the os level. Apple even recognized it way back when they first introduced Apple Intelligence. Yes they're commoditizing llms or whatever. But this has been a user facing feature they've been iterating on for years now
amelius 1 days ago [-]
Now we only need to commoditize the hardware.
hedora 1 days ago [-]
Check out AMD’s offerings.
They’re typically a bit better on high TDP stuff, and a bit worse on low TDP. They mostly match in the middle. I have a $500 AMD NUC and a slightly older $2000 MBP. Inference throughput is within 2x.
The comparison is a little messy: AMD currently maxes out at 128GB of RAM vs Apple’s discontinued 512. Apple has nothing to rival the Steam Deck.
jimbokun 1 days ago [-]
This is what originally made Microsoft the most lucrative tech company of its day.
Android succeeded at this to an extent with phones, but Apple has been able to keep its products differentiated enough in the minds of consumers to maintain their premium pricing. So far.
Danox 1 days ago [-]
Vertical computer company operating system plus hardware under one roof.
1 days ago [-]
klausa 1 days ago [-]
How is this Apple keeping control of the UX?
matwood 1 days ago [-]
The betas of the next OS's include a Siri AI chatbot, and the AI features are built into various parts of the OS. A user has no idea what model is powering any of it - Apple controls the UX.
mr_toad 1 days ago [-]
I’ll be curious to see if they make the models accessible to Shortcuts, like they do with the current models.
klausa 1 days ago [-]
I'm aware. How is this relevant to the posted article?
embedding-shape 1 days ago [-]
The article is about (from the eyes of a user) white-labeled usage of Claude models on Apple devices, this subthread is about white-labeled usage of LLMs on Apple devices, how is it not relevant?
klausa 1 days ago [-]
Because that's not what the article is about; this is about a unified API for the _app developers_ to access different kind of models.
That API has no user-facing components, and has no influence over UX of what the end-users are interacting with.
The users won't know if you used Foundation Models API or integrated with OpenAI/Anthropic/Gemini SDK directly.
embedding-shape 1 days ago [-]
> The users won't know if you used Foundation Models API or integrated with OpenAI/Anthropic/Gemini SDK directly.
That's the point! That's the whole "white-labeling" part, and what the commentator earlier is talking about. You're very close in understanding the context here!
klausa 1 days ago [-]
I’m sorry, so your position now is that “being completely invisible to the users” is “controlling the UX”?
embedding-shape 1 days ago [-]
I think you're taking the written words a bit too literally here. Read it with a more lax filter and less literal word-meaning, and I think the original comment will become a bit clearer.
klausa 1 days ago [-]
You know what, I've been a bit too snipe-y in my previous comments, and it led to to discussion devolving in unproductive ways.
I'd genuinely like to understand where you're coming from more.
I think we're all in agreement that this framework is very much about letting developers swap the models easily, and treat them as commodities. That seems pretty obvious.
I do however still don't see how this has anything to do with controlling the UX (or the new Siri for that matter! The new Siri doesn't use Anthropic models, and there are no extensions point for it to do so — that's pretty much the whole reason why it won't be available in the EU).
Help me see your point of view!
embedding-shape 1 days ago [-]
Thanks for the patience!
The way I see it, isn't about what is immediately there right now today, but what intent it signals, or what path Apple is planning. Yes, today it's ClaudeForFoundationModels, but the FoundationModels stuff will be used to allowed switching between models, probably without users noticing, and who knows what Apple will ultimately surface to users, tends to be in the direction of less user-control.
But there is a lot of assumptions, guesses and extrapolation from that, I think you're right if you focus only what's there right now, rather than trying to "see into the future" which harrouet basically started doing with their root comment.
geodel 1 days ago [-]
I don't know if it helps. One way to look at it is branding product. Apple is branding the product. So they supposedly have more value to customers as it stands for quality, awareness, trust etc. As oppose to 100 little components in computer which maybe from different brands, and Apple may switch brand year to year without user noticing. So those components makers have little power over Apple.
Same is happening to Claude software package as it would stand behind branded Apple foundation models. From pure software developer thinking this is exactly what Claude offered here so where is the issue? Issue is in larger space where Apple could take steps to block Claude out of their ecosystem if they so wish at some point and there is little Claude / Anthropic would do if Apple Foundation is the only thing that Apple consumers would know about.
klausa 1 days ago [-]
That framing would make sense to me if the thing being discussed was Apple letting _end users_ somehow access Claude models white-labeled as "Apple Foundation Model", sure? Or even letting _developers_ access Apple-hosted Claude or something?
But this is very much _not_ what this is.
Apple showed a bunch of new APIs at WWDC last week. One of this is a way for a developers to interact with LLM's in a way that let's you easily swap out models (with a bunch of other niceties around it), including swapping between on-device and remote models.
This is _Anthropic_ (not Apple!) shipping their support for that framework, so you can also switch between different Anthropic models using the same APIs you'd use to swap between a local or PCC model.
I expect OpenAI will probably ship their shims in the next couple of weeks too? (You can probably vibe-code one in half an hour if you point Codex at the Anthropic one, tbh).
(Apple also doesn't use "Apple Foundation Model" anywhere in the user-facing marketing materials AFAICT, this is strictly developer facing terminology, but I could be wrong?)
My impression is that people are _wildly_ misunderstanding what this _actually_ is, and running wild with speculation/interpretation.
1 days ago [-]
butlike 1 days ago [-]
I can't reply to your child comment for whatever reason, but Siri is part of the Apple Foundation Models framework. The idea is that no matter what backend the developer uses, the end user will always say "Hey Siri." This is analogous to controlling the UX. Siri is independent of whichever model the app developer uses.
klausa 1 days ago [-]
No, Siri is entirely separate from this framework.
Are you thinking about Intents? That lets Siri interact with data (and perform some actions in them) from your apps, but it is something completely different.
You can definitely expose things from your app via Intents that will end up calling an external arbitrary LLM somewhere, but it does not require using Foundation Models API whatsoever.
kcb 1 days ago [-]
It's Apple, so it's some revolutionary big brained play, and not just yet another llm sdk.
1 days ago [-]
wuliwong 1 days ago [-]
I think there is an opportunity for a new hardware company to enter the market. I know this is just hypothetical but I believe that AI is revolutionary enough where a new approach to hardware and UI/UX will enable far more value to be derived from AI. I think the incumbents like Apple will stick to their familiar platforms and could get beaten out by a new competitor that is AI native to the core. Maybe? ¯\_(ツ)_/¯
dlev_pika 1 days ago [-]
Apple’s play was a masterclass - unsure how deliberate it was, or how much of a choice thy actually had, but it’s turning out pretty well IMO.
Now if they can further reinforce their angle on Privacy, they might continue to be what they are (or more)
1 days ago [-]
post-it 1 days ago [-]
> a Swift package that makes Claude available as a server-side language model in Apple's Foundation Models framework
Ahh I was hoping for the opposite: all of the existing features of Claude Code but somehow running locally on my laptop's neural engine. A pipe dream on an M2 with 8 GB of RAM, but I had a flicker of hope there.
inickt 1 days ago [-]
Check out this WWDC session. Obviously not going to compete with the frontier models (and I think 8GB is too small anyways), but Apple did demo MLX + OpenCode.
You can use OpenCode or Pi with SSD streaming so it technically will have all the features, just unbearably slow.
FuriouslyAdrift 1 days ago [-]
I've found most of the frontier coding models require somewhere between 300GB to 1TB to run with full capabilities.
godzillabrennus 1 days ago [-]
If only we could buy 1TB of unified memory in a Mac for $1k-$2k in total hardware costs. Apple would basically be able to extinguish the entirety of the market cap for Nvidia, OpenAI, Anthropic, and others all at once.
In 10 years, I hope my MacBook Pro can run today's frontier models and has 1TB of unified Memory.
shadowpho 1 days ago [-]
Why can’t Apple launch a $50k product for $1k? Everyone would buy it!
tempoponet 22 hours ago [-]
To go further down this pipe dream - Anthropic / OpenAI would buy them all and still price out the consumer. There's no end-run in this scenario.
connicpu 1 days ago [-]
The Nvidia GB300 DGX Station, which isn't even going to hit 1TB total memory, is expected to launch at almost $100k. Bit of a pipe dream with memory prices where they're at.
FuriouslyAdrift 22 hours ago [-]
There are multiple server systems available right around the $100k range that have 512B of GPU RAM right now (4x AMD Instinct MI300A)
GIGABYTE G383-R80-AAP1 for example
jayd16 1 days ago [-]
They want you to buy four 256GB Studios and link them with ThunderBolt.
Danox 1 days ago [-]
Yes, particularly if that memory is designed and engineered by Apple in house like Apple Silicon in house and manufactured by TSMC on shore somewhere in the United States.
manoDev 1 days ago [-]
I’m bullish on Apple because of that. Tech waves always oscillate between mainframe/thin-client models at first, then commodity hardware catches up. Apple is well positioned to deliver that with the M series, all it takes is for the current AI bubble to pop a bit and memory costs go down.
dboreham 1 days ago [-]
The people who train the frontier models want to recover their costs, so they're not going to let you do that.
bigyabai 1 days ago [-]
> Apple would basically be able to extinguish the entirety of the market cap for Nvidia
I don't think you understand why people buy Nvidia hardware if you're beating the "just add more dual channel DDR, bro" drum. Apple wouldn't even be able to extinguish AMD with a product like that, it's all slow memory being fed into a raster-first GPU architecture.
pstuart 1 days ago [-]
The work on LLM in a Flash will probably help, and Apple's NVMe architecture is well suited to maximize throughput could allow their devices to work better on larger models than other vendors.
ABS 1 days ago [-]
[flagged]
jubilanti 1 days ago [-]
> all of the existing features of Claude Code but somehow running locally on my laptop's neural engine
You can use environment variables to have claude code query literally any endpoint you choose as long as it has a compatible API.
5701652400 23 hours ago [-]
I would not mind if cloud was actually private users iCloud. users pay for it, and it runs in Apple servers next to where users store their iPhotos already. that would be really elegant solution.
..but instead we get Claude, hosted who-knows-where. maybe in X-AI datacenters? maybe in Amazon somewhere? who knows..
While I'm happy with Apple introducing this abstraction. my main concern was with local models.
I'd love using Gemma4 as an example. but thinking of a user. if 10 Apps each uses same model and downloads it, the phone will be bloated.
I still didn't understand if Apple provided a way for multiple apps uses same on-device model (without tricky namespaces and permissions).
I didn't see anything suggesting that's the case.
scosman 1 days ago [-]
I think that's what they are trying to avoid. If you need on-device intelligence, their pitch was "The model the device already has is best", and if you need something more specific an adapter (aka, a fine-tune/lora) is best.
They were wrong when their on-device model was way behind. They still might be right in the long term.
While multiple app I use might need Gemma 4 E4B, I use dozens of apps and app devs can choose from hundreds of models. A shared cache might reduce size a little when there's overlap, but the core problem still exists. If each app chooses a model disk and memory-swapping explode.
Its probably be better for device manufacturers to bake in a default. I'm not proposing they limit you from using others, but one shared default might be best developer/user experience for 99% of apps.
- Being warm in memory is the single biggest perf speedup you can get, and a default is much more likely to be warm.
- "Best model" is usually "best model for this device" given both RAM and compute. A developer can't test every device but Apple can/will.
- Each model needs to be optimized for the hardware (what's running on ANE, what's running on Metal, what's running on CPU). The default gets optimized.
- If you need custom model, a Lora is probably best (30MB, benefits from all of the above)
You could say the default should be swappable, but that's more a linux ideal than an Apple one so I doubt we ever see that. Plus there are real downsides: intentional or not, prompts end up optimized to the model they are developed for, so swapping the default system model would degrade every app.
scotty79 1 days ago [-]
But models aren't universally best, especially small ones. For text Gemma is great. For vision qwen3.6 is amazing.
scosman 18 hours ago [-]
I think the point is: if every app chooses the best model for their use case, my phone is hosed (disk, memory swap, memory). A good-enough default might be better for the user than the each app having the best possible.
jtfrench 1 days ago [-]
That's a great opportunity for Apple to provide a universal unique model ID protocol and some shared storage space to allow devs to register models.
I see an id based ability suggesting `modelId`. but in current docs I cannot find any context to it. The other limit is that it suggests Swift Packages. but I'm not seeing any model management hints similar to Docker/Ollama/etc where:
- Application can ask for specific model, if available use it. if not, ask to download it (or try some fallback / alternative)
- User can manage models. So as a user I can clean unused models (and for non-techie have something similar to offloading apps when unused for some period of time).
klausa 1 days ago [-]
The apps can use the system provided on-device model using the same framework and APIs; but there's no affordances to deduplicate custom models between apps.
satvikpendem 1 days ago [-]
That is exactly what foundation models are, yes. Same in Android with AICore which uses Gemma underneath, apps can query the LLM and receive responses back rather than bundling in their own model.
trvz 1 days ago [-]
Do you guys not have phones (with at least 1TB of storage)?
rock_artist 1 days ago [-]
Who’s “you guys” a developer from Bay Area? A student with a MacBook Neo? Or John Appleseed who bought basic iPhone 17e?
I have a Mac with 4TB of storage but it’s still annoying when every new AI app I try installs its own virtual environment with a fresh copy of Python, PyTorch, other duplicate libraries, and then models on top of that.
DrScientist 1 days ago [-]
As an occasional python user I'm always amazed and frustrated that it seems that the only way to be able to use/build anything is to create a whole separate environment.
And now given everybody now does this I guess the incentive to stop breaking stuff reduces even further.
Might as well have static binaries.
simondotau 1 days ago [-]
The meme phrase “it’s fractally wrong” applies to the entire python ecosystem, IMHO. Virtual environments are just another layer of this fractal wrongness in the layer cake of ecosystem awfulness.
It’s a nice language though.
kstrauser 1 days ago [-]
That’s exactly how NPM works, and how Cargo works by default. You can make npm install stuff globally, but that’s not recommended except for things like CLI tooling. Cargo builds every project in its own separate targets/ directory unless you manually configure it to share that dir between builds. In both cases, the default is to isolate your current project from everything else on the system.
The main difference is that Python use to make you have to know that the virtualenv existed. Now `uv run` and `poetry run` abstract that away so you don’t have to interact with it if you don’t want to.
DrScientist 6 hours ago [-]
I understand it's meme that operates well outside python - python seems particularly bad due to many packages having system dependencies in addition to package to package dependencies.
I'm just speculating that's it's a self reinforcing pattern - compatibility problems leads to isolated builds, which reduces peoples concern for backwards compatibility, which makes isolated builds ever more important.
Maybe it's fine - a trade off that allows greater velocity of development, it just seems attention to backwards compatibility is becoming a thing of the past.
whstl 1 days ago [-]
I have a couple small apps that have a (non-LLM) model, and originally the models and code were in PyTorch, built by Python devs.
The original plan was to ship Python. However I found out I can migrate them to CoreML, and now it's a model file + Swift code. I got some massive performance improvements as well.
Of course, this doesn't work at all for non-Mac environments, but it was nice to be able to do it. (Also doesn't solve the duplicate large models problem)
hedora 1 days ago [-]
It’d be nice if there was a standard like ~/.local/llm/hugging-face-name.gguf or something.
Python heaviness is a more fundamental problem.
ac29 1 days ago [-]
If you use uv, python apps use a shared cache which helps a lot.
fragmede 1 days ago [-]
No? iPhones don't come standard with that much storage.
Ok but don't expect Anthropic to help with local models, that'll be something apple rolls out themselves if at all
taneq 1 days ago [-]
Sounds ripe for block-level deduplication. :D Or an API that lets you request a model and handles caching.
GeekyBear 1 days ago [-]
This isn't Claude specific. Developers can also write apps that call Google's server based Gemini models.
> At WWDC, Apple announced that it's opening its Foundation Models framework to third-party cloud model providers. Starting with iOS 27, macOS 27, iPadOS 27, visionOS 27 and watchOS 27, model providers can implement the new public LanguageModel protocol to provide a common interface for model inference. We've made Gemini models available to the Foundation Models framework through the Firebase Apple SDK.
This provides a fully native development experience — cloud-hosted Gemini models can plug directly into the Foundation Models framework using the same API. That means the on-device Apple model and cloud-hosted Gemini models sit behind a shared API surface, so you can easily swap between local and cloud inference to fit your use case.
The important part is Apple rebranding “OpenAI-compatible API” to “language model protocol” and I think we should all rally around this immediately before we’re cursed with that awful tongue twister.
Is this Apple encouraging developers to go through their api abstraction layer to use LLMs so that when they launch their own (which I think we’ve heard they’ve been spending lots of money on training and might be somehow involved with Siri or current Apple AI?) that they can easily help devs make a seamless transition? Or is it just a developer nicety or something else?
tarcon 1 days ago [-]
Apple has some clever mechanics to protect user data. I had to work with App tracking stuff lately and their approach to keeping user details private with anonymized cohorts (SKAN, Differential Privacy) before reporting tracking events to third party platforms was surprisingly well thought out. There is value in having them in your loop if you care about privacy.
HDThoreaun 1 days ago [-]
My read of the ATT stuff is basically that it forced all the apps to use meta ad tracking because they’re the only ones who figured out how to serve relevant ads despite it.
drivebyhooting 1 days ago [-]
Figured out = do the forbidden PII join anyway with their partners in “clean rooms”.
HDThoreaun 1 days ago [-]
Right, the lesson here is that if you make rules with exploitable loopholes youre probably only going to end up strengthening malicious actors who are willing to exploit loopholes.
willis936 1 days ago [-]
It would be cool if they offered some kind of prompt sanitation option.
klausa 1 days ago [-]
This is support for a new framework that ships with reality/mac/iPad/watch/tv/iOS 27 (and that they've promised to open-source later in the year, so presumably you'll also be able to lean on this if you ship Swift on your backend).
The framework's whole deal is that it lets you use the same API to target either the device built-in models, the Apple-hosted online models (Private Cloud Computer), or write your own shims to call out to arbitrarily hosted online models.
You can then dynamically route your calls to a different kind of model/provider, using system APIs, without having to write your own abstraction layer over "I want to use local model for this, but I want to use Claude for that", or having to integrate your own API integration with Anthropic/OpenAI APIs.
It abstracts things like tool calling in one place; and has a bunch of other niceties/oddities (it keeps the same "transcript" going, even if you dynamically switch providers/models during a session) and some other things.
claud_ia 1 days ago [-]
[dead]
pprotas 1 days ago [-]
The cynic (or realist?) in my thinks this abstraction layer is Apple's way of making sure that users give their own Apple Intelligence credit for the underlying LLM functionality, even if another company is actually providing the LLM.
_the_inflator 1 days ago [-]
Assembled in Cupertino once more. ;)
coldtea 1 days ago [-]
Yeah, Apple just designs and writes the SoC, CPU, graphics unit, neural unit, compiler (Swift), OS, graphics layer, 3D API, core libs from graphics to persistence, filesystem, broadband chip, and a few more things besides...
saagarjha 1 days ago [-]
Notably good models are not on that list.
Danox 1 days ago [-]
AI models in the end are just commodities the computer using public is not going to pay for them directly, in short, they’re not gonna bail out OpenAI, Meta, Google, Microsoft, Anthropic.
geden 1 days ago [-]
Neither are other capex heavy items like chip fabs.
coldtea 1 days ago [-]
Yeah, they also don't mine their own steel and copper. Such mere assemblers!
coldtea 1 days ago [-]
Yeah, that totally makes them merely assemblers then /s
bigyabai 1 days ago [-]
Apple Silicon is broadly unused for LLM training. Arguably, Apple isn't even helping to assemble real-world AI models, just the thin client hardware.
Gareth321 1 days ago [-]
This is clearly because they plan to monetise AI in the future, and they don't want competition.
Danox 1 days ago [-]
They have competition, Microsoft and Nvidia, Google and Huawei long term…
1 days ago [-]
NorwegianDude 1 days ago [-]
A dark, but not totally unfair take: It makes it easier for Apple to take payment for the models others provide, and even allows Apple, if they want to, to use the data to build a dataset for training their own models based on how users use third party models. It's only on Apple devices this API is used, so they split up the market by not letting developers use the same system if they want things to work on iOS, locking users even more in.
oefrha 1 days ago [-]
Call it Intelligence Store and charge… wait for it… 30%.
cush 1 days ago [-]
This is genuinely the only way Apple will make it out of the intelligence era alive and not become the next IBM
aesthesia 1 days ago [-]
From the linked docs page:
> Requests go directly from your app to the Claude API; Apple is not in the request path and does not see prompts or responses. Usage is billed to your Anthropic account at standard API pricing. Your app decides when to use Claude and when to use Apple's on-device model: pass whichever model you want to each session.
thombles 1 days ago [-]
There are already on-device models that you can use through this framework as a developer. Claude would just be an additional one.
FinnKuhn 1 days ago [-]
Maybe they plan to have the providers pay for being the default model? So basically, what Google is doing right now for search engines. The difference however is that Google is making money with additional search requests while AIs are (as of now) losing money with additional requests. I don't see the business case for them yet though.
mathisfun123 1 days ago [-]
> which I think we’ve heard they’ve been spending lots of money on training and might be somehow involved with Siri or current Apple AI
Lol bro this is literally it this is the model they've been training (was Apple Foundation model not a big enough hint?)
mcintyre1994 1 days ago [-]
I think this is just Apple planning for their on-device models getting better, which makes sense given they have access to Gemini now. If developers use this for all their code calling an external LLM, then as Apple's model becomes more capable and covers more use cases it'll be easy to switch to it at individual call sites. That'll give apps better UX and save developers money on a bill that Apple doesn't get a cut of.
embedding-shape 1 days ago [-]
> That'll give apps better UX and save developers money on a bill that Apple doesn't get a cut of.
With other words, it's unlikely to happen as there is no money in it. Better for Apple to create some new subscription "AI" and "AI-lite" plans people can subscribe to, and since Apple is a company and we all know what those care about, it's unlikely to become a utopia of local models running on your phone.
criddell 1 days ago [-]
How does using Gemini lead to better on-device models?
halJordan 1 days ago [-]
Apple is distilling models from gemini
Danox 1 days ago [-]
Gemini is just a stopgap like using Intel processors or Qualcomm modems.
Danox 1 days ago [-]
UX is just another word for ecosystem building, which is what Apple does best in comparison to their competition and also doesn’t hurt to do hardware to go along with it. Microsoft and Nvidia aren’t teaming up for nothing.
VadimPR 1 days ago [-]
How can you practically use this in software if you're to deploy this to users? Asking a user to create and enter their own API key is a bar too high for good UX.
hajile 1 days ago [-]
The even bigger hurdle is selling token based pricing to normal (non-dev) users.
"You pay an indeterminant amount of money to ask a question and you might not even get the response you want without spending even more money" doesn't appeal to most people who aren't gamblers and explaining how "thank you" at the end of a long exchange can be expensive due to context is an even harder thing for an average person to swallow.
Token cost going up/down like a yo-yo also doesn't help. Normal users NEED fixed costs and don't want to expend energy constantly keeping up with the AI meta. "My subscription lasted much longer last month" isn't a winning problem either.
I think Apple is correct that Local LLM for most things is the future.
nate 1 days ago [-]
Ugh. It really is. I have allihat.com which is the only safari extension (i think still) that talks to claude. And it's well sought for. But you as a user have to enter a friggin claude api key. :( And I still don't grok their TOS around this. Like you can still type: ```setup-token Set up a long-lived authentication token (requires Claude subscription)``` but this seems like a trap? :) Whose using this? Doesn't this like insta break their TOS if you use that anywhere?
Right now for allihat.com I just let people use the Apple model locally if you don't feel like using the claude key. And my conversions to paying user shot up like 3x! But it really isn't a replacement obviously to claude. I was hoping Apple would make proxying to Claude some kind of thing they do for me so I also don't have to proxy to my own server just to try and manage API to Claude usage.
daralthus 1 days ago [-]
ppl pay for this?
Maxious 1 days ago [-]
> For production, route requests through your own back end with .proxied
The same way you did it before — by proxying the requests to your backend.
cush 1 days ago [-]
Users don’t give a API key. The docs show how to set up your backend proxy.
otter0 1 days ago [-]
First Microsoft has broken keyfabe by putting "Copilot is for entertainment purposes only" in the Copilot terms of use and putting warnings in copilot for excel "avoid using COPILOT for ... any task requiring accuracy or reproducibility ... Tasks with legal, regulatory or compliance implications".
Then Apple quietly refuses to participate by not investing tens or hundreds of billions in creating a competing LLM. Sure, they resell Claude for the marks or utilize Gemini to placate the gullible fools but they know what's up.
> Requests go directly from your app to the Claude API; Apple is not in the request path and does not see prompts or responses.
I know this is from a developer perspective. But as a consumer this is just funny.
saretup 1 days ago [-]
Why?
zkmon 1 days ago [-]
Coding agent itself an imposed layer. Now they are adding one more layer? Many times I think of coding agent as the vendor supervisor from the body shops of the 90's who promise the customer everything under the sky and thrash the poor contractor to deliver. Coding agents consume 10x more tokens just like how body shops charged their customers vs how they paid the contractors. For a simple test, the same task that makes the model to go out of context length when used via a coding agent, runs fine when prompted directly.
Layers are luxury and remove control and transparency.
klausa 1 days ago [-]
You wouldn't use this when building a coding agent.
hedora 1 days ago [-]
How else will I run my coding agent on your Mac without having you download a second LLM and double your memory usage?
_pdp_ 1 days ago [-]
From app developer standpoint why would anyone ship claude keys like that ... or am I missing something? From consumer standpoint - I guess they can use their own keys but it is not something that is very user friendly as you can imagine.
1 days ago [-]
nl 1 days ago [-]
it says:
Proxy (production)
For production, route requests through your own back end with .proxied. The relay at baseURL adds the Claude API credential server-side, so the app ships no key. The headers you provide are sent on every request so your proxy can authorize the caller.
This seems smart. Apple, despite not really leading in AI themselves, are right on the hot path of where developers are going to yolo slop into the ecosystem. Make a tonne of sense to define a nice clean API that places like Anthropic can build on top of and expose to developers.
It's also smart for them to make sure the billing is going direct from Anthropic to the developer. The initial thought is "That means Apple's not taking a cut", but from the other side of it, developers who use this API are going to have to expose that cost to customers somehow, and that translates to subscription/InAppPurchase etc. on top of which Apple will get it's 30%.
mark_l_watson 1 days ago [-]
I think Apple has a fairly good plan for supplying a common API and default on device models.
What confuses me about this article is: The code examples Python, Ruby, etc.) look to me like the original Anthropic APIs, not Apple’s abstraction. Did I miss something?
pgt 1 days ago [-]
I’m surprised to see the model names hardcoded as an enum (e.g. `.sonnet4_6`), instead of a string with model discovery so that the user can select their preferred model without having to get a new app version through the App Store to support newer models.
klausa 1 days ago [-]
>Model identifiers are values of ClaudeModel. Use a compiled-in constant, or construct one with explicit capabilities for an ID that isn't compiled in yet (see Capabilities):
Special emphasis on the "isn't compiled in yet" and "or construct one" bit.
theopsimist 1 days ago [-]
Is this included in the free AI tier for small developers? Big news if so
21-DOT-DEV 1 days ago [-]
> Usage is billed to your Anthropic account at standard API pricing.
While expected, it’s still a bummer.
isoprophlex 1 days ago [-]
The pricing squeezes will continue until token spend improves!
gregman1 1 days ago [-]
So actually the most successful AI was OpenRouter Intelligence? Pronounced as OÏ.
ryanshrott 1 days ago [-]
Shared daemon is the only way this makes sense on-device. A 3B model at 4-bit is roughly 2GB - three apps loading their own copies would eat an 8GB phone.
HelloUsername 1 days ago [-]
Does "Apple Intelligence" need to be Turned On for this as well?
1 days ago [-]
cush 1 days ago [-]
Since Claude is technically a subscription, Apple will slowly weasel their way into skimming 30% of the token spend
hmokiguess 1 days ago [-]
How does it work now though? There is a Claude app on iOS
londons_explore 1 days ago [-]
> A key bundled into an app is extractable from the shipping binary, and anyone who extracts it can make requests billed to your account. Use .apiKey for development only, and switch to a proxy before release.
I don't like this model. Then all the user data is visible to the proxy.
Far better would be some kind of micro payment architecture where a wallet is on the users device and coins are attached to each request.
We just need to live in the alternate universe where micro payments succeeded.
me551ah 1 days ago [-]
So where does the api key reside? You can’t ship it on the iOS client since anyone can read and abuse it
Misleading title. This is about Claude for Apple Foundation Models, not about Apple Foundation Models
simianwords 1 days ago [-]
Serious question: this looks like a thin library on an API. Why is it a big deal?
hedora 1 days ago [-]
Shared daemon (as others pointed out), and, later shared revenue, probably with Apple receiving payments to ship ad-laden, “editorialized” models. Hopefully, it’ll go the other way, and Apple will subsidize high quality model training.
xducn1 1 days ago [-]
[flagged]
64lamei 1 days ago [-]
[flagged]
mlpicker 1 days ago [-]
What I'm curious about is whether this is actually on-device. Apple's framework caps local models around 3B params last I looked, and Claude is way bigger than that. So either there's some hybrid setup I haven't seen documented, or this is mostly a Claude SDK in FM clothing. Anyone tried it on a plane?
brookst 1 days ago [-]
Read the linked article? It is absolutely a cloud service. Neither Apple nor Anthropic is suggesting otherwise
ABS 1 days ago [-]
it's cloud, the doc is explicit that requests go straight to api.anthropic.com with Apple not in the way.
so Claude via FM dies offline while Apple's on-device SystemLanguageModel (the ~3B one) keeps working. It isn't a hybrid really: the framework just has both implement the same LanguageModelSession protocol so "local 3B" and "remote frontier model" become a one-argument swap.
IMHO what's worth internalising is that the two share an API but nothing else: the on-device path runs on Apple's Neural Engine and costs battery (you can watch ANE power ramp while it works) while the cloud path costs API credits/tokens and does zero local compute. Same code, opposite cost model.
swordlucky666 1 days ago [-]
[flagged]
tonyoconnell 1 days ago [-]
What it is
Apple's Foundation Models framework (shipping in iOS 27 / macOS 27 this fall) is the standard Swift API for on-device AI — the same API Apple uses for their own small model. This package makes Claude plug into that same API as a drop-in swap.
// Apple's on-device model
let session = LanguageModelSession(model: SystemLanguageModel.default)
// Claude — same API, just different model constructor
let session = LanguageModelSession(model: ClaudeLanguageModel(name: .sonnet4_6, auth: auth))
One API, two tiers. You write your app once against the Foundation Models protocol. On-device model handles fast/free/private tasks; Claude handles heavy reasoning, long context, or capability gaps — you swap the model, not your code.
You don't call the Anthropic API directly. Apple's framework handles streaming, tool calling, and structured output (@Generable) — you just get Claude's capability through it.
stackedinserter 1 days ago [-]
I'm not sure if I want to touch anything Anthropic anymore.
hedora 1 days ago [-]
OpenAI is worse from a public policy standpoint, and apparently Fable was yanked at Amazon’s request.
Enough is enough. I’m seriously evaluating open models this week.
hit8run 1 days ago [-]
Why would I want a nerfed model?
insumanth 1 days ago [-]
This was expected.
Apple will carefully choose what & how people can use AI in their ecosystem and will make sure of it. I hope "Apple Foundation Models" Eco-system grows with support from major model providers.
Rendered at 18:36:20 GMT+0000 (Coordinated Universal Time) with Vercel.
Seems that the UX will be enough to win over users and investors
They are a hardware company and will keep selling the best machine for AI use. Well done.
OpenAI and Anthropic may have gone silent on how they build their models, but other companies have different incentives.
China will spend all of the money required to catch up, Google and OpenAI will both spend money to catch up as well. NVidia and others will not allow a frontier lab to become the AI bottleneck.
Isn’t this the problem inference (training) a model is designed to solve :)))
And it's a hard problem.
What's an easier form of training is being able to see the intermediate results and train to imitate them.
I think Evans is completely wrong. There are only 2 truly frontier models. (at least for now). And Anthropic seems to be leaving OpenAI behind so there might be only 1 in the near future. (which is scary/dangerous)
I wish there was a case where I find Evans is wrong. As far as my memory served me, I failed to record a single one.
I disagree that Amazon, Meta, Microsoft, and Google are "well" behind. If anything the frontier model advantage seems to be at best 6 - 9 months. And that the Chinese model are all doing well.
One of Steve Jobs's line, "It is a feature, not a product." Even if Apple were a generation behind or 1 year behind frontier model. The advantage of default is enough to hold a lot of its user.
To put it simply, even if OpenAI or Anthropic were better, there is zero chances they would topple Apple in hardware sales, user or ecosystem. On the other hand, even if Apple's AI were 6 - 9 months or a generation behind, most user would settle for it and damage OpenAI / Anthropic.
Do you mean Google's AI with Apple wrappers? Apple's in-house AI is further behind Google, amd very far from the frontier according to your ranking. IMO, Google is on the frontier - I recall Altman calling for an OpenAI all-hands-on deck when Gemini was released because of how good it was compared to ChatGPT. I also suspect Google has the lowest operating expenses due to scale, experience and luck/planning (TPUs), there will come a time when AI investments will slow down, and the cost of revenue will become more important.
If anything Apple should notice it is Anthropic has got a really good marketing team and it would be no shame if they pick a trick or two from them.
employees will always suffer.
Anthropic and OpenAI are far behind state of the art for the entire curve except the “extremely expensive for barely measurable improvements” part.
GLM is probably the third most expensive frontier model (benchmarks and reviews will say for sure), and is apparently ~Opus 4.6 for 10% the inference cost.
The last I checked, qwen was still owning the 24-32GiB RAM range (it runs reasonably without a GPU!) and somewhere around 3.5-4 generation models.
Also, even anthropic says Mythos ~= ChatGPT 5.5, so it’s unlikely either one is leaving the other behind. The big problem they both have is they asked for the government to gate keep model releases and use cases, and their wish was granted.
That’s knocked them back 6 months already. Anthropic’s only frontier offering has been taken down.
It's like the difference to talking to two smartest kids in a class, but one really belongs a grade higher - and the other hasn't learned yet to ask the questions that encourage it to dig in that little bit more for the additional multi-order effects.
I didn’t use it on big enough tasks to notice any improvement.
I had been hitting plan limits pretty regularly, but fixed it by changing my workflow. That also increased the success rate of claude by an order of magnitude.
Spend for compute seems like it needs to increase to get the next iterations of models, and even if they IPO the money might run out before they can solidify their revenue streams.
All while Google just needs to survive long enough with their good-enough models and do it without really putting themselves in any existential financial risk.
And ideally the chinese models are also still there keeping everyone honest.
The true dystopic worst case is a Google monopoly on cutting edge AI.
Truly fascinating ecosystem and community in general, as experiences differ so wildly. Anthropic's models seems far behind OpenAI to me, especially when you get into "Pro" territory, and there doesn't seem to be any worthy competition to Pro Mode available at all.
And this is said with someone who use both platforms, and spend a lot of my day interacting with agents and LLMs in various ways. The interesting part is that probably so do you too, and probably your experience and what you share lines up with what you experience! Yet we come away with basically opposite takeaways :) I don't think either of us are wrong either, somehow.
I've noticed that depending on how you talk to it, you get wildly different outputs. This seems to happen less with Opus: it mostly understand what I want. GPT is often a bit too literal.
Just my two cents.
Yeah, exact prompting matters a lot, seemingly more than people think. There is definitely tradeoffs between how literal the models takes the prompts, on one hand it's useful for the model to ignore their own instinct when you know better, so they don't go chasing geese randomly, but on the other hand it's useful sometimes when they self-direct, when you misworded something and it's obvious you meant something different because of the context, and similar things. They're basically good at different things.
Really agree every model isn't equal and they aren't as interchangeable without adjusting how you prompt them as people seem to think.
At which point it’s fair to reject the commoditization label.
Also missing from these discussions are e.g. Qwen, which is at least as good as one back from OpenAI or Anthropic’s frontiers.
They're missing in the discussion because the ones you can run locally, aren't actually "one step away from other closed-source labs" in practice when you use them. They might benchmark as such, but they're sadly far away from measuring up to those scores except for very specific use cases, even when you have say 96GB of VRAM available to run the bigger models even most (at home) consumers won't be able to run.
And they probably won’t be for at least another decade. Comparing like with like, flagship model running on the best hardware it can run on, Qwen is close.
I wish so badly this was true, but sadly today it just isn't.
But what I think a lot of people miss is that the market for the truly bleeding edge (developing bio-tech, building the most sophisticated software stacks (probably with a tilt towards simulation, GPU kernel optimization, etc)) is not the whole market.
There's a plethora of use-cases for models that are not on the bleeding edge. If I can solve my relatively simple problems with an off-the-shelf model for a minuscule fraction of the cost of the frontier, I'm going to.
Its somewhat of a myth that you need the most advanced, expensive model for software development.
Well, in domains like SWE where Anthropic's putting in the effort. I don't they'll make the claims that OpenAI makes about how their models are pushing the life sciences forward, for example.
Some of the harness even let you run a local model for most things, and only pay for the latest frontier models when needed, which cuts down cost drastically.
Fable might well be a better model but it’s too expensive for everyday AI use. Definitely if we’re talking about the kind of stuff you’re going to want to do on your phone. Even for coding, I’m not going to reach for Fable (well, when I can…) for 95% of the work I do.
I don’t believe a mature AI industry is going to have a one size fits all, single winner.
Most of the ones that survived did so due to being able to pick up distressed assets and at values that could then be profitably monetized - a move that it would not surprise me to see repeat itself in the LLM space (we'll see).
The fact that telcos couldn't charge rent was a primary reason the Internet was so successful.
Remember $0.10 per text message? You bet in some alternate timeline AT&T charges $0.10 per webpage visit and we're stuck on 100kbps connections because the monopoly doesn't want to innovate.
Extremely tangential, but this is my favourite upshot of AI. For decades, companies have been walling off their services and forcing us into their fuckass UIs. Now over the course of the last twelve months, suddenly everything has an MCP and I can use it through my command line chat interface.
Any company that doesn't adapt gets so hammered by people's AI-DIY web scrapers that they have no choice but to cave.
But we can imagine that the balance of what's on-device vs what's remote will move continuously towards the former as time, improved HW and improved local models keep progressing
From a user’s perspective, it doesn’t matter.
Anthropic literally says "Requests go directly from your app to the Claude API; Apple is not in the request path and does not see prompts or responses." — Apple straight up lied
the Swift package for Claude for Foundation Models is about sending calls to Claude. That had nothing to do with Apples models which do use local models and models on Private Cloud Compute.
Your accusation that "Apple straight up lied" is based on misunderstanding TFA.
They’re typically a bit better on high TDP stuff, and a bit worse on low TDP. They mostly match in the middle. I have a $500 AMD NUC and a slightly older $2000 MBP. Inference throughput is within 2x.
The comparison is a little messy: AMD currently maxes out at 128GB of RAM vs Apple’s discontinued 512. Apple has nothing to rival the Steam Deck.
Android succeeded at this to an extent with phones, but Apple has been able to keep its products differentiated enough in the minds of consumers to maintain their premium pricing. So far.
That API has no user-facing components, and has no influence over UX of what the end-users are interacting with.
The users won't know if you used Foundation Models API or integrated with OpenAI/Anthropic/Gemini SDK directly.
That's the point! That's the whole "white-labeling" part, and what the commentator earlier is talking about. You're very close in understanding the context here!
I'd genuinely like to understand where you're coming from more.
I think we're all in agreement that this framework is very much about letting developers swap the models easily, and treat them as commodities. That seems pretty obvious.
I do however still don't see how this has anything to do with controlling the UX (or the new Siri for that matter! The new Siri doesn't use Anthropic models, and there are no extensions point for it to do so — that's pretty much the whole reason why it won't be available in the EU).
Help me see your point of view!
The way I see it, isn't about what is immediately there right now today, but what intent it signals, or what path Apple is planning. Yes, today it's ClaudeForFoundationModels, but the FoundationModels stuff will be used to allowed switching between models, probably without users noticing, and who knows what Apple will ultimately surface to users, tends to be in the direction of less user-control.
But there is a lot of assumptions, guesses and extrapolation from that, I think you're right if you focus only what's there right now, rather than trying to "see into the future" which harrouet basically started doing with their root comment.
Same is happening to Claude software package as it would stand behind branded Apple foundation models. From pure software developer thinking this is exactly what Claude offered here so where is the issue? Issue is in larger space where Apple could take steps to block Claude out of their ecosystem if they so wish at some point and there is little Claude / Anthropic would do if Apple Foundation is the only thing that Apple consumers would know about.
But this is very much _not_ what this is.
Apple showed a bunch of new APIs at WWDC last week. One of this is a way for a developers to interact with LLM's in a way that let's you easily swap out models (with a bunch of other niceties around it), including swapping between on-device and remote models.
This is _Anthropic_ (not Apple!) shipping their support for that framework, so you can also switch between different Anthropic models using the same APIs you'd use to swap between a local or PCC model.
I expect OpenAI will probably ship their shims in the next couple of weeks too? (You can probably vibe-code one in half an hour if you point Codex at the Anthropic one, tbh).
(Apple also doesn't use "Apple Foundation Model" anywhere in the user-facing marketing materials AFAICT, this is strictly developer facing terminology, but I could be wrong?)
My impression is that people are _wildly_ misunderstanding what this _actually_ is, and running wild with speculation/interpretation.
Are you thinking about Intents? That lets Siri interact with data (and perform some actions in them) from your apps, but it is something completely different.
You can definitely expose things from your app via Intents that will end up calling an external arbitrary LLM somewhere, but it does not require using Foundation Models API whatsoever.
Now if they can further reinforce their angle on Privacy, they might continue to be what they are (or more)
Ahh I was hoping for the opposite: all of the existing features of Claude Code but somehow running locally on my laptop's neural engine. A pipe dream on an M2 with 8 GB of RAM, but I had a flicker of hope there.
https://developer.apple.com/videos/play/wwdc2026/232/ https://www.youtube.com/watch?v=wykPErJ8M-8
In 10 years, I hope my MacBook Pro can run today's frontier models and has 1TB of unified Memory.
GIGABYTE G383-R80-AAP1 for example
I don't think you understand why people buy Nvidia hardware if you're beating the "just add more dual channel DDR, bro" drum. Apple wouldn't even be able to extinguish AMD with a product like that, it's all slow memory being fed into a raster-first GPU architecture.
You can use environment variables to have claude code query literally any endpoint you choose as long as it has a compatible API.
..but instead we get Claude, hosted who-knows-where. maybe in X-AI datacenters? maybe in Amazon somewhere? who knows..
I'd love using Gemma4 as an example. but thinking of a user. if 10 Apps each uses same model and downloads it, the phone will be bloated.
I still didn't understand if Apple provided a way for multiple apps uses same on-device model (without tricky namespaces and permissions).
I didn't see anything suggesting that's the case.
They were wrong when their on-device model was way behind. They still might be right in the long term.
While multiple app I use might need Gemma 4 E4B, I use dozens of apps and app devs can choose from hundreds of models. A shared cache might reduce size a little when there's overlap, but the core problem still exists. If each app chooses a model disk and memory-swapping explode.
Its probably be better for device manufacturers to bake in a default. I'm not proposing they limit you from using others, but one shared default might be best developer/user experience for 99% of apps.
- Being warm in memory is the single biggest perf speedup you can get, and a default is much more likely to be warm.
- "Best model" is usually "best model for this device" given both RAM and compute. A developer can't test every device but Apple can/will.
- Each model needs to be optimized for the hardware (what's running on ANE, what's running on Metal, what's running on CPU). The default gets optimized.
- If you need custom model, a Lora is probably best (30MB, benefits from all of the above)
You could say the default should be swappable, but that's more a linux ideal than an Apple one so I doubt we ever see that. Plus there are real downsides: intentional or not, prompts end up optimized to the model they are developed for, so swapping the default system model would degrade every app.
- Application can ask for specific model, if available use it. if not, ask to download it (or try some fallback / alternative)
- User can manage models. So as a user I can clean unused models (and for non-techie have something similar to offloading apps when unused for some period of time).
And now given everybody now does this I guess the incentive to stop breaking stuff reduces even further.
Might as well have static binaries.
It’s a nice language though.
The main difference is that Python use to make you have to know that the virtualenv existed. Now `uv run` and `poetry run` abstract that away so you don’t have to interact with it if you don’t want to.
I'm just speculating that's it's a self reinforcing pattern - compatibility problems leads to isolated builds, which reduces peoples concern for backwards compatibility, which makes isolated builds ever more important.
Maybe it's fine - a trade off that allows greater velocity of development, it just seems attention to backwards compatibility is becoming a thing of the past.
The original plan was to ship Python. However I found out I can migrate them to CoreML, and now it's a model file + Swift code. I got some massive performance improvements as well.
Of course, this doesn't work at all for non-Mac environments, but it was nice to be able to do it. (Also doesn't solve the duplicate large models problem)
Python heaviness is a more fundamental problem.
> At WWDC, Apple announced that it's opening its Foundation Models framework to third-party cloud model providers. Starting with iOS 27, macOS 27, iPadOS 27, visionOS 27 and watchOS 27, model providers can implement the new public LanguageModel protocol to provide a common interface for model inference. We've made Gemini models available to the Foundation Models framework through the Firebase Apple SDK.
This provides a fully native development experience — cloud-hosted Gemini models can plug directly into the Foundation Models framework using the same API. That means the on-device Apple model and cloud-hosted Gemini models sit behind a shared API surface, so you can easily swap between local and cloud inference to fit your use case.
https://blog.google/innovation-and-ai/technology/developers-...
Protocol in this context means a Swift language feature, like interface in some other languages: https://docs.swift.org/swift-book/documentation/the-swift-pr...
The framework's whole deal is that it lets you use the same API to target either the device built-in models, the Apple-hosted online models (Private Cloud Computer), or write your own shims to call out to arbitrarily hosted online models.
You can then dynamically route your calls to a different kind of model/provider, using system APIs, without having to write your own abstraction layer over "I want to use local model for this, but I want to use Claude for that", or having to integrate your own API integration with Anthropic/OpenAI APIs.
It abstracts things like tool calling in one place; and has a bunch of other niceties/oddities (it keeps the same "transcript" going, even if you dynamically switch providers/models during a session) and some other things.
> Requests go directly from your app to the Claude API; Apple is not in the request path and does not see prompts or responses. Usage is billed to your Anthropic account at standard API pricing. Your app decides when to use Claude and when to use Apple's on-device model: pass whichever model you want to each session.
Lol bro this is literally it this is the model they've been training (was Apple Foundation model not a big enough hint?)
With other words, it's unlikely to happen as there is no money in it. Better for Apple to create some new subscription "AI" and "AI-lite" plans people can subscribe to, and since Apple is a company and we all know what those care about, it's unlikely to become a utopia of local models running on your phone.
"You pay an indeterminant amount of money to ask a question and you might not even get the response you want without spending even more money" doesn't appeal to most people who aren't gamblers and explaining how "thank you" at the end of a long exchange can be expensive due to context is an even harder thing for an average person to swallow.
Token cost going up/down like a yo-yo also doesn't help. Normal users NEED fixed costs and don't want to expend energy constantly keeping up with the AI meta. "My subscription lasted much longer last month" isn't a winning problem either.
I think Apple is correct that Local LLM for most things is the future.
Right now for allihat.com I just let people use the Apple model locally if you don't feel like using the claude key. And my conversions to paying user shot up like 3x! But it really isn't a replacement obviously to claude. I was hoping Apple would make proxying to Claude some kind of thing they do for me so I also don't have to proxy to my own server just to try and manage API to Claude usage.
Apple is offering developers with less than 2 million downloads free AI models via their servers https://techcrunch.com/2026/06/08/apple-bets-cheaper-ai-will...
Then Apple quietly refuses to participate by not investing tens or hundreds of billions in creating a competing LLM. Sure, they resell Claude for the marks or utilize Gemini to placate the gullible fools but they know what's up.
https://www.microsoft.com/en-us/microsoft-copilot/for-indivi...
https://support.microsoft.com/en-US/Excel/copilot-function
I know this is from a developer perspective. But as a consumer this is just funny.
Layers are luxury and remove control and transparency.
Proxy (production)
For production, route requests through your own back end with .proxied. The relay at baseURL adds the Claude API credential server-side, so the app ships no key. The headers you provide are sent on every request so your proxy can authorize the caller.
https://platform.claude.com/docs/en/cli-sdks-libraries/libra...
It's also smart for them to make sure the billing is going direct from Anthropic to the developer. The initial thought is "That means Apple's not taking a cut", but from the other side of it, developers who use this API are going to have to expose that cost to customers somehow, and that translates to subscription/InAppPurchase etc. on top of which Apple will get it's 30%.
What confuses me about this article is: The code examples Python, Ruby, etc.) look to me like the original Anthropic APIs, not Apple’s abstraction. Did I miss something?
Special emphasis on the "isn't compiled in yet" and "or construct one" bit.
While expected, it’s still a bummer.
I don't like this model. Then all the user data is visible to the proxy.
Far better would be some kind of micro payment architecture where a wallet is on the users device and coins are attached to each request.
We just need to live in the alternate universe where micro payments succeeded.
They are.
so Claude via FM dies offline while Apple's on-device SystemLanguageModel (the ~3B one) keeps working. It isn't a hybrid really: the framework just has both implement the same LanguageModelSession protocol so "local 3B" and "remote frontier model" become a one-argument swap.
IMHO what's worth internalising is that the two share an API but nothing else: the on-device path runs on Apple's Neural Engine and costs battery (you can watch ANE power ramp while it works) while the cloud path costs API credits/tokens and does zero local compute. Same code, opposite cost model.
Apple's Foundation Models framework (shipping in iOS 27 / macOS 27 this fall) is the standard Swift API for on-device AI — the same API Apple uses for their own small model. This package makes Claude plug into that same API as a drop-in swap.
One API, two tiers. You write your app once against the Foundation Models protocol. On-device model handles fast/free/private tasks; Claude handles heavy reasoning, long context, or capability gaps — you swap the model, not your code.You don't call the Anthropic API directly. Apple's framework handles streaming, tool calling, and structured output (@Generable) — you just get Claude's capability through it.
Enough is enough. I’m seriously evaluating open models this week.