It's good there is a new player on the market, I take benchmark tables with a grain of salt, however. Speaking about model presentation it's funny to see how clearly their website is inspired by other AI company blogs with extra innovation of hijacked scrollbar.
10 minutes ago [-]
keeda 2 hours ago [-]
> Second, clean data. MAI-Thinking-1 was trained on clean and appropriately licensed data, with AI-generated content excluded from pre-training. This matters for quality, provenance, and control. If we cannot account for what shaped a model, we cannot fully understand its behavior or credibly improve it.
Shots fired?
It would be interesting to see how far "clean data" can go on the scaling laws.
foresterre 17 minutes ago [-]
I would really like to see what "appropriately licensed data" means. Cannot imagine they didn't copy all open repo's on GitHub, and can't imagine they asked for permission, or are reproducing license texts from these repo's now. It sounds hand wavy.
P.S. A fairly basic website otherwise, but it unfortunately seems to be hacking scroll for no good reason.
stingraycharles 13 minutes ago [-]
I assume they took the actual repos’ licenses info account. I don’t understand why they should ask for permission when the license would already allow for it.
rocqua 7 minutes ago [-]
Which licenses allow usage for training? MIT, BSD, etc likely do. But I would expect it gets weird for all the various copyleft licences.
supermdguy 4 minutes ago [-]
It's interesting because their last model series (Phi) was based around the thesis that high-quality synthetic data is better than a large pre-training corpus.
vdfs 1 hours ago [-]
I doubt any lab would say otherwise, they all _claim_ to use licensed data
keeda 56 minutes ago [-]
Maybe, but Microsoft, through their partnership with OpenAI, is already involved in major copyright lawsuits. That is probably a driving force for this move, actually... I doubt they would want to tempt fate while those lawsuits are on-going.
48 minutes ago [-]
onlyrealcuzzo 2 hours ago [-]
I'm interested how much "Clean Data" is synthetic data from "unclean" models...
bicx 44 minutes ago [-]
So, laundered data?
ertgbnm 1 hours ago [-]
> with AI-generated content excluded from pre-training.
> without distillation from third-party models
sounds like zero unless they are lying.
zamalek 1 hours ago [-]
> with AI-generated content excluded from pre-training.
Though this is largely impossible these days, unless they pre-trained on pre-AI era data.
saghm 17 minutes ago [-]
"how many of those shapes are rectangles?" "sounds like zero unless they are squares"
Adding "unless" to a statement makes it vacuous if the latter clause is weaker than the first clause. I find it hard to believe that a company willing to violate licenses would have scruples about lying about it.
rocqua 3 minutes ago [-]
Not vacuous, but tautological.
Which is different, because tautologies can actually be quite directly informative. Whereas vacuous truths tend to be oblique.
Also, “Microsoft is lying” is not a logically stronger statement, because they might be lying about something other than whether they distilled or trained on AI output.
chongli 9 minutes ago [-]
Adding "unless" to a statement makes it vacuous if the latter clause is weaker than the first clause
I think that's the point. "How do I say they're lying without outright saying they're lying?"
It's a common rhetorical trick.
40 minutes ago [-]
xavriley 2 hours ago [-]
“ We trained it from the ground up on enterprise grade, clean and commercially licensed data, without distillation from third-party models.”
azinman2 1 hours ago [-]
aka all of GitHub OSS
ChicagoDave 20 minutes ago [-]
Yeah this is exactly what I was thinking.
Alifatisk 23 minutes ago [-]
> MAI-Thinking-1 is built with enterprise readiness in mind. It supports long context with a 256k token window
Isn’t 1M becoming the norm?
vb-8448 5 minutes ago [-]
1M it's only marketing, in my experience above 150k quality noticeable drops.
Claude code will suggest you to start a new session or compact if you go above 100k.
stingraycharles 12 minutes ago [-]
Yes it is, but I can imagine that they want to start out a bit smaller to see how well things scale, and/or did not yet have the time to work on optimizing for the large context windows.
droidjj 9 minutes ago [-]
I struggle to get quality results from the frontier models at contexts > 256k anyway.
Centigonal 18 minutes ago [-]
> MAI-Thinking-1 is a 35B-active, ~1T-total parameters, sparse Mixture of Experts model, a smaller inference footprint than much larger models.
This seemingly nonsensical sentence (of course this will have a smaller inference footprint than larger models) suggests this model's competitors have larger inference footprints and total parameter sizes.
pixeldash928 3 hours ago [-]
Looks like the OAI divergence is finally taking place. Seems like the comparisons are mainly with Opus 4.6 and GPT 5.4 though. Still, exciting to see a new frontier player.
i_have_an_idea 1 hours ago [-]
Is it a frontier player though, or perhaps a new benchmaxxed model? People were saying similar things about Grok but it ultimately amounted to little.
wasabi991011 43 minutes ago [-]
"preferred by humans over Sonnet 4.6" makes it pretty clearly not benchmaxxed though.
At least when you define benchmaxxed as "good in benchmarks but not human preference".
dude250711 13 minutes ago [-]
Post 4.6 Anthropic models do not exactly have a stellar reputation, so that choice is smart.
kaicianflone 12 minutes ago [-]
Is that a pretext zoom effect when changing screen dimensions? Very cool.
BeetleB 58 minutes ago [-]
Based on the first table, why would I pick this over GLM?
missedthecue 38 minutes ago [-]
Because your employer might make you exclusively use enterprise copilot.
BeetleB 8 minutes ago [-]
As long as my employer is footing the bill, fine.
For personal stuff this release is not noteworthy.
gigatexal 7 minutes ago [-]
Anyone believing those benchmark numbers from a 35B model?
jeffdn 3 minutes ago [-]
It says right at the top, 35B active, 1T total.
lordmauve 2 hours ago [-]
We need to see DeepSWE scores. SWE Bench Pro is junk.
hartator 41 minutes ago [-]
I like it so much when a website hijacks the way my scroll works. This is truly innovative.
wmf 1 hours ago [-]
At least there shouldn't be any complaints about benchmaxing this time.
i_have_an_idea 1 hours ago [-]
Just because it is performing rather poorly by comparison, it doesn’t mean it isn’t benchmaxxed. It can still be worse than it appears.
wasabi991011 41 minutes ago [-]
It isn't benchmaxxed because they are using human preference as an evaluation.
kstenerud 1 hours ago [-]
They've hijacked scrolling. They've hijacked the spacebar. It flickers like crazy when I try to move through the article. Trying to get through it is an exercise in madness.
t-sauer 1 hours ago [-]
I do not understand how scroll hijacking is still a thing. Who thinks this is a better experience?
maelito 55 minutes ago [-]
Designers.
grassfedgeek 35 minutes ago [-]
Even without flicker it is very distracting. Why do people think this is a good idea?
AirMax98 1 hours ago [-]
I normally don't comment on matters of taste like this, but wow this is brutal. It's like someone threw the site in a vat of molasses.
blisstonia 45 minutes ago [-]
I gave up after the first scroll.
aniceperson 1 hours ago [-]
there is also a gap between the header and the top of the page... they should ask the ai to make it better a few more times...
vcryan 38 minutes ago [-]
It really looks like they used Claude to design this webpage. I guess the color taupe it the marker of good AI today.
Handy-Man 27 minutes ago [-]
Inflection AI
1 hours ago [-]
bossyTeacher 2 hours ago [-]
7 modes launched. 5 models in the dropdown. Only 4 actually usable :(
About time Microsoft joined the fray. After the OpenAI divorce, it really looked like Microsoft was going to become another Uber.
giancarlostoro 1 hours ago [-]
They still own 27% of OpenAI, this IPO will feed them a lot of easy cash.
simjnd 3 hours ago [-]
Absolutely disgusting scroll jacking, even when "Accessibility mode" is turned on
dang 2 hours ago [-]
I'm sure most of us agree, but:
"Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting."
Shots fired?
It would be interesting to see how far "clean data" can go on the scaling laws.
P.S. A fairly basic website otherwise, but it unfortunately seems to be hacking scroll for no good reason.
> without distillation from third-party models
sounds like zero unless they are lying.
Though this is largely impossible these days, unless they pre-trained on pre-AI era data.
Adding "unless" to a statement makes it vacuous if the latter clause is weaker than the first clause. I find it hard to believe that a company willing to violate licenses would have scruples about lying about it.
Also, “Microsoft is lying” is not a logically stronger statement, because they might be lying about something other than whether they distilled or trained on AI output.
I think that's the point. "How do I say they're lying without outright saying they're lying?"
It's a common rhetorical trick.
Isn’t 1M becoming the norm?
Claude code will suggest you to start a new session or compact if you go above 100k.
This seemingly nonsensical sentence (of course this will have a smaller inference footprint than larger models) suggests this model's competitors have larger inference footprints and total parameter sizes.
At least when you define benchmaxxed as "good in benchmarks but not human preference".
For personal stuff this release is not noteworthy.
About time Microsoft joined the fray. After the OpenAI divorce, it really looked like Microsoft was going to become another Uber.
"Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting."
https://news.ycombinator.com/newsguidelines.html