Rendered at 21:38:33 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
holistio 17 hours ago [-]
You pay $200/month to Anthropic, $200/month to OpenAI, $200/month to Cursor, $200/month to $200/month to Google, and seeing that it didn't come to a nice round $1024/month, you pay $200/month to Sakana to coordinate it all, because why not.
While you're at it, feel free to send me $200 as well, I'll generate a crypto address ending with "AI".
holistio 17 hours ago [-]
TIL: I just found out that base58 disallows I (capital i), l (lowercase L), O (capital o) and 0 (zero), so I could only generate GrxoJt4eNXE2QaQ55iPSa7hhiYdzCo8ZeAuokmh2Cai.
(don't send anything, sharing only because of the base58 fun fact I didn't know)
IdiotSavage 9 hours ago [-]
More fun facts:
Omitting those characters makes it good for generating passwords if they need to be typed in by hand.
Double-clicking a base58 string always selects the whole string and it doesn't wrap accidentally, thanks to missing / and +, so it's also convenient to copy and paste.
wasabi991011 3 hours ago [-]
Unfortunately, no special characters means that a base58 string will often be rejected as a secure enough password.
ricardobeat 7 hours ago [-]
My current setup:
$20/month: Claude Code
$10/month: Minimax
$16/month: Xiaomi Mimo
$10/month: Opencode Go
Opus at low/medium effort generates plans. Then several coordinator/worker pairs are possible: DeepSeek v4 Pro + Minimax M3, Mimo v2.5 Pro + Mimo v2.5, Mimo + Minimax, Sonnet 4.6 + Haiku. I've been running hundreds of long multi-agent sessions, topped up extra credits here and theere, but haven't reached $200/month spend yet. Relying entirely on Claude/Codex feels like a waste of cash now.
robertwt7 16 hours ago [-]
at this point I might just try Neuralwatt and see how much request I can get with GLM5.2. I've read a lot of reviews that its very cheap to run using Neuralwatt cloud
bicx 8 hours ago [-]
I wish I only paid $200/mo for Anthropic! Multiply that by 20x.
blks 7 hours ago [-]
What are you getting out of it at $4000/month?
maxdo 6 hours ago [-]
i burned ~20k+/mo on codex.
blks 4 hours ago [-]
Did you make those money back?
maxdo 2 hours ago [-]
It is hard to stretch every single token to a win but …
The major two deals it was purposed to are still up on the air , if we win sure , 60x win
JumpCrisscross 13 hours ago [-]
Does it work? I’m less interested in economics than fit with an MVP.
Or use openrouter and switch to model you want to use..(i think so)
ljlolel 16 hours ago [-]
Or TrustedRouter if you want privacy and open source
yorwba 14 hours ago [-]
You ought to realize that shilling your product in the comments doesn't exactly come across as trustworthy.
JumpCrisscross 13 hours ago [-]
Disclosing affiliation hasn’t been a legal thing for a while. It’s reputational. Knowing that firm spams is a black mark.
smusamashah 11 hours ago [-]
Oh! I thought TrustedRouter was a joke/sarcasm. Very wrong placement of the comment.
ljlolel 9 hours ago [-]
It’s all open source and I say that it’s mine in all the sibling comments above
rvz 17 hours ago [-]
Pay $0 to run a local model or even a cheap DeepSeek V4 model via their API which is close to free per million tokens.
These prices are just going to get raced to $0.
goodmythical 56 minutes ago [-]
Where do you acquire free hardware and free electricity such that you can host local models for $0?
a2128 13 hours ago [-]
I used to have a $20/mo ChatGPT subscription and now I spend $12 per year using Kimi models on OpenRouter, and that's with zero-data-retention-only providers (some models sometimes have free providers with scary tracking). Maybe I just don't use that many tokens, I don't fill the context with more than what's needed for a specific request, but it goes to show how these subscriptions can be an absolute ripoff. The thought of spending 200x that is insane to me
mark_l_watson 9 hours ago [-]
The beauty of your approach: when people are not paying for an expensive subscription, they can decide to use models less and not feel like they are leaving money on the table.
holistio 17 hours ago [-]
Maybe. But for now it's fascinating how $200/month has kind of become a normal tier.
It's similar to how AirPods normalised all of us having $300+ headphones. All of us would have scoffed at the idea a decade ago.
p1esk 17 hours ago [-]
Many people here spent a lot more than $300 on headphones long before AirPods appeared.
mc3301 16 hours ago [-]
Those were hobbyists, audiophiles, professionals, artists (recording, performing, etc.).
They are talking about a much larger group of people.
klausa 15 hours ago [-]
I think OP meant noise-cancelling headphones, which were fairly ubiquitous in tech circles in open offices; before Apple launched AirPods.
uberex 14 hours ago [-]
Airpods Inc. would be very high up SP500 as a standalone business.
holistio 17 hours ago [-]
I had a really nice Sennheiser before that, too. But now you hop on the subway and everybody sports one.
mark_l_watson 9 hours ago [-]
But, it is not all about cost: models like DeepSeek v4 flash (I use the US company Fireworks.ai and also buy tokens directly from DeepSeek) is very fast, very low latency while working.
Would you want to use a text editor that updates the screen very slowly? Kind of the same thing for using agentic systems as coding assistants: don’t want a ‘sluggish’ experience.
erispoe 8 hours ago [-]
I have, mostly, long running autonomous tasks, so it doesn't matter how slow inference is. If I optimize for latency it means I'm turning into the limiting factor.
sofixa 16 hours ago [-]
The Sony WH-1000XM series and the Bose QC35 were the standard quality headphones years before AirPods were a thing, and both retailed at $300+.
holistio 15 hours ago [-]
Of course, premium headphones existed before. I have a WH-1000XM4 sitting right next to me.
But your aunt Josie didn't have one. Now Apple is selling 80 million units / year and the ~$300 price tag has become normal. Before that, most people had headphones that were 10 times cheaper.
Hamuko 14 hours ago [-]
$300 isn’t what AirPods cost though. You can get a pair of AirPods 4 for $129 on Apple.com, and I presume that is still the most popular model. If you’re paying ~$300, you are buying premium headphones.
holistio 6 hours ago [-]
The base model where I live (Central Europe) is $194. The Pro is $357. The Max is $779.
I just averaged it out.
qainsights 7 hours ago [-]
Not everyone can run local models. It is also expensive will be outdated soon as the model evolves.
kijin 17 hours ago [-]
Not while the hardware required to run a local model at an acceptable speed costs way more than $200.
Guess what, the big players are hoarding all the RAM and GPUs so that other people can't afford decent hardware. It's working out beautifully for them!
sofixa 16 hours ago [-]
> Not while the hardware required to run a local model at an acceptable speed costs way more than $200
It's $200/month. You have to take into account energy costs and all the rest of a system, but if you break even within 1-2 years ($2400-$4800) it'd be a pretty good deal. And $4000 buys you a pretty decent system.
kijin 10 hours ago [-]
Sure, if you're going to keep using it long term.
But it's a hefty upfront investment for people who just want to experiment. The good thing about $200/month subscriptions is that you can cancel them any time and cut your losses. Not so with a $4000 computer that loses half of its resale value as soon as you plug it in.
I think the current sweet spot for people who don't already own a high-end gaming PC is to rent a server with a beefy GPU from Hetzner et al. and run local models there.
emodendroket 6 hours ago [-]
[dead]
audreyt 17 hours ago [-]
Happy user here, pairing it with Composer 2.5, with Fugu Ultra as advisor and Fugur as planner. For scope/architecture it’s on par with useful Fable-style orchestration than one chat thread.
I've been shipping production on archive.tw with Fugu Ultra in /advisor on oh-my-pi.
Advisor doesn’t slow the loop if the driver stays fast. Worth it if your harness can split advisor from worker.
Bombthecat 6 hours ago [-]
Which software are you using to do that?
Edit: nevermind, but which plugin or so?
da_grift_shift 12 hours ago [-]
Yo dawg, I heard you like agents, so we put agents in yo agents so you can burn tokens while you burn tokens.
quanto 13 hours ago [-]
There are so many derisive comments here.
David Ha, CEO and co-founder, was one of the youngest managing director at Goldman Sachs before doing ML at Google. His ML publications were considered top-notch almost a decade ago. I had high hopes for him when he raised money and founded Sakana.
I do agree with some comments here that perhaps this particular product is not well thought out. I also agree with the criticism that David calls Sakana a frontier AI lab while making money just selling AI B2B applications to Japanese businesses. I also agree with the assessment that Sakana has abrasive and antagonistic, sometimes openly hostile, recruiting tactics. I also agree that his then-impressive publications may have lost their luster in the age of LLMs.
However, the man is clearly driven; and he and his team may have more to offer in future. I admire the man for not taking the conventional AI-research career path.
ainch 10 hours ago [-]
Indeed. The world models research many labs are now chasing was to some degree ignited by David Ha and Schmidhuber's 2018 paper.
More broadly, Sakana is pursing a refreshingly distinct research path, with their focus on evolutionary methods, biological intelligence (e.g. continuous thought machines) and open publication.
hsaliak 13 hours ago [-]
so he's the quintessential brilliant jerk, ok
11 hours ago [-]
epsteingpt 13 hours ago [-]
Kind of shocking - a model comes out that beats mythos and offers a reasonable price and it ... gets downvoted?
Probably taking hate from both sides - OpenAI / Claude fans who are undercutting its moat. Chinese open-model fans that want it to be cheaper.
But it's a genuine accomplishment to hit those benchmarks and offer a reasonable plan?
Bizarre reaction TBH.
actionfromafar 11 hours ago [-]
Which model is that? What is it named?
tagawa 5 hours ago [-]
Not OP but I believe it’s Fugu (7B). According to [0]: “Fugu itself is a trained coordinator LLM.”
Looking at the technical report I'm a bit confused. The improvement from using their orchestrator models seems minimal (in some cases lower than just the model which I'm assuming is in the orchestrator's pool?). Maybe it's sort of acting as an additional reasoning step upfront? Sort of like how if you asked Claude to create a plan for how best to prompt itself, you would probably end up with a better result than just the base prompt.
Also, from the technical report, looks like they're training on the output of Claude Code, etc. I'm guessing this doesn't violate TOS because they're technically not a directly competing model. This brings me to what I see as the main risk with this service, which is that it seems like an easy thing for a frontier lab to make obsolete, either by models beginning to converge in terms of strengths or by improving their own harnesses to include more of this meta-reasoning.
cortesi 17 hours ago [-]
As a developer outside the US I think it's vital to have alternatives to OpenAI and Anthropic, but sadly this is not it. For $200/month you get < 3 hours of use per week, the API is extremely slow, and the output quality in my tests is nowhere near Fable. It's nowhere remotely near usable as a day-to-day workhorse. Very disappointing.
u seem to be the only one who used it here - how did it compare to opus and gpt5.5? in theory it should be at least on par if not better at times right.
cortesi 14 hours ago [-]
I only had time to use it for a couple of deep reviews of large Rust projects, and a few agentic coding tasks (implement plan X, refactor Y in fashion Z) before my quota ran out. My impression is that the reviews were quite strong - maybe Opus 4.8+ or around GPT 5.5 (for my particular use case) - but very slow. For implementation I found it weaker, it made a few mistakes that I haven't seen frontier models make in a long time.
NetOpWibby 14 hours ago [-]
I'm glad eager people like you test for lazy people like me
personjerry 1 hours ago [-]
I love when they put a black box in front of the other black boxes so I can get a questionably better black box for slower service and more money!
blixt 8 hours ago [-]
I tried running this for some market research for my startup and it did a pretty nice job. It didn't necessarily find any obscure data, and it seemed to rely on older data than what I could find myself. On top of this, it had the same sycophantic tendencies as most LLMs these days (explaining why your idea is great and riffing on that), which I find to be unnecessary use of resources.
All put together, paying ~$60 to get a hit-or-miss report seems a bit excessive, but obviously as the models they use under the hood get better it becomes more and more worth it, assuming they also improve their grounding/search capabilities.
I'm a big fan of Sakana though, and have followed David Ha / @hardmaru since the world models papers (with the racing car game and the Doom clone), which were incredible at the time.
epsteingpt 17 hours ago [-]
Beta user: they piloted OpenRouter fusion before it was seen as the viable step. Everyone's understood for months now that having different models check each other is the best path forward.
This gets you that in a nice neat package, without the underlying tinkering mechanics.
If (big iff) the usage mechanics work out, then this is actually a really good anti-big-model strategy.
They'll be incentivized for your success, not token-maximizing for their investors.
The team is super smart too. What's not to like?
Wishing them the best on launch.
prodigycorp 16 hours ago [-]
if you've used codex or claude, how do the usage limits on fugu feel compared to the pro plans on either? honestly wouldn't mind subscribing to this if it's as generous as what codex is giving me monthly, which seems unrealistic.
epsteingpt 16 hours ago [-]
Hard to say - since I used it in Beta with free credits, where the usage felt more 'Opus' than 'ChatGPT' but more efficient token wise. Switching models every time is annoying.
But their paid plans I'm not sure yet - planning to subscribe and can let you know.
Almost no chance it will be as generous as OpenAI though. They just don't have the money :-)
Lwrless 11 hours ago [-]
Got myself the $20 subscription and tried it out. The 5-hour limit runs out surprisingly fast. Quality is okay but it feels slow, and even with my $20 Claude subscription on Fable, the credit usage ends up being lower. Fable usually catches issues in my Opus 4.8-generated code that I'd miss otherwise, but Fugu didn't. Makes me wonder if it's really at the Fable level. Hard to see the value here.
soma8088 43 minutes ago [-]
$40/month on Kiro, I hardly ever hit my credit limit
embedding-shape 18 hours ago [-]
> Frontier-level performance without single-vendor dependency. [...] Plug collective intelligence directly into your workflows today with a single API.
Does multiple vendors run this "single API" or how is this not replacing a single-vendor dependency for another single-vendor dependency?
prodigycorp 17 hours ago [-]
ngl, I thought sakana.ai was doing cooler stuff than this. that said, the release of a product like this makes sense because it follows your natural intuition when using these models. The best way to use LLMs is to have at least two in your pocket, because the models do a good job at covering each others assets and filling in obvious model-specific blindspots.
it's interesting that they're offering in the form of fixed cost subscription plans too. My impression was that the first party providers can do this because they api inference margins to the tune of 80ish percent. Anyone else orchestrating on top of these models have to pass through these costs or eat it themselves.
jordemort 9 hours ago [-]
Fugu, eh? So there’s a nonzero chance this thing might kill me?
monkeydust 11 hours ago [-]
Imho there are two dimensions here: Firstly different LLMs and secondly the strategy in which you break down the problem in an agentic fashion (e.g. break up to separate agents with own persona and then judge evaluates across all agents). You can of course mix-up the dimensions as well and that's what I have been tinkering* with for a good few months with some success. This was all done using home-brew setup running on openrouter.
Personally I prefer understanding the dimensions and the interplay and controlling it though can see why openrouter and others are now offering this a solved solution.
Just be careful when you start outsourcing too much of your intelligence needs to a blackbox.
This is interesting. Would you share a few ways in which you're using this in your workflow? What about if you were to start a new project and test and built it out from scratch - how do you work this approach in without bogging everything down(including the simple things) down with overanalysis?
GolfPopper 18 hours ago [-]
This is a joke, right?
NitpickLawyer 16 hours ago [-]
Not necessarily. There were some tests last year-ish from hf that showed that simply alternating (randomly) between claude and gpt (whatever their versions were at the time) on a task produced better results than either of them individually. So during a task, the first call was sent to one, then the other and so on.
There's also the concept of "smart routing" requests based on some heuristics / embeddings. You'd get "simple" tasks handled by smaller (cheaper) models and use a bigger model to curate / sort / merge the results.
There's a lot of things to try here. I wouldn't personally pay for this service, but I don't think it's "a joke"...
Is there any official source that could confirms if Fable (or Mythos) is parallelized test-time compute (like GPT 5.5 Pro) or sparse Mixture-of-Experts (MoE) transformer combined with a multi-agent, inference-time compute scaling architecture (Gemini 3.1 Deep Think)?
mark_l_watson 9 hours ago [-]
Nice idea but expensive. It looks like they don’t add very low cost models like DeepSeek v4 flash into their mix.
After a few months of spending money on the best frontier models, now I am spending time using DeepSeek v4 flash as my workhorse, and flipping to more capable (but still very inexpensive) open models on an as-needed basis. We all make our own tool selection decisions, but for me, I feel happier and enjoy working more following the very fast response and ultra low cost path.
ljlolel 9 hours ago [-]
We found that an all open source fusion was 1/3 the price and better than Fable
Brilliant. What this actually is, is a swarm, albeit a very small one. I'm wondering if for research specifically, swarm size (on higher temp?) would outweigh model size.
At least, for the initial data gathering phase. You'd probably want a sequence of progressively larger models to filter it.
Have you guys tested it on anything other than research?
ljlolel 4 hours ago [-]
Working on bio coding and cybersecurity benchmarks now
david_shi 17 hours ago [-]
Their research around building a domain specific model is pretty cool, it's kind of like Karpathy's autoresearch but pointed at deciding the optimal model to use at each step of the inference.
If cost becomes an even bigger problem being able to choose "best performance possible" or "strong but cost effective" will be useful.
OpenRouter Fusion is basically ask N models + synthesizer step.
This is ask a special orchestrator they built, which is in front of a bunch of models, which model would suit the request best.
Regular Fugu seems to be just "pick the best model and route the request there"
Fugu Ultra can generate like a little mini workflow/plan instead to achieve a result
1. Ask GPT to derive the math.
2. Ask Opus to check for implementation/security issues.
3. Ask Gemini to synthesize or resolve disagreement.
4. Return final answer.
I could be wrong but seems to be that at a glance, so I think it's more dynamic than OpenRouter Fusion.
links to two papers with at least enough apparent quality and novelty to get into ICLR 2026
> So basically... openrouter
:skull:
i now really wonder how many people of the public understood my thesis defense lol
19 hours ago [-]
agalamli 4 hours ago [-]
i've seen many AI models, tried some. i'm genuinely interested in trying this kind of model/architecture. however i'm a little confused about the pricing.
chvid 16 hours ago [-]
This would have been much more interesting and impactful if it had relied on open source models rather than commercial models that are only availble via an API.
The reasoning chains could have been used, and the resulting combined model could easily and effectively have been distilled.
claw-el 15 hours ago [-]
Will Le Chat try to eat Sakana?
There is Le Chaton Fat and then there is Sakana Fugu too..
olmo23 14 hours ago [-]
> Le Chaton Fat
For others looking around: LCF is a meme model, it's not real. It's a joke.
andai 7 hours ago [-]
See also: OpenRouter Fusion, similar idea, although it seems limited to internet research tasks? (Unclear, maybe someone who has used it can elaborate.)
What's nice is that OpenRouter included a pareto graph showing the cost as well as the performance. (But not time, unfortunately -- model fusion adds a large factor to round trip time.) Benchmarks are a lot less helpful without that.
OpenRouter: Surpassing frontier performance with fusion (blog post with benchmarks)
I did my own last weekend in a few lines of Python, though I haven't tested it much yet. (Looking for some very hard, very cheap benchmarks, if such a thing exists!)
adamnemecek 18 hours ago [-]
Seems kinda underwhelming considering they raised like $400M.
ffsm8 17 hours ago [-]
400m is the new 400k!
Just look at the other company evaluations and how much they raised vs what they delivered
itemize123 15 hours ago [-]
it's just one of their products right
nickandbro 19 hours ago [-]
Very interesting. I wonder if its kinda functions similarly to how OpenRouter's fusion API does. Hopefully isn't too long to respond.
Looks like Fusion calls a bunch of models and then uses an LLM to synthesize the results, and pass to another model for final output.
Fugu looks like it's doing something different? Using an LLM earlier on in the flow as an orchestrator to decide which other LLMs to call. More coordinator than simply synthesizing results, and more "agentic".
It's interesting because it's all exposed behind a single OpenAI compatible endpoint (Responses API?) and so then presumably someone could use this for one of their single agents. Now you have agent-of-agents, nested in some sense. The token usage increases accordingly!
panorama22 9 hours ago [-]
Is this the beginning of the Hyperion TechnoCore?
bprasanna 18 hours ago [-]
Isn't this what perplexity is?
JumpCrisscross 13 hours ago [-]
Is Perplexity still a daily driver for a lot of folks?
dancemethis 8 hours ago [-]
Sakana is certainly a choice of a name. In portuguese, goes from anything between "scoundrel" to "sleazebag".
puttycat 18 hours ago [-]
Can someone explain this in layman terms? I don't understand any of it
Basically, if you combine a bunch of near-frontier models (like GPT 5.5, etc) you can get performance that sometimes surpasses top line models like Claude's Fable.
Sakana seems to have a separate approach using a domain specific model to perform the model routing step.
chenzhekl 16 hours ago [-]
But it's priced the same as frontier models. Why do I not directly pay for frontier models?
david_shi 13 hours ago [-]
This is a charitable read, but I think that being able to pick from a panoply of models will actually yield much better results in the long run.
The same model that has been post-trained to operate for hours as a Linux admin will be incapable of writing a heartfelt email, but with something like Fugu, you'd get both the Linux admin for driving the browser harness and the smaller writing specialist model for drafting the email itself.
And it's gonna be like that in this space for the next decade-plus.
rvz 17 hours ago [-]
Just letting you guys know that the model is not a moat.
nixosbestos 18 hours ago [-]
AI noob question, is this like Amp? I just use Amp, I ask it to do neat stuff and it does it. I desperately need to invest in my AI skills but every day I open two new tabs and add it to "AI stuff" folder, and then go back to drowning in work to do.
71bw 15 hours ago [-]
And yet, as per usual...
Not yet available in the EU/EEA while we work toward compliance with GDPR and EU-specific regulations.
lmz 12 hours ago [-]
The price to pay for claiming your laws are applicable worldwide.
17 hours ago [-]
midasdf 8 hours ago [-]
[dead]
thibault00 8 hours ago [-]
[dead]
8 hours ago [-]
ljlolel 18 hours ago [-]
I’ve also developed and open-sourced Mythos level model using fusion/synthesis on TrustedRouter
Yeah, I was trying to parse their "defense policy"
https://sakana.ai/company-info/defense-policy.html?lang=en
But it seems like lot of words to say we have no policy and we'll just go along with the powers that be. Like they rely on deferring to the Pacifist constitution, which the current administration if moving mountains to try and change. And when it it you can bet they will not want to give up their defense contracts.
nickandbro 15 hours ago [-]
I imagine if it was Deepseek partnering with the CCP it would be different?
chenzhekl 15 hours ago [-]
I was just stating facts about Sakana, and that was enough to trigger you? For the same reason, I don’t use GPT either. At least for now, DeepSeek has no ties to the defense sector. And don’t talk as if the CCP were the devil. The U.S. president is the world’s biggest arms dealer, after all.
While you're at it, feel free to send me $200 as well, I'll generate a crypto address ending with "AI".
(don't send anything, sharing only because of the base58 fun fact I didn't know)
Omitting those characters makes it good for generating passwords if they need to be typed in by hand.
Double-clicking a base58 string always selects the whole string and it doesn't wrap accidentally, thanks to missing / and +, so it's also convenient to copy and paste.
The major two deals it was purposed to are still up on the air , if we win sure , 60x win
These prices are just going to get raced to $0.
It's similar to how AirPods normalised all of us having $300+ headphones. All of us would have scoffed at the idea a decade ago.
They are talking about a much larger group of people.
Would you want to use a text editor that updates the screen very slowly? Kind of the same thing for using agentic systems as coding assistants: don’t want a ‘sluggish’ experience.
But your aunt Josie didn't have one. Now Apple is selling 80 million units / year and the ~$300 price tag has become normal. Before that, most people had headphones that were 10 times cheaper.
I just averaged it out.
Guess what, the big players are hoarding all the RAM and GPUs so that other people can't afford decent hardware. It's working out beautifully for them!
It's $200/month. You have to take into account energy costs and all the rest of a system, but if you break even within 1-2 years ($2400-$4800) it'd be a pretty good deal. And $4000 buys you a pretty decent system.
But it's a hefty upfront investment for people who just want to experiment. The good thing about $200/month subscriptions is that you can cancel them any time and cut your losses. Not so with a $4000 computer that loses half of its resale value as soon as you plug it in.
I think the current sweet spot for people who don't already own a high-end gaming PC is to rent a server with a beefy GPU from Hetzner et al. and run local models there.
I've been shipping production on archive.tw with Fugu Ultra in /advisor on oh-my-pi.
Advisor doesn’t slow the loop if the driver stays fast. Worth it if your harness can split advisor from worker.
Edit: nevermind, but which plugin or so?
David Ha, CEO and co-founder, was one of the youngest managing director at Goldman Sachs before doing ML at Google. His ML publications were considered top-notch almost a decade ago. I had high hopes for him when he raised money and founded Sakana.
I do agree with some comments here that perhaps this particular product is not well thought out. I also agree with the criticism that David calls Sakana a frontier AI lab while making money just selling AI B2B applications to Japanese businesses. I also agree with the assessment that Sakana has abrasive and antagonistic, sometimes openly hostile, recruiting tactics. I also agree that his then-impressive publications may have lost their luster in the age of LLMs.
However, the man is clearly driven; and he and his team may have more to offer in future. I admire the man for not taking the conventional AI-research career path.
More broadly, Sakana is pursing a refreshingly distinct research path, with their focus on evolutionary methods, biological intelligence (e.g. continuous thought machines) and open publication.
Probably taking hate from both sides - OpenAI / Claude fans who are undercutting its moat. Chinese open-model fans that want it to be cheaper.
But it's a genuine accomplishment to hit those benchmarks and offer a reasonable plan?
Bizarre reaction TBH.
[0] https://dev.classmethod.jp/en/articles/sakana-fugu-ga-first-...
Also, from the technical report, looks like they're training on the output of Claude Code, etc. I'm guessing this doesn't violate TOS because they're technically not a directly competing model. This brings me to what I see as the main risk with this service, which is that it seems like an easy thing for a frontier lab to make obsolete, either by models beginning to converge in terms of strengths or by improving their own harnesses to include more of this meta-reasoning.
https://x.com/cortesi/status/2068898694238486658
All put together, paying ~$60 to get a hit-or-miss report seems a bit excessive, but obviously as the models they use under the hood get better it becomes more and more worth it, assuming they also improve their grounding/search capabilities.
I'm a big fan of Sakana though, and have followed David Ha / @hardmaru since the world models papers (with the racing car game and the Doom clone), which were incredible at the time.
This gets you that in a nice neat package, without the underlying tinkering mechanics.
If (big iff) the usage mechanics work out, then this is actually a really good anti-big-model strategy.
They'll be incentivized for your success, not token-maximizing for their investors.
The team is super smart too. What's not to like?
Wishing them the best on launch.
But their paid plans I'm not sure yet - planning to subscribe and can let you know.
Almost no chance it will be as generous as OpenAI though. They just don't have the money :-)
Does multiple vendors run this "single API" or how is this not replacing a single-vendor dependency for another single-vendor dependency?
it's interesting that they're offering in the form of fixed cost subscription plans too. My impression was that the first party providers can do this because they api inference margins to the tune of 80ish percent. Anyone else orchestrating on top of these models have to pass through these costs or eat it themselves.
Personally I prefer understanding the dimensions and the interplay and controlling it though can see why openrouter and others are now offering this a solved solution.
Just be careful when you start outsourcing too much of your intelligence needs to a blackbox.
* https://github.com/monkeydust/rightmind
There's also the concept of "smart routing" requests based on some heuristics / embeddings. You'd get "simple" tasks handled by smaller (cheaper) models and use a bigger model to curate / sort / merge the results.
There's a lot of things to try here. I wouldn't personally pay for this service, but I don't think it's "a joke"...
https://news.ycombinator.com/item?id=44630724
They randomly alternated between frontier LLMs and got a massive boost to performance on cybersecurity tasks.
Is there any official source that could confirms if Fable (or Mythos) is parallelized test-time compute (like GPT 5.5 Pro) or sparse Mixture-of-Experts (MoE) transformer combined with a multi-agent, inference-time compute scaling architecture (Gemini 3.1 Deep Think)?
After a few months of spending money on the best frontier models, now I am spending time using DeepSeek v4 flash as my workhorse, and flipping to more capable (but still very inexpensive) open models on an as-needed basis. We all make our own tool selection decisions, but for me, I feel happier and enjoy working more following the very fast response and ultra low cost path.
https://trustedrouter.com/blog/open-fusion-beats-fable-5
At least, for the initial data gathering phase. You'd probably want a sequence of progressively larger models to filter it.
Have you guys tested it on anything other than research?
If cost becomes an even bigger problem being able to choose "best performance possible" or "strong but cost effective" will be useful.
https://arxiv.org/pdf/2512.04695
EDIT: Found something here https://dev.classmethod.jp/en/articles/sakana-fugu-ga-first-...
This is ask a special orchestrator they built, which is in front of a bunch of models, which model would suit the request best.
Regular Fugu seems to be just "pick the best model and route the request there"
Fugu Ultra can generate like a little mini workflow/plan instead to achieve a result
1. Ask GPT to derive the math. 2. Ask Opus to check for implementation/security issues. 3. Ask Gemini to synthesize or resolve disagreement. 4. Return final answer.
I could be wrong but seems to be that at a glance, so I think it's more dynamic than OpenRouter Fusion.
https://www.databricks.com/blog/introducing-omnigent-meta-ha...
> So basically... openrouter
:skull:
i now really wonder how many people of the public understood my thesis defense lol
The reasoning chains could have been used, and the resulting combined model could easily and effectively have been distilled.
For others looking around: LCF is a meme model, it's not real. It's a joke.
What's nice is that OpenRouter included a pareto graph showing the cost as well as the performance. (But not time, unfortunately -- model fusion adds a large factor to round trip time.) Benchmarks are a lot less helpful without that.
OpenRouter: Surpassing frontier performance with fusion (blog post with benchmarks)
https://news.ycombinator.com/item?id=48525392
OpenRouter Fusion API
https://news.ycombinator.com/item?id=48537641
See also: Sibling comment with an open source implementation
https://news.ycombinator.com/item?id=48624782#48629598
I did my own last weekend in a few lines of Python, though I haven't tested it much yet. (Looking for some very hard, very cheap benchmarks, if such a thing exists!)
We open sourced it all
and will be releasing a similar orchestrator next week on TrustedRouter
Looks like Fusion calls a bunch of models and then uses an LLM to synthesize the results, and pass to another model for final output.
Fugu looks like it's doing something different? Using an LLM earlier on in the flow as an orchestrator to decide which other LLMs to call. More coordinator than simply synthesizing results, and more "agentic".
It's interesting because it's all exposed behind a single OpenAI compatible endpoint (Responses API?) and so then presumably someone could use this for one of their single agents. Now you have agent-of-agents, nested in some sense. The token usage increases accordingly!
Basically, if you combine a bunch of near-frontier models (like GPT 5.5, etc) you can get performance that sometimes surpasses top line models like Claude's Fable.
Sakana seems to have a separate approach using a domain specific model to perform the model routing step.
The same model that has been post-trained to operate for hours as a Linux admin will be incapable of writing a heartfelt email, but with something like Fugu, you'd get both the Linux admin for driving the browser harness and the smaller writing specialist model for drafting the email itself.
https://trustedrouter.com/blog/fusion-evals-open-source
https://japannews.yomiuri.co.jp/politics/defense-security/20...
Like every company based in China they are under the control of the Chinese state, which is an armed entity known to use violence.
https://openai.com/index/our-agreement-with-the-department-o...