Hey, if Apple can continue to compare Apple Silicon Macs to random, ill-defined "PCs," then turnabout is fair play.
Apple and Qualcomm can (and will, and do) play whatever games they choose. This is to be expected from first-party performance claims.Hey, if Apple can continue to compare Apple Silicon Macs to random, ill-defined "PCs," then turnabout is fair play.
Yeah it’s also a clickbait framing by the reviewer and headline writer. Just going by his own testing results that he provides, the M3 is significantly faster (25%) in single thread, a bit slower in multi thread, about 16%, with a lower performance core count). “Smokes” the M3… that’s just dishonest.Hmmm...
Re: "This thing toasts the M3 MacBook Air" ... this seems a conveniently selective comparison. How does this compare to chips with a comparable core count, like the M3 Pro? Who do these benchmarks include CPU comparisons to Apple Silicon, yet omit a similar compare re: graphics?
Hats off to Qualcomm Marketing.
I dunno. It’s a 15” laptop that costs less than a 15” MBA, and has comparable battery life and weight. That seems like the correct compare.Hmmm...
Re: "This thing toasts the M3 MacBook Air" ... this seems a conveniently selective comparison. How does this compare to chips with a comparable core count, like the M3 Pro? Who do these benchmarks include CPU comparisons to Apple Silicon, yet omit a similar compare re: graphics?
Hats off to Qualcomm Marketing.
What we all should expect from external media and reviewers is to take a skeptical eye toward those claims, offering their readers/viewer a deeper, more considered, and independent look. Ideally one that goes beyond the first-party briefing materials and talking points.
Yeah it’s also a clickbait framing by the reviewer and headline writer. Just going by his own testing results that he provides, the M3 is significantly faster (25%) in single thread, a bit slower in multi thread, about 16%, with a lower performance core count). “Smokes” the M3… that’s just dishonest.
I dunno. It’s a 15” laptop that costs less than a 15” MBA, and has comparable battery life and weight. That seems like the correct compare.
That said, it only wins in multicore while the MbA is well ahead on single core performance so ‘toasts’ is… what’s the word I’m looking for???… oh… bullshit.
The hiker doesn’t have to outrun the bear, just the other hikers.Hmmm...
Re: "This thing toasts the M3 MacBook Air" ... this seems a conveniently selective comparison. How does this compare to chips with a comparable core count, like the M3 Pro? Who do these benchmarks include CPU comparisons to Apple Silicon, yet omit a similar compare re: graphics?
Hats off to Qualcomm Marketing.
So far it seems like better temperature, battery life, and performance altogether; Intel or AMD can eke out more performance but temperature and battery life suffers. Models that match battery life seem to have lower performance:
A new dawn for laptops? I’ve tested the Asus Vivobook S 15 (S5507) and there’s no doubt in my mind | Expert Reviews
The first Snapdragon X Elite laptop has amazing battery life, a superb screen and great performance – Windows ARM laptops are here to staywww.expertreviews.co.uk
I've read that AMD are considering ARM-based laptop offerings as well.
Electron apps are painful, regardless of platform. It seems unstable and crash prone (in my experience)I think it'll probably hinge to quite a significant degree on ARM software availability. Electron apps that still require x86 sounds painful.
The majority of AMD's laptop SoCs, certainly the low-power variants, are monolithic dies – so this likely would not be a chiplet play. I'd be curious over the extent to which AMD could simply 'drop in' ARM cores and keep the bulk of the remaining blocks (GPU, NPU, decode, etc.) relatively untouched.They've been threatening to do that on and off for years, hasn't ever really gone anywhere. Might be the case though that between Qualcomm, and probably Nvidia if they ever need revenue streams other than the gravy train du jour they take the hits to improve compatibility, allowing AMD to slide in relatively easily with the hard work already done. And if AMD wants to get the ball rolling I think they can probably do it quickly simply by licensing the ARM cores for the first generation.
If they were to use chiplets they could replace an existing CCD with an ARM CCD.The majority of AMD's laptop SoCs, certainly the low-power variants, are monolithic dies – so this likely would not be a chiplet play. I'd be curious over the extent to which AMD could simply 'drop in' ARM cores and keep the bulk of the remaining blocks (GPU, NPU, decode, etc.) relatively untouched.
This thought exercise is about an ARM-based laptop chip, and AMD has traditionally chosen monolithic dies for its low-power laptop offerings.If they were to use chiplets they could replace an existing CCD with an ARM CCD.
In other words I’m not sure why you think this wouldn’t be a chiplet. The issue really is if they have a superior in house core, or if they will use stock ARM (assuming this happens)
The majority of AMD's laptop SoCs, certainly the low-power variants, are monolithic dies – so this likely would not be a chiplet play. I'd be curious over the extent to which AMD could simply 'drop in' ARM cores and keep the bulk of the remaining blocks (GPU, NPU, decode, etc.) relatively untouched.
In other words I’m not sure why you think this wouldn’t be a chiplet.
Totally. Just wondering what the effort might be accounting for endianess, changes to the fabric between blocks, etc. Maybe it's minimal. How much would the memory controllers need to change? Stuff like that.I mean it would take a new SoC but those other things would be reusable.
Like MacOS, Windows on ARM runs little-endian.Totally. Just wondering what the effort might be accounting for endianess, changes to the fabric between blocks, etc. Maybe it's minimal. How much would the memory controllers need to change? Stuff like that.
Ah, I had assumed they were going to target the 10W-60W range, not only the low power designs.This thought exercise is about an ARM-based laptop chip, and AMD has traditionally chosen monolithic dies for its low-power laptop offerings.
If they were to use chiplets they could replace an existing CCD with an ARM CCD.
In other words I’m not sure why you think this wouldn’t be a chiplet. The issue really is if they have a superior in house core, or if they will use stock ARM (assuming this happens)
I thought 10 wide was crazy a decade ago. Man how technology progresses.I'm 99% sure it will be Cortex-X925, that will be competitive with anything else around.
I thought 10 wide was crazy a decade ago. Man how technology progresses.
Do we know how the Qualcomm chips compare on price to current AMD and Intel offerings? I would expect ARM to start replacing x86 in laptops, but obviously pricing would be a big factor in whether that happens, and how quickly. Would AMD be able to make ARM chips more cheaply than they make x86 CPUs? I assume just licensing ARM designs would be cheaper than rolling your own, but if you’re selling standard designs, if you’re AMD, then you’re also sliding further down the value chain, at risk from any outfit that manages to bang out the same designs a bit cheaper.
In the paper, the researchers mention BitNet (the so-called "1-bit" transformer technique that made the rounds as a preprint in October) as an important precursor to their work. According to the authors, BitNet demonstrated the viability of using binary and ternary weights in language models, successfully scaling up to 3 billion parameters while maintaining competitive performance.
The researchers' approach involves two main innovations: first, they created a custom LLM and constrained it to use only ternary values (-1, 0, 1) instead of traditional floating-point numbers, which allows for simpler computations. Second, the researchers redesigned the computationally expensive self-attention mechanism in traditional language models with a simpler, more efficient unit (that they called a MatMul-free Linear Gated Recurrent Unit—or MLGRU) that processes words sequentially using basic arithmetic operations instead of matrix multiplications.
These changes, combined with a custom hardware implementation to accelerate ternary operations through the aforementioned FPGA chip, allowed the researchers to achieve what they claim is performance comparable to state-of-the-art models while reducing energy use.
Yes? Why do you bring this up?Well now there’s this…
Researchers upend AI status quo by eliminating matrix multiplication in LLMs
Running AI models without matrix math means far less power consumption—and fewer GPUs?arstechnica.com
A few highlights:
As follow-up to the discussion of low bit-depth model research back in April.Yes? Why do you bring this up?
I don’t think obsolete is the right way to think about it. Over time new computing modalities tend to be additive. If trinary LLMs take off, those matmul units aren’t going to be obsolete - they’re just not going to be used for that type of LLM. Someone will find something else they’re optimal for (I mean matmul is fundamentally pretty useful thing to be able to do.) And if I were a betting man I’d bet that someone will come up with a mixed model that uses both matmul and trinary to advantage.I find the model research especially fascinating now that the PC industry is on the verge or rolling out NPUs to the Windows space… just as strong indicators emerge that cureent NPU architectures could soon be rendered obsolete.
Additive, sure. Obsolete in the fact that they’ll be missing optimizations to run these newer, wildly more efficient models well – e.g. what the Santa Cruz team currently has implemented in FPGA. Hybrid approaches may exist, but if the all-ternary approach proves applicable to many types of models (that’s a giant IF), I could easily imagine an Apple jumping ship completely. The power and memory savings would be too dramatic for them to ignore.I don’t think obsolete is the right way to think about it. Over time new computing modalities tend to be additive. If trinary LLMs take off, those matmul units aren’t going to be obsolete - they’re just not going to be used for that type of LLM. Someone will find something else they’re optimal for (I mean matmul is fundamentally pretty useful thing to be able to do.) And if I were a betting man I’d bet that someone will come up with a mixed model that uses both matmul and trinary to advantage.
That’s like saying a 20xx GPU is obsolete because of a 30xx GPU.Additive, sure. Obsolete in the fact that they’ll be missing optimizations to run these newer, wildly more efficient models well – e.g. what the Santa Cruz team currently has implemented in FPGA. Hybrid approaches may exist, but if the all-ternary approach proves applicable to many types of models (that’s a giant IF), I could easily imagine an Apple jumping ship completely. The power and memory savings would be too dramatic for them to ignore.
That’s like saying a 20xx GPU is obsolete because of a 30xx GPU.
Ternary logic doesn’t mean abandoning old HW. Notice in the linked research paper how they were able to run their ternary neural net on an NVIDIA GPU?
You can perform ternary logic using a binary system, the same way you can do decimal notation or floating point. Rather than -1, 0, and 1 you use two bits and map the values to 00, 01, and 11. In other words INT2 is a valid substitute for true ternary HW.
And you f there is a gap until INT2 is HW accelerated, Apple and Nvidia both already have INT4 in their latest designs.
… it looks like Nvidia have made up the minds about how to build a more efficient next-generation Nvidia GPU (an obsolete name when graphics barely gets a second thought now).
No. It’s like saying a 10xx GPU is obsolete because you want to use raytracing.That’s like saying a 20xx GPU is obsolete because of a 30xx GPU.
Of course. That’s a backwards compatibility observation/argument. Apple ’abandons’ old HW all the time, they just do so on a 6-7 year timetable.Ternary logic doesn’t mean abandoning old HW. Notice in the linked research paper how they were able to run their ternary neural net on an NVIDIA GPU?
The NPU accelerates all kinds of models for all kids of uses - of which LLMs are just one. We’re deep into diminishing returns of throwing transistors at single threaded performance. There’s no pressing need to radically increase the transistor count in the GPU. And process node improvements are giving Apple (and everyone else) more and more transistors to play with.No. It’s like saying a 10xx GPU is obsolete because you want to use raytracing.
Of course. That’s a backwards compatibility observation/argument. Apple ’abandons’ old HW all the time, they just do so on a 6-7 year timetable.
If Apple found they needed to migrate their NPU architecture, they would just do so. The Neural Engine is a black box, and Apple controls every mechanism that interfaces with it. Their own system models would be migrate immediately, and the only compatibility ‘required’ would be third-party apps with their own CoreML models. With deprecation and developer outreach these could be moved in as soon as two OS revisions. IMO
Why do you compare BitLinear to raytracing?No. It’s like saying a 10xx GPU is obsolete because you want to use raytracing.
That seems a non sequitur here. Any implementation of BitLinear is likely to be on vanilla NVIDIA HW because that’s the HW that exists today, and leaves a window open for future HW to improve its performance and energy efficiency.Of course. That’s a backwards compatibility observation/argument. Apple ’abandons’ old HW all the time, they just do so on a 6-7 year timetable.
I’m saying that the whole point of NVIDIA’s GPGPU/CUDA focus for the past 17 years is that they would create a new CUDA API and library, first, to support it and then release new HW in two years to accelerate it even more. Nothing is obsoleted, because the existing HW is in fact capable of running a BitLiner network.If Apple found they needed to migrate their NPU architecture, they would just do so. The Neural Engine is a black box, and Apple controls every mechanism that interfaces with it. Their own system models would be migrate immediately, and the only compatibility ‘required’ would be third-party apps with their own CoreML models. With deprecation and developer outreach these could be moved in as soon as two OS revisions. IMO
All of this adds up to replacing matmul with ternary Instead of just adding it on being extremely unlikely.
It's kind of wild that the researchers allocated three bits here to represent -1, 0, and 1.To reinforce what ev9_tarantula just said, when announcing Blackwell, NVIDIA used INT4 as its performance metric to show how much faster than Hopper it was. This was because Hopper only goes down to INT8. This ternary approach is basically INT2 (yeah, yeah, oversimplification I'm sure), so that could be NVIDIA's next evolution.
NOTE: Today I learned INT4 means 4 bytes and INT1 means 1 byte, so really we need a different term than INT2 to specify 2 bits.
No it doesn't.NOTE: Today I learned INT4 means 4 bytes and INT1 means 1 byte