Geekbench has a history of measuring the OS along with the CPU+compiler combination. For any given 'x86 machine, scores under Windows are usually notably lower than under Linux. Presumably the unix based MacOS is also looking a bit better here than Windows on ARM.

For 1T, though, that is not quite accurate. nT, I agree with you: it's a different story with schedulers, power limits, governors, etc.

Geekbench 1T is especially precise between different OSes, which few other modern CPU benchmarks can claim.

From Phoronix's extensive Windows vs Linux testing,

AMD 7995WX
Windows 11 23H2 GB6.1 1T: 2720 (100%)
Ubuntu 23.10 GB6.1 1T: 2691 (98.9%)

AMD 7800X3D
Windows 11 22H2 GB6 1T: 2666 (100%)
Ubuntu 23.04 GB6 1T: 2689 (100.9%)

AMD 7840U
Windows 11 23H2 GB6.1 1T: 2601 (100%)
Ubuntu 22.04 GB6.1 1T: 2645 (101.7%)

AMD 3970X
Windows 10 1909 GB5 1T: 1316 (100%)
Ubuntu 19.10 GB5 1T: 1337 (101.6%)

Geoeman of Ubuntu vs Windows is within 1% (100.8%). It'd be good to see some Intel & Qualcomm tests added here, but virtually everything I've seen is user tests. There's just a significant margin of error there.
 
  • Like
Reactions: continuum
This seems very odd. Restricted access to some critical IP, or different priorities in the design?

Qualcomm did aim for many more cores, so perhaps that influenced their uArch considerations. The Snapdragon X Elite has both 1) the highest IPC uArch on a Windows device and 2) the highest P-core count on Windows thin and light.
 

continuum

Ars Legatus Legionis
94,897
Moderator
Don't Intel laptops already have much better battery life than AMD? Or did things switch around with the latest chip generation?
They have been pretty comparable the past few years, it gets down to implementation details* more than anything else.


Compares like for like in the Lenovo Thinkpad T16 Gen 2 and seriously, they’re damned close.

* = as we have so much discussion here on Ars already, but there’s so many different knobs to turn that can make massive differences…
 

BigLan

Ars Tribunus Angusticlavius
6,907
Thanks Continuum. I don't pay much attention to the mobile space - last laptop I bought was a Ryzen 7 4700U which I think was when manufacturers were putting AMD in their low end stuff which all have pretty poor battery.

My work-provided stuff has always been Intel but the end-point stuff seems to eat battery life compared to reviewed units. Currently living with ~4-5 hour battery life on a HP Studio Z. The i9-13900H performs great, but combined with the RTX card I have no use for it's a struggle to make it through a half day meeting. I'd switch to a Mac Pro but am too deep in the MS world between Excel and PowerBI.
 
It's the catch-22 of needing to sell hardware now, but the software needs to at least handle the GPU hardware better under Prism emulation and/or the developers for (mostly 3rd party) apps need a mature ARM port and haven't done so (or indicated they won't do so now.)

All of that could be fixed in software, but I don't expect miracles to happen overnight.

It's nice that the Verge largely confirmed the Qualcomm and Microsoft promises that other posters here were.. ahem... dubious about. Performance does seem to be roughly in the same ballpark as the M3-non Pro/Max when running native code-- faster in multicore, slower in single core, but not significantly worse. Battery life is roughly equivalent. x64 emulation isn't quite in as good of place under Qualcomm/Windows as it is under Apple Silicon/OS X, but it seems like it's going to improve. It's somewhere in the ballpark of "good enough" for general laptop computing and close enough to a current M2 or M3 MacBook Air in terms of capabilities.

If I'm spending my own money, then giving Microsoft and 3rd party developers (including Linux devs) time to catch up to the hardware seems prudent, but they do have to sell these in enough quantity to justify making more versions in the future-- and further motivating developers to port their software.
 
If I'm spending my own money, then giving Microsoft and 3rd party developers (including Linux devs) time to catch up to the hardware seems prudent, but they do have to sell these in enough quantity to justify making more versions in the future-- and further motivating developers to port their software.
Exactly right. Promising, but too early to spend my own money on one unless strix point is disappointing or you strongly prioritize battery life.
 
Exactly right. Promising, but too early to spend my own money on one unless strix point is disappointing or you strongly prioritize battery life.

Yeah, but "strongly prioritizing battery life" isn't wrong. Windows users (who could have become Mac users at any point) are now giving Apple Silicon MacBooks a look because the battery life is so good.

Strix point may get us to the point where a fast CPU arch is going to provide a long enough battery life runtime, but for now pricing the Snapdragon X starting around $1k is a good start, and the cheaper it is, the better the value proposition is. Many laptops are "good enough" already for compute, but a long runtime is a selling point. I'm conservative with my own money, but I can see consumers today being a good fit for this product and I could definitely see businesses and other organizations starting to do pilot programs.
 

continuum

Ars Legatus Legionis
94,897
Moderator
One thousand percent agree, battery life is incredibly important in mobile. Question is whether, what, 20% more battery life is enough to move the needle-- or whether people are happy to run in low-performance mode to get something even in the ballpark of Apple laptops.
I know I'm personally an edge case but as someone who has a high-resolution screen as a requirement (currently on 14" 4K LED laptop, looks like 2.8K OLED is going to be my next laptop, and is very curious how the new LG 13" 2.8K OLED will go this fall) and consistently finds battery life of laptops with such screens disappointing...................................

.... yeah I dunno if 20% more would be enough. I find real-world I get between 2 and 4 hours under my typical workloads, if I could reduce that variance (and say get 4 hours dependably) that would be appealing. But that's not quite the same question...
 
IMO, the most compelling benefit of Apple's Mx chips is the rough equivalence to top-bin single-threaded x86 performance at much lower power. You can always prop up your multithreaded numbers by throwing more cores at the problem, but it's much harder to improve ST.

It looks like the Nuvia cores largely replicate that benefit (ST is a generation behind Apple's stuff, but still better than Zen4 mobile) and I'd bet the power numbers will come in at under 10w.

IMO the fact that Apple, Qualcomm, and ARM have all converged on wide + lower clocked designs is a pretty strong vindication of that approach. Intel and AMD still seem determined to live in the 5ghz+ regime, and that will become increasingly dubious when their ST lead disappears (or at least becomes marginal).
 
  • Like
Reactions: charliebird

fitten

Ars Legatus Legionis
52,249
Subscriptor++
Intel's latest Lunar Lake designs extended the width over previous cores as well as having high clocks (and will likely be the case for the upcoming P-cores as well). There were studies in the past that examined diminishing returns of 'wideness' but I can't think of what to search for to find the main one I'm thinking of right now.
 

charliebird

Ars Tribunus Militum
1,894
Subscriptor++
Reviews are trickling out:

I don't agree with the below paragraph from the review at all. Apple is a completely different beast than Microsoft because they fully control the hardware and software. Maybe the windows platform will get stronger with the competition between Arm and x86 but I can't imagine software vendors going whole sale Arm native anytime soon.


If you don’t venture beyond the top Windows apps, you’ll probably have a great experience like I did in terms of performance and battery life. For anything more, you’ll need to check to make sure your apps are compatible and run well. If this latest Windows on Arm push is as successful as Apple’s M1 silicon, that’s an issue that should eventually disappear.
 
Intel's latest Lunar Lake designs extended the width over previous cores as well as having high clocks (and will likely be the case for the upcoming P-cores as well). There were studies in the past that examined diminishing returns of 'wideness' but I can't think of what to search for to find the main one I'm thinking of right now.
For perspective Lunar Lake is 8-wide now, which where the M1 was in 2020. IIRC the M4 and Cortex x925 are 10-wide.

Of course increasing width has diminishing returns, just like increasing structure sizes has diminishing returns, or pretty much any other microarchitectural feature that exists on a continuum, and treating width as a proxy for sophistication is problematic, but judging from results the optimal point on the brainiac/speed racer spectrum appears to be substantially more towards the former than where Intel and AMD's current and upcoming designs sit.

We'll have to see what the power numbers are like for Strix Point and Lunar Lake, but I doubt it'll be a transformative improvement.
 
Last edited:
Of course increasing width has diminishing returns, just like increasing structure sizes has diminishing returns, or pretty much any other microarchitectural feature that exists on a continuum, and treating width as a proxy for microarchitecture sophistication is problematic, but judging from results the optimal point on the brainiac/speed racer spectrum appears to be substantially more towards the former than where Intel and AMD's current and upcoming designs sit.
I wonder if that approach works as well for a variable instruction length ISA like x86 that must be decoded sequentially. Intel has made the decoders more and more complex and added bigger and bigger uop caches, but it seems like they're already at the point of diminishing returns. ARM is as well, but since decoding is parallel, the costs of going wider on the front end are much lower.

I don't agree with the below paragraph from the review at all. Apple is a completely different beast than Microsoft because they fully control the hardware and software. Maybe the windows platform will get stronger with the competition between Arm and x86 but I can't imagine software vendors going whole sale Arm native anytime soon.
MS's problem is that they have underinvested in the Windows platform for a long time while the applications people run are fragmented across a huge number of APIs from modern WPF, to older but still managed .NET to thing still using decades old Win32. Operating systems like Android, OSX/iOS are designed to aggressively power manage applications so to avoid spending cycles unnecessarily, and to schedule things on to high power cores only when absolutely necessary. Windows programs (including many of MS's own apps) are more like the Android 1.x days where apps do whatever they want and the OS counts on them to save power themselves. Everyone else spent the last 15 years figuring out how to solve these problems while Windows stayed in the early 2000s. I see this every day on my Meteor Lake laptop running the latest Win11 where I'll have Word/Powerpoint/Excel, a couple mostly idle web pages, PDFs open in Edge and discord but the CPU power draw and battery life oscillate wildly because the software doesn't communicate well to the OS how things need to be scheduled. When it works, its great, but then there are days I'll lose 20% of the battery typing in a Word document over lunch, something that I could have done on a Pentium 4. Apple controlling both the hardware and software matters so much less than them having invested enough in just their software platform.
 
  • Like
Reactions: steelghost
Intel's latest Lunar Lake designs extended the width over previous cores as well as having high clocks (and will likely be the case for the upcoming P-cores as well). There were studies in the past that examined diminishing returns of 'wideness' but I can't think of what to search for to find the main one I'm thinking of right now.
I remember reading a paper from a millenium long gone, where the authors simulated a "perfect" CPU. It was infinitely wide (i.e. an unlimited number of execution units for each type of instruction), branch prediction was always guessing right, all instructions had single cycle latency (implying that all memory accesses were cache hits), and so on. Their perfect simulated CPU was only limited by causality; i.e. it could not magically guess values before it had actually computed them.

On this simulated processor they were running a workload that was both important and difficult (for real CPUs) to execute quickly: a compiler. Over the complete compiler run, the "perfect" CPU reached an average IPC (instructions per clock) of roundabout 2000.


The researcher's next step was to introduce limits into their simulation. The less perfect (but still way beyond realistically feasible) simulated CPU was 2000 wide: up to 2000 instructions could be executed in any single clock cycle, but not more. The rationale was that the first experiment suggested that this should be enough on average to run the compiler near the "causal limit".

So they made a run with the 2000 wide simulated CPU and got an effective IPC of ... drumroll ... eight. Just eight instruction per clock cycle executed on average over the whole compiler run.

On closer inspection, the researchers found that the "perfect" CPU got most of its speed from its ability to look arbitrarily far into the future of the running program. So it found independent work even across compiler phases, and so on; this allowed it to be extremely bursty, with an individual clock cycle potentially executing millions of instructions, preemptively making up for many stalled cycles later on.

The 2000 wide CPU could not come anywhere near such burst benefits.

(I have tried to find this paper unsucessfully at least four times since I first read it. Sigh.)


BTW, reality has since surpassed even the perfect simulated CPU a little bit. Nowadays we do things like instruction fusion, where (causally) dependent instructions are executed not in subsequent clock cycles, but in a single clock cycle; this is actually one clock cycle faster than the perfect CPU above. And our CPUs often have fairly powerful SIMD execution units, which sometimes deliver us the performance of a CPU much wider than what we actually have.

In practice, any complicated workload that reaches even only an average IPC of 1.0 on a real, 8 wide, machine is already fairly rare. Some optimized workloads or very regular algorithms can break an IPC of 2.0 on a real CPU core, but that almost always involved a lot of brain cycles and a lot of work to get there.

The vast majority of program code has never been tuned to that point, and average IPCs below 1.0 are commonplace.
 
Last edited:
I wonder if that approach works as well for a variable instruction length ISA like x86 that must be decoded sequentially. Intel has made the decoders more and more complex and added bigger and bigger uop caches, but it seems like they're already at the point of diminishing returns. ARM is as well, but since decoding is parallel, the costs of going wider on the front end are much lower.
Yeah, I think this definitely is part of the reason. There's an additional cost in the form of the uop cache that x86 designs pay, but that isn't a complete mitigation because otherwise Intel wouldn't have increased the decode width (there must be enough important cases where the uop cache hit rate is low enough for the decode to occasionally become a bottleneck). And the fact that AMD has moved to clustered decode in Zen 5 is an indicator that this problem is not entirely trivial to solve.

My guess is that the direct impact of this stuff isn't huge, but I'd bet there's an indirect cost as well, that of constraining the possible design space.
 

fitten

Ars Legatus Legionis
52,249
Subscriptor++
I remember reading a paper from a millenium long gone, where the authors simulated a "perfect" CPU. It was infinitely wide (i.e. an unlimited number of execution units for each type of instruction), branch prediction was always guessing right, all instructions had single cycle latency (implying that all memory accesses were cache hits), and so on. Their perfect simulated CPU was only limited by causality; i.e. it could not magically guess values before it had actually computed them.

On this simulated processor they were running a workload that was both important and difficult (for real CPUs) to execute quickly: a compiler. Over the complete compiler run, the "perfect" CPU reached an average IPC (instructions per clock) of roundabout 2000.


The researcher's next step was to introduce limits into their simulation. The less perfect (but still way beyond realistically feasible) simulated CPU was 2000 wide: up to 2000 instructions could be executed in any single clock cycle, but not more. The rationale was that the first experiment suggested that this should be enough on average to run the compiler near the "causal limit".

So they made a run with the 2000 wide simulated CPU and got an effective IPC of ... drumroll ... eight. Just eight instruction per clock cycle executed on average over the whole compiler run.

On closer inspection, the researchers found that the "perfect" CPU got most of its speed from its ability to look arbitrarily far into the future of the running program. So it found independent work even across compiler phases, and so on; this allowed it to be extremely bursty, with an individual clock cycle potentially executing millions of instructions, preemptively making up for many stalled cycles later on.

The 2000 wide CPU could not come anywhere near such burst benefits.

(I have tried to find this paper unsucessfully at least four times since I first read it. Sigh.)


BTW, reality has since surpassed even the perfect simulated CPU a little bit. Nowadays we do things like instruction fusion, where (causally) dependent instructions are executed not in subsequent clock cycles, but in a single clock cycle; this is actually one clock cycle faster than the perfect CPU above. And our CPUs often have fairly powerful SIMD execution units, which sometimes deliver us the performance of a CPU much wider than what we actually have.

In practice, any complicated workload that reaches even only an average IPC of 1.0 on a real, 8 wide, machine is already fairly rare. Some optimized workloads or very regular algorithms can break an IPC of 2.0 on a real CPU core, but that almost always involved a lot of brain cycles and a lot of work to get there.

The vast majority of program code has never been tuned to that point, and average IPCs below 1.0 are commonplace.
That's exactly the one I was thinking about, thanks. Once branches, etc. were put into the picture, the IPC was pretty much limited. I've seen somewhat recently where someone was talking about x86 code in general and found that compiler generated code tended to have a branch around every seven instructions or so, as well.
 
Not sure how great of a comparison (I think he may have used some x86 apps for a test or two without saying it) but interesting.... Surface vs. Surface comparison.


When is that guy going to notice that he has Steam running in the background on the Intel?

What I see either way is my M1 MBP, which I remind you shipped in 2020, still smokes both of these things on workloads that matter to me, namely Speedometer 3.0.
 

fitten

Ars Legatus Legionis
52,249
Subscriptor++
What I see either way is my M1 MBP, which I remind you shipped in 2020, still smokes both of these things on workloads that matter to me, namely Speedometer 3.0.
That's fine, but lots of us don't care about what the M1 MBP does... it doesn't have workloads that matter to us. These aren't being made for or marketed to you guys. It's not like you'd switch to them even if they were significantly better than the M1 MBP :)
 

BigLan

Ars Tribunus Angusticlavius
6,907
I literally couldn't give a shit about gaming on a laptop... yet 75% of these reviewers act as if this is the only thing that matters.
Same. It might give some clue about what potential ARM based desktop performance would be like if that ever materializes, but my take is that MS has aimed this round of "Windows on ARM" at catching up to Apple's M[x] chip's performance and improving battery life. Gaming needs a discrete GPU and the translation layer which destroys both of those benefits.
 
  • Like
Reactions: fitten

fitten

Ars Legatus Legionis
52,249
Subscriptor++
Same. It might give some clue about what potential ARM based desktop performance would be like if that ever materializes, but my take is that MS has aimed this round of "Windows on ARM" at catching up to Apple's M[x] chip's performance and improving battery life. Gaming needs a discrete GPU and the translation layer which destroys both of those benefits.

Yep... an ARM based computer that I could hook up with a dGPU for gaming might be interesting. Put one of those in a box with an RTX30x0 or RTX40x0 card and game. I'd even like an ARM minipc for a Linux box. I have a laptop for work (I actually have a personal laptop as well but other than powering it up for updates every so often, I haven't really used it in a year... I bought it to take some classes at the local university but that didn't work out as I had planned so I have no real use for it) and I do work things on the work laptop (which do not include gaming). I have a gaming PC I play games on (as well as do various projects for fun... also have Linux PC as well for playing around with).
 
I literally couldn't give a shit about gaming on a laptop... yet 75% of these reviewers act as if this is the only thing that matters.
Gaming on laptop doesn't make or break the mobile windows experience. But gaming on ARM will make or break the success of Windows on ARM for consumers at home.

Can Microsoft drag business PCs to ARM? Maybe. Can Microsoft drag home PCs to ARM? Not without games. And MS cannot abandon 'x86 gaming as long as Proton on Linux exists in its current form and quality.
 
That's fine, but lots of us don't care about what the M1 MBP does... it doesn't have workloads that matter to us. These aren't being made for or marketed to you guys. It's not like you'd switch to them even if they were significantly better than the M1 MBP :)
I am not sure what you mean. I always use the machine that runs Chrome the best. If someone sold a faster one I would use it.
 

Paladin

Ars Legatus Legionis
32,552
Subscriptor
I am not sure what you mean. I always use the machine that runs Chrome the best. If someone sold a faster one I would use it.
Chrome? That's your benchmark? Like 'most tabs open wins' or something? :unsure: How are you able to make that kind of distinction when you want to buy something?

Or is this some kind of advanced sarcasm that I am too primitive to comprehend?
 
I literally couldn't give a shit about gaming on a laptop... yet 75% of these reviewers act as if this is the only thing that matters.
High performance machines make good gaming machines. If it can't game well it likely won't do anything else high performance well.

See the Baldur's Gate section which is massively CPU limited on any system.