The Lunar Lake Thread

redleader

Ars Legatus Legionis
35,019
1717620850650.png

30% faster for 20% more power ... meaning that if a load needs P cores you can boost your energy efficiency by ~10% with hyperthreading. That is a big deal for a server, where energy efficiency matters more than saving a little bit of silicon. Likely the difference in real world is larger, both because servers have longer memory access times for DRAM/cache and because these are marketing slides meant to argue against SMT.
 
  • Like
Reactions: continuum

hobold

Ars Tribunus Militum
2,657
Big.little has easily won this race. Apple/Arm does this well,
Again, not a clear cut winner. The big.LITTLE approach usees less silicon area than P-cores with SMT2, but - at least in theory with everything else being eqal (i.e. not comparing different processors with different ISAs made on different generations of silicon) - the fat P-cores have higher per thread performance over a larger number of threads, and still have very competitive throughput when the secondary SMT threads are added in later.

Apple silicon is looking good mostly because Apple is always exclusively on TSMCs best node; the gap has narrowed significantly when Ryzen mobile 7000/8000 jumped to TSMC 4. (Then there are all the different design trade-offs / targets, where AMD and Intel would rather want to melt their CPUs when Apple would rather ditch active cooling.)

It's just like the good ole' times when Intel won by default by having the best transistors. Now Apple has played that game for a few years, but that won't last forever. Intel can afford TSMC's best, too, and has swallowed their pride. Even AMD is finally rich enough to buy TSMC's best. But we customers might not actually be rich enough, at least not in large masses, so AMD will use those cutting edge wafers for Epyc and AI accelerators. Without Apple's reality distortion field, there might not be enough customers willing to overpay twice the price for half the RAM and half the mass storage.

Lunar Lake will be interesting in that it is a serious attempt to aim an 'x86 design to a very different product target. I am not sure if Microsoft can do their part on the software side to really challenge Apple, but we'll see.
 
  • Like
Reactions: EnglishBob

w00key

Ars Praefectus
5,907
Subscriptor
Apple silicon is looking good mostly because Apple is always exclusively on TSMCs best node
Hahahahaha. Wow.

I'm not going to repeat all the analysis done since M1 surprised the world but here's a thread with the major points. https://news.ycombinator.com/item?id=25257932

Note that IPC doesn't scale on fab nodes but is pure design and choices. A massive OOO chip doesn't suddenly appear just because you pay for 3nm while others are on 4.
 

hobold

Ars Tribunus Militum
2,657
Note that IPC doesn't scale on fab nodes but is pure design and choices. A massive OOO chip doesn't suddenly appear just because you pay for 3nm while others are on 4.
That's not what I wrote, not what I implied, and not what I meant.

Faster/smaller/lower power transistors are a tide that floats all boats equally. For a long time, Intel was in the lead. Then TSMC was in the lead, and only Apple alone paid up to enjoy the benefits exclusively.
 

redleader

Ars Legatus Legionis
35,019
IIRC, they developed a new AVX (AVX-10) that does all of the things AVX-512 does, but works better with their big-little CPUs....
Dont know if they're including that on lunar lake (at least).
These don't support AVX512/10. Arrow Lake does support AVX512/AVX10-512, but it will probably be disabled on the consumer versions of the chip since it doesn't yet work with the e cores. My guess is that Panter Lake is the first time the E and P cores support the same AVX flavors, although Intel's plans are clear as mud.
 

Aeonsim

Ars Scholae Palatinae
1,057
Subscriptor++
IIRC, they developed a new AVX (AVX-10) that does all of the things AVX-512 does, but works better with their big-little CPUs....
Dont know if they're including that on lunar lake (at least).
From what I remember reading of the technical commentary AVX10 just provides a standard way to bundle all the various AVX/AVX512 features together with a guarantee if you have a specific AVX10 level you will always have X-range of features. AVX10 also supports different physical implementations. So you can have two AVX10 CPU's with massively different performance characteristics at the same clock speed. As one could be AVX10/128b (a 128b back-end for AVX extensions) and the other AVX10/512 (full 512b AVX512) they should both support the same functions but one might be 4x faster than the other.

Chips and Cheese go into it in far more detail, in particular on issues with AVX10/128 which targets a 128bit state but also requires some 256b stuff for AVX2 backwards compatibility.
 

redleader

Ars Legatus Legionis
35,019
So you can have two AVX10 CPU's with massively different performance characteristics at the same clock speed. As one could be AVX10/128b (a 128b back-end for AVX extensions) and the other AVX10/512 (full 512b AVX512) they should both support the same functions but one might be 4x faster than the other.
The 128/256/512 is about what the front end knows how to decode. If you get an AVX10/256 instruction on a 128 CPU, the front end doesn't understand that instruction and you crash. The backend is different. Intel's e cores going forward are going to have 512 wide backend, but they might still only support AVX10/256.