How to optimize a build for a chess engine?

BillFoster

Ars Tribunus Militum
2,221
Okay, sorry for the weird question. But, I play a little chess, and one of the things that vaguely annoys me is that chess engines are kinda slow. Suppose I want to analyze a position to a certain depth. It takes me a while to do that. It's not like, a really long time. But it adds up. So I am planning a new build (to be completed in November or December). My budget is goal $2500, not to exceed $3000. I just bought a 1440p, 120hz monitor, so I think that's pretty good and I don't need to upgrade. I also have KBM, etc. from my old box. I don't really expect to need anything except for the box itself.

I figure that with that budget, I have some room to splurge on some elements of the machine. I am probably going to go with an RTX 4070 for the GPU, because I do some video work and I've found that CUDA is the fastest hardware acceleration. The CPU I'm not dead set on. Either the 13700k or 13900k makes sense to me. I want a -k CPU instead of a -kf to have QuickSync available because it does some hardware encoding that NVENC doesn't do.

The big question for me is, does RAM speed matter for chess engines? Will faster RAM going to get me to a desired move depth faster? Or will it just be a waste of money? Any idea of 32GB of RAM will be enough, or should I go to 64? If I have to choose between more RAM and faster RAM, which is optimal? (I assume more is always better than fast, but I admit ignorance)

If it matters, I use stockfish for position analysis.
 

BillFoster

Ars Tribunus Militum
2,221
A bit of Googling turns up nothing about RAM speed. There's abstract agreement that more RAM is good because it allows a larger hash, but it seems like most people aren't very detailed about this. It's just a "Get as much as you can. More is better" kind of thing.

Maybe not surprisingly, a lot of people just brush the question off entirely and say, if you want to do this seriously, you need to buy cloud computing time. And if you're not serious, then it doesn't matter what computer you're running stockfish on because they're all fine. Which is maybe true, but still kind of frustrating.

Well, I guess I'll just get whatever DDR5 seems reasonable in general case benchmarks and assume it's fine. General consensus seems to be that I'll be bottlenecked by threads more than anything else anyway, so maybe it really doesn't matter.
 

Anonymous Chicken

Ars Scholae Palatinae
1,134
Subscriptor
General consensus seems to be that I'll be bottlenecked by threads more than anything else anyway, so maybe it really doesn't matter.
Bring on the E-cores! If you're bound on threads, I guess you should grab the 13900 and worry less about the RAM. Nothing like 16+16 threads to mask a bit of memory latency.
 

BillFoster

Ars Tribunus Militum
2,221
General consensus seems to be that I'll be bottlenecked by threads more than anything else anyway, so maybe it really doesn't matter.
Bring on the E-cores! If you're bound on threads, I guess you should grab the 13900 and worry less about the RAM. Nothing like 16+16 threads to mask a bit of memory latency.
If what I'm reading is correct, the 13900k should be an absolute monster for stockfish. And people are saying that now the GPU is going to matter because it'll accelerate the new neural net code in stockfish. So I'm guessing maybe I'll get use out of the 4070 outside of gaming and video rendering?

Man, this is going to be great. Stockfish is going to crush me faster and more efficiently than ever! Not that it was ever in any danger...
 

Anonymous Chicken

Ars Scholae Palatinae
1,134
Subscriptor
General consensus seems to be that I'll be bottlenecked by threads more than anything else anyway, so maybe it really doesn't matter.
Bring on the E-cores! If you're bound on threads, I guess you should grab the 13900 and worry less about the RAM. Nothing like 16+16 threads to mask a bit of memory latency.
If what I'm reading is correct, the 13900k should be an absolute monster for stockfish. And people are saying that now the GPU is going to matter because it'll accelerate the new neural net code in stockfish. So I'm guessing maybe I'll get use out of the 4070 outside of gaming and video rendering?

Man, this is going to be great. Stockfish is going to crush me faster and more efficiently than ever! Not that it was ever in any danger...
Watts per defeat?
 

Jehos

Ars Legatus Legionis
55,555
General consensus seems to be that I'll be bottlenecked by threads more than anything else anyway, so maybe it really doesn't matter.
Bring on the E-cores! If you're bound on threads, I guess you should grab the 13900 and worry less about the RAM. Nothing like 16+16 threads to mask a bit of memory latency.
Related to this, seriously consider going with an AMD CPU instead of Intel. Intel is still focused on getting you the fastest single-thread performance, AMD is focused on giving you lots and lots of the same cores. That plays exactly into how Stockfish works.

From what I can tell, Stockfish will scale up to supercomputers. Per the Wikipedia page, it'll scale to 1024 threads and a 32TB RAM. What that means for you is you really need to spend your money on cores and RAM. I'm not sure how the model tuning works, but based on those max specs it looks like each thread can conceivably use 32GB of RAM. Obviously that's way more than you'll likely put in your system, but I'd say a 16/32 processor and 64GB or 128GB RAM isn't unreasonable for your use case.

Stockfish is one of those oddball use cases where you really do want what other people would call too much RAM and too many cores.

Incidentally, if you ever decide to REALLY go nuts, you could absolutely drop $5k-6k on a top-end Threadripper system and see real performance gains.

Edit: Dangit, now I'm drooling over the idea of like a 32/64 Threadripper, a quarter terabyte of RAM, and a RTX 4090. Just an absolute beast of a workstation.
 

Jehos

Ars Legatus Legionis
55,555
Here are some benchmarks for Stockfish on Linux. Unfortunately, it doesn't look to have some of the latest processors on it :(

https://openbenchmarking.org/test/pts/stockfish

I am seeing both a 7650X, a 12900K a 5950X and so forth in that list. What am i missing?
Nothing.

Good to see my guess was confirmed. Top-end Ryzen is the only desktop-class processor that gets you out of that mid tier. Intel doesn't have anything to compete until you get into a Xeon, and even those get smoked by Threadrippers and Epycs.
 

fitten

Ars Legatus Legionis
52,250
Subscriptor++
Here are some benchmarks for Stockfish on Linux. Unfortunately, it doesn't look to have some of the latest processors on it :(

https://openbenchmarking.org/test/pts/stockfish

I am seeing both a 7650X, a 12900K a 5950X and so forth in that list. What am i missing?

It doesn't have any of the 13x00 processors in the list, yet, although those probably won't change the 7950X being at the top for the consumer level processors. At most I would expect it would challenge the 5950X. So yeah, if you want top of the line consumer right now, the 7950X is your CPU.
 

Paladin

Ars Legatus Legionis
32,552
Subscriptor
If core count is a primary factor in performance, then a workstation class machine (server in tower form) might be best, if you get one with dual socket and high end CPUs. You can get tons of RAM slots as well that way. A used machine should fit the budget easily. Whether it is faster with a single top end consumer CPU vs. a used machine with 2 high end CPUs from a couple years back is another question.
 

BillFoster

Ars Tribunus Militum
2,221
General consensus seems to be that I'll be bottlenecked by threads more than anything else anyway, so maybe it really doesn't matter.
Bring on the E-cores! If you're bound on threads, I guess you should grab the 13900 and worry less about the RAM. Nothing like 16+16 threads to mask a bit of memory latency.
Related to this, seriously consider going with an AMD CPU instead of Intel. Intel is still focused on getting you the fastest single-thread performance, AMD is focused on giving you lots and lots of the same cores. That plays exactly into how Stockfish works.

From what I can tell, Stockfish will scale up to supercomputers. Per the Wikipedia page, it'll scale to 1024 threads and a 32TB RAM. What that means for you is you really need to spend your money on cores and RAM. I'm not sure how the model tuning works, but based on those max specs it looks like each thread can conceivably use 32GB of RAM. Obviously that's way more than you'll likely put in your system, but I'd say a 16/32 processor and 64GB or 128GB RAM isn't unreasonable for your use case.

Stockfish is one of those oddball use cases where you really do want what other people would call too much RAM and too many cores.

Incidentally, if you ever decide to REALLY go nuts, you could absolutely drop $5k-6k on a top-end Threadripper system and see real performance gains.

Edit: Dangit, now I'm drooling over the idea of like a 32/64 Threadripper, a quarter terabyte of RAM, and a RTX 4090. Just an absolute beast of a workstation.
The big thing is, I want an Intel processor because then I can use QSV to encode more kinds of video.
 

BillFoster

Ars Tribunus Militum
2,221
Ugh. You should run this in the cloud. Lots of cores.
But then I'd have to pay for the cloud compute time, and that introduces costs that I'm not really interested in. I can bundle this into my overall build and then not worry about the cost. I don't want some ongoing thing where I have to ask myself, "Do I really want to analyze this position? How much is it worth to me?" That's just decision fatigue waiting to happen.
 

fitten

Ars Legatus Legionis
52,250
Subscriptor++
General consensus seems to be that I'll be bottlenecked by threads more than anything else anyway, so maybe it really doesn't matter.
Bring on the E-cores! If you're bound on threads, I guess you should grab the 13900 and worry less about the RAM. Nothing like 16+16 threads to mask a bit of memory latency.
Related to this, seriously consider going with an AMD CPU instead of Intel. Intel is still focused on getting you the fastest single-thread performance, AMD is focused on giving you lots and lots of the same cores. That plays exactly into how Stockfish works.

From what I can tell, Stockfish will scale up to supercomputers. Per the Wikipedia page, it'll scale to 1024 threads and a 32TB RAM. What that means for you is you really need to spend your money on cores and RAM. I'm not sure how the model tuning works, but based on those max specs it looks like each thread can conceivably use 32GB of RAM. Obviously that's way more than you'll likely put in your system, but I'd say a 16/32 processor and 64GB or 128GB RAM isn't unreasonable for your use case.

Stockfish is one of those oddball use cases where you really do want what other people would call too much RAM and too many cores.

Incidentally, if you ever decide to REALLY go nuts, you could absolutely drop $5k-6k on a top-end Threadripper system and see real performance gains.

Edit: Dangit, now I'm drooling over the idea of like a 32/64 Threadripper, a quarter terabyte of RAM, and a RTX 4090. Just an absolute beast of a workstation.
The big thing is, I want an Intel processor because then I can use QSV to encode more kinds of video.

If you want Intel for something specific like that, then the 13900K is the top consumer CPU in that line, then. The other option would be to see if you can find software that can use either a dGPU (like the 30x0 or 40x0 series from Nvidia) or can use the iGPU in the 7950X to do what you want.
 

BillFoster

Ars Tribunus Militum
2,221
If you want Intel for something specific like that, then the 13900K is the top consumer CPU in that line, then. The other option would be to see if you can find software that can use either a dGPU (like the 30x0 or 40x0 series from Nvidia) or can use the iGPU in the 7950X to do what you want.
I don't know if there's some software I can run on a 4070 to do the stuff that NVENC doesn't do, but I've never heard of such a thing. I know a few people who are in the same position I am and they're using Intel specifically because NVENC doesn't do anything with 4:2:2 color, and QSV will encode it. In the major leagues they just use 4:4:4 and NVENC can do that, but I don't have a camera that shoots 4:4:4. I can do 4:2:2 with an external recorder, or I can do 4:2:0 internally.
 
General consensus seems to be that I'll be bottlenecked by threads more than anything else anyway, so maybe it really doesn't matter.
Bring on the E-cores! If you're bound on threads, I guess you should grab the 13900 and worry less about the RAM. Nothing like 16+16 threads to mask a bit of memory latency.
Related to this, seriously consider going with an AMD CPU instead of Intel. Intel is still focused on getting you the fastest single-thread performance, AMD is focused on giving you lots and lots of the same cores. That plays exactly into how Stockfish works.

From what I can tell, Stockfish will scale up to supercomputers. Per the Wikipedia page, it'll scale to 1024 threads and a 32TB RAM. What that means for you is you really need to spend your money on cores and RAM. I'm not sure how the model tuning works, but based on those max specs it looks like each thread can conceivably use 32GB of RAM. Obviously that's way more than you'll likely put in your system, but I'd say a 16/32 processor and 64GB or 128GB RAM isn't unreasonable for your use case.

Stockfish is one of those oddball use cases where you really do want what other people would call too much RAM and too many cores.

Incidentally, if you ever decide to REALLY go nuts, you could absolutely drop $5k-6k on a top-end Threadripper system and see real performance gains.

Edit: Dangit, now I'm drooling over the idea of like a 32/64 Threadripper, a quarter terabyte of RAM, and a RTX 4090. Just an absolute beast of a workstation.
Damn, I still wish I had access to that scientific cluster from my first professional IT job with very similar specs (1000+ threads, 32 tb ram). One of the 100Gbps NICs alone probably cost more than a top of the line Threadripper machine. I was allowed to run whatever load I wanted with as many cores as I wanted with as much ram as I wanted, as long as it fit in 48 hours of wall time. Certainly helped to sit across the room from the guy who managed it. :D if I still had access to it, I’d run a few simulations for OP and maybe get a benchmark to put on that website.
 

Paladin

Ars Legatus Legionis
32,552
Subscriptor
Ugh. You should run this in the cloud. Lots of cores.
But then I'd have to pay for the cloud compute time, and that introduces costs that I'm not really interested in. I can bundle this into my overall build and then not worry about the cost. I don't want some ongoing thing where I have to ask myself, "Do I really want to analyze this position? How much is it worth to me?" That's just decision fatigue waiting to happen.
I get that, and agree to a point. Just want to make sure it is clear that you only pay for what you use, not some ongoing subscription whether you use it or not. You could be looking at pennies most months and a few dollars other months. A long way from a $2500 budget. Depends on how much you use, of course.
 

BillFoster

Ars Tribunus Militum
2,221
Ugh. You should run this in the cloud. Lots of cores.
But then I'd have to pay for the cloud compute time, and that introduces costs that I'm not really interested in. I can bundle this into my overall build and then not worry about the cost. I don't want some ongoing thing where I have to ask myself, "Do I really want to analyze this position? How much is it worth to me?" That's just decision fatigue waiting to happen.
I get that, and agree to a point. Just want to make sure it is clear that you only pay for what you use, not some ongoing subscription whether you use it or not. You could be looking at pennies most months and a few dollars other months. A long way from a $2500 budget. Depends on how much you use, of course.
Sure, I agree with that. And I wouldn't build a $2500 box just to run chess analysis. But I'm going to build the box for video editing, and I also do some FPS gaming. So it's going to come in around $2500 anyway. I figured, if I can use it for Chess analysis when I'm not rendering video just because I have some machine down time for whatever reason (and honestly, I'm probably expecting more down time if anything if the 4070 lives up to the hype) then that's cool. And I'm willing to put slightly more money into the box if it makes it significantly more capable at Chess.

Right now for me, that's looking like an upgrade from the 13700k to the 13900k. And that's not like... a ton of money. I think I can still stay within my budget goals with a 13900k.
 

BillFoster

Ars Tribunus Militum
2,221
Okay, I take it back. I take it all back. Building a box to do Chess analysis is apparently totally obsolete and dead. I just found out that Chessify.me offers FREE analysis that will do frickin' 1 million node per second. And if I even upgrade to their lowest-paid tier ($80/year; not too bad) I can get 10 million nodes per second.

That is like... INSANE. That is insane speed that I can't hope to even get close to on a home box. And surprisingly affordable!

The scariest thing is, you can pay for up to 1 billion nodes/second. That is... so insane. That is so, so insane.
 
  • Like
Reactions: MarkL

Jehos

Ars Legatus Legionis
55,555
Okay, I take it back. I take it all back. Building a box to do Chess analysis is apparently totally obsolete and dead. I just found out that Chessify.me offers FREE analysis that will do frickin' 1 million node per second. And if I even upgrade to their lowest-paid tier ($80/year; not too bad) I can get 10 million nodes per second.

That is like... INSANE. That is insane speed that I can't hope to even get close to on a home box. And surprisingly affordable!
Er...the Ryzen 9 7950X I was suggesting does 75 million per second. The i9-13900K you were looking at does like 43 million per second. A billion per second is just three dual-socket Epyc servers.

A million per second is like a Raspberry Pi 4.

You really should read over that chart fitten linked.
 
  • Like
Reactions: elh

BillFoster

Ars Tribunus Militum
2,221
Since the embargo is lifted, the 13x00 processors are on the chart now.
I’m surprised… why isn’t the 13900 even more dominant over the 13600? I’d have thought the extra e-cores would make a bigger difference.

EDIT: If anything, I'm once again not sure what to do. Because now it seems like the marginal cost of going from a 13600 to 13900 might not be worth the additional performance.
 

Mister E. Meat

Ars Tribunus Angusticlavius
7,241
Subscriptor
Okay, I take it back. I take it all back. Building a box to do Chess analysis is apparently totally obsolete and dead. I just found out that Chessify.me offers FREE analysis that will do frickin' 1 million node per second. And if I even upgrade to their lowest-paid tier ($80/year; not too bad) I can get 10 million nodes per second.

That is like... INSANE. That is insane speed that I can't hope to even get close to on a home box. And surprisingly affordable!
Er...the Ryzen 9 7950X I was suggesting does 75 million per second. The i9-13900K you were looking at does like 43 million per second. A billion per second is just three dual-socket Epyc servers.

A million per second is like a Raspberry Pi 4.

You really should read over that chart fitten linked.
Here's a chart with older hardware on it - https://sites.google.com/site/computers ... benchmarks. My old 6600K apparently would do about 10 million nodes per second. Apparently a Samsung Galaxy S4 does about 1 million nodes per second. So yeah, not a lot compared to what any sort of modern machine can do.
 

BillFoster

Ars Tribunus Militum
2,221
Power limitations most likely.

Could also be memory bandwidth, could likely be a combination of both.

Any idea what clock speeds your 9750H is sustaining with stockfish?
According to the Intel Extreme Tuning Utility, I'm getting 6 cores at 3.76Ghz. It is power limit throttling, which is not really surprising. But it's also running at 89C, so there's not a ton of thermal headroom left.

I'm not surprised that I'm not getting the same scores that a desktop 9900k would get, but 25%? That seems really low. I'm running my laptop with the fans maxed, and it's on a fan riser to increase airflow.
 

Jehos

Ars Legatus Legionis
55,555
I installed stockfish on a machine with an Intel 9750h, and I’m only getting about 3.75 million nodes/second. That’s obviously good, but why so low? If a 9900k should get 20 million, according to this, why am I getting so much less? I only have 6 cores, okay. But that doesn’t explain the performance shortfall alone.
Laptop processor. Those are significantly slower than their desktop counterparts to prevent them overheating and/or sucking down the battery in no time.

It's silly marketing that they even put the same model numbers on them.
 

cogwheel

Ars Tribunus Angusticlavius
6,691
Subscriptor
I installed stockfish on a machine with an Intel 9750h, and I’m only getting about 3.75 million nodes/second. That’s obviously good, but why so low? If a 9900k should get 20 million, according to this, why am I getting so much less? I only have 6 cores, okay. But that doesn’t explain the performance shortfall alone.
Laptop processor. Those are significantly slower than their desktop counterparts to prevent them overheating and/or sucking down the battery in no time.

It's silly marketing that they even put the same model numbers on them.
The 9750H uses the exact same core design as the 9900K, on the exact same fab process, and support the same RAM channel count and official max RAM speeds. Since we know the clock speed BillFoster's laptop runs at with all cores loaded and the core counts, we've already factored out the power & thermal differences between the CPUs themselves. Our expected performance difference is (4.7GHz1/3.76GHz)*(8c/6c), or 1.67x. The rest of the difference isn't due to "laptop processor", it's due to the rest of the system design and/or software configuration.


1 Max all-core turbo for 9900K, unless overclocked. Even if overclocked, you won't be able to overclock it enough to explain much more of the difference seen.
 

BillFoster

Ars Tribunus Militum
2,221
The plot thickens somewhat. I checked threads, and that all seems correct. I'm running the latest version of Stockfish (downloaded directly from the website). I'm not running anything significant that should take up that many clock cycles (just some windows of Chrome).

But I was able to get my machine up to a hair over 5 million nodes/second by increasing the size of the hash table. Now stockfish is maxing out my RAM and running 15-20% faster. That's great. It's still not where the benchmarks say it should be. So I'm still kind of confused there.
 

BillFoster

Ars Tribunus Militum
2,221
An update: I built a new box. 13900k, RTX 3080, 64GB of DDR5-4800. I can get right around 27 million nodes/second in Stockfish 15.1, which feels pretty damn insane. It's still bizarrely not that much better than the 20 million nodes/second that a 9900k should be getting. But this is a brand new box, with the OS freshly installed. I'm not sure what's going on, but it seems like the benchmarks being reported are somehow out of alignment with how I'm doing analysis.

In any event, I suppose I'm happy enough. This new box analyzes positions like a fiend.
 
  • Like
Reactions: elh