Which new computer language for fun?

ShuggyCoUk · Mar 18, 2024

koala said:
OK, my definition of "hiding remote" is different from "the name makes it clear it's a remote call".

It's not a definition. The clincher parts are the important bits, but names always matter.

koala said:
I did mention paging in my post, but really I think paging is typically an antipattern somewhere (either in your code, or in the API itself).

Nope, pagination is batching. It could be done really badly of course but it's a remote call to some other system that has to maintain resources on holding open some arbitrary length reply to your request. There are costs to that and pushing those to the client is normally a good idea to scale because it's normally cheap for the client (they only have one or few such requests so trivial data structures just work) and if the client stops/forgets there's no need for the server to have any timeout implementation/reaping.
Batching in distributed systems is normally a big win in API design. The smallest unit of work to the domain may not match the most efficient unit of request/response.
Also pagination shows you when your failure can occur on the boundary not at some arbitrary point calling Next on an iterator that is hiding this. You want to wrap the result in some code that de-paginates, cool go ahead. But apis that just hide that so you have to just deal with it are bad.

koala said:
I have a similar opinion wrt. to error handling/retries, although in principle, I prefer non-API specific abstractions (e.g. in Python using something like https://pypi.org/project/retry/ ), instead of each API client having its own mechanism. However, I'm undecided on this, because I see some drawbacks to this approach and I haven't played enough in this space to have a clear opinion.

I would hope that the plugin model on githubs api allows for using retry with it. In languages with a sane error handling path (exceptions are one such, but consistent ResultOrError return types also work - which is why you want them baked into the standard library and APIs up front) the 'plugin' is right there and should be minimal/no effort for APIs to integrate with.
Designing things to be idempotent (either naturally or through techniques like one shot request identifiers) makes such designs simpler

koala · Mar 18, 2024

Pagination is EXTREMELY hard to get right, in the face of modifications. So if you grab page one of a query, someone makes a modification... that can invalidate what "page 2" is. In theory, if you want to return consistent results, as soon as someone makes a paginated query, you should get some kind of snapshot of the state, so you can return consistent results.

I guess most APIs just ostrich this, and make everything using paging unreliable (e.g. you might miss items).

And likely, making things consistent is just harder than sending out all results in one batch. At some scales, I think making 10 requests of size 50 is more expensive for everyone, that making a single request of size 500 (which is not that much!).

...

My conflict with a generic retry function is for instance keeping sessions. If I implement a generic retry function, then it's complex that this integrates easily and properly with the underlying state of the function being retried (e.g. preserving a session across requests).

ShuggyCoUk · Mar 18, 2024

koala said:
Pagination is EXTREMELY hard to get right, in the face of modifications. So if you grab page one of a query, someone makes a modification... that can invalidate what "page 2" is. In theory, if you want to return consistent results, as soon as someone makes a paginated query, you should get some kind of snapshot of the state, so you can return consistent results.

Anything is hard to get right in the face oof modifications...
In dotnet for example if something modifies a list during iteration it throws on the next iteration.

Pretending that can't happen, or forcing the server to try to pretend it hasn't is a massive cost (potentially a long running database transaction in the worst case).

If you have snapshot isolation, or a bi-temporal structure of some other form, then pagination can become comparatively simple. The follow up requests indicate which snapshot/point in time they are from and can be reliably sourced. Attempts to access a snapshot thats aged out in some way result in a clean failure telling you why.

koala said:
I guess most APIs just ostrich this, and make everything using paging unreliable (e.g. you might miss items).

if you arrange the pagination on something with a decent ordering (say creation in a single database) then you can avoid this, more comples systems can solve this under sharding too but the request state increases in size and complexity). This then is designing the API to conform to a contract that can be effectively correct. It pushes any ordering on the results to the client if it wants to reorder them in some way, it's all a trade off.

koala said:
And likely, making things consistent is just harder than sending out all results in one batch. At some scales, I think making 10 requests of size 50 is more expensive for everyone, that making a single request of size 500 (which is not that much!).

Absolutely - someone choosing the batch size badly just make sit worse. This stuff is hard which I think is making my point.
Go slow clients shafting the server are a thing if you don't though.

koala said:
My conflict with a generic retry function is for instance keeping sessions. If I implement a generic retry function, then it's complex that this integrates easily and properly with the underlying state of the function being retried (e.g. preserving a session across requests).

Indeed, if there's session level state nothing generic will solve it. But session level state is hard and definitely shouldn't be hidden away without making the session the thing which is retired (which is not amenable to simple stack bound based semantics).
This is where you start making decisions on whether the potential performance gains of a session level API (like NFS4) outweigh the simplicity of something like NFS3) it's a balance, anything hiding that away is likely to be either a source of problems, or a source of poor performance.

In distributed NAS land there's a bunch of people/groups who stick to NFS3 because it can scale in certain ways more pleasantly (since that session level state isn't something that needs to be distributed)

ShuggyCoUk · Mar 18, 2024

I think file systems are a brilliant example of how trying to pretend something is like a local one is both hugely useful and hugely flawed

It's hugely useful because things JustWork (good or bad) your UX is exactly the same (Windows Explorer/Nautilus/Whatever you like-ish) and there's an API for it baked into anything with a filesystem IO layer (the point is that your OS normally does the magic to convert it into the needed NFS/SMB/Whatever protocols).

But now some operations that are stupidly cheap locally (in part due to incredibly aggressive caching) become really onerous for the NAS (directory listing, file locking¹). NFS clients end up trying to cache things to make up for this - but now you have a split brain problem that locally you do not in almost all circumstances (because the filesystem driver can mediate most such access and locks work very cheaply - and tied to process life)

So for a worked example readdir and readdirplus are the best concrete examples I know of where, if something is written to use the former then locally it tends to be fine. Perhaps a minor increase in kernel transitioons sometimes, but the local filesystem will likely agressively cache the needed info (even just having the relevant parts of the backing tree paged in, or in a CPU cache already helps without any actual attempt to code it in the underlying driver). As a result some libraries/apps don't update (because it's a mild faff and they see no difference in their tests because they don't care about some complex distributed NAS user experience). Then you use it where directory listing is expensive and complex and really wish someone had moved it to readdirplus for both the batching, and the additional info returned rather than being a separate fstat or whatever.

I think there's been multiple python libraries which had to be changed to make sure they used the right one to acvoid having awful IO perf (that impacts other users of the system)

1. a mess on NFS anyway, they are generally a nightmare conceptually and in the concrete implementations because you need the client side to respect timeouts.

koala · Mar 18, 2024

Yeah, what I'm saying is that for most use cases, likely increasing page size is the best option- and it actually "hides" even more remoteness/complexity.

Because likely if you need to get so many items from an API call that paginating is worth it... likely your API is not the correct API.

ImpossiblyStupid · Mar 18, 2024

Apteris said:
Abstractions leak, and network calls are quantitatively different enough from local calls that they become qualitatively different. Nothing on the local machine is going to hang for 30-60s before returning a response.

But at some level of abstraction, why should you or the code care? Which is to say, I have absolutely had machines start thrashing when they were low on memory and take 5m to do something that would normally take 5s. Instead of handwaving the issue out of existence, just define how you're going to handle it. I'm going to handle it by waiting (or having a timeout if it matters), and that is why I'm not sold on your argument.

Apteris said:
When I program locally, my expectation is that most individual calls will complete within a few dozen milliseconds, at most. When I call external resources, I have to think explicitly about latencies, be they also small (e.g. when calling a well-optimized database) or very large (e.g. REST calls). Putting those two regimes behind an abstraction that purports to treat them the same is... questionable.

Yes, questionable. Questionable and fun! The name of the game is fun new languages, after all, which kind of morphed to language features, and again to experimenting with those kinds of features in a language you already know. You're welcome to play around in different ways, but I personally just get a little sad when I pull up a process monitor and see 8 cores on my computer just sitting there unused by any process, local or remote.

ImpossiblyStupid · Mar 19, 2024

ShuggyCoUk said:
Designing your system to have those concepts is really quite hard (I've had such fun showing people how to achieve number 1).

I never said systems should be (re)designed to function that way! In fact, I said just the opposite. The challenge was to add concurrency + RPC without having to change the code you already have.

ShuggyCoUk said:
You don't add in those capabilities unless someone wants them and is willing to pay for them

Or unless you're looking to have fun!

ShuggyCoUk said:
Compute clusters are so much cheaper than they were but they still cost a lot of money and you don't opt into that at small scale unless you want to waste a bunch of time and effort and money because of all the frictions created.

Somebody isn't having . . . fun!

ShuggyCoUk said:
That's a foolish description of the problem. Anyone suggesting that as-is in business would be laughed out of a room.

People laughing and being foolish? Oh no!

ShuggyCoUk said:
You want to use the least amount of resources to accomplish a task in a acceptable amount of time (this is often an optimisation problem with some interesting surfaces but you get the idea).

There's something of a paradox in that, though, for modern "cloud" computing. Depending on what you're doing, the "least amount of resources" is actually a massive, distant server farm that you may only briefly rent.

ShuggyCoUk said:
So long as the library call makes it clear that it's doing that (I'll cover that on Koala's question).

I'd still argue that there may be a better place to "make it clear" than the library call itself. Exceptions exist. Observer patterns exist.

ShuggyCoUk · Mar 19, 2024

Retargetting to fun given the OP sure.

ImpossiblyStupid said:
Exceptions exist. Observer patterns exist.

Oh man, if you're having to lean on patterns I in a new language you're learning for fun that is sad making

Repetitive/boilerplate/obfuscatory use of patterns almost always indicates the language you are using is a poor fit for your task. That's fine in a business domain (And I've seen many people fall into the trap of just trying to shoe horn some new language/DSL in without considering the long term costs) but for fun pick a language which changes the game.

If you want to look at doing fun stuff with concurrency then look into Erlang (but I suggest using Elixir as the language is much less tedious - it has proper string types for a start!) or something else that truly changes the game. Green threads/fibers are another option but you'll fine them unpleasant outside of maybe Go for the widely used modern languages.
If you want to look at parallelism then things like the TPL exist in dotnet and other languages have similar libraries. Just whack .AsParallel() into your use of an IEnumerable and off you trot. But I suspect many things will happen, including your actual task not getting much faster unless it was already embarrassingly parallel.

Blending concurrency and parallelism is fundamentally hard (again, except for the embarrassingly parallel stuff).

ShuggyCoUk · Mar 19, 2024

ImpossiblyStupid said:
I never said systems should be (re)designed to function that way! In fact, I said just the opposite. The challenge was to add concurrency + RPC without having to change the code you already have.

I think it's pretty clear I fundamentally disagree with this conceptually and in practice but this isn't the thread for that.

If you really want to do this then consider looking into the field but it's very dry and pretty small.

Lt_Storm · Mar 24, 2024

koala said:
Yeah, what I'm saying is that for most use cases, likely increasing page size is the best option- and it actually "hides" even more remoteness/complexity.

Because likely if you need to get so many items from an API call that paginating is worth it... likely your API is not the correct API.

On one hand, I don't disagree that larger page sizes are often a good idea, in the other hand, you don't always know when you're query is only going to match ten items vs ten thousand until after you have made it, so pagination is often still handy, if nothing else, just to reduce the chances that your daft request isn't sniping boxes...

Lt_Storm · Mar 24, 2024

ShuggyCoUk said:
If you want to look at doing fun stuff with concurrency then look into Erlang (but I suggest using Elixir as the language is much less tedious - it has proper string types for a start!) or something else that truly changes the game. Green threads/fibers are another option but you'll fine them unpleasant outside of maybe Go for the widely used modern languages.

Ruby has pretty nice fiber support...

AdrianS · Apr 21, 2024

As someone who's written a lot of C & ASM code before moving to managed languages, I really enjoyed dipping into pixel shaders.

Basically GPU programming with register accesses etc. to manipulate images.

Mat8iou · May 9, 2024

Clojure is a possible choice that builds on some of what you already know, but with some new features added and can fairly easily translate across to real world applications.

TigerAway · Jun 8, 2024

I vote Python or Julia.

snotnose · Jun 9, 2024

I spent 40 years as an embedded engineer, writing mostly C code for device drivers and OS internals (my 30 year old fingerprints are in the Linux kernel). Along the way I used bash, awk, perl, Python, TCL/Tk, and a whole host of other tools to support my embedded stuff.

Retired several years back, and a few months ago decided to learn Java. Learning OO from the ground up is actually fun. I feel like the teenager who disassembled the ROM in my TRS-80 to figure out how things worked. Better, my programs are no longer C written with Java syntax.

I'm now learning Kotlin and Android and really enjoying myself. Not to mention I'm haven't fired up Vi nor a Makefile in a few months now, my IDE, while not perfect, is pretty danged good (IntelliJ Idea).

Will I ever actually use this knowledge? No. I can't see myself working again, I don't have the patience to teach, and any ideas for a useful program I can come up with has already been done better by 100 other people.

I'd like to put in a plug for the Big Nerd Ranch Android Programming book. I've read a lot of computer books, for learning a complex subject from the ground up this is the best book by far I've ever read.

Which new computer language for fun?

Ars Tribunus Angusticlavius

Ars Tribunus Angusticlavius

Ars Tribunus Angusticlavius

Ars Tribunus Angusticlavius

Ars Tribunus Angusticlavius

Smack-Fu Master, in training

Smack-Fu Master, in training

Ars Tribunus Angusticlavius

Ars Tribunus Angusticlavius

Ars Praefectus

Ars Praefectus

Ars Scholae Palatinae

Ars Praefectus

Ars Tribunus Militum

Ars Tribunus Militum