Which new computer language for fun?

ShuggyCoUk

Ars Tribunus Angusticlavius
9,975
Subscriptor++
OK, my definition of "hiding remote" is different from "the name makes it clear it's a remote call".
It's not a definition. The clincher parts are the important bits, but names always matter.
I did mention paging in my post, but really I think paging is typically an antipattern somewhere (either in your code, or in the API itself).
Nope, pagination is batching. It could be done really badly of course but it's a remote call to some other system that has to maintain resources on holding open some arbitrary length reply to your request. There are costs to that and pushing those to the client is normally a good idea to scale because it's normally cheap for the client (they only have one or few such requests so trivial data structures just work) and if the client stops/forgets there's no need for the server to have any timeout implementation/reaping.
Batching in distributed systems is normally a big win in API design. The smallest unit of work to the domain may not match the most efficient unit of request/response.
Also pagination shows you when your failure can occur on the boundary not at some arbitrary point calling Next on an iterator that is hiding this. You want to wrap the result in some code that de-paginates, cool go ahead. But apis that just hide that so you have to just deal with it are bad.
I have a similar opinion wrt. to error handling/retries, although in principle, I prefer non-API specific abstractions (e.g. in Python using something like https://pypi.org/project/retry/ ), instead of each API client having its own mechanism. However, I'm undecided on this, because I see some drawbacks to this approach and I haven't played enough in this space to have a clear opinion.
I would hope that the plugin model on githubs api allows for using retry with it. In languages with a sane error handling path (exceptions are one such, but consistent ResultOrError return types also work - which is why you want them baked into the standard library and APIs up front) the 'plugin' is right there and should be minimal/no effort for APIs to integrate with.
Designing things to be idempotent (either naturally or through techniques like one shot request identifiers) makes such designs simpler
 

koala

Ars Tribunus Angusticlavius
7,579
Pagination is EXTREMELY hard to get right, in the face of modifications. So if you grab page one of a query, someone makes a modification... that can invalidate what "page 2" is. In theory, if you want to return consistent results, as soon as someone makes a paginated query, you should get some kind of snapshot of the state, so you can return consistent results.

I guess most APIs just ostrich this, and make everything using paging unreliable (e.g. you might miss items).

And likely, making things consistent is just harder than sending out all results in one batch. At some scales, I think making 10 requests of size 50 is more expensive for everyone, that making a single request of size 500 (which is not that much!).

...

My conflict with a generic retry function is for instance keeping sessions. If I implement a generic retry function, then it's complex that this integrates easily and properly with the underlying state of the function being retried (e.g. preserving a session across requests).
 

ShuggyCoUk

Ars Tribunus Angusticlavius
9,975
Subscriptor++
Pagination is EXTREMELY hard to get right, in the face of modifications. So if you grab page one of a query, someone makes a modification... that can invalidate what "page 2" is. In theory, if you want to return consistent results, as soon as someone makes a paginated query, you should get some kind of snapshot of the state, so you can return consistent results.
Anything is hard to get right in the face oof modifications...
In dotnet for example if something modifies a list during iteration it throws on the next iteration.

Pretending that can't happen, or forcing the server to try to pretend it hasn't is a massive cost (potentially a long running database transaction in the worst case).

If you have snapshot isolation, or a bi-temporal structure of some other form, then pagination can become comparatively simple. The follow up requests indicate which snapshot/point in time they are from and can be reliably sourced. Attempts to access a snapshot thats aged out in some way result in a clean failure telling you why.

I guess most APIs just ostrich this, and make everything using paging unreliable (e.g. you might miss items).
if you arrange the pagination on something with a decent ordering (say creation in a single database) then you can avoid this, more comples systems can solve this under sharding too but the request state increases in size and complexity). This then is designing the API to conform to a contract that can be effectively correct. It pushes any ordering on the results to the client if it wants to reorder them in some way, it's all a trade off.
And likely, making things consistent is just harder than sending out all results in one batch. At some scales, I think making 10 requests of size 50 is more expensive for everyone, that making a single request of size 500 (which is not that much!).
Absolutely - someone choosing the batch size badly just make sit worse. This stuff is hard which I think is making my point.
Go slow clients shafting the server are a thing if you don't though.
My conflict with a generic retry function is for instance keeping sessions. If I implement a generic retry function, then it's complex that this integrates easily and properly with the underlying state of the function being retried (e.g. preserving a session across requests).
Indeed, if there's session level state nothing generic will solve it. But session level state is hard and definitely shouldn't be hidden away without making the session the thing which is retired (which is not amenable to simple stack bound based semantics).
This is where you start making decisions on whether the potential performance gains of a session level API (like NFS4) outweigh the simplicity of something like NFS3) it's a balance, anything hiding that away is likely to be either a source of problems, or a source of poor performance.

In distributed NAS land there's a bunch of people/groups who stick to NFS3 because it can scale in certain ways more pleasantly (since that session level state isn't something that needs to be distributed)
 
  • Like
Reactions: MilleniX

ShuggyCoUk

Ars Tribunus Angusticlavius
9,975
Subscriptor++
I think file systems are a brilliant example of how trying to pretend something is like a local one is both hugely useful and hugely flawed

It's hugely useful because things JustWork (good or bad) your UX is exactly the same (Windows Explorer/Nautilus/Whatever you like-ish) and there's an API for it baked into anything with a filesystem IO layer (the point is that your OS normally does the magic to convert it into the needed NFS/SMB/Whatever protocols).

But now some operations that are stupidly cheap locally (in part due to incredibly aggressive caching) become really onerous for the NAS (directory listing, file locking1). NFS clients end up trying to cache things to make up for this - but now you have a split brain problem that locally you do not in almost all circumstances (because the filesystem driver can mediate most such access and locks work very cheaply - and tied to process life)

So for a worked example readdir and readdirplus are the best concrete examples I know of where, if something is written to use the former then locally it tends to be fine. Perhaps a minor increase in kernel transitioons sometimes, but the local filesystem will likely agressively cache the needed info (even just having the relevant parts of the backing tree paged in, or in a CPU cache already helps without any actual attempt to code it in the underlying driver). As a result some libraries/apps don't update (because it's a mild faff and they see no difference in their tests because they don't care about some complex distributed NAS user experience). Then you use it where directory listing is expensive and complex and really wish someone had moved it to readdirplus for both the batching, and the additional info returned rather than being a separate fstat or whatever.

I think there's been multiple python libraries which had to be changed to make sure they used the right one to acvoid having awful IO perf (that impacts other users of the system)

1. a mess on NFS anyway, they are generally a nightmare conceptually and in the concrete implementations because you need the client side to respect timeouts.
 

ImpossiblyStupid

Smack-Fu Master, in training
81
Subscriptor
Abstractions leak, and network calls are quantitatively different enough from local calls that they become qualitatively different. Nothing on the local machine is going to hang for 30-60s before returning a response.
But at some level of abstraction, why should you or the code care? Which is to say, I have absolutely had machines start thrashing when they were low on memory and take 5m to do something that would normally take 5s. Instead of handwaving the issue out of existence, just define how you're going to handle it. I'm going to handle it by waiting (or having a timeout if it matters), and that is why I'm not sold on your argument.

When I program locally, my expectation is that most individual calls will complete within a few dozen milliseconds, at most. When I call external resources, I have to think explicitly about latencies, be they also small (e.g. when calling a well-optimized database) or very large (e.g. REST calls). Putting those two regimes behind an abstraction that purports to treat them the same is... questionable.
Yes, questionable. Questionable and fun! The name of the game is fun new languages, after all, which kind of morphed to language features, and again to experimenting with those kinds of features in a language you already know. You're welcome to play around in different ways, but I personally just get a little sad when I pull up a process monitor and see 8 cores on my computer just sitting there unused by any process, local or remote.
 

ImpossiblyStupid

Smack-Fu Master, in training
81
Subscriptor
Designing your system to have those concepts is really quite hard (I've had such fun showing people how to achieve number 1).
I never said systems should be (re)designed to function that way! In fact, I said just the opposite. The challenge was to add concurrency + RPC without having to change the code you already have.

You don't add in those capabilities unless someone wants them and is willing to pay for them

Or unless you're looking to have fun!

Compute clusters are so much cheaper than they were but they still cost a lot of money and you don't opt into that at small scale unless you want to waste a bunch of time and effort and money because of all the frictions created.

Somebody isn't having . . . fun! :ROFLMAO:

That's a foolish description of the problem. Anyone suggesting that as-is in business would be laughed out of a room.

People laughing and being foolish? Oh no!

You want to use the least amount of resources to accomplish a task in a acceptable amount of time (this is often an optimisation problem with some interesting surfaces but you get the idea).

There's something of a paradox in that, though, for modern "cloud" computing. Depending on what you're doing, the "least amount of resources" is actually a massive, distant server farm that you may only briefly rent.

So long as the library call makes it clear that it's doing that (I'll cover that on Koala's question).

I'd still argue that there may be a better place to "make it clear" than the library call itself. Exceptions exist. Observer patterns exist.
 

ShuggyCoUk

Ars Tribunus Angusticlavius
9,975
Subscriptor++
Retargetting to fun given the OP sure.
Exceptions exist. Observer patterns exist.
Oh man, if you're having to lean on patterns I in a new language you're learning for fun that is sad making 😿

Repetitive/boilerplate/obfuscatory use of patterns almost always indicates the language you are using is a poor fit for your task. That's fine in a business domain (And I've seen many people fall into the trap of just trying to shoe horn some new language/DSL in without considering the long term costs) but for fun pick a language which changes the game.

If you want to look at doing fun stuff with concurrency then look into Erlang (but I suggest using Elixir as the language is much less tedious - it has proper string types for a start!) or something else that truly changes the game. Green threads/fibers are another option but you'll fine them unpleasant outside of maybe Go for the widely used modern languages.
If you want to look at parallelism then things like the TPL exist in dotnet and other languages have similar libraries. Just whack .AsParallel() into your use of an IEnumerable and off you trot. But I suspect many things will happen, including your actual task not getting much faster unless it was already embarrassingly parallel.

Blending concurrency and parallelism is fundamentally hard (again, except for the embarrassingly parallel stuff).
 

ShuggyCoUk

Ars Tribunus Angusticlavius
9,975
Subscriptor++
  • Like
Reactions: MilleniX

Lt_Storm

Ars Praefectus
16,294
Subscriptor++
Yeah, what I'm saying is that for most use cases, likely increasing page size is the best option- and it actually "hides" even more remoteness/complexity.

Because likely if you need to get so many items from an API call that paginating is worth it... likely your API is not the correct API.
On one hand, I don't disagree that larger page sizes are often a good idea, in the other hand, you don't always know when you're query is only going to match ten items vs ten thousand until after you have made it, so pagination is often still handy, if nothing else, just to reduce the chances that your daft request isn't sniping boxes...
 

Lt_Storm

Ars Praefectus
16,294
Subscriptor++
If you want to look at doing fun stuff with concurrency then look into Erlang (but I suggest using Elixir as the language is much less tedious - it has proper string types for a start!) or something else that truly changes the game. Green threads/fibers are another option but you'll fine them unpleasant outside of maybe Go for the widely used modern languages.
Ruby has pretty nice fiber support...
 

snotnose

Ars Tribunus Militum
2,747
Subscriptor
I spent 40 years as an embedded engineer, writing mostly C code for device drivers and OS internals (my 30 year old fingerprints are in the Linux kernel). Along the way I used bash, awk, perl, Python, TCL/Tk, and a whole host of other tools to support my embedded stuff.

Retired several years back, and a few months ago decided to learn Java. Learning OO from the ground up is actually fun. I feel like the teenager who disassembled the ROM in my TRS-80 to figure out how things worked. Better, my programs are no longer C written with Java syntax.

I'm now learning Kotlin and Android and really enjoying myself. Not to mention I'm haven't fired up Vi nor a Makefile in a few months now, my IDE, while not perfect, is pretty danged good (IntelliJ Idea).

Will I ever actually use this knowledge? No. I can't see myself working again, I don't have the patience to teach, and any ideas for a useful program I can come up with has already been done better by 100 other people.

I'd like to put in a plug for the Big Nerd Ranch Android Programming book. I've read a lot of computer books, for learning a complex subject from the ground up this is the best book by far I've ever read.