Struggling with architecture for service hosting and SSO in my home network - advice wanted!

kaworu1986

Ars Scholae Palatinae
654
My current home network is relatively complex: I share a home with my SO and between the two of us we have some 30 client devices running all the major OSes (Windows, macOS, iOS, Android, Linux) as well as three NASes (two Truenas Core boxes and an Odroid HC4 running Armbian) and a LibreElec (minimal Linux running Kodi) box for streaming video from the NASes to the TV over SMB.

The devices are divided into a few VLANs (my devices, hist devices, IoT stuff etc.) and core network services (routing, gateway to the internet, DHCP, Router Advertisement, DNS, RADIUS server for the WiFi) are handled by an OPNSense box. My internet connection is IPv6 only (IPv4 via DS-Lite) with a stable /56 prefix delegation and internally I dual stack (IPv4 via DHCP and IPv6 ULAs and GUAs via SLAAC).

I recently picked up a couple of used SFF boxes for cheap (6-core 8th Gen Core i7, 32GB RAM, 1TB NVME SSDs, single gigabit ethernet) with the intention of creating a small cluster to self host a few services like:
  • Directory and SSO
  • home assistant
  • gitlab or gitea
  • build agents running Windows/Linux/macOS (if I can get it to work on Proxmox) to be able to do CI for multiple platforms at the same time
  • owncloud/nextcloud
  • an internal SMTP relay
I am struggling with a few fundamental decisions, and would like to ask for people's perspective
  1. With the exception of Home Assistant, I would be the only one using the hosted services: should I put the cluster in its own VLAN and subnet or keep it together with my clients and existing servers? If I put it in its own VLAN it would be easier to turn it into an internet accessible DMZ, however since my switches are L2 only doing so would increase load on the gateway.
  2. Given that I own a domain and have some infrastructure off-prem (three websites on Github pages, a VPS, email for the domain hosted on Google), would it be better to use a subdomain (home.mydomain.com) to name devices on my LAN or should I implement split brain DNS?
    1. Split brain would be easy to implement, just define a zone for mydomain.com in Bind (already use it as DNS resolver) on the gateway. the rest of the internet uses what I set up in Cloudflare
    2. Split brain DNS means less typing when connecting to devices at home
    3. Split brain DNS also allows me to use mydomain.com for the directory, so I'd have principal names that map with my existing email addresses (user@mydomain.com vs user@home.mydomain.com)
    4. Subdomain means I could make an on-prem DNS server authoritative for the home subdomain, and have proper delegation and DNSSEC for the rest of the internet
  3. For directory service and SSO, the idea is to only have the servers - most of which running Linux - joined to the domain with clients logging in via manually kiniting or browser (SAML/OAUTH2). Is it better to go FreeIPA + something like Authelia/Authentik set up as web IdP provider using it as back end via LDAP, SAMBA + Authelia/Authentik in the same configuration or Active Directory Domain and Federation Services on Server Core?
    1. I have been labbing FreeIPA and got my Windows client to authenticate to SSH and SMB shares via Kerberos on a different Linux instance from the IPA controller.
    2. FreeIPA has an integrated CA which is nice, but also a bloody hog (on an LXC container it uses 1GB RAM with CA, half that without)
    3. FreeIPA is nicer to set up on Linux (fedora at least) and enrol clients with, but I'm not sure it can work with TrueNAS
    4. Samba does not want to work in an LXC container, ever. Kind of a pain to set up SSH SSO via Kerberos
    5. I don't need or want GPOs
    6. Can I just skip the web IDP for SAML/OAUTH2 support and just make my web services authenticate using GSSAPI/Windows Authentication instead? Windows goes out of its way to disable it by default (need to add the URIs to local intranet zone for it to work) - why?
  4. Having already set up proxmox on the cluster members, would it be better to go for putting each service in individual VMs or LXC containers where possible OR set up a Kubernetes cluster (two VMs per node, a worker and a manager) and run my services inside it?
    1. Is only having a single ethernet interface going to be a bottleneck for Kubernetes? Worried about intra-cluster traffic, especially for storage
  5. Right now the two nodes are far apart (placed on different desks in different rooms, connected to different switches and power outlets): would it be better to place them together connected to the same switch? Or is the increase in resiliency (if I need to unplug one for any reason chances are the other one would be left alone) worth it?
Thanks in advance
 

koala

Ars Tribunus Angusticlavius
7,579
You might want to look at my setup.

For DNS, I have location1.int.mydomain.com, location2.int.mydomain.com. Every location has a local dnsmasq, and they all each point to each other for their subdomains. Together with Tinc site-to-site VPNs that allows me to reference hosts by name across networks (even reverse DNS). Puppet module.

I love FreeIPA. It gives me some occasional trouble, but the mailing list usually helps. I like how you can set up sudo/host access policies. Yes, it's a hog (I run two instances; a 2gb VPS, and a 3gb LXC container in Proxmox), but if you are happy with running RHEL clones, it's so easy... Also, all my Linux boxes use it for authentication. I used to run some Kerberos services, and it was VERY pleasant, but I did a large migration and postponed that.

I use Ipsilon for web auth. It's a bit unknown, and it's unfrequently maintained (basically, the Fedora project and some other RHEL clones use it for authentication- and they only maintain it for their usecases, I think), but it has two advantages:
  • It does Kerberos browser auth! So I get the SSO experience easily on services I authenticate with that
  • It's an RPM and the install process is easy (when you know the small details)
The other alternative that does Kerberos is Keycloak, which is much nicer and more maintained, but the setup requires more effort (unless you deploy it in Kubernetes, which was my plan B if Ipsilon didn't work).

I don't have everything on Ipsilon yet (e.g. Nextcloud is using LDAP, yuck). More stuff supports OAuth than Kerberos browser-based auth, although Apache httpd does, and thus any software that can use authentication via a reverse proxy. And you can get Kerberos browser-based auth through Ipsilon or Keycloak.

In the end, for file sharing I have NFS and Samba. Samba is not integrated into FreeIPA, and... not sure what's happening with NFS. It doesn't get used a lot and it works.

As for LXC/VMs vs. Kubernetes, it depends on the service, IMHO. If I can, I prefer RPMs on LXC. It's superlightweight, very easy to patch automatically, and easy to set up. However, not all software is provided as an RPM. It's more about "what's the first-class distribution system for upstream?" For example, FreeIPA/Ipsilon, RPMs are first class. Keycloak, Kubernetes is first-class.

I set up a single node Talos Kubernetes cluster (on Proxmox) and I really like how straightforward it is. But right now I'm running it without any storage- just to run software which only requires external storage (e.g. a PostgreSQL database running on LXC, mostly). There's a "Kubernetes is the preferred distribution method" in my future (Takahe), but it requires object storage. I'm mulling SeaweedFS- my ideal plan would be to create an RPM (I discussed it with upstream, and they seemed open to the idea), and run it in an LXC container.

The two big benefits of Kubernetes for me are:
  • Easy node scaling. I think this doesn't make sense for my personal infra. But it's awesome in larger scale infra, esp. if you run in a cloud.
  • It's a standardized OS for running services- it's very easy to build a SaaS platform on it, and it's very easy to deploy software that is distributed as container images.
Right now, I just run two apps developed by me, as a kind of SaaS (every time I make a change, GH Actions builds a new container image. At the moment I just use a bit of kubectl to update. But even at that level, it's really nice). But I think eventually more services will have container images as the only distribution mechanism, so it will get more importance.

Build agents you could also run on Kubernetes. From a theoretical standpoint, I find that ideal, but I haven't played much with that. I have a similar plan. I'm using GH Actions for builds, but they don't provide ARM runners yet, which I use a bit. Some software I have managed to crosscompile, but I have one piece of software where I run builds using qemu and it's SO slow. My plan was to host GH Action Runners on my K8S cluster, and add an ARM64 node. But it's a bit far off in my TODO yet :(