Production cluster - architecture

I think it’s a good time (and place) to continue the conversation started by @pierreozoux at indie.host last year.

In a nutshell, he was proposing to setup a shared production cluster and asking a few questions:

  1. Bare metal vs. Cloud?
  2. Load Balancer with Floating IPs on bare metal?
  3. Persistent Layer (Rook? Ceph?)
  4. FDE for deployed nodes?

Regarding 4. (FDE) I found this nice tutorial (in French) to setup full disk encryption on a cloud instance running Alpine Linux. The initramfs includes OpenSSH which is configured to run /sbin/unlock_disk.sh over SSH. Users with the right SSH key can then enter the LUKS passphrase and the system can boot.

1 Like

I’m also experimenting with Clevis/Tang, for auto-decrypting during an unattended reboot.

2 Likes

I fail to understand how Clevis/Tang would keep a server VM safe. It has no access to TPM on the physical host and keeping the key on the server to automatically decipher a disk seems to defeat the purpose of encrypting the disk in the first place. Am I wrong?

Well the key is actually on another server (the one that runs Tang), which “offers” the decryption key during boot of the encrypted machine. Their Readme fie has some more detailed explanation of how this works.

It’s not bulletproof of course. I can see Thread models that don’t fit into this and it would make more sense to ssh to the machine and type a passphrase. But in some cases it might be useful.

Any news about this?

I’ll come and reboot! I want to have a shared cluster among chatons and librehosters (and some other nice person).
We need to have a staging cluster, and we can pay for it.
Here is the gist of it:

The idea is to build Kubernetes - the libre way.
So the idea of the libre.sh v2 is to build a kube distribution, and as always, battery included, but swappable.
Think of it as the debian of kubernetes distribution. One that is aimed at non cloud environment.

I plan to divide the work in various layers:

  • L0 - infra - Libre.sh will support HetznerCloud by default (Provision the 9 nodes of the reference implementation: 3 masters, 3 infra, 3 workers)
  • L1 - ansible
    • wireguard (‘management’ vpn for ceph and kube and ‘ceph-backend’ vpn, only for ceph OSDs)
    • cri-o
    • FloatingIP (for master API and ingresses)
    • kubeadm
    • Operator Lifecycle Manageemnt
  • L2 - Libre.sh operator
    • rook-ceph
    • canal + wireguard
    • nginx ingress
    • cert-manager
    • ClusterMonitoringOperator

Basically after L1, we have access to the kubeAPI and we provision the rest with operators.

After L2, we are ready to provision applications.

Then on L3, we provide operators for popular FLOSS:

  • Nextcloud / Collabora
  • Discourse
  • CodiMD
  • RocketChat
  • GitLab
3 Likes

I just got my hands on a Dell - PowerEdge R710 https://www.dell.com/downloads/global/products/pedge/en/server-poweredge-r710-tech-guidebook.pdf

5 * 300GB SAS, 2 * 149GB SSD SAS and 1 * 600GB SAS

Posted this without finishing, forgot the post and then didn’t want to lose it.

Am struggling at the moment with the RAID-controller (which I can’t disable nor but in JBOD), thus 2 RAID1 groups.

Not sure how to proceed architecture-wise, I think you always need ab base-OS for the hypervisor?

1 Like

Hi Pierre - how is this experiment going? I’m very interested in the idea of Debian for Kubernetes especially as it regards a stable and resilient setup for indie/libre hosting. Having the base (L0/L1/L2 as you describe them) as a fairly simple-to-install starting point would be a great foundation to build on

And how can we help to make this a reality !

I am doing a bit of sidework with operationtulip, and there I have access to a “couple” of servers.
The total CPU core count is about 1000, and total RAM = around 4TB.

The only problem we face - is room in our rack.
We will ship an S3 storagesolution soon in CEPH-RADOSGW if that is interesting for youu