Collaborate on application lifecycle managent in k8s - let's have a meeting to discuss

pierreozoux · June 3, 2022, 3:15pm

k8s.Libre.sh

If you want to discuss about these, answer this form to find a date.
Vote until the 8th of June 3pm, and I’ll tell the date here the 8th of June at 4pm.

Context

We, at IndieHosters, run on kubernetes since more than 3 years now.

We went through different phases to manage our deployments:

From this experience, we can tell that managing the deploy object is the easy part of the story, of our job.
And helm is really good at that.

But, our job, system administrator, involves a lot more tasks than just deploy, and run a migration script to update.

Here is our migrations, where we keep track of these admin tasks.

And templating is fine to replace values, but when you start to need turing complete language, it becomes difficult to manage.

And because we plan to maintain these kubernetes clusters for the next decade, we want to have solid foundations. That’s why we want to redevelop these packages as operators. (Nextcloud operator, Matrix operator, Discourse operator…)

For all these reasons, we’d like to discuss the creation of a yunohost like system, but based on kubernetes primitives.

And we’d like to collaborate with you on this endeavor.

Potential collaboration

There is a wide variety of subject we could discuss like:

distrib kubernetes / Infra
- VM provider
- OS
- kubernetes binary
- …
Curating backend/infra services providers
- storage
  - objects
  - databases
  - cache
  - …
- network
  - overlay
  - gateway/proxy
  - LoadBalancer
  - cert manager
  - …
Identity Management
- Identity provider
- clients provisionning
  - SAML
  - OiDC
  - …
- user/group sync
UI Dashboard
- manage users/groups
- manage applications
- …
Curating Collaboration plateforme
- RocketChat vs Matrix
- Nextcloud vs seafile vs raw s3
Maintain Upstream apps
- share work to bug fix Nextcloud
- …

But we’ll not discuss about those.
Instead, we’d like to discuss on:

Application operator

For each app, we aim to build a “package” spec (a CRD) that would allow us to:

deploy
update
perform migrations (change bucket endpoint for instance)
backup/restore/clone

The goal is to be able to deploy this application in these envs:

single home env, like a VM or arm computer
high availability setup

Overview

Goals

safe : not easy to break
auto pilot : set it and forget it
- auto update
- auto scaling
2 topology for app deployments
- single node : simple setup & small footprint
- high availibility : survives node failure & rolling update
backup/restore : from one libre.sh to another
point in time recovery
modularity : you can swap backing service implementation

Non-Goals

highly customizable : it’s not kustomize, you can’t change every fields
support every scenario : we offer a curated experience, if your cluster isn’t compatible, deploy a dedicated one
use existing packages (helm or kustomize)
build a package manager : it won’t replace helm & kustomize

Technical problems

Abstract backing service management

We need standard APIs to manage and keep track of the backing services, and extract most of required logic from the app management code.
A CRD + controller for each service seems like kube way. And it avoid building http api and the need of external state storage.

Example backend services:

Id provider
database
object store
redis
email

(We tried to discuss upstream about this concept, but it is way too much work for our small coop (: )

Separate infrastructure specific configuration (of backing service)

We need to choose and configure backing service’s provisionner. It’s the kind of config done once at installation and availible cluster wide.
The existing storageClass concept is interesting. Using something similar will allow us to decouple this config from any instance (app or backing service) CRD. It also avoid repetition. And it can be managed by a specific role (RBAC).

Potential solution based on Crossplane:

This object is deployed by the Nextlcoud operator:

apiVersion: database.libre.sh/v1alpha1
kind: PostgreSQL
metadata:
  namespace: default
  name: my-db
spec:
  parameters:
    storage: 5G
  compositionRef:
    name: zalandopg
  writeConnectionSecretToRef:
    name: my-db-connection-details

Then, as cluster admin, you can then decide to use crossplane to implement this CRD (or develop your onw controller implementation).
We configure our cluster to transform this object into a postgres zalando object with this config:

apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: zalandopg
  labels:
    crossplane.io/xrd: xpostgresqlinstances.database.libre.sh
    provider: kubernetes
spec:
  writeConnectionSecretsToNamespace: libresh-system
  compositeTypeRef:
    apiVersion: database.libre.sh/v1alpha1
    kind: XPostgreSQLInstance
  resources:
    - name: zalandoinstance
      base:
        apiVersion: kubernetes.crossplane.io/v1alpha1
        kind: Object
        spec:
          providerConfigRef:
            name: kubernetes-provider
          forProvider:
            manifest:
              apiVersion: acid.zalan.do/v1
              kind: postgresql
              spec:
                databases: {}
                numberOfInstances: 1
                postgresql:
                  version: "12"
                teamId: libresh
                users: {}
      patches:
        - type: FromCompositeFieldPath
          fromFieldPath: metadata.labels["crossplane.io/claim-namespace"]
          toFieldPath: spec.forProvider.manifest.metadata.namespace
        - type: FromCompositeFieldPath
          fromFieldPath: metadata.name
          toFieldPath: spec.forProvider.manifest.metadata.name
          transforms:
            - type: string
              string:
                fmt: "libresh-%s"
"postgres.libresh-%s.credentials.postgresql.acid.zalan.do"
        - type: FromCompositeFieldPath
          fromFieldPath: spec.parameters.storage
          toFieldPath: spec.forProvider.manifest.spec.volume.size
      readinessChecks:
        - type: MatchString
          fieldPath: status.atProvider.manifest.status.PostgresClusterStatus
          matchString: "Running"

You can find more yaml ideas here.

Referencing other application

kind: CollaboraClaim
metadata:
  name: test-instance
spec:
  writeConnectionSecretToRef:
    name: collabora-connection-details

Using external backing services

See Crossplane.

Managing app configs for many instances

How do we configure many Nextcloud instances with defaults, and specifics.

how · June 3, 2022, 6:18pm

This is awesome @pierreozoux!

I have some updates for you on this plane: we’re just out of the third meeting for setting up a cooperative datacenter nearby and we agreed on using libre.sh. \o/ More later…

how · June 3, 2022, 6:21pm

But I have bad news as well: I’ll be on the road from June 8 to June 20-something, so a meeting will be improbable. If there is a date and it matches my schedule I’ll be happy to join though. I’ll pass the meeting info on to my new colleagues

And I’m planning to go to Lyon on the way back @unteem… Let’s find a moment to find each other.

pierreozoux · June 9, 2022, 1:24pm

@how we’ll take note and organize a monthly meeting anyway, I’ll tell when is the recurrence.

It will be Tuesday, June 14, 2022 - 16h

See you there!

how · August 22, 2022, 2:32pm

I missed Lyon but got to Grenoble instead. I’ll be there again in October for Rezine’s 10th anniversary. If we can meet in Lyon (or you come to Grenoble) we can catch up!