New generation status page


#1

We discussed various times about hosting each other a cachet.

I have a better idea, it is a mix of the following:

the netlifly is for maintenances (with RSS).

And grafana is for real time status of services (with prom and blackbox)

Now, let’s say we have Alice and Bob. Alice will host the status page for Bob.

Alice needs to discover the list of services to monitor.

This looks like this the configuration.

Bob could setup one http endpoint with this file, and Alice has to scrap it regularly and when it changes restart her Prom.

This is just the basic idea, but if you are interested, we should do a hackathon :slight_smile: (even remote :wink: )


#3

Alice and Bob met on this forum, or in the matrix channel :slight_smile:

Actually, this could also be a service offered by the network.

Both ways are fine for me.

Regarding alerts, Alice can also configure Prometheus alerts so that if a service is down, it sends an email or a webhook to Bob endpoint.


#4

This can be helpful too :slight_smile:


#6

Of course this is containerized, so we just need to know the Docker commands and the cost of hosting it. :wink:

Maybe we should make an inventory of “our” resources (I think a lot of people here use Hetzner, so it might be interesting to see how much “cloud” could be turned into a colocation for example.)


#7

If we are to automate, we need to establish some “protocol” beyond the technology itself.

If we converge on Docker, we already have a common base for disseminating the containers. But I find the challenge on a few other questions:

  1. Which addresses should these containers probe?
  2. Do they scrape some entries on the librehosters.json?
  3. Do they scrape all addresses published there, or a few of them (meaning there is some orchestration in saying e.g. an address is to be probed by 3 other containers, think of replicas)?
  4. Can all the information to be scraped public, or do we have an internal trusted chain?

#8

My answer here is always discovery/registration.

So Bob wants Alice to host hist status page.

I think that Bob has to register it’s list of endpoint, probably an array in a json.
And Alice would have to watch that endpoint to rerender a prometheus configuration and restart prom.
Or, when Bob changes his endpoint, he could send a webhook to Alice (which is better for the environement).

In a k8s context, we could also imagine that Alice hosts prom in a k8s cluster. The list of services to monitor is just a CRD, and Bob can modify this CRD himself.