Here is my idea about an all-what-you-need cloud native, resilient, observable, microservice oriented platform running on a Kubernetes cluster.
In my examples all microservices are written in Java, but the platform components are language agnostic and may work with any microservice implementing standard protocols and APIs mentioned below.
Here are the main chapter of this document:
At the time I started to design and implement this project, I had in mind clear goals that the whole platform was supposed to achieve:
This project may represent the whole IT stack of a "small" company or a department of a big organization. The platform is meant to be observable, fault-tolerant, resilient and cloud-provider-agnostic.
After running it, you would mainly need to write the business logic only, everything else is supposed to be there.
By following the standards provided here, your business services will be monitored, observed, resilient, fault-tolerant, etc. A kafka cluster and a relational database are also ready to go to be used. Dashboards and metrics charts will show you their status, workloads, network usage, service-to-service interactions, CPU, memory and disk space usages, etc., etc., etc.
A tech library for (all) the business services
Code repetition, as long as it's about technical features or configuration, is bad. For example, you never want to repeat configuration about circuit breakers, retry patterns, service discovery, etc. Also, you don't want to configure observability, tracing, log streaming, etc. multiple times. That's why I designed a library to be included into each business service meant to provide technical configuration for letting the applications work happily together in a microservice environment. All those features are configured once and got for free from all the business service. Of course there's always a way to apply some customization at service level.
A Helm charts for (all) the business services
All the business services are meant to be deployed and run on a Kubernetes cluster. For each of them you need few kubernetes resources such as a Deployment, a Service, etc. Again here, code repetition would be bad. As you can immagine, such resource definitions would be (almost) the same for all the business services, moreover they are quite verbose to be written so for sure you don't want to copy and paste all of them for each business service.
In order to template and customize such kubernetes resources I deiced to use Helm. But this is not enough in order to avoid code repetition between the services, as you don't want to write the helm charts multiple times. That's why one of the main goal was to have a pipeline able to check out a Helm charts definition to be used for all the business service and maybe only apply some customization if required for the service that's building.
One of the core functionality provided is the pipeline, the main goal here is to bring each commit on a main branch to become a rolling upgrade against the related namespace. In between of course, there are many steps regarding compilation, checks, packaging, distribution, etc.
The twelve-factor app is a methodology for building software-as-a-service apps that:
- Use declarative formats for setup automation, to minimize time and cost for new developers joining the project;
- Have a clean contract with the underlying operating system, offering maximum portability between execution environments;
- Are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration;
- Minimize divergence between development and production, enabling continuous deployment for maximum agility;
- And can scale up without significant changes to tooling, architecture, or development practices.
Here is a small overview about a running instance of the whole project / platform. Yuo can see 3 layers:
- Local machine
- Kubernetes cluster
- Kubernetes namespace
The k8s cluster is reachable through the Ingress component on fixed ports as you can see below. Ingress is linked to the namespace services. From the browser, (on http default port 80) you can reach the Gateway and - through it - all the platform tools UIs, for example: http://k8s.local/nexus, http://k8s.local/zipkin/, http://k8s.local/grafana, etc. Additionally, you can reach the business services as defined in the gateway routes. Also note that such routes can be updated on the fly from Consul UI.
By feature, I mean all the "topics / components" I focused more.
Detailed description can be found in the following DOCs:
- Go Native! 🆕
- Observability
- Platform components
- Java Technical platform library(s)
- Business logic microservices - Use case
- Continuous integration / Continuous deployment (CI / CD)
ℹ️ Please note that digging into this GITHUB project, you will see a bunch of application / repositories. They can be divided into 3 different sets:
- Platform components. I.e. monitoring tools, kafka cluster, database, pipeline, etc. Basically all what you need to run and monitor a distributed system.
- Java Technical platform libraries. Mainly the parent POM and a custom spring boot starter library providing common configuration for business services' high availability, resiliency, rolling update, etc.
- Business logic microservices. Some Spring Boot based java services meant to test the platform features, measuring throughput, resiliency, etc.
🚧 TODO... 🚧
error: error upgrading connection: error dialing backend: tls: failed to verify certificate: x509
Why? Usually, it happens when the laptop is connected to a new local network with a different IP address domain (for example from 198.168.0.X to 198.168.100.X)
Solution:
Run the following commands:
- sudo microk8s.refresh-certs -e server.crt
- sudo microk8s.refresh-certs -e front-proxy-client.crt
- sudo microk8s.refresh-certs -e ca.crt
Possible solution:
sudo tee /etc/udev/rules.d/90-loopback.rules <<EOF
SUBSYSTEM=="block", DEVPATH=="/devices/virtual/block/loop*", ENV{UDISKS_PRESENTATION_HIDE}="1", ENV{UDISKS_IGNORE}=" 1"
EOF
Reference:
canonical/microk8s#500 (comment)
How can I navigate microk8s volumes?
Go to /var/snap/microk8s/common/default-storage
How can I navigate the database schemas?
- From local machine mysql client:
k port-forward svc/mariadb 3306
mysql -u root -P 3306 -h localhost --protocol=TCP --password=MARIADB_PASSWORD
- From the utility mysql-client-pod located in this project /support folder:
k exec -it mysql-client-pod -- sh
mysql -h mariadb --protocol TCP --password=MARIADB_PASSWORD
How can I consume kafka cluster messages ?
From the utility kafka-pod located in this project /support folder:
k exec -it mysql-pod -- bash
kafka-console-consumer.sh --bootstrap-server dan-kafka-cluster-kafka-bootstrap:9092 --topic my-topic-name
-
Jenkinsfile, helm template, etc.
-
Evaluate if replacing the microservice logs streaming against kafka can be avoided in favour of Fluent Bit directly reading them from the containers std output or from k8s nodes if they are stored there as well
- Avoid kafka overhead
- Logs are not lost if Kafka cluster is down
- Remove the streaming logic from the services and maybe increase the performance
- If Fluent Bit is down, logs entries may be lost (as there's no kafka storage in between)
-
- Check this URL: http://k8s.local/pretrade/swagger-ui/index.html
- Wrongly generated url example: http://dan-pretrade-service.dan-ci-cd.svc.cluster.local
- Solutions to be tested: https://stackoverflow.com/questions/60625494/wrong-generated-server-url-in-springdoc-openapi-ui-swagger-ui-deployed-behin
-
Fix pipeline Docker image push in order to choose snapshot or release docker repository according to the POM version
Basically as is for Helm maven plugin
- https://opentelemetry.io/
- https://strimzi.io/documentation/
- https://www.marcobehler.com/guides/graalvm-aot-jit
- https://www.baeldung.com/distributed-systems-observability
- https://spring.io/blog/2022/10/12/observability-with-spring-boot-3
- https://peter.bourgon.org/blog/2017/02/21/metrics-tracing-and-logging.html
- https://docs.spring.io/spring-boot/docs/current/reference/html/native-image.html
- https://www.cloudkarafka.com/blog/apache-kafka-retention-and-segment-size-mistake.html
- https://www.graalvm.org/22.3/reference-manual/native-image/guides/debug-native-image-process/
- Actually many, many more I forgot to report here :)