Kubernetes
Training & Certification Path
CKA (Certified Kubernetes Administrator): https://training.linuxfoundation.org/certification/certified-kubernetes-administrator-cka/
Linux K8S official certification: https://training.linuxfoundation.org/full-catalog/?_sft_product_type=certification&_sft_topic_area=cloud-containers
Kubernetes: https://kubernetes.io/docs/home/
Linux Introduction to K8S: https://training.linuxfoundation.org/training/introduction-to-kubernetes/
Introduction course to Kubernetes: https://www.edx.org/course/introduction-to-kubernetes
Linux Foundation Training on K8S: https://training.linuxfoundation.org/full-catalog/?_sft_product_type=training&_sft_technology=kubernetes
Good introduction course 4h on K8S: https://www.youtube.com/watch?v=X48VuDVv0do
KUBERNETES AND CLOUD NATIVE ESSENTIALS (LFS250)
This wiki page will focus on that course training. Here under are the chapters under that course
Cloud Native Architecture
Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.
The caracteristics of cloud native architecture are:
High level of automation
(A reliable automated system also allows for much easier disaster recovery if you have to rebuild your whole system.)
Self healing (guerison)
Scalable
Just like scaling up your application for high traffic situations, scaling down your application and the usage-based pricing models of cloud providers can save costs if traffic is low
(Cost-) Efficient
Easy to Maintain
Secure by default
Just like scaling up your application for high traffic situations, scaling down your application and the usage-based pricing models of cloud providers can save costs if traffic is low. Patterns like zero trust computing mitigates that by requiring authentication from every user and process.
A very good best practice when developping an application as a service: https://12factor.net
Auto-Scaling
Vertical Scaling (add more CPU/RAM on existing VM) and horizontal scaling (you add more new server to take the load, via balancing)
Serverless
The idea of serverless computing is to relieve developers of these complicated tasks(repare and configure several resources like a network, virtual machines, operating systems and load balancers to run a simple web application). In a nutshell, you can just provide the application code, while the cloud provider chooses the right environment to run your application.
Function as a service (FaaS) is a emerging subclass of serverless -> https://www.youtube.com/watch?v=EOIja7yFScs , allow for a very fast deployment and make for excellent testing and sandbox environments.
Systems like Knative that are built on top of Kubernetes allow to extend existing platforms with serverless computing abilities -> https://knative.dev/docs/
Although there are many advantages to serverless technology, it initially struggled with standardization. Many cloud providers have proprietary offerings that make it difficult to switch between different platforms. To address these problems, the CloudEvents project was founded and provides a specification of how event data should be structured.
Open Standard
While Docker is often used synonymously with container technologies, the community has committed to the open industry standard of the Open Container Initiative (OCI).
Under the umbrella of the Linux Foundation, the Open Container Initiative provides two standards which define the way how to build and run containers. The image-spec defines how to build and package container images. While the runtime-spec specifies the configuration, execution environment and lifecycle of containers. A more recent addition to the OCI project is the Distribution-Spec, which provides a standard for the distribution of content in general and container images in particular.
Other systems like Prometheus or OpenTelemetry evolved and thrived in this ecosystem and provide additional standards for monitoring and observability.
Cloud Native Roles & Site Reliability Engineering
New Cloud natives roles:
Cloud Architect
Responsible for adoption of cloud technologies, designing application landscape and infrastructure, with a focus on security, scalability and deployment mechanisms.
DevOps Engineer
DevOps engineers use tools and processes that balance out software development and operations. Starting with approaches to writing, building, and testing software throughout the deployment lifecycle.
Security Engineer
DevSecOps Engineer
Bridge the 2 previous roles: Devops and Security Engineer in one head
Data Engineer
Data engineers face the challenge of collecting, storing, and analyzing the vast amounts of data that are being or can be collected in large systems.
Full Stack Developer
Site Reliability Engineer (SRE)
The overarching goal of SRE is to create and maintain software that is reliable and scalable. To achieve this, software engineering approaches are used to solve operational problems and automate operation tasks.
To measure performance and reliability, SREs use three main metrics:
Service Level Objectives (SLO): “Specify a target level for the reliability of your service. A goal that is set, for example reaching a service latency of less that 100ms.
Service Level Indicators (SLI): “A carefully defined quantitative measure of some aspect of the level of service that is provided” - For example how long a request actually needs to be answered.
Service Level Agreements (SLA): “An explicit or implicit contract with your users that includes consequences of meeting (or missing) the SLOs they contain. The consequences are most easily recognized when they are financial – a rebate or a penalty – but they can take other forms.” - Answers the question what happens if SLOs are not met.
Community and Governance
Nice visual of community actor: https://landscape.cncf.io
Additional Resources
Cloud Native Architecture
Adoption of Cloud-Native Architecture, Part 1: Architecture Evolution and Maturity, by Srini Penchikala, Marcio Esteves, and Richard Seroter (2019)
5 principles for cloud-native architecture-what it is and how to master it, by Tom Grey (2019)
Well-Architected Framework
Microservices
Microservices, by James Lewis and Martin Fowler
Serverless
The CNCF takes steps toward serverless computing, by Kristen Evans (2018)
Site Reliability Engineering
SRE Book, by Benjamin Treynor Sloss (2017)
DevOps, SRE, and Platform Engineering, by Ivan Velicho (2021)
Container Orchestration
Containers can be used to solve both of these problems, managing the dependencies of an application and running much more efficiently than spinning up a lot of virtual machines.
One of the earliest ancestors of modern container technologies is the chroot command that was introduced in Version 7 Unix in 1979. The chroot command could be used to isolate a process from the root filesystem and basically "hide" the files from the process and simulate a new root directory. The isolated environment is a so-called chroot jail, where the files can’t be accessed by the process, but are still present on the system. chroot directory can be created in different place in the filesystem.
Container technologies that we have today still embody this very concept, but in a modernized version and with a lot of features on top.
To isolate a process even more than chroot can do, current Linux kernels provide features like namespaces and cgroups:
Namespaces are used to isolate various resources
cgroups are used to organize processes in hierarchical groups and assign them resources like memory and CPU. When you want to limit your application container to let’s say 4GB of memory, cgroups are used under the hood to ensure these limits.
The Linux Kernel 5.6 currently provides 8 namespaces:
pid - process ID provides a process with its own set of process IDs.
net - network allows the processes to have their own network stack, including the IP address.
mnt - mount abstracts the filesystem view and manages mount points.
ipc - inter-process communication provides separation of named shared memory segments.
user - provides process with their own set of user IDs and group IDs.
uts - Unix time sharing allows processes to have their own hostname and domain name.
cgroup - a newer namespace that allows a process to have its own set of cgroup root directories.
time - the newest namespace can be used to virtualize the clock of the system.
Launched in 2013, Docker became synonymous with building and running containers. Although Docker did not invent the technologies that are used to run containers, they stitched together existing technologies in a smart way to make containers more user friendly and accessible.
While virtual machines emulate a complete machine, including the operating system and a kernel, containers share the kernel of the host machine and, as explained, are only isolated processes.
Virtual machines come with some overhead, be it boot time, size or resource usage to run the operating system. Containers on the other hand are literally processes, like the browser you can start on your machine, therefore they start a lot faster and have a smaller footprint.
Reminder:
Operating system is system program that runs on the computer to provide an interface to the computer user so that they can easily operate on the computer.
Kernel is also a system program that controls all programs running on the computer. Kernel is basically a bridge between software and hardware of the system.
Running Containers
The OCI (Open Container Intiative) Run time spec: https://github.com/opencontainers/runtime-spec. There is a reference implementation of it:https://github.com/opencontainers/runc, that ref implementation is used by the docker product
The runtime-spec goes hand in hand with the image-spec, which we will cover in the next chapter, since it describes how to unpack a container image and then manage the complete container lifecycle, from creating the container environment, to starting the process, stopping and deleting it.
For your local machine, there are plenty of alternatives to choose from, some of which are only for building images like buildah or kaniko, while others line up as full alternatives to Docker, like podman does.
Documentation on Docker: https://docs.docker.com
To search for a public docker image: https://hub.docker.com/search?q=
Building Container image
Container images are what makes containers portable and easy to reuse on a variety of systems. Docker describes a container image as following:
“A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.”
In 2015, the image format made popular by Docker was donated to the newly founded Open Container Initiative and is known also as the OCI image-spec that can be found on GitHub. Images consist of a filesystem bundle and metadata.
From Code -> layer + image index + config
Build an image
Images can be built by reading the instructions from a buildfile called Dockerfile.
Example of a dockerfile here under
Other useful commands
Example Exercice to build a new Docker image
Go to https://github.com/docker/getting-started
Security
Sysdig has a great blog article on how to avoid a lot of security issues and build secure container images.
The 4C cloud security: https://kubernetes.io/docs/concepts/security/overview/
Container Orchestration Fundamentals
Problems to be solved can include:
Providing compute resources like virtual machines where containers can run on
Schedule containers to servers in an efficient way
Allocate resources like CPU and memory to containers
Manage the availability of containers and replace them if they fail
Scale containers if load increases
Provide networking to connect containers together
Provision storage if containers need to persist data.
Container orchestration systems provide a way to build a cluster of multiple servers and host the containers on top. Most container orchestration systems consist of two parts: a control plane that is responsible for the management of the containers and worker nodes that actually host the containers.
Networking
Most modern implementations of container networking are based on the Container Network Interface (CNI). CNI is a standard that can be used to write or configure network plugins and makes it very easy to swap out different plugins in various container orchestration platforms.
Service Discovery & DNS
The solution to the problem again is automation. Instead of having a manually maintained list of servers (or in this case containers), all the information is put in a Service Registry. Finding other services in the network and requesting information about them is called Service Discovery.
Approaches to Service Discovery:
DNS: Modern DNS servers that have a service API can be used to register new services as they are created.
Key-Value: Using a strongly consistent datastore especially to store information about services. A lot of systems are able to operate highly available with strong failover mechanisms. Popular choices, especially for clustering, are etcd, Consul or Apache Zookeeper.
For info:
etcd: https://github.com/etcd-io/etcd, etcd is a distributed reliable key-value store for the most critical data of a distributed system, with a focus on being:
Simple: well-defined, user-facing API (gRPC)
Secure: automatic TLS with optional client cert authentication
Fast: benchmarked 10,000 writes/sec
Reliable: properly distributed using Raft
etcd is written in Go and uses the Raft consensus algorithm to manage a highly-available replicated log.
Consul from Hashicorp: https://www.consul.io, A modern service networking solution requires that we answer four specific questions: Where are my services running? How do I secure the communication between them? How do I automate routine networking tasks? How do I control access to my environments?
Apache Zookeper: https://zookeeper.apache.org, ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications.
Service Mesh
Because the networking is such a crucial part of microservices and containers, the networking can get very complex and opaque for developers and administrators. In addition to that, a lot of functionality like monitoring, access control or encryption of the networking traffic is desired when containers communicate with each other.
Instead of implementing all of this functionality into your application, you can just start a second container that has this functionality implemented. The software you can use to manage network traffic is called a proxy. This is a server application that sits between a client and server and can modify or filter network traffic before it reaches the server. Popular representatives are nginx, haproxy or envoy.
Taking this idea a step further, a service mesh adds a proxy server to every container that you have in your architecture.
When a service mesh is used, applications don’t talk to each other directly, but the traffic is routed through the proxies instead. The most popular service meshes at the moment are istio and linkerd. While they have differences in implementation, the architecture is the same.
The proxies in a service mesh form the data plane. This is where networking rules are implemented and shape the traffic flow.
These rules are managed centrally in the control plane of the service mesh. This is where you define how traffic flows from service A to service B and what configuration should be applied to the proxies.
Istio service mesh: https://istio.io/v1.10/docs/ops/deployment/architecture/
Istio uses an extended version of the Envoy proxy. Envoy is a high-performance proxy developed in C++ to mediate all inbound and outbound traffic for all services in the service mesh. Envoy proxies are the only Istio components that interact with data plane traffic.Envoy proxies are deployed as sidecars to services.
The Service Mesh Interface (SMI) project aims at defining a specification on how a service mesh from various providers can be implemented. With a strong focus on Kubernetes, their goal is to standardize the end user experience for service meshes, as well as a standard for the providers that want to integrate with Kubernetes. You can find the current specification on GitHub.
Envoy: https://www.envoyproxy.io
SMI: Service Mesh Interface specification: https://smi-spec.io, https://github.com/servicemeshinterface/smi-spec
Storage
In order to keep up with the unbroken growth of various storage implementations, again, the solution was to implement a standard. The Container Storage Interface (CSI) came up to offer a uniform interface which allows attaching different storage systems no matter if it’s cloud or on-premises storage.
Additional Resources
The History of Containers
A Brief History of Containers: From the 1970s Till Now, by Rani Osnat (2020)
It's Here: Docker 1.0, by Julien Barbier (2014)
Chroot
Container Performance
Container Performance Analysis at DockerCon 2017, by Brendan Gregg
Best Practices on How to Build Container Images
Top 20 Dockerfile Best Practices, by Álvaro Iradier (2021)
3 simple tricks for smaller Docker images, by Daniele Polencic (2019)
Alternatives to Classic Dockerfile Container Building
Buildpacks vs Jib vs Dockerfile: Comparing containerization methods, by James Ward (2020)
Service Discovery
Service Discovery in a Microservices Architecture, by Chris Richardson (2015)
Container Networking
Kubernetes Networking Part 1: Networking Essentials, By Simon Kurth (2021)
Life of a Packet (I), by Michael Rubin (2017)
Computer Networking Introduction - Ethernet and IP (Heavily Illustrated), by Ivan Velichko (2021)
Container Storage
Managing Persistence for Docker Containers, by Janakiram MSV (2016)
Container and Kubernetes Security
Secure containerized environments with updated thread matrix for Kubernetes, by Yossi Weizman (2021)
Docker Container Playground
Kubernetes Fundamental
Originally designed and developed by Google, Kubernetes got open-sourced in 2014, and along the release v1.0 Kubernetes was donated to the newly formed Cloud Native Computing Foundation as the very first project. A lot of cloud native technologies evolve around Kubernetes, be it low-level tools like container runtimes, monitoring or application delivery tools.
Kubernetes Architecture
Kubernetes is often used as a cluster, meaning that it is spanned across multiple servers that work on different tasks and to distribute the load of a system.
From a high-level perspective, Kubernetes clusters consist of two different server node types that make up a cluster:
Control plane node(s) These are the brains of the operation. Control plane nodes contain various components which manage the cluster and control various tasks like deployment, scheduling and self-healing of containerized workloads.
Worker nodes The worker nodes are where applications run in your cluster. This is the only job of worker nodes and they don’t have any further logic implemented. Their behavior, like if they should start a container, is completely controlled by the control plane node.
Similar to a microservice architecture you would choose for your own application, Kubernetes incorporates multiple smaller services that need to be installed on the nodes.
Control plane nodes typically host the following services:
kube-apiserver
This is the centerpiece of Kubernetes. All other components interact with the api-server and this is where users would access the cluster.
etcd
A database which holds the state of the cluster. etcd is a standalone project and not an official part of Kubernetes.
kube-scheduler
When a new workload should be scheduled, the kube-scheduler chooses a worker node that could fit, based on different properties like CPU and memory.
kube-controller-manager
Contains different non-terminating control loops that manage the state of the cluster. For example, one of these control loops can make sure that a desired number of your application is available all the time.
cloud-controller-manager (optional)
Can be used to interact with the API of cloud providers, to create external resources like load balancers, storage or security groups.
Components of worker nodes:
container runtime
The container runtime is responsible for running the containers on the worker node. For a long time, Docker was the most popular choice, but is now replaced in favor of other runtimes like containerd.
kubelet
A small agent that runs on every worker node in the cluster. The kubelet talks to the api-server and the container runtime to handle the final stage of starting containers.
kube-proxy
A network proxy that handles inside and outside communication of your cluster. Instead of managing traffic flow on it’s own, the kube-proxy tries to rely on the networking capabilities of the underlying operating system if possible.
Kubernetes also has a concept of namespaces, which are not to be confused with kernel namespaces that are used to isolate containers. A Kubernetes namespace can be used to divide a cluster into multiple virtual clusters, which can be used for multi-tenancy when multiple teams share a cluster. Please note that Kubernetes namespaces are not suitable for strong isolation and should more be viewed like a directory on a computer where you can organize objects and manage which user has access to which folder.
containerd is an industry standard container runtime: https://containerd.io
Setup Kubernetes
Setting up a Kubernetes cluster can be achieved with a lot of different methods. Creating a test "cluster" can be very easy with the right tools:
If you want to set up a production-grade cluster on your own hardware or virtual machines, you can choose one of the various installers:
A few vendors started packaging Kubernetes into a distribution and even offer commercial support:
The distributions often choose an opinionated approach and offer additional tools while using Kubernetes as the central piece of their framework.
If you don’t want to install and manage it yourself, you can consume it from a cloud provider:
You can learn how to set up your own Kubernetes cluster with Minikube in this interactive tutorial.
Different Kubectl usefull command
To install a full prod cluster on several node (so create a AKS like cluster) -> https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
Need to install an overlay network for the cluster (Calico for example): https://projectcalico.docs.tigera.io/about/about-calico
Kubernetes API
The Kubernetes API is the most important component of a Kubernetes cluster. Without it, communication with the cluster is not possible, every user and every component of the cluster itself needs the api-server.
Access Control Overview, retrieved from the Kubernetes documentation
Before a request is processed by Kubernetes, it has to go through three stages:
Authentication The requester needs to present a means of identity to authenticate against the API. Commonly done with a digital signed certificate (X.509) or with an external identity management system. Kubernetes users are always externally managed. Service Accounts can be used to authenticate technical users.
Authorization It is decided what the requester is allowed to do. In Kubernetes this can be done with Role Based Access Control (RBAC).
Admission Control In the last step, admission controllers can be used to modify or validate the request. For example, if a user tries to use a container image from an untrustworthy registry, an admission controller could block this request. Tools like the Open Policy Agent can be used to manage admission control externally.
Like many other APIs, the Kubernetes API is implemented as a RESTful interface that is exposed over HTTPS. Through the API, a user or service can create, modify, delete or retrieve resources that reside in Kubernetes.
Running containers in K8S
In Kubernetes, instead of starting containers directly, you define Pods as the smallest compute unit and Kubernetes translates that into a running container. We will learn more about Pods later, for now imagine it as a wrapper around a container.
In an effort to allow using other container runtimes than Docker, Kubernetes introduced the Container Runtime Interface (CRI) in 2016.
Container runtime:
containerd
CRI-O
Docker: The standard for a long time, but never really made for container orchestration. The usage of Docker as the runtime for Kubernetes has been deprecated and will be removed in Kubernetes 1.23. Kubernetes has a great blog article that answers all the questions on the matter.
Networking
Kubernetes distinguishes between four different networking problems that need to be solved:
Container-to-Container communications This can be solved by the Pod concept as we'll learn later.
Pod-to-Pod communications This can be solved with an overlay network.
Pod-to-Service communications It is implemented by the kube-proxy and packet filter on the node.
External-to-Service communications It is implemented by the kube-proxy and packet filter on the node.
There are different ways to implement networking in Kubernetes, but also three important requirements:
All pods can communicate with each other across nodes.
All nodes can communicate with all pods.
No Network Address Translation (NAT).
To implement networking, you can choose from a variety of network vendors like:
In Kubernetes, every Pod gets its own IP address, so there is no manual configuration involved. Moreover, most Kubernetes setups include a DNS server add-on called core-dns, which can provide service discovery and name resolution inside the cluster.
Scheduling
In its most basic form, scheduling is a sub-category of container orchestration and describes the process of automatically choosing the right (worker) node to run a containerized workload on. In a Kubernetes cluster, the kube-scheduler is the component that makes the scheduling decision, but is not responsible for actually starting the workload. The scheduling process in Kubernetes always starts when a new Pod object is created. Remember that Kubernetes is using a declarative approach, where the Pod is only described first, then the scheduler selects a node where the Pod actually will get started by the kubelet and the container runtime.
The scheduler will use that information to filter all nodes that fit these requirements. If multiple nodes fit the requirements equally, Kubernetes will schedule the Pod on the node with the least amount of Pods. This is also the default behavior if a user has not specified any further requirements.
Additional Resources
Kubernetes history and the Borg Heritage
From Google to the world: The Kubernetes origin story, by Craig McLuckie (2016)
Large-scale cluster management at Google with Borg, by Abhishek Verma, Luis Pedrosa, Madhukar R. Korupolu, David Oppenheimer, Eric Tune, John Wilkes (2015)
Kubernetes Architecture
RBAC
Demystifying RBAC in Kubernetes, by Kaitlyn Barnard
Container Runtime Interface
Kubernetes networking and CNI
Internals of Kubernetes Scheduling
A Deep Dive into Kubernetes Scheduling, by Ron Sobol (2020)
Kubernetes Security Tools
Kubernetes Playground
Working with K8S
One of the core concepts of Kubernetes is providing a lot of mostly abstract resources, also called objects, that you can use to describe how your workload should be handled. Some of them are used to handle problems of container orchestration, like scheduling and self-healing, others are there to solve some inherent problems of containers.
Kubernetes objects can be distinguished between workload-oriented objects that are used for handling container workloads and infrastructure-oriented objects, that for example handle configuration, networking and security. Some of these objects can be put into a namespace, while others are available across the whole cluster.
As a user, we can describe these objects in the popular data-serialization language YAML and send them to the api-server, where they get validated before they are created.
Other tools for interaction with Kubernetes:
There are also advanced tools that allow the creation of templates and the packaging of Kubernetes objects. Probably the most frequently used tool in connection with Kubernetes today is Helm.
Helm is a package manager for Kubernetes, which allows easier updates and interaction with objects. Helm packages Kubernetes objects in so-called Charts, which can be shared with others via a registry. To get started with Kubernetes, you can search the ArtifactHub to find your favorite software packages, ready to deploy.
POD
A pod describes a unit of one or more containers that share an isolation layer of namespaces and cgroups. It is the smallest deployable unit in Kubernetes, which also means that Kubernetes is not interacting with containers directly. The pod concept was introduced to allow running a combination of multiple processes that are interdependent. All containers inside a pod share an IP address and can share via the filesystem.
You could add as many containers to your main application as you want. But be careful since you lose the ability to scale them individually! Using a second container that supports your main application is called a sidecar container.
All containers defined are started at the same time with no ordering, but you also have the ability to use initContainers to start containers before your main application starts.
Some examples of important settings that can be set for every container in a Pod are:
resources: Set a resource request and a maximum limit for CPU and Memory.
livenessProbe: Configure a health check that periodically checks if your application is still alive. Containers can be restarted if the check fails.
securityContext: Set user & group settings, as well as kernel capabilities.
For more detailed info on POD: https://kubernetes.io/docs/concepts/workloads/pods/
Workload objects
Working just with Pods would not be flexible enough in a container orchestration platform. For example, if a Pod is lost because a node failed, it is gone forever. To make sure that a defined number of Pod copies runs all the time, we can use controller objects that manage the pod for us.
Kubernetes objects:
ReplicaSet: A controller object that ensures a desired number of pods is running at any given time. ReplicaSets can be used to scale out applications and improve their availability. They do this by starting multiple copies of a pod definition.
Deployment: The most feature-rich object in Kubernetes. A Deployment can be used to describe the complete application lifecycle, they do this by managing multiple ReplicaSets that get updated when the application is changed by providing a new container image, for example. Deployments are perfect to run stateless applications in Kubernetes.
StatefulSet: Considered a bad practice for a long time, StatefulSets can be used to run stateful applications like databases on Kubernetes. Stateful applications have special requirements that don't fit the ephemeral nature of pods and containers. In contrast to Deployments, StatefulSets try to retain IP addresses of pods and give them a stable name, persistent storage and more graceful handling of scaling and updates.
DaemonSet: Ensures that a copy of a Pod runs on all (or some) nodes of your cluster. DaemonSets are perfect to run infrastructure-related workload, for example monitoring or logging tools.
Job: Creates one or more Pods that execute a task and terminate afterwards. Job objects are perfect to run one-shot scripts like database migrations or administrative tasks.
CronJob: CronJobs add a time-based configuration to jobs. This allows running Jobs periodically, for example doing a backup job every night at 4am.
! By default they are visible from other pods and services within the same kubernetes cluster, but not outside that network. The kubectl
command can create a proxy that will forward communications into the cluster-wide, private network.
Networking objects
Since a lot of Pods would require a lot of manually network configuration, we can use Service and Ingress objects to define and abstract networking.
Services can be used to expose a set of pods as a network service. Type of Services:
ClusterIP: The most common service type. A ClusterIP is a virtual IP inside Kubernetes that can be used as a single endpoint for a set of pods. This service type can be used as a round-robin load balancer.
NodePort: The NodePort service type extends the ClusterIP by adding simple routing rules. It opens a port (default between 30000-32767) on every node in the cluster and maps it to the ClusterIP. This service type allows routing external traffic to the cluster.
LoadBalancer: The LoadBalancer service type extends the NodePort by deploying an external LoadBalancer instance. This will only work if you’re in an environment that has an API to configure a LoadBalancer instance, like GCP, AWS, Azure or even OpenStack.
EternalName: A special service type that has no routing whatsoever. ExternalName is using the Kubernetes internal DNS server to create a DNS alias. You can use this to create a simple alias to resolve a rather complicated hostname like: my-cool-database-az1-uid123.cloud-provider-i-like.com. This is especially useful if you want to reach external resources from your Kubernetes cluster.
Ingress provides a means to expose HTTP and HTTPS routes from outside of the cluster for a service within the cluster. It does this by configuring routing rules that a user can set and implement with an ingress controller. Ingress exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. Traffic routing is controlled by rules defined on the Ingress resource.
An Ingress does not expose arbitrary ports or protocols. Exposing services other than HTTP and HTTPS to the internet typically uses a service of type Service.Type=NodePort orService.Type=LoadBalancer.
Ingress documentation: https://kubernetes.io/docs/concepts/services-networking/ingress/
Standard features of ingress controllers may include:
LoadBalancing
TLS offloading/termination
Name-based virtual hosting
Path-based routing
A lot of ingress controllers even provide more features, like:
Redirects
Custom errors
Authentication
Session affinity
Monitoring
Logging
Weighted routing
Rate limiting.
Kubernetes also provides a cluster internal firewall with the NetworkPolicy concept. NetworkPolicies are a simple IP firewall (OSI Layer 3 or 4) that can control traffic based on rules. You can define rules for incoming (ingress) and outgoing traffic (egress). A typical use case for NetworkPolicies would be restricting the traffic between two different namespaces.
The set of Pods targeted by a Service is usually determined by a LabelSelector
Services match a set of Pods using labels and selectors, a grouping primitive that allows logical operation on objects in Kubernetes. Labels are key/value pairs attached to objects and can be used in any number of ways:
Designate objects for development, test, and production
Embed version tags
Classify an object using tags
Volume and storage object
Containers already had the concept of mounting volumes, but since we’re not working with containers directly, Kubernetes made volumes part of a Pod, just like containers are. ceph storage solution: https://rook.io/docs/rook/v1.7/ceph-storage.html
Configuration object
In Kubernetes, this problem is solved by decoupling the configuration from the Pods with a ConfigMap. ConfigMaps can be used to store whole configuration files or variables as key-value pairs. There are two possible ways to use a ConfigMap:
Mount a ConfigMap as a volume in Pod
Map variables from a ConfigMap to environment variables of a Pod.
Right from the beginning Kubernetes also provided an object to store sensitive information like passwords, keys or other credentials. These objects are called Secrets. Secrets are very much related to ConfigMaps and basically their only difference is that secrets are base64 encoded.
There is an on-going debate about the risk of using Secrets, since their - in contrast to their name - not considered secure. In cloud-native environments purpose-built secret management tools have emerged that integrate very well with Kubernetes. One example would be HashiCorp Vault.
AutoScaling Object
Auto scaling Mechanisms:
Horizontal Pod Autoscaler (HPA)
Horizontal Pod Autoscaler (HPA) is the most used autoscaler in Kubernetes. The HPA can watch Deployments or ReplicaSets and increase the number of Replicas if a certain threshold is reached. Imaging your Pod can use 500MiB of memory and you configured a threshold of 80%.
Cluster Autoscaler
Of course, there is no point in starting more and more Replicas of Pods, if the Cluster capacity is fixed. The Cluster Autoscaler can add new worker nodes to the cluster if the demand increases. The Cluster Autoscaler works great in tandem with the Horizontal Autoscaler.
Vertical Pod Autoscaler
The Vertical Pod Autoscaler is relatively new and allows Pods to increase the resource requests and limits dynamically. As we discussed earlier, vertical scaling is limited by the node capacity
Unfortunately, (horizontal) autoscaling in Kubernetes is NOT available out of the box and requires installing an add-on called metrics-server.
Other third party integration to manage scaling in Kubernetes with Pods metrics:
KEDA for Kubernetes-based Event Driven Autoscaler and was started in 2019 as a partnership between Microsoft and Red Hat.
Additional Resources
Differences between Containers and Pods
What are Kubernetes Pods Anyway?, by Ian Lewis (2017)
Containers vs. Pods - Taking a Deeper Look, by Ivan Velichko (2021)
kubectl tips & tricks
Storage and CSI in Kubernetes
Container Storage Interface (CSI) for Kubernetes GA, by Saad Ali (2019)
Kubernetes Storage: Ephemeral Inline Volumes, Volume Cloning, Snapshots and more!, by Henning Eggers (2020)
Autoscaling in Kubernetes
Architecting Kubernetes clusters - choosing the best autoscaling strategy, by Daniele Polencic (2021)
Cloud Native Application Delivery
Application Delivery Fundamentals
In 2005 Linus Torvalds created Git, which is the standard version control system that almost everybody is using today. Git is a decentralized system that can be used to track changes in your source code. In essence, Git can work with copies of the code, so called in branches or forkswhere you can work in, before your changes get merged back in a main branch.
Git Doc: https://git-scm.com
If your target platform is Kubernetes, you can write a YAML file to deploy your application while your newly built container image can be pushed to a container registry, where Kubernetes will download it for you.
To make full use of cloud resources, the principle of Infrastructure as Code (IaC) became popular. Instead of installing infrastructure manually, you describe it in files and use the cloud vendors' API to set up your infrastructure. This allows developers to be more involved in the setup of the infrastructure.
CI-CD
Automation is the key to overcoming these barriers, and today we know and use the principles of Continuous Integration/Continuous Delivery (CI/CD), which describe the different steps in the deployment of an application, configuration or even infrastructure.
Continuous Integration is the first part of the process and describes the permanent building and testing of the written code. High automation and usage of version control allows multiple developers and teams to work on the same code base.
Continuous Delivery is the second part of the process and automates the deployment of the pre-built software. In cloud environments, you will often see that software is deployed to Development or Staging environments, before it gets released and delivered to a production system.
To automate the whole workflow, you can use a CI/CD pipeline, which is actually nothing more than the scripted form of all the steps involved, running on a server or even in a container. Pipelines should be integrated with a version control system that manages changes to the code base. Every time a new revision of your code is ready to be deployed, the pipeline starts to execute scripts that build your code, run tests, deploy them to servers and even perform security and compliance checks.
Popular CI/CD tools include:
Argo.
Very nice free training on Devops Intro: https://training.linuxfoundation.org/training/introduction-to-devops-and-site-reliability-engineering-lfs162/
GitOps
Infrastructure as Code was a real revolution in increasing the quality and speed of providing infrastructure, and it works so well that today, configuration, network, policies, or security can be described as code, and often even live in the same repository as the software.
GitOps takes the idea of Git as the single source of truth a step further and integrates the provisioning and change process of infrastructure with version control operations.
There are two different approaches how a CI/CD pipeline can implement the changes you want to make:
Push-based The pipeline is started and runs tools that make the changes in the platform. Changes can be triggered by a commit or merge request.
Pull-based An agent watches the git repository for changes and compares the definition in the repository with the actual running state. If changes are detected, the agent applies the changes to the infrastructure.
Two examples of popular GitOps frameworks that use the pull-based approach are Flux and ArgoCD. ArgoCD is implemented as a Kubernetes controller, while Flux is built with the GitOps Toolkit, a set of APIs and controllers that can be used to extend Flux, or even build a custom delivery platform.
ArgoCD Architecture, retrieved from the ArgoCD documentation
Very nice training on the subject: To learn more about GitOps in action and the usage of ArgoCD and Flux, consider enrolling for the free course Introduction To GitOps (LFS169).
Additional Resources
10 Deploys Per Day - Start of the DevOps movement at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr, by John Allspaw and Paul Hammond
Learn git in a playful way
Infrastructure as Code
Beginners guide to CI/CD
- 10 Deploys Per Day - Start of the DevOps movement at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr, by John Allspaw and Paul Hammond
Learn git in a playful wayInfrastructure as CodeBeginners guide to CI/CD
Cloud Native Observability
Conventional monitoring for servers may include collecting basic metrics of the system like CPU and memory resource usage and logging of processes and the operating system. A new challenge for a microservice architecture is monitoring requests that move through a distributed system. That discipline is called tracing and is especially useful when a lot of services are involved in answering a request.
We will learn how container infrastructure is still relying on collecting metrics and logs, but changes the requirements quite a bit. There is a lot more focus on network problems like latency, throughput, retrying of requests or application start time, while the sheer volume of metrics, logs, and traces in distributed systems calls for a different approach to managing these systems.
Observability
The higher goal of observability is to allow analysis of the collected data. This helps to get a better understanding of the system and react to error states. The term observability is closely related to the control theory which deals with behavior of dynamic systems.
Telemetry
In container systems, each and every application should have tools built in that generate information data, which is then collected and transferred in a centralized system. The data can be divided into three categories:
Logs
These are messages that are emitted from an application when errors, warnings or debug information should be presented. A simple log entry could be the start and end of a specific task that the application performed.
Metrics
Metrics are quantitative measurements taken over time. This could be the number of requests or an error rate.
Traces
They track the progression of a request while it’s passing through the system. Traces are used in a distributed system that can provide information about when a request was processed by a service and how long it took.
Logging
Application frameworks and programming languages come with extensive logging tools built-in, which makes it very easy to log to a file with different log levels based on the severity of the log message.
Unix and Linux programs provide three I/O streams from which two can be used to output logs from a container:
standard input (stdin): Input to a program e.g. via keyboard
standard output (stdout): The output a program writes on the screen
standard error (stderr): Errors that a program writes on the screen
If you want to learn more about I/O streams and how they originated, make sure to visit the stdin(3) - Linux manual page.
The documentation of the kubectl logs command provides some examples.
To ship the logs, different methods can be used:
Node-level logging The most efficient way to collect logs. An administrator configures a log shipping tool that collects logs and ships them to a central store.
Logging via sidecar container The application has a sidecar container that collects the logs and ships them to a central store.
Application-level logging The application pushes the logs directly to the central store. While this seems very convenient at first, it requires configuring the logging adapter in every application that runs in a cluster.
There are several tools to choose from to ship and store the logs. The first two methods can be done by tools like fluentd or filebeat.
Popular choices to store logs are OpenSearch or Grafana Loki. To find more datastores, you can visit the fluentd documentation on possible log targets.
To make logs easy to process and searchable make sure you log in a structured format like JSON instead of plaintext. The major cloud vendors provide good documentation on the importance of structured logging and how to implement it:
For more infomation on ElasticSearch -> https://www.elastic.co/what-is/elasticsearch
Prometheus
Prometheus is an open source monitoring system, originally developed at SoundCloud, which became the second CNCF hosted project in 2016. Over time, it became a very popular monitoring solution and is now a standard tool that integrates especially well in the Kubernetes and container ecosystem.
Prometheus can collect metrics that were emitted by applications and servers as time series data - these are very simple sets of data that include a timestamp, label and the measurement itself. The Prometheus data model provides four core metrics:
Counter: A value that increases, like a request or error count
Gauge: Values the increase or decrease, like memory size
Histogram: A sample of observations, like request duration or response size
Summary: Similar to a histogram, but also provides the total count of observations.
Monitoring only makes sense if you use the data collected. The most used companion for Prometheus is Grafana, which can be used to build dashboards from the collected metrics. You can use Grafana for many more data sources and not only Prometheus, although that is the most used one.
You can also use one of the many unofficial client libraries listed in the Prometheus documentation.
Here are some examples taken from the Prometheus documentation.
Another tool from the Prometheus ecosystem is the Alertmanager. The Prometheus server itself allows you to configure alerts when certain metrics reach or pass a threshold. When the alert is firing, Alertmanager can send a notification out to your favorite persistent chat tool, e-mail or specialized tools that are made for alerting and on-call management.
Tracing
Logging and Monitoring with the collection of metrics are not particularly new methods. The same thing cannot be said for (distributed) tracing. Metrics and logs are essential and can give a good overview of individual services, but to understand how a request is processed in a microservice architecture, traces can be of good use.
A trace describes the tracking of a request while it passes through the services. A trace consists of multiple units of work which represent the different events that occur while the request is passing the system. Each application can contribute a span to the trace, which can include information like start and finish time, name, tags or a log message.
These traces can be stored and analyzed in a tracing system like Jaeger.
While tracing was a new technology and method that was geared towards cloud native environments, there were again problems in the area of standardization. In 2019, the OpenTracing and OpenCensus projects merged to form the OpenTelemetry project, which is now also a CNCF project.
OpenTelemetry is a set of application programming interfaces (APIs), software development kits (SDKs) and tools that can be used to integrate telemetry such as metrics, protocols, but especially traces into applications and infrastructures. The OpenTelemetry clients can be used to export telemetry data in a standardized format to central platforms like Jaeger. Existing tools can be found in the OpenTelemetry documentation.
Cost Management
All these methods can be combined to be more cost-efficient. It is usually no problem to mix on-demand, reserved and spot instances:
Identify Wasted and unused resources
With a good monitoring of your resource usage, it is very easy to find unused resources or servers that don’t have a lot of idle time. A lot of cloud vendors have cost explorers that can break down costs for individual services. Autoscaling helps to shut down instances that are not needed.
Right-Sizing
When you start out, it can be a good idea to choose servers and systems with a lot more power than actually needed. Again, good monitoring can give you indications over time of how much resources are actually needed for your application. This is an ongoing process where you should always adapt to the load you really need. Don’t buy powerful machines if you only need half of their capacity.
Reserverd Instances
On-demand pricing models are great if you really need resources on-demand. Otherwise, you’re probably paying a lot for the "on-demand" service. A method to save a lot of money is to reserve resources and even pay for them upfront. This is a great pricing model if you have a good estimate about the resources you need, maybe even for years in advance.
Spot Instances
If you have a batch job or heavy load for a short amount of time, you can use spot instances to save money. The idea of spot instances is that you get unused resources that have been over-provisioned by the cloud vendor for very low prices. The "problem" is that these resources are not reserved for you, and can be terminated on short notice to be used by someone else paying "full price".
Additional Resources
Cloud Native Observability
Prometheus
Prometheus Cheat Sheet - Basics (Metrics, Labels, Time Series, Scraping), by Ivan Velichko (2021)
Prometheus at scale
Logging for Containers
Use the native logging mechanisms of containers (Google Cloud)
Right-Sizing and cost optimization
Right Sizing (Amazon AWS)
Cloud cost optimization: principles for lasting success (Google Cloud)
Last updated
Was this helpful?