autoscaling Archives - Piotr's TechBlog https://piotrminkowski.com/tag/autoscaling/ Java, Spring, Kotlin, microservices, Kubernetes, containers Tue, 18 Jan 2022 14:59:00 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.1 https://i0.wp.com/piotrminkowski.com/wp-content/uploads/2020/08/cropped-me-2-tr-x-1.png?fit=32%2C32&ssl=1 autoscaling Archives - Piotr's TechBlog https://piotrminkowski.com/tag/autoscaling/ 32 32 181738725 Autoscaling on Kubernetes with KEDA and Kafka https://piotrminkowski.com/2022/01/18/autoscaling-on-kubernetes-with-keda-and-kafka/ https://piotrminkowski.com/2022/01/18/autoscaling-on-kubernetes-with-keda-and-kafka/#comments Tue, 18 Jan 2022 14:58:54 +0000 https://piotrminkowski.com/?p=10475 In this article, you will learn how to autoscale your application that consumes messages from the Kafka topic with KEDA. The full name that stands behind that shortcut is Kubernetes Event Driven Autoscaling. In order to explain the idea behind it, I will create two simple services. The first of them is sending events to […]

The post Autoscaling on Kubernetes with KEDA and Kafka appeared first on Piotr's TechBlog.

]]>
In this article, you will learn how to autoscale your application that consumes messages from the Kafka topic with KEDA. The full name that stands behind that shortcut is Kubernetes Event Driven Autoscaling. In order to explain the idea behind it, I will create two simple services. The first of them is sending events to the Kafka topic, while the second is receiving them. We will run both these applications on Kubernetes. To simplify the exercise, we may use Spring Cloud Stream, which offers a smart integration with Kafka.

Architecture

Before we start, let’s take a moment to understand our scenario for today. We have a single Kafka topic used by both our applications to exchange events. This topic consists of 10 partitions. There is also a single instance of the producer that sends events at regular intervals. We are going to scale down and scale up the number of pods for the consumer service. All the instances of the consumer service are assigned to the same Kafka consumer group. It means that only a single instance with the group may receive the particular event.

Each consumer instance has only a single receiving thread. Therefore, we can easily simulate an event processing time. We will sleep the main thread for 1 second. On the other hand, the producer will send events with a variable speed. Also, it will split the messages across all available partitions. Such behavior may result in consumer lag on partitions because Spring Cloud Stream commits offset only after handling a message. In our case, the value of lag depends on producer speed and the number of running consumer instances. To clarify let’s take a look at the diagram below.

keda-kafka-arch1

Our goal is very simple. We need to adjust the number of consumer instances to the traffic rate generated by the producer service. The value of offset lag can’t exceed the desired threshold. If we increase the traffic rate on the producer side KEDA should scale up the number of consumer instances. Consequently, if we decrease the producer traffic rate it should scale down the number of consumer instances. Here’s the diagram with our scenario.

keda-kafka-arch2

Use Kafka with Spring Cloud Stream

In order to use Spring Cloud Stream for Kafka, we just need to include a single dependency in Maven pom.xml:

<dependency>
  <groupId>org.springframework.cloud</groupId>
  <artifactId>spring-cloud-starter-stream-kafka</artifactId>
</dependency>

After that, we can use a standard Spring Cloud Stream model. However, in the background, it integrates with Kafka through a particular binder implementation. I will not explain the details, but if you are interested please read the following article. It explains the basics at the example of RabbitMQ.

Both our applications are very simple. The producer just generates and sends events (by default in JSON format). The only thing we need to do in the code is to declare the Supplier bean. In the background, there is a single thread that generates and sends CallmeEvent every second. Each time it only increases the id field inside the message:

@SpringBootApplication
public class ProducerApp {

   private static int id = 0;

   public static void main(String[] args) {
      SpringApplication.run(ProducerApp.class, args);
   }

   @Bean
   public Supplier<CallmeEvent> eventSupplier() {
      return () -> new CallmeEvent(++id, "Hello" + id, "PING");
   }

}

We can change a default fixed delay between the Supplier ticks with the following property. Let’s say we want to send an event every 100 ms:

spring.cloud.stream.poller.fixedDelay = 100

We should also provide basic configuration like the Kafka address, topic name (if different than the name of the Supplier function), number of partitions, and a partition key. Spring Cloud Stream automatically creates topics on application startup.

spring.cloud.stream.bindings.eventSupplier-out-0.destination = test-topic
spring.cloud.stream.bindings.eventSupplier-out-0.producer.partitionKeyExpression = payload.id
spring.cloud.stream.bindings.eventSupplier-out-0.producer.partitionCount = 10
spring.kafka.bootstrap-servers = one-node-cluster.redpanda:9092

Now, the consumer application. It is also not very complicated. As I mentioned before, we will sleep the main thread inside the receiving method in order to simulate processing time.

public class ConsumerApp {

   private static final Logger LOG = LoggerFactory.getLogger(ConsumerAApp.class);

   public static void main(String[] args) {
      SpringApplication.run(ConsumerApp.class, args);
   }

   @Bean
   public Consumer<Message<CallmeEvent>> eventConsumer() {
      return event -> {
         LOG.info("Received: {}", event.getPayload());
         try {
            Thread.sleep(1000);
         } catch (InterruptedException e) { }
      };
   }

}

Finally, the configuration on the consumer side. It is important to set the consumer group and enable partitioning.

spring.cloud.stream.bindings.eventConsumer-in-0.destination = test-topic
spring.cloud.stream.bindings.eventConsumer-in-0.group = a
spring.cloud.stream.bindings.eventConsumer-in-0.consumer.partitioned = true
spring.kafka.bootstrap-servers = one-node-cluster.redpanda:9092

Now, we should deploy both applications on Kubernetes. But before we do that, let’s install Kafka and KEDA on Kubernetes.

Install Kafka on Kubernetes

To perform this part you need to install helm. Instead of Kafka directly, we can install Redpanda. It is a Kafka API compatible platform. However, the Redpanda operator requires cert-manager to create certificates for TLS communication. So, let’s install it first. We use the latest version of the cert-manager. It requires adding CRDs:

$ kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.6.1/cert-manager.crds.yaml

Then you need to add a new Helm repository:

$ helm repo add jetstack https://charts.jetstack.io

And finally, install it cert-manager:

$ helm install cert-manager \
   --namespace cert-manager \
   --version v1.6.1 \
   jetstack/cert-manager

Now, we can proceed to the Redpanda installation. The same as before, let’s add the Helm repository:

$ helm repo add redpanda https://charts.vectorized.io/
$ helm repo update

We can obtain the latest version of Redpanda:

$ export VERSION=$(curl -s https://api.github.com/repos/vectorizedio/redpanda/releases/latest | jq -r .tag_name)

Then, let’s add the CRDs:

$ kubectl apply -f https://github.com/vectorizedio/redpanda/src/go/k8s/config/crd

After that, we can finally install the Redpanda operator:

$ helm install \
   redpanda-operator \
   redpanda/redpanda-operator \
   --namespace redpanda-system \
   --create-namespace \
   --version $VERSION

We will install a single-node cluster in the redpanda namespace. To do that we need to apply the following manifest:

apiVersion: redpanda.vectorized.io/v1alpha1
kind: Cluster
metadata:
  name: one-node-cluster
spec:
  image: "vectorized/redpanda"
  version: "latest"
  replicas: 1
  resources:
    requests:
      cpu: 1
      memory: 1.2Gi
    limits:
      cpu: 1
      memory: 1.2Gi
  configuration:
    rpcServer:
      port: 33145
    kafkaApi:
    - port: 9092
    pandaproxyApi:
    - port: 8082
    adminApi:
    - port: 9644
    developerMode: true

Once you did that, you can verify a list of pods in the redpanda namespace:

$ kubectl get pod -n redpanda                      
NAME                 READY   STATUS    RESTARTS   AGE
one-node-cluster-0   1/1     Running   0          4s

If you noticed, I have already set a Kafka bootstrap server address in the application.properties. For me, it is one-node-cluster.redpanda:9092. You can verify it using the following command:

$ kubectl get svc -n redpanda
NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                      AGE
one-node-cluster   ClusterIP   None         <none>        9644/TCP,9092/TCP,8082/TCP   23h

Install KEDA and integrate it with Kafka

The same as before we will install KEDA on Kubernetes with Helm. Let’s add the following Helm repo:

$ helm repo add kedacore https://kedacore.github.io/charts

Don’t forget to update the repository. We will install the operator in the keda namespace. Let’s create the namespace first:

$ kubectl create namespace keda

Finally, we can install the operator:

$ helm install keda kedacore/keda --namespace keda

I will run both example applications in the default namespace, so I will create a KEDA object also in this namespace. The main object responsible for configuring autoscaling with KEDA is ScaledObject. Here’s the definition:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: consumer-scaled
spec:
  scaleTargetRef:
    name: consumer-deployment # (1)
  cooldownPeriod: 30 # (2)
  maxReplicaCount:  10 # (3)
  advanced:
    horizontalPodAutoscalerConfig: # (4)
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 30
          policies:
            - type: Percent
              value: 50
              periodSeconds: 30
  triggers: # (5)
    - type: kafka
      metadata:
        bootstrapServers: one-node-cluster.redpanda:9092
        consumerGroup: a
        topic: test-topic
        lagThreshold: '5'

Let’s analyze the configuration in the details:

(1) We are setting autoscaler for the consumer application, which is deployed under the consumer-deployment name (see the next section for the Deployment manifest)

(2) We decrease the default value of the cooldownPeriod parameter from 300 seconds to 30 in order to test the scale-to-zero mechanism

(3) The maximum number of running pods is 10 (the same as the number of partitions in the topic) instead of the default 100

(4) We can customize the behavior of the Kubernetes HPA. Let’s do that for the scale-down operation. We could as well configure that for the scale-up operation. We allow to scale down 50% of current running replicas.

(5) The last and the most important part – a trigger configuration. We should set the address of the Kafka cluster, the name of the topic, and the consumer group used by our application. The lag threshold is 10. It sets the average target value of offset lags to trigger scaling operations.

Before applying the manifest containing ScaledObject we need to deploy the consumer application. Let’s proceed to the next section.

Test Autoscaling with KEDA and Kafka

Let’s deploy the consumer application first. It is prepared to be deployed with Skaffold, so you can just run the command skaffold dev from the consumer directory. Anyway, here’s the Deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: consumer-deployment
spec:
  selector:
    matchLabels:
      app: consumer
  template:
    metadata:
      labels:
        app: consumer
    spec:
      containers:
      - name: consumer
        image: piomin/consumer
        ports:
        - containerPort: 8080

Once we created it, we can also apply the KEDA ScaledObject. After creating let’s display its status with the kubectl get so command.

keda-kafka-scaledobject

Ok, but… it is inactive. If you think about that’s logical since there are no incoming events in Kafka’s topic. Right? So, KEDA has performed a scale-to-zero operation as shown below:

Now, let’s deploy the producer application. For now, DO NOT override the default value of the spring.cloud.stream.poller.maxMessagesPerPoll parameter. The producer will send one event per second.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: producer
spec:
  selector:
    matchLabels:
      app: producer
  template:
    metadata:
      labels:
        app: producer
    spec:
      containers:
        - name: producer
          image: piomin/producer
          ports:
            - containerPort: 8080

After some time you can run the kubectl get so once again. Now the status in the ACTIVE column should be true. And there is a single instance of the consumer application (1 event per second sent by a producer and received by a consumer).

We can also verify offsets and lags for consumer groups on the topic partitions. Just run the command rpk group describe a inside the Redpanda container.

Now, we will change the traffic rate on the producer side. It will send 5 events per second instead of 1 event before. To do that we have to define the following property in the application.properties file.

spring.cloud.stream.poller.fixedDelay = 200

Before KEDA performs autoscaling we still have a single instance of the consumer application. Therefore, after some time the lag on partitions will exceed the desired threshold as shown below.

Once autoscaling occurs we can display a list of deployments. As you see, now there are 5 running pods of the consumer service.

keda-kafka-scaling

Once again let’s verify the status of our Kafka consumer group. There 5 consumers on the topic partitions.

Finally, just to check it out – let’s undeploy the producer application. What happened? The consumer-deployment has been scaled down to zero.

Final Thoughts

You can use KEDA not only with Kafka. There are a lot of other options available including databases, different message brokers, or even cron. Here’s a full list of the supported tools. In this article, I showed how to use Kafka consumer offset and lag as a criterium for autoscaling with KEDA. I tried to explain this process in the detail. Hope it helps you to understand how KEDA exactly works πŸ™‚

The post Autoscaling on Kubernetes with KEDA and Kafka appeared first on Piotr's TechBlog.

]]>
https://piotrminkowski.com/2022/01/18/autoscaling-on-kubernetes-with-keda-and-kafka/feed/ 20 10475
Spring Boot on Knative https://piotrminkowski.com/2021/03/01/spring-boot-on-knative/ https://piotrminkowski.com/2021/03/01/spring-boot-on-knative/#respond Mon, 01 Mar 2021 11:28:54 +0000 https://piotrminkowski.com/?p=9506 In this article, I’ll explain what is Knative and how to use it with Spring Boot. Although Knative is a serverless platform, we can run there any type of application (not just function). Therefore, we are going to run there a standard Spring Boot application that exposes REST API and connects to a database. Knative […]

The post Spring Boot on Knative appeared first on Piotr's TechBlog.

]]>
In this article, I’ll explain what is Knative and how to use it with Spring Boot. Although Knative is a serverless platform, we can run there any type of application (not just function). Therefore, we are going to run there a standard Spring Boot application that exposes REST API and connects to a database.

Knative introduces a new way of managing your applications on Kubernetes. It extends Kubernetes to add some new key features. One of the most significant of them is a “Scale to zero”. If Knative detects that a service is not used, it scales down the number of running instances to zero. Consequently, it provides a built-in autoscaling feature based on a concurrency or a number of requests per second. We may also take advantage of revision tracking, which is responsible for switching from one version of your application to another. With Knative you just have to focus on your core logic.

All the features I described above are provided by the component called “Knative Serving”. There are also two other components: Eventing” and “Build”. The Build component is deprecated and has been replaced by Tekton. The Eventing component requires attention. However, I’ll discuss it in more detail in the separated article.

Source Code

If you would like to try it by yourself, you may always take a look at my source code. In order to do that you need to clone my GitHub repository. Then you should just follow my instructions πŸ™‚

I used the same application as the example in some of my previous articles about Spring Boot and Kubernetes. I just wanted to focus that you don’t have to change anything in the source code to run it also on Knative. The only required change will be in the YAML manifest.

Since Knative provides built-in autoscaling you may want to compare it with the horizontal pod autoscaler (HPA) on Kubernetes. To do that you may read the article Spring Boot Autoscaling on Kubernetes. If you are interested in how to easily deploy applications on Kubernetes read the following article about the Okteto platform.

Install Knative on Kubernetes

Of course, before we start Spring Boot development we need to install Knative on Kubernetes. We can do it using the kubectl CLI or an operator. You can find the detailed installation instruction here. I decided to try it on OpenShift. It is obviously the fastest way. I could do it with one click using the OpenShift Serverless Operator. No matter which type of installation you choose, the further steps will apply everywhere.

Using Knative CLI

This step is optional. You can deploy and manage applications on Knative with CLI. To download CLI do to the site https://knative.dev/docs/install/install-kn/. Then you can deploy the application using the Docker image.

$ kn service create sample-spring-boot-on-kubernetes \
   --image piomin/sample-spring-boot-on-kubernetes:latest

We can also verify a list of running services with the following command.

$ kn service list

For more advanced deployments it will be more suitable to use the YAML manifest. We will start the build from the source code build with Skaffold and Jib. Firstly, let’s take a brief look at our Spring Boot application.

Spring Boot application for Knative

As I mentioned before, we are going to create a typical Spring Boot REST-based application that connects to a Mongo database. The database is deployed on Kubernetes. Our model class uses the person collection in MongoDB. Let’s take a look at it.

@Document(collection = "person")
@Getter
@Setter
@AllArgsConstructor
@NoArgsConstructor
public class Person {

   @Id
   private String id;
   private String firstName;
   private String lastName;
   private int age;
   private Gender gender;
}

We use Spring Data MongoDB to integrate our application with the database. In order to simplify this integration we take advantage of its “repositories” feature.

public interface PersonRepository extends CrudRepository<Person, String> {
   Set<Person> findByFirstNameAndLastName(String firstName, String lastName);
   Set<Person> findByAge(int age);
   Set<Person> findByAgeGreaterThan(int age);
}

Our application exposes several REST endpoints for adding, searching and updating data. Here’s the controller class implementation.

@RestController
@RequestMapping("/persons")
public class PersonController {

   private PersonRepository repository;
   private PersonService service;

   PersonController(PersonRepository repository, PersonService service) {
      this.repository = repository;
      this.service = service;
   }

   @PostMapping
   public Person add(@RequestBody Person person) {
      return repository.save(person);
   }

   @PutMapping
   public Person update(@RequestBody Person person) {
      return repository.save(person);
   }

   @DeleteMapping("/{id}")
   public void delete(@PathVariable("id") String id) {
      repository.deleteById(id);
   }

   @GetMapping
   public Iterable<Person> findAll() {
      return repository.findAll();
   }

   @GetMapping("/{id}")
   public Optional<Person> findById(@PathVariable("id") String id) {
      return repository.findById(id);
   }

   @GetMapping("/first-name/{firstName}/last-name/{lastName}")
   public Set<Person> findByFirstNameAndLastName(@PathVariable("firstName") String firstName,
			@PathVariable("lastName") String lastName) {
      return repository.findByFirstNameAndLastName(firstName, lastName);
   }

   @GetMapping("/age-greater-than/{age}")
   public Set<Person> findByAgeGreaterThan(@PathVariable("age") int age) {
      return repository.findByAgeGreaterThan(age);
   }

   @GetMapping("/age/{age}")
   public Set<Person> findByAge(@PathVariable("age") int age) {
      return repository.findByAge(age);
   }

}

We inject database connection settings and credentials using environment variables. Our application exposes endpoints for liveness and readiness health checks. The readiness endpoint verifies a connection with the Mongo database. Of course, we use the built-in feature from Spring Boot Actuator for that.

spring:
  application:
    name: sample-spring-boot-on-kubernetes
  data:
    mongodb:
      uri: mongodb://${MONGO_USERNAME}:${MONGO_PASSWORD}@mongodb/${MONGO_DATABASE}

management:
  endpoints:
    web:
      exposure:
        include: "*"
  endpoint.health:
      show-details: always
      group:
        readiness:
          include: mongo
      probes:
        enabled: true

Defining Knative Service in YAML

Firstly, we need to define a YAML manifest with a Knative service definition. It sets an autoscaling strategy using the Knative Pod Autoscaler (KPA). In order to do that we have to add annotation autoscaling.knative.dev/target with the number of simultaneous requests that can be processed by each instance of the application. By default, it is 100. We decrease that limit to 20 requests. Of course, we need to set liveness and readiness probes for the container. Also, we refer to the Secret and ConfigMap to inject MongoDB settings.

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: sample-spring-boot-on-kubernetes
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/target: "20"
    spec:
      containers:
        - image: piomin/sample-spring-boot-on-kubernetes
          livenessProbe:
            httpGet:
              path: /actuator/health/liveness
          readinessProbe:
            httpGet:
              path: /actuator/health/readiness
          env:
            - name: MONGO_DATABASE
              valueFrom:
                secretKeyRef:
                  name: mongodb
                  key: database-name
            - name: MONGO_USERNAME
              valueFrom:
                secretKeyRef:
                  name: mongodb
                  key: database-user
            - name: MONGO_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: mongodb
                  key: database-password

Configure Skaffold and Jib for Knative deployment

We will use Skaffold to automate the deployment of our application on Knative. Skaffold is a command-line tool that allows running the application on Kubernetes using a single command. You may read more about it in the article Local Java Development on Kubernetes. It may be easily integrated with the Jib Maven plugin. We just need to set jib as a default option in the build section of the Skaffold configuration. We can also define a list of YAML scripts executed during the deploy phase. The skaffold.yaml file should be placed in the project root directory. Here’s a current Skaffold configuration. As you see, it runs the script with the Knative Service definition.

apiVersion: skaffold/v2beta5
kind: Config
metadata:
  name: sample-spring-boot-on-kubernetes
build:
  artifacts:
    - image: piomin/sample-spring-boot-on-kubernetes
      jib:
        args:
          - -Pjib
  tagPolicy:
    gitCommit: {}
deploy:
  kubectl:
    manifests:
      - k8s/mongodb-deployment.yaml
      - k8s/knative-service.yaml

Skaffold activates the jib profile during the build. Within this profile, we will place a jib-maven-plugin. Jib is useful for building images in dockerless mode.

<profile>
   <id>jib</id>
   <activation>
      <activeByDefault>false</activeByDefault>
   </activation>
   <build>
      <plugins>
         <plugin>
            <groupId>com.google.cloud.tools</groupId>
            <artifactId>jib-maven-plugin</artifactId>
            <version>2.8.0</version>
         </plugin>
      </plugins>
   </build>
</profile>

Finally, all we need to do is to run the following command. It builds our application, creates and pushes a Docker image, and run it on Knative using knative-service.yaml.

$ skaffold run

Verify Spring Boot deployment on Knative

Now, we can verify our deployment on Knative. To do that let’s execute the command kn service list as shown below. We have a single Knative Service with the name sample-spring-boot-on-kubernetes.

spring-boot-knative-services

Then, let’s imagine we deploy three versions (revisions) of our application. To do that let’s just provide some changes in the source and redeploy our service using skaffold run. It creates new revisions of our Knative Service. However, the whole traffic is forwarded to the latest revision (with -vlskg suffix).

spring-boot-knative-revisions

With Knative we can easily split traffic between multiple revisions of the single service. To do that we need to add a traffic section in the Knative Service YAML configuration. We define a percent of the whole traffic per a single revision as shown below.

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: sample-spring-boot-on-kubernetes
  spec:
    template:
      ...
    traffic:
      - latestRevision: true
        percent: 60
        revisionName: sample-spring-boot-on-kubernetes-vlskg
      - latestRevision: false
        percent: 20
        revisionName: sample-spring-boot-on-kubernetes-t9zrd
      - latestRevision: false
        percent: 20
        revisionName: sample-spring-boot-on-kubernetes-9xhbw

Let’s take a look at the graphical representation of our current architecture. 60% of traffic is forwarded to the latest revision, while both previous revisions receive 20% of traffic.

spring-boot-knative-openshift

Autoscaling and scale to zero

By default, Knative supports autoscaling. We may choose between two types of targets: concurrency and requests-per-second (RPS). The default target is concurrency. As you probably remember, I have overridden this default value to 20 with the autoscaling.knative.dev/target annotation. So, our goal now is to verify autoscaling. To do that we need to send many simultaneous requests to the application. Of course, the incoming traffic is distributed across three different revisions of Knative Service.

Fortunately, we may easily simulate a large traffic with the siege tool. We will call the GET /persons endpoint that returns all available persons. We are sending 150 concurrent requests with the command visible below.

$ siege http://sample-spring-boot-on-kubernetes-pminkows-serverless.apps.cluster-7260.7260.sandbox1734.opentlc.com/persons \
   -i -v -r 1000  -c 150 --no-parser

Under the hood, Knative still creates a Deployment and scales down or scales up the number of running pods. So, if you have three revisions of a single Service, there are three different deployments created. Finally, I have 10 running pods for the latest deployment that receives 60% of traffic. There are also 3 and 2 running pods for the previous revisions.

What will happen if there is no traffic coming to the service? Knative will scale down the number of running pods for all the deployments to zero.

Conclusion

In this article, you learned how to deploy the Spring Boot application as a Knative service using Skaffold and Jib. I explained with the examples how to create a new revision of the Service, and distribute traffic across those revisions. We also test the scenario with autoscaling based on concurrent requests and scale to zero in case of no incoming traffic. You may expect more articles about Knative soon! Not only with Spring Boot πŸ™‚

The post Spring Boot on Knative appeared first on Piotr's TechBlog.

]]>
https://piotrminkowski.com/2021/03/01/spring-boot-on-knative/feed/ 0 9506
Spring Boot Autoscaling on Kubernetes https://piotrminkowski.com/2020/11/05/spring-boot-autoscaling-on-kubernetes/ https://piotrminkowski.com/2020/11/05/spring-boot-autoscaling-on-kubernetes/#comments Thu, 05 Nov 2020 09:39:05 +0000 https://piotrminkowski.com/?p=9051 Autoscaling based on the custom metrics is one of the features that may convince you to run your Spring Boot application on Kubernetes. By default, horizontal pod autoscaling can scale the deployment based on the CPU and memory. However, such an approach is not enough for more advanced scenarios. In that case, you can use […]

The post Spring Boot Autoscaling on Kubernetes appeared first on Piotr's TechBlog.

]]>
Autoscaling based on the custom metrics is one of the features that may convince you to run your Spring Boot application on Kubernetes. By default, horizontal pod autoscaling can scale the deployment based on the CPU and memory. However, such an approach is not enough for more advanced scenarios. In that case, you can use Prometheus to collect metrics from your applications. Then, you may integrate Prometheus with the Kubernetes custom metrics mechanism.

Preface

In this article, you will learn how to run Prometheus on Kubernetes using the Helm package manager. You will use the chart that causes Prometheus to scrape a variety of Kubernetes resource types. Thanks to that you won’t have to configure it. In the next step, you will install the Prometheus Adapter. You can also do that using the Helm package manager. The adapter acts as a bridge between the Prometheus instance and the custom metrics API. Our Spring Boot application exposes metrics through the HTTP endpoint. You will learn how to configure autoscaling on Kubernetes based on the number of incoming requests.

Source code

If you would like to try it by yourself, you may always take a look at my source code. In order to do that, you need to clone my repository sample-spring-boot-on-kubernetes. Then you should go to the k8s directory. You can find there all the Kubernetes manifests and configuration files required for that exercise.

Our Spring Boot application is ready to be deployed on Kubernetes with Skaffold. You can find the skaffold.yaml file in the project root directory. Skaffold uses Jib Maven Plugin for building a Docker image. It deploys not only the Spring Boot application but also the Mongo database.

apiVersion: skaffold/v2beta5
kind: Config
metadata:
  name: sample-spring-boot-on-kubernetes
build:
  artifacts:
    - image: piomin/sample-spring-boot-on-kubernetes
      jib:
        args:
          - -Pjib
  tagPolicy:
    gitCommit: {}
deploy:
  kubectl:
    manifests:
      - k8s/mongodb-deployment.yaml
      - k8s/deployment.yaml

The only thing you need to do to build and deploy the application is to execute the following command. It also allows you to access HTTP API through the local port.

$ skaffold dev --port-forward

For more information about Skaffold, Jib and a local development of Java applications, you may refer to the article Local Java Development on Kubernetes

Kubernetes Autoscaling with Spring Boot – Architecture

The picture visible below shows the architecture of our sample system. The horizontal pod autoscaler (HPA) automatically scales the number of pods based on CPU, memory, or other custom metrics. It obtains the value of the metric by pulling the Custom Metrics API. In the beginning, we are running a single instance of our Spring Boot application on Kubernetes. Prometheus gathers and stores metrics from the application by calling HTTP endpoint /actuator/promentheus. Consequently, the HPA scales up the number of pods if the value of the metric exceeds the assumed value.

spring-boot-autoscaler-on-kubernetes-arch

Run Prometheus on Kubernetes

Let’s start by running the Prometheus instance on Kubernetes. In order to do that, you should use the official Prometheus Helm chart. Firstly, you need to add the required repository.

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo add stable https://charts.helm.sh/stable
$ helm repo update

Then, you should execute the Helm install command and provide the name of your installation.

$ helm install prometheus prometheus-community/prometheus

In a moment, the Prometheus instance is ready to use. You can access it through the Kubernetes Service prometheus-server. It is available on port 443. By default, the type of service is ClusterIP. Therefore, you should execute the kubectl port-forward command to access it on the local port.

Deploy Spring Boot on Kubernetes

In order to enable Prometheus support in Spring Boot, you need to include Spring Boot Actuator and the Micrometer Prometheus library. The full list of required dependencies also contains the Spring Web and Spring Data MongoDB modules.

<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
   <groupId>io.micrometer</groupId>
   <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-devtools</artifactId>
</dependency>
<dependency>
   <groupId>org.springframework.boot</groupId>
   <artifactId>spring-boot-starter-data-mongodb</artifactId>
</dependency>

After enabling Prometheus support, the application exposes metrics through the /actuator/prometheus endpoint.

Our example application is very simple. It exposes the REST API for CRUD operations and connects to the Mongo database. Just to clarify, here’s the REST controller implementation.

@RestController
@RequestMapping("/persons")
public class PersonController {

   private PersonRepository repository;
   private PersonService service;

   PersonController(PersonRepository repository, PersonService service) {
      this.repository = repository;
      this.service = service;
   }

   @PostMapping
   public Person add(@RequestBody Person person) {
      return repository.save(person);
   }

   @PutMapping
   public Person update(@RequestBody Person person) {
      return repository.save(person);
   }

   @DeleteMapping("/{id}")
   public void delete(@PathVariable("id") String id) {
      repository.deleteById(id);
   }

   @GetMapping
   public Iterable<Person> findAll() {
      return repository.findAll();
   }

   @GetMapping("/{id}")
   public Optional<Person> findById(@PathVariable("id") String id) {
      return repository.findById(id);
   }

}

You may add a new person, modify and delete it through the HTTP API. You can also find it by id or just find all available persons. The Spring Boot Actuator generates HTTP traffic statistics per each endpoint and exposes it in the form readable by Prometheus. The number of incoming requests is available in the http_server_requests_seconds_count metric. Consequently, we will use this metric for Spring Boot autoscaling on Kubernetes.

Prometheus collects a pretty large set of metrics from the whole cluster. However, by default, it is not gathering the logs from applications. In order to force Prometheus to scrape the particular pods, you must add annotations to the Deployment as shown below. The annotation prometheus.io/path indicates the context path with the metrics endpoint. Of course, you have to enable scraping for the application using the annotation prometheus.io/scrape. Finally, you need to set the number of HTTP port with prometheus.io/port.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-spring-boot-on-kubernetes-deployment
spec:
  selector:
    matchLabels:
      app: sample-spring-boot-on-kubernetes
  template:
    metadata:
      annotations:
        prometheus.io/path: /actuator/prometheus
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
      labels:
        app: sample-spring-boot-on-kubernetes
    spec:
      containers:
      - name: sample-spring-boot-on-kubernetes
        image: piomin/sample-spring-boot-on-kubernetes
        ports:
        - containerPort: 8080
        env:
          - name: MONGO_DATABASE
            valueFrom:
              configMapKeyRef:
                name: mongodb
                key: database-name
          - name: MONGO_USERNAME
            valueFrom:
              secretKeyRef:
                name: mongodb
                key: database-user
          - name: MONGO_PASSWORD
            valueFrom:
              secretKeyRef:
                name: mongodb
                key: database-password

Install Prometheus Adapter on Kubernetes

The Prometheus adapter pulls metrics from the Prometheus instance and exposes them as the Custom Metrics API. In this step, you will have to provide configuration for pulling a custom metric exposed by the Spring Boot Actuator. The http_server_requests_seconds_count metric contains a number of requests received by the particular HTTP endpoint. To clarify, let’s take a look at the list of http_server_requests_seconds_count metrics for the multiple /persons endpoints.

You need to override some configuration settings for the Prometheus adapter. Firstly, you should change the default address of the Prometheus instance. Since the name of the Prometheus Service is prometheus-server, you should change it to prometheus-server.default.svc. The number of HTTP port is 80. Then, you have to define a custom rule for pulling the required metric from Prometheus. It is important to override the name of the Kubernetes pod and namespace used as a metric tag by Prometheus. There are multiple entries for http_server_requests_seconds_count, so you must calculate the sum. The name of the custom Kubernetes metric is http_server_requests_seconds_count_sum.

prometheus:
  url: http://prometheus-server.default.svc
  port: 80
  path: ""

rules:
  default: true
  custom:
  - seriesQuery: '{__name__=~"^http_server_requests_seconds_.*"}'
    resources:
      overrides:
        kubernetes_namespace:
          resource: namespace
        kubernetes_pod_name:
          resource: pod
    name:
      matches: "^http_server_requests_seconds_count(.*)"
      as: "http_server_requests_seconds_count_sum"
    metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>,uri=~"/persons.*"}) by (<<.GroupBy>>)

Now, you just need to execute the Helm install command with the location of the YAML configuration file.

$ helm install -f k8s\helm-config.yaml prometheus-adapter prometheus-community/prometheus-adapter

Finally, you can verify if metrics have been successfully pulled by executing the following command.

Create Kubernetes Horizontal Pod Autoscaler

In the last step of this tutorial, you will create a Kubernetes HorizontalPodAutoscaler. HorizontalPodAutoscaler automatically scales up the number of pods if the average value of the http_server_requests_seconds_count_sum metric exceeds 100. In other words, if your instance of application receives more than 100 requests, HPA automatically runs a new instance. Then, after sending another 100 requests, an average value of metric exceeds 100 once again. So, HPA runs a third instance of the application. Consequently, after sending 1k requests you should have 10 pods. The definition of our HorizontalPodAutoscaler is visible below.

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: sample-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sample-spring-boot-on-kubernetes-deployment
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_server_requests_seconds_count_sum
        target:
          type: AverageValue
          averageValue: 100

Testing Kubernetes autoscaling with Spring Boot

After deploying all the required components you may verify a list of running pods. As you see, there is a single instance of our Spring Boot application sample-spring-boot-on-kubernetes.

In order to check the current value of the http_server_requests_seconds_count_sum metric you can display the details about HorizontalPodAutoscaler. As you see I have already sent 15 requests to the different HTTP endpoints.

spring-boot-autoscaling-on-kubernetes-hpa

Here’s the sequence of requests you may send to the application to test autoscaling behavior.

$ curl http://localhost:8080/persons -d "{\"firstName\":\"Test\",\"lastName\":\"Test\",\"age\":20,\"gender\":\"MALE\"}" -H "Content-Type: application/
json"
{"id":"5fa334d149685f24841605a9","firstName":"Test","lastName":"Test","age":20,"gender":"MALE"}
$ curl http://localhost:8080/persons/5fa334d149685f24841605a9
{"id":"5fa334d149685f24841605a9","firstName":"Test","lastName":"Test","age":20,"gender":"MALE"}
$ curl http://localhost:8080/persons
[{"id":"5fa334d149685f24841605a9","firstName":"Test","lastName":"Test","age":20,"gender":"MALE"}]
$ curl -X DELETE http://localhost:8080/persons/5fa334d149685f24841605a9

After sending many HTTP requests to our application, you may verify the number of running pods. In that case, we have 5 instances.

spring-boot-autoscaling-on-kubernetes-hpa-finish

You can also display a list of running pods.

Conclusion

In this article, I showed you a simple scenario of Kubernetes autoscaling with Spring Boot based on the number of incoming requests. You may easily create more advanced scenarios just by modifying a metric query in the Prometheus Adapter configuration file. I run all my tests on Google Cloud with GKE. For more information about running JVM applications on GKE please refer to Running Kotlin Microservice on Goggle Kubernetes Engine.

The post Spring Boot Autoscaling on Kubernetes appeared first on Piotr's TechBlog.

]]>
https://piotrminkowski.com/2020/11/05/spring-boot-autoscaling-on-kubernetes/feed/ 4 9051