RestTemplate Archives - Piotr's TechBlog

Circuit breaker and retries on Kubernetes with Istio and Spring Boot

piotr.minkowski — Wed, 03 Jun 2020 07:29:41 +0000

An ability to handle communication failures in an inter-service communication is an absolute necessity for every single service mesh framework. It includes handling of timeouts and HTTP error codes. In this article I’m going to show how to configure retry and circuit breaker mechanisms using Istio. The same as for the previous article about Istio Service mesh on Kubernetes with Istio and Spring Boot we will analyze a communication between two simple Spring Boot applications deployed on Kubernetes. But instead of very basic example we are going to discuss more advanced topics.

Example

For demonstrating usage of Istio and Spring Boot I created a repository on GitHub with two sample applications: callme-service and caller-service. The address of this repository is https://github.com/piomin/sample-istio-services.git. The same repository has been for the first article about service mesh with Istio already mentioned in the preface.

Architecture

The architecture of our sample system is pretty similar to those in the previous article. However, there are some differences. We are not injecting a fault or delay using Istio components, but directly on the application inside the source code. Why? Now, we will be able to handle directly the rules created for callme-service, not on the client side as before. Also we are running two instances of version v2 of callme-service application to test how circuit breaker works for more than instances of the same service (or rather the same Deployment). The following picture illustrates the currently described architecture.

Spring Boot applications

We are starting from an implementation of the sample applications. The application callme-service is exposing two endpoints that return information about version and instance id. The endpoint GET /ping-with-random-error sets HTTP 504 error code as a response for ~50% of requests. The endpoint GET /ping-with-random-delay returns response with random delay between 0s and 3s. Here’s the implementation of @RestController on the callme-service side.

@RestController
@RequestMapping("/callme")
public class CallmeController {

    private static final Logger LOGGER = LoggerFactory.getLogger(CallmeController.class);
    private static final String INSTANCE_ID = UUID.randomUUID().toString();
    private Random random = new Random();

    @Autowired
    BuildProperties buildProperties;
    @Value("${VERSION}")
    private String version;

    @GetMapping("/ping-with-random-error")
    public ResponseEntity pingWithRandomError() {
        int r = random.nextInt(100);
        if (r % 2 == 0) {
            LOGGER.info("Ping with random error: name={}, version={}, random={}, httpCode={}",
                    buildProperties.getName(), version, r, HttpStatus.GATEWAY_TIMEOUT);
            return new ResponseEntity<>("Surprise " + INSTANCE_ID + " " + version, HttpStatus.GATEWAY_TIMEOUT);
        } else {
            LOGGER.info("Ping with random error: name={}, version={}, random={}, httpCode={}",
                    buildProperties.getName(), version, r, HttpStatus.OK);
            return new ResponseEntity<>("I'm callme-service" + INSTANCE_ID + " " + version, HttpStatus.OK);
        }
    }

    @GetMapping("/ping-with-random-delay")
    public String pingWithRandomDelay() throws InterruptedException {
        int r = new Random().nextInt(3000);
        LOGGER.info("Ping with random delay: name={}, version={}, delay={}", buildProperties.getName(), version, r);
        Thread.sleep(r);
        return "I'm callme-service " + version;
    }

}

The application caller-service is also exposing two GET endpoints. It is using RestTemplate to call the corresponding GET endpoints exposed by callme-service. It also returns the version of caller-service, but there is only a single Deployment of that application labeled with version=v1.

@RestController
@RequestMapping("/caller")
public class CallerController {

    private static final Logger LOGGER = LoggerFactory.getLogger(CallerController.class);

    @Autowired
    BuildProperties buildProperties;
    @Autowired
    RestTemplate restTemplate;
    @Value("${VERSION}")
    private String version;


    @GetMapping("/ping-with-random-error")
    public ResponseEntity pingWithRandomError() {
        LOGGER.info("Ping with random error: name={}, version={}", buildProperties.getName(), version);
        ResponseEntity responseEntity =
                restTemplate.getForEntity("http://callme-service:8080/callme/ping-with-random-error", String.class);
        LOGGER.info("Calling: responseCode={}, response={}", responseEntity.getStatusCode(), responseEntity.getBody());
        return new ResponseEntity<>("I'm caller-service " + version + ". Calling... " + responseEntity.getBody(), responseEntity.getStatusCode());
    }

    @GetMapping("/ping-with-random-delay")
    public String pingWithRandomDelay() {
        LOGGER.info("Ping with random delay: name={}, version={}", buildProperties.getName(), version);
        String response = restTemplate.getForObject("http://callme-service:8080/callme/ping-with-random-delay", String.class);
        LOGGER.info("Calling: response={}", response);
        return "I'm caller-service " + version + ". Calling... " + response;
    }

}

Handling retries in Istio

The definition of Istio DestinationRule is the same as before in my article Service mesh on Kubernetes with Istio and Spring Boot. There two subsets created for instances labeled with version=v1 and version=v2. Retries and timeout may be configured on VirtualService. We may set the number of retries and the conditions under which retry takes place (a list of enum strings). The following configuration is also setting 3s timeout for the whole request. Both these settings are available inside HTTPRoute object. We also need to set a timeout per single attempt. In that case I set 1s. How does it work in practice? We will analyze it in a simple example.

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: callme-service-destination
spec:
  host: callme-service
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: callme-service-route
spec:
  hosts:
    - callme-service
  http:
    - route:
      - destination:
          host: callme-service
          subset: v2
        weight: 80
      - destination:
          host: callme-service
          subset: v1
        weight: 20
      retries:
        attempts: 3
        perTryTimeout: 1s
        retryOn: 5xx
      timeout: 3s

Before deploying sample applications we should increase a level of logging. We may easily enable Istio access logging. Thanks to that Envoy proxies print access logs with all incoming requests and outgoing responses to their standard output. Analyze of logging entries will be especially usable for detecting retry attempts.

$ istioctl manifest apply --set profile=default --set meshConfig.accessLogFile="/dev/stdout"

Now, let’s send a test request to the HTTP endpoint GET /caller/ping-with-random-delay. It calls the randomly delayed callme-service endpoint GET /callme/ping-with-random-delay. Here’s the request and response for that operation.

Seemingly it’s a very clear situation. But let’s check out what is happening under the hood. I have highlighted the sequence of retries. As you see Istio has performed two retries, since the first two attempts were longer than perTryTimoeut which has been set to 1s. Both two attempts were timeout by Istio, which can be verified in its access logs. The third attempt was successful, since it took around 400ms.

Retrying on timeout is not the only available option of retrying in Istio. In fact, we may retry all 5XX or even 4XX codes. The VirtualService for testing just error codes is much simpler, since we don’t have to configure any timeouts.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: callme-service-route
spec:
  hosts:
    - callme-service
  http:
    - route:
      - destination:
          host: callme-service
          subset: v2
        weight: 80
      - destination:
          host: callme-service
          subset: v1
        weight: 20
      retries:
        attempts: 3
        retryOn: gateway-error,connect-failure,refused-stream

We are going to call HTTP endpoint with GET /caller/ping-with-random-error, that is calling endpoint GET /callme/ping-with-random-error exposed by callme-service. It is returning HTTP 504 for around 50% of incoming requests. Here’s the request and successful response with 200 OK HTTP code.

Here are the logs, which illustrate what happened on the callme-service side. The requests have been retried 2 times, since the two first attempts result in HTTP error code.

Istio circuit breaker

A circuit breaker is configured on the DestinationRule object. We are using TrafficPolicy for that. First we will not set any retries used for the previous sample, so we need to remove it from VirtualService definition. We should also disable any retries on the connectionPool inside TrafficPolicy. And now the most important. For configuring a circuit breaker in Istio we are using OutlierDetection object. Istio circuit breaker implementation is based on consecutive errors returned by the downstream service. The number of subsequent errors may be configured using properties consecutive5xxErrors or consecutiveGatewayErrors. The only difference between them is in the HTTP errors they are able to handle. While consecutiveGatewayErrors is just for 502, 503 and 504, the consecutive5xxErrors is used for 5XX codes. In the following configuration of callme-service-destination I used set consecutive5xxErrors on 3. It means that after 3 errors in row an instance (pod) of application is removed from load balancing for 1 minute (baseEjectionTime=1m). Because we are running two pods of callme-service in version v2 we also need to override a default value of maxEjectionPercent to 100%. A default value of that property is 10%, and it indicates a maximum % of hosts in the load balancing pool that can be ejected.

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: callme-service-destination
spec:
  host: callme-service
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2
  trafficPolicy:
    connectionPool:
      http:
        http1MaxPendingRequests: 1
        maxRequestsPerConnection: 1
        maxRetries: 0
    outlierDetection:
      consecutive5xxErrors: 3
      interval: 30s
      baseEjectionTime: 1m
      maxEjectionPercent: 100
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: callme-service-route
spec:
  hosts:
    - callme-service
  http:
    - route:
      - destination:
          host: callme-service
          subset: v2
        weight: 80
      - destination:
          host: callme-service
          subset: v1
        weight: 20

The fastest way of deploying both applications is with Jib and Skaffold. First you go to directory callme-service and execute skaffold dev command with optional --port-forward parameter.

$ cd callme-service
$ skaffold dev --port-forward

Then do the same for caller-service.

$ cd caller-service
$ skaffold dev --port-forward

Before sending some test requests let’s run the second instance of v2 version of callme-service, since Deployment sets parameter replicas to 1. To do that we need to run the following command.

$ kubectl scale --replicas=2 deployment/callme-service-v2

Now, let’s verify the status of deployment on Kubernetes. There are 3 deployments. The deployment callme-service-v2 has to running pods.

After that we are ready to send some test requests. We are calling endpoint GET /caller/ping-with-random-error exposed by caller-service, that is calling endpoint GET /callme/ping-with-random-error exposed by callme-service. Endpoint exposed by callme-service returns HTTP 504 for 50% of requests. I have already set port forwarding for callme-service on 8080, so the command used calling application is: curl http://localhost:8080/caller/ping-with-random-error.
Now, let’s analyze responses from caller-service. I have highlighted the responses with HTTP 504 error code from instance of callme-service with version v2 and generated id 98c068bb-8d02-4d2a-9999-23951bbed6ad. After 3 error responses in row from that instance, it is immediately removed from load balancing pool, what results in sending all other requests to the second instance of callme-service v2 having id 00653617-58e1-4d59-9e36-3f98f9d403b8. Of course there is still available a single instance of callme-service v1, that is receiving 20% of total requests send by caller-service.

Ok, let’s check what will happen if a single instance callme-service v1 returns 3 errors in row. I have also highlighted those error responses in the picture with logs visible below. Because there is only one instance of callme-service v1 in the pool, there is no chance to redirect an incoming traffic to other instances. That’s why Istio is returning HTTP 503 for the next request sent to callme-service v1. The same response is returned within 1 next minute since the circuit is open.

The post Circuit breaker and retries on Kubernetes with Istio and Spring Boot appeared first on Piotr's TechBlog.

A Deep Dive Into Spring Cloud Load Balancer

piotr.minkowski — Wed, 13 May 2020 08:07:35 +0000

Spring Cloud is currently on the verge of large changes. I have been writing about it in my previous article A New Era of Spring Cloud. While almost all of Spring Cloud Netflix components will be removed in the next release, it seems that the biggest change is a replacement of Ribbon client into Spring Cloud Load Balancer.
Currently, there are not many articles about Spring Cloud Load Balancer online. In fact, this component is still under active development, so we could expect some new features in the near future. Netflix Ribbon client is a stable solution, but unfortunately not developed anymore. However, it is still used as a default load balancer in all Spring Cloud projects, and has many interesting features like integration with circuit breaker or load balancing according to an average response time from service instances. Currently, such features are not available for Spring Cloud Load Balancer, but we can create some custom code to implement them. In this article I’m going to show you how to use spring-cloud-loadbalancer module with RestTemplate for communication between applications, how to implement custom load balancer basing on average response time, and finally how to provide static list of service addresses.

If you are interested in more detailed explanation of Spring Cloud components used for inter-service communication you should refer to the third part of my online course Microservices With Spring Boot And Spring Cloud: Part 3 – Inter-service communication.

Example

You can find a source code snippets related to this article in my GitHub repository https://github.com/piomin/course-spring-microservices.git. That repository is also used for my online course, so I decided to extend it with the new examples. All the required changes were performed in directory inter-communication/inter-caller-service inside that repository. The code is written in Kotlin.
There are three applications, which are a part of our sample system: discovery-server (Spring Cloud Netflix Eureka), inter-callme-service (Spring Boot application that expose REST API), and finally inter-caller-service (Spring Boot application that calls endpoints exposed by inter-callme-service).

How to start with Spring Cloud Load Balancer

To enable Spring Cloud Load Balancer for our application we first need to include the following starter to Maven dependencies. That module may be also included together with some other Spring Cloud starters.


	org.springframework.cloud
	spring-cloud-starter-loadbalancer

Because Ribbon is still used as a default client-side load balancer for REST-based communication between applications we need to disable it in application properties. Here’s a fragment of application.yml file.


spring:
  application:
    name: inter-caller-service
  cloud:
    loadbalancer:
      ribbon:
        enabled: false

For discovery integration we also need to include spring-cloud-starter-netflix-eureka-client. To use RestTemplate with a client-side load balancer we should define the bean visible below and annotate it with @LoadBalanced. As you on the code below I’m also setting interceptor on RestTemplate, but more about it in the next section.

@Bean
@LoadBalanced
fun template(): RestTemplate = RestTemplateBuilder()
		.interceptors(responseTimeInterceptor())
		.build()

Adapt traffic to average response time

Spring Cloud Load Balancer provides a simple round robin rule for load balancing between multiple instances of a single service. Our goal here is to implement a rule, which measures each application response time and gives a weight according to that time. The longer the response time, the less weight it will get. The rule should randomly pick an instance where the possibility is determined by its weight. To record response time of each call we need to set an already mentioned interceptor that implements ClientHttpRequestInterceptor. Interceptor is executed on every request (1). Since the implementation is very typical, one line requires explanation (2). I’m getting the address of the target application from a thread scoped variable existing in Slf4J MDC. Of course I could also implement a simple thread scoped context based on ThreadLocal, but MDC is used here just for simplification.

class ResponseTimeInterceptor(private val responseTimeHistory: ResponseTimeHistory) : ClientHttpRequestInterceptor {

    private val logger: Logger = LoggerFactory.getLogger(ResponseTimeInterceptor::class.java)

    override fun intercept(request: HttpRequest, array: ByteArray,
                           execution: ClientHttpRequestExecution): ClientHttpResponse {
        val startTime: Long = System.currentTimeMillis()
        val response: ClientHttpResponse = execution.execute(request, array) // 1
        val endTime: Long = System.currentTimeMillis()
        val responseTime: Long = endTime - startTime
        logger.info("Response time: instance->{}, time->{}", MDC.get("address"), responseTime)
        responseTimeHistory.addNewMeasure(MDC.get("address"), responseTime) // 2
        return response
    }
}

Of course, counting an average response time is just a part of our job. The most important is the implementation of our custom load balancer, which is visible below. It should implement interface ReactorServiceInstanceLoadBalancer. It need to inject ServiceInstanceListSupplier bean to fetch a list of available instances of a given service in overridden method choose. While choosing the right instance we are analyzing the average response time for each instance saved in ResponseTimeHistory by ResponseTimeInterceptor. In the beginning our load balancer acts like a simple round robin.

class WeightedTimeResponseLoadBalancer(
        private val serviceInstanceListSupplierProvider: ObjectProvider,
        private val serviceId: String,
        private val responseTimeHistory: ResponseTimeHistory) : ReactorServiceInstanceLoadBalancer {

    private val logger: Logger = LoggerFactory.getLogger(WeightedTimeResponseLoadBalancer::class.java)
    private val position: AtomicInteger = AtomicInteger()

    override fun choose(request: Request<*>?): Mono> {
        val supplier: ServiceInstanceListSupplier = serviceInstanceListSupplierProvider
                .getIfAvailable { NoopServiceInstanceListSupplier() }
        return supplier.get().next()
                .map { serviceInstances: List -> getInstanceResponse(serviceInstances) }
    }

    private fun getInstanceResponse(instances: List): Response {
        return if (instances.isEmpty()) {
            EmptyResponse()
        } else {
            val address: String? = responseTimeHistory.getAddress(instances.size)
            val pos: Int = position.incrementAndGet()
            var instance: ServiceInstance = instances[pos % instances.size]
            if (address != null) {
                val found: ServiceInstance? = instances.find { "${it.host}:${it.port}" == address }
                if (found != null)
                    instance = found
            }
            logger.info("Current instance: [address->{}:{}, stats->{}ms]", instance.host, instance.port,
                    responseTimeHistory.stats["${instance.host}:${instance.port}"])
            MDC.put("address", "${instance.host}:${instance.port}")
            DefaultResponse(instance)
        }
    }
}

Here’s the implementation of ResponseTimeHistory bean, which is responsible for storing measures and selecting the instance of service based on computed weight.

class ResponseTimeHistory(private val history: MutableMap> = mutableMapOf(),
                          val stats: MutableMap = mutableMapOf()) {

    private val logger: Logger = LoggerFactory.getLogger(ResponseTimeHistory::class.java)

    fun addNewMeasure(address: String, measure: Long) {
        var list: Queue? = history[address]
        if (list == null) {
            history[address] = LinkedList()
            list = history[address]
        }
        logger.info("Adding new measure for->{}, measure->{}", address, measure)
        if (measure == 0L)
            list!!.add(1L)
        else list!!.add(measure)
        if (list.size > 9)
            list.remove()
        stats[address] = countAvg(address)
        logger.info("Counting avg for->{}, stat->{}", address, stats[address])
    }

    private fun countAvg(address: String): Long {
        val list: Queue? = history[address]
        return list?.sum()?.div(list.size) ?: 0
    }

    fun getAddress(numberOfInstances: Int): String? {
        if (stats.size < numberOfInstances)
            return null
        var sum: Long = 0
        stats.forEach { sum += it.value }
        var r: Long = Random.nextLong(100)
        var current: Long = 0
        stats.forEach {
            val weight: Long = (sum - it.value)*100 / sum
            logger.info("Weight for->{}, value->{}, random->{}", it.key, weight, r)
            current += weight
            if (r <= current)
                return it.key
        }
        return null
    }

}

Customizing Spring Cloud Load Balancer

The implementation of our mechanism for weighted response time rule is ready, so the last step is to apply it to Spring Cloud Load Balancer. To do that we need to create a dedicated configuration class with ReactorLoadBalancer bean declaration as shown below.

class CustomCallmeClientLoadBalancerConfiguration(private val responseTimeHistory: ResponseTimeHistory) {

    @Bean
    fun loadBalancer(environment: Environment, loadBalancerClientFactory: LoadBalancerClientFactory):
            ReactorLoadBalancer {
        val name: String? = environment.getProperty("loadbalancer.client.name")
        return WeightedTimeResponseLoadBalancer(
                loadBalancerClientFactory.getLazyProvider(name, ServiceInstanceListSupplier::class.java),
                name!!, responseTimeHistory)
    }
}

The custom configuration may be passed to a load balancer using annotation @LoadBalancerClient. The name of client should be the same as registered in discovery. This part of code is currently commented out in the GitHub repository, so if you would like to enable it for testing just uncomment it.

@SpringBootApplication
@LoadBalancerClient(value = "inter-callme-service", configuration = [CustomCallmeClientLoadBalancerConfiguration::class])
class InterCallerServiceApplication {

    @Bean
    fun responseTimeHistory(): ResponseTimeHistory = ResponseTimeHistory()

    @Bean
    fun responseTimeInterceptor(): ResponseTimeInterceptor = ResponseTimeInterceptor(responseTimeHistory())

    // THE REST OF IMPLEMENTATION...
}

Customizing instance list supplier

Currently Spring Cloud Load Balancer does not support a static list of instances set in configuration properties (unlike Netflix Ribbon). We can easily add such a mechanism. The static list of instances for every service will be defined as shown below.

spring:
  application:
    name: inter-caller-service
  cloud:
    loadbalancer:
      ribbon:
        enabled: false
      instances:
        - name: inter-callme-service
          servers: localhost:59600, localhost:59800

As the first step, we should define a class that implements interface ServiceInstanceListSupplier and overrides two methods: getServiceId() and get(). The following implementation of ServiceInstanceListSupplier takes the list of service addresses from application properties through @ConfigurationProperties.

class StaticServiceInstanceListSupplier(private val properties: LoadBalancerConfigurationProperties,
                                        private val environment: Environment) : ServiceInstanceListSupplier {

    override fun getServiceId(): String = environment.getProperty("loadbalancer.client.name")!!

    override fun get(): Flux> {
        val serviceConfig: LoadBalancerConfigurationProperties.ServiceConfig? =
                properties.instances.find { it.name == serviceId }
        val list: MutableList =
                serviceConfig!!.servers.split(",", ignoreCase = false, limit = 0)
                        .map { StaticServiceInstance(serviceId, it) }.toMutableList()
        return Flux.just(list)
    }

}

Here’s the implementation of configuration class with properties.

@Configuration
@ConfigurationProperties("spring.cloud.loadbalancer")
class LoadBalancerConfigurationProperties {

    val instances: MutableList = mutableListOf()

    class ServiceConfig {
        var name: String = ""
        var servers: String = ""
    }

}

The same as for the previous sample we should also register our implementation of ServiceInstanceListSupplier as a bean inside custom configuration class.

class CustomCallmeClientLoadBalancerConfiguration) {

    @Bean
    fun discoveryClientServiceInstanceListSupplier(discoveryClient: ReactiveDiscoveryClient, environment: Environment,
        zoneConfig: LoadBalancerZoneConfig, context: ApplicationContext,
        properties: LoadBalancerConfigurationProperties): ServiceInstanceListSupplier {
        val delegate = StaticServiceInstanceListSupplier(properties, environment)
        val cacheManagerProvider = context.getBeanProvider(LoadBalancerCacheManager::class.java)
        return if (cacheManagerProvider.ifAvailable != null) {
            CachingServiceInstanceListSupplier(delegate, cacheManagerProvider.ifAvailable)
        } else delegate
    }
}

Testing Spring Cloud Load Balancer

To test the solution implemented for the purpose of this article you should:

Run the instance of discovery server (only if StaticServiceInstanceListSupplier is disabled)
Run two instances of inter-callme-service (for one selected instance activate random delay using VM parameter -Dspring.profiles.active=delay)
Run instance of inter-caller-service, which is available on port 8080
Send some test requests to inter-caller-service using command, for example curl -X POST http://localhost:8080/caller/random-send/12345

Our test scenario is visualized in the following picture.

Conclusion

Currently, Spring Cloud Load Balancer does not offer such many interesting features for inter-service communication as the Netflix Ribbon client. Of course, it is still being actively developed by the Spring Team. The good news is that we can easily customize Spring Cloud Load Balancer to add some custom features. In this article I demonstrated how to provide more advanced load balancing algorithms or create custom instances of list suppliers.

The post A Deep Dive Into Spring Cloud Load Balancer appeared first on Piotr's TechBlog.