spring-ai Archives - Piotr's TechBlog

Spring AI with External MCP Servers

piotr.minkowski — Fri, 06 Feb 2026 10:00:53 +0000

This article explains how to integrate Spring AI with external MCP servers that provide APIs for popular tools such as GitHub and SonarQube. Spring AI provides built-in support for MCP clients and servers. In this article, we will use only the Spring MCP client. If you are interested in more details on building MCP servers, please refer to the following post on my blog. MCP has recently become very popular, and you can easily find an MCP server implementation for almost any existing technology.

You can actually run MCP servers in many different ways. Ultimately, they are just ordinary applications whose task is to make a given tool available via an API compatible with the MCP protocol. The most popular AI IDE tools, such as Cloud Code, Codex, and Cursor, make it easy to run any MCP server. I will take a slightly unusual approach and use the support provided with Docker Desktop, namely the MCP Toolkit.

My idea for today is to build a simple Spring AI application that communicates with MCP servers for GitHub, SonarQube, and CircleCI to retrieve information about my repositories and projects hosted on those platforms. The Docker MCP Toolkit provides a single gateway that distributes incoming requests among running MCP servers. Let’s see how it works in practice!

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions. This repository contains several sample applications. The correct application for this article is in the spring-ai-mcp/external-mcp-sample-client directory.

Getting Started with Docker MCP Toolkit

First, run your Docker Desktop. You can find more than 300 popular MCP servers to run in the “Catalog” bookmark. Next, you should search for SonarQube, CircleCI, and GitHub Official servers (note that there are additional GitHub servers). To be honest, I encountered unexpected issues running the CircleCI server, so for now, I based the application on MCP communication with GitHub and SonarCloud.

Each MCP server usually requires configuration, such as your authorization token or service address. Therefore, before adding a server to Docker Toolkit, you must first configure it as described below. Only then should you click the “Add MCP server” button.

For the GitHub MCP server, in addition to entering the token itself, you must also authorize it via OAuth. Here, too, the MCP Toolkit provides graphical support. After entering the token, go to the “OAuth” tab to complete the process.

This is what your final result should look like before moving on to implementing the Spring Boot application. You have added two MCP servers, which together offer 65 tools.

To make both MCP servers available outside of Docker, you need to run the Docker MCP gateway. In the default stdio mode, the API is not exposed outside Docker. Therefore, you need to change the mode to streaming using the transport parameter, as shown below. The gateway is exposed on port 8811.

docker mcp gateway run --port 8811 --transport streaming

docker mcp gateway run --port 8811 --transport streaming

ShellSession

This is what it looks like after launch. Additionally, the Docker MCP gateway is secured by an API token. This will require appropriate settings on the MCP client side in the Spring AI application.

Integrate Spring AI with External MCP Clients

Prepare the MCP Client with Spring AI

Let’s move on to implementing our sample application. We need to include the Spring AI MCP client and the library that communicates with the LLM model. For me, it’s OpenAI, but you can use many other options available through Spring AI’s integration with popular chat models.

<dependencies>
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
  </dependency>
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-mcp-client-webflux</artifactId>
  </dependency>
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
  </dependency>
</dependencies>


  
    org.springframework.boot
    spring-boot-starter-web
  
  
    org.springframework.ai
    spring-ai-starter-mcp-client-webflux
  
  
    org.springframework.ai
    spring-ai-starter-model-openai

XML

Our MCP client must authenticate itself to the Docker MCP gateway using an API token. Therefore, we need to modify the Spring WebClient used by Spring AI to communicate with MCP servers. It is best to use the ExchangeFilterFunction interface to create an HTTP filter that adds the appropriate Authorization header with the bearer token to the outgoing request. The token will be injected from the application properties.

@Component
public class McpSyncClientExchangeFilterFunction implements ExchangeFilterFunction {

    @Value("${mcp.token}")
    private String token;

    @Override
    public Mono<ClientResponse> filter(ClientRequest request, 
                                       ExchangeFunction next) {

            var requestWithToken = ClientRequest.from(request)
                    .headers(headers -> headers.setBearerAuth(token))
                    .build();
            return next.exchange(requestWithToken);

    }

}

@Component
public class McpSyncClientExchangeFilterFunction implements ExchangeFilterFunction {

    @Value("${mcp.token}")
    private String token;

    @Override
    public Mono<ClientResponse> filter(ClientRequest request, 
                                       ExchangeFunction next) {

            var requestWithToken = ClientRequest.from(request)
                    .headers(headers -> headers.setBearerAuth(token))
                    .build();
            return next.exchange(requestWithToken);

    }

}

Java

Then, let’s set the previously implemented filter for the default WebClient builder.

@SpringBootApplication
public class ExternalMcpSampleClient {

    public static void main(String[] args) {
        SpringApplication.run(ExternalMcpSampleClient.class, args);
    }

    @Bean
    WebClient.Builder webClientBuilder(McpSyncClientExchangeFilterFunction filterFunction) {
        return WebClient.builder()
                .filter(filterFunction);
    }
}

@SpringBootApplication
public class ExternalMcpSampleClient {

    public static void main(String[] args) {
        SpringApplication.run(ExternalMcpSampleClient.class, args);
    }

    @Bean
    WebClient.Builder webClientBuilder(McpSyncClientExchangeFilterFunction filterFunction) {
        return WebClient.builder()
                .filter(filterFunction);
    }
}

Java

After that, we must configure the MCP gateway address and token in the application properties. To achieve that, we must use the spring.ai.mcp.client.streamable-http.connections property. The MCP gateway listens on port 8811. The token value will be read from the MCP_TOKEN environment variable.

spring.ai.mcp.client.streamable-http.connections:
  docker-mcp-gateway:
    url: http://localhost:8811

mcp.token: ${MCP_TOKEN}

spring.ai.mcp.client.streamable-http.connections:
  docker-mcp-gateway:
    url: http://localhost:8811

mcp.token: ${MCP_TOKEN}

YAML

Implement Application Logic with Spring AI and OpenAI Support

The concept behind the sample application is quite simple. It involves creating a @RestController per tool provided by each MCP server. For each, I will create a simple prompt to request the number of repositories or projects in my account on a given platform. Let’s start with SonCloud. Each implementation uses the Spring AI ToolCallbackProvider bean to enable the available MCP server to communicate with the LLM model.

@RestController
@RequestMapping("/sonarcloud")
public class SonarCloudController {

    private final static Logger LOG = LoggerFactory
        .getLogger(SonarCloudController.class);
    private final ChatClient chatClient;

    public SonarCloudController(ChatClient.Builder chatClientBuilder,
                                ToolCallbackProvider tools) {
        this.chatClient = chatClientBuilder
                .defaultToolCallbacks(tools)
                .build();
    }

    @GetMapping("/count")
    String countRepositories() {
        PromptTemplate pt = new PromptTemplate("""
                How many projects in Sonarcloud do I have ?
                """);
        Prompt p = pt.create();
        return this.chatClient.prompt(p)
                .call()
                .content();
    }

}

@RestController
@RequestMapping("/sonarcloud")
public class SonarCloudController {

    private final static Logger LOG = LoggerFactory
        .getLogger(SonarCloudController.class);
    private final ChatClient chatClient;

    public SonarCloudController(ChatClient.Builder chatClientBuilder,
                                ToolCallbackProvider tools) {
        this.chatClient = chatClientBuilder
                .defaultToolCallbacks(tools)
                .build();
    }

    @GetMapping("/count")
    String countRepositories() {
        PromptTemplate pt = new PromptTemplate("""
                How many projects in Sonarcloud do I have ?
                """);
        Prompt p = pt.create();
        return this.chatClient.prompt(p)
                .call()
                .content();
    }

}

Java

Below is a very similar implementation for GitHub MCP. This controller is exposed under the /github context path.

@RestController
@RequestMapping("/github")
public class GitHubController {

    private final static Logger LOG = LoggerFactory
        .getLogger(GitHubController.class);
    private final ChatClient chatClient;

    public GitHubController(ChatClient.Builder chatClientBuilder,
                            ToolCallbackProvider tools) {
        this.chatClient = chatClientBuilder
                .defaultToolCallbacks(tools)
                .build();
    }

    @GetMapping("/count")
    String countRepositories() {
        PromptTemplate pt = new PromptTemplate("""
                How many repositories in GitHub do I have ?
                """);
        Prompt p = pt.create();
        return this.chatClient.prompt(p)
                .call()
                .content();
    }

}

@RestController
@RequestMapping("/github")
public class GitHubController {

    private final static Logger LOG = LoggerFactory
        .getLogger(GitHubController.class);
    private final ChatClient chatClient;

    public GitHubController(ChatClient.Builder chatClientBuilder,
                            ToolCallbackProvider tools) {
        this.chatClient = chatClientBuilder
                .defaultToolCallbacks(tools)
                .build();
    }

    @GetMapping("/count")
    String countRepositories() {
        PromptTemplate pt = new PromptTemplate("""
                How many repositories in GitHub do I have ?
                """);
        Prompt p = pt.create();
        return this.chatClient.prompt(p)
                .call()
                .content();
    }

}

Java

Finally, there is the controller implementation for CircleCI MCP. It is available externally under the /circleci context path.

@RestController
@RequestMapping("/circleci")
public class CircleCIController {

    private final static Logger LOG = LoggerFactory
        .getLogger(CircleCIController.class);
    private final ChatClient chatClient;

    public CircleCIController(ChatClient.Builder chatClientBuilder,
                              ToolCallbackProvider tools) {
        this.chatClient = chatClientBuilder
                .defaultToolCallbacks(tools)
                .build();
    }

    @GetMapping("/count")
    String countRepositories() {
        PromptTemplate pt = new PromptTemplate("""
                How many projects in CircleCI do I have ?
                """);
        Prompt p = pt.create();
        return this.chatClient.prompt(p)
                .call()
                .content();
    }

}

@RestController
@RequestMapping("/circleci")
public class CircleCIController {

    private final static Logger LOG = LoggerFactory
        .getLogger(CircleCIController.class);
    private final ChatClient chatClient;

    public CircleCIController(ChatClient.Builder chatClientBuilder,
                              ToolCallbackProvider tools) {
        this.chatClient = chatClientBuilder
                .defaultToolCallbacks(tools)
                .build();
    }

    @GetMapping("/count")
    String countRepositories() {
        PromptTemplate pt = new PromptTemplate("""
                How many projects in CircleCI do I have ?
                """);
        Prompt p = pt.create();
        return this.chatClient.prompt(p)
                .call()
                .content();
    }

}

Java

The last controller implementation is a bit more complex. First, I need to instruct the LLM model to generate project names in SonarQube and specify my GitHub username. This will not be part of the main prompt. Rather, it will be the system role, which guides the AI’s behavior and response style. Therefore, I’ll create the SystemPromptTemplate first. The user role prompt accepts an input parameter specifying the name of my GitHub repository. The response should combine data on the last commit in a given repository with the status of the most recent SonarQube analysis. In this case, the LLM will need to communicate with two MCP servers running with Docker MCP simultaneously.

@RestController
@RequestMapping("/global")
public class GlobalController {

    private final static Logger LOG = LoggerFactory
        .getLogger(CircleCIController.class);
    private final ChatClient chatClient;

    public GlobalController(ChatClient.Builder chatClientBuilder,
                            ToolCallbackProvider tools) {
        this.chatClient = chatClientBuilder
                .defaultToolCallbacks(tools)
                .build();
    }

    @GetMapping("/status/{repo}")
    String repoStatus(@PathVariable String repo) {
        SystemPromptTemplate st = new SystemPromptTemplate("""
                My username in GitHub is piomin.
                Each my project key in SonarCloud contains the prefix with my organization name and _ char.
                """);
        var stMsg = st.createMessage();

        PromptTemplate pt = new PromptTemplate("""
                When was the last commit made in my GitHub repository {repo} ?
                What is the latest analyze status in SonarCloud for that repo ?
                """);
        var usMsg = pt.createMessage(Map.of("repo", repo));

        Prompt prompt = new Prompt(List.of(usMsg, stMsg));
        return this.chatClient.prompt(prompt)
                .call()
                .content();
    }
}

@RestController
@RequestMapping("/global")
public class GlobalController {

    private final static Logger LOG = LoggerFactory
        .getLogger(CircleCIController.class);
    private final ChatClient chatClient;

    public GlobalController(ChatClient.Builder chatClientBuilder,
                            ToolCallbackProvider tools) {
        this.chatClient = chatClientBuilder
                .defaultToolCallbacks(tools)
                .build();
    }

    @GetMapping("/status/{repo}")
    String repoStatus(@PathVariable String repo) {
        SystemPromptTemplate st = new SystemPromptTemplate("""
                My username in GitHub is piomin.
                Each my project key in SonarCloud contains the prefix with my organization name and _ char.
                """);
        var stMsg = st.createMessage();

        PromptTemplate pt = new PromptTemplate("""
                When was the last commit made in my GitHub repository {repo} ?
                What is the latest analyze status in SonarCloud for that repo ?
                """);
        var usMsg = pt.createMessage(Map.of("repo", repo));

        Prompt prompt = new Prompt(List.of(usMsg, stMsg));
        return this.chatClient.prompt(prompt)
                .call()
                .content();
    }
}

Java

Before running the app, we must set two required environment variables that contain the OpenAI and Docker MCP gateway tokens.

export MCP_TOKEN=by1culxc6sctmycxtyl9xh7499mb8pctbsdb3brha1hvmm4d8l
export SPRING_AI_OPENAI_API_KEY=<YOUR_OPEN_AI_TOKEN>

export MCP_TOKEN=by1culxc6sctmycxtyl9xh7499mb8pctbsdb3brha1hvmm4d8l
export SPRING_AI_OPENAI_API_KEY=

Plaintext

Finally, we can run our Spring Boot app with the following command.

mvn spring-boot:run

mvn spring-boot:run

ShellSession

Firstly, I’m going to ask about the number of my GitHub repositories.

curl http://localhost:8080/github/count

curl http://localhost:8080/github/count

ShellSession

Then, I can check the number of projects in my SonarCloud account.

curl http://localhost:8080/github/sonarcloud

curl http://localhost:8080/github/sonarcloud

ShellSession

Finally, I can choose a specific repository and verify the last commit and the current analysis status in SonarCloud.

curl http://localhost:8080/global/status/sample-spring-boot-kafka

curl http://localhost:8080/global/status/sample-spring-boot-kafka

ShellSession

Here’s the LLM answer for my sample-spring-boot-kafka repository. You can perform the same exercise for your repositories and projects.

Conclusion

Spring AI, combined with the MCP client, opens a powerful path toward building truly tool-aware AI applications. By using the Docker MCP Gateway, we can easily host and manage MCP servers such as GitHub or SonarQube consistently and reproducibly, without tightly coupling them to our application runtime. Docker provides a user-friendly interface for managing MCP servers, giving users access to everything through a single MCP gateway. This approach appears to have advantages, particularly during application development.

The post Spring AI with External MCP Servers appeared first on Piotr's TechBlog.

OpenShift AI with vLLM and Spring AI

piotr.minkowski — Mon, 12 May 2025 06:47:19 +0000

This article will teach you how to use OpenShift AI and vLLM to serve models used by the Spring AI application. To run the model on OpenShift AI, we will use a solution called KServe ModelCar. It can serve models directly from a container without using the S3 bucket. KServe is a standard, cloud-agnostic Model Inference Platform designed to serve predictive and generative AI models on Kubernetes. OpenShift AI includes a single model serving platform based on the KServe component. We can serve models on the single-model serving platform using model-serving runtimes. OpenShift AI includes several preinstalled runtimes. However, only the vLLM runtime is compatible with the OpenAI REST API. Therefore, we will use this one.

Previously, I published several articles about Spring AI with examples of using different AI models. Therefore, I will not focus on the introduction to Spring AI. For example, you can read about integration between Spring AI and Azure AI in the following post. Please refer to the following article for a quick intro to the Spring AI project.

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Prerequisites

Create the OpenShift Cluster

For this exercise, you will need a relatively large OpenShift cluster. At least one of the cluster’s nodes must have a GPU. I created a cluster on AWS with one node on a g4dn.12xlarge machine. On OpenShift, you can achieve this by creating the MachineSet object that creates nodes using the appropriate virtual machine available on AWS.

Install Required Operators

Next, install and configure several operators on the cluster. Begin with the “Node Feature Discovery” operator. On OpenShift, this operator enables automatic discovery of cluster nodes with features such as GPUs. After installing the operator, create the NodeFeatureDiscovery object. The default values set by the OpenShift console during object creation are sufficient.

The operator’s task is to mark the node with the detected GPU using the appropriate label. The label is feature.node.kubernetes.io/pci-10de.present=true. After configuring the operator, verify that the correct GPU has been detected.

$ oc get node -l feature.node.kubernetes.io/pci-10de.present=true
NAME                                        STATUS   ROLES    AGE   VERSION
ip-10-0-45-120.us-east-2.compute.internal   Ready    worker   15d   v1.31.6

ShellSession

Next, install the NVIDIA GPU Operator. This operator automatically installs, configures, and manages NVIDIA drivers and tools on nodes with NVIDIA graphics cards. This allows OpenShift to recognize the GPU as a resource that can be declared in pods. This will enable OpenShift to work with the “Node Feature Discovery” operator to label nodes with GPUs. The NVIDIA GPU operator uses the feature.node.kubernetes.io/pci-10de.present=true label to determine where to install the drivers. For this to happen, the ClusterPolicy object must be created. As before, you can use the default values generated by the OpenShift Console when creating this object.

The OpenShift AI feature for serving AI models requires the installation of OpenShift Serverless and OpenShift Service Mesh operators. The key solution here is KServe. KServe uses Knative to scale models on demand and integrates with Istio to secure model routing and versioning.

The final step in this phase is to install the OpenShift AI Operator and create the DataScienceCluster object. If the previous installations were successful, everything will be configured automatically after creating the DataScienceCluster object. For instance, OpenShift AI will make the Istio control plane and the Knative Serving component.

OpenShift AI creates several namespaces within a cluster. The most important is the redhat-ods-applications namespace, where most components comprising the entire solution are run.

$ oc get pod -n redhat-ods-applications
NAME                                                              READY   STATUS    RESTARTS   AGE
authorino-767bd64465-fq8bl                                        1/1     Running   0          15d
codeflare-operator-manager-5c69778b87-wxcwp                       1/1     Running   0          15d
data-science-pipelines-operator-controller-manager-6686587wcmkr   1/1     Running   0          15d
etcd-549d769449-hqzwt                                             1/1     Running   0          15d
kserve-controller-manager-85f9b8d66f-qpxbf                        1/1     Running   0          15d
kuberay-operator-8d77dcf84-qgsq5                                  1/1     Running   0          15d
kueue-controller-manager-7c895bd669-467nk                         1/1     Running   0          6h8m
modelmesh-controller-7f9dd5f848-ljlxp                             1/1     Running   0          15d
modelmesh-controller-7f9dd5f848-qqsl8                             1/1     Running   0          24d
modelmesh-controller-7f9dd5f848-txlhd                             1/1     Running   0          24d
notebook-controller-deployment-86f5b87585-p6nz5                   1/1     Running   0          15d
odh-model-controller-574ff4657-q75gr                              1/1     Running   0          15d
odh-notebook-controller-manager-9d754d5f-2ptk9                    1/1     Running   0          15d
rhods-dashboard-5b96595667-79tx6                                  2/2     Running   0          15d
rhods-dashboard-5b96595667-8m52g                                  2/2     Running   0          15d
rhods-dashboard-5b96595667-kx7p4                                  2/2     Running   0          15d
rhods-dashboard-5b96595667-nn2cf                                  2/2     Running   0          15d
rhods-dashboard-5b96595667-ttcht                                  2/2     Running   0          15d
trustyai-service-operator-controller-manager-bd9fbdb6d-kcd57      1/1     Running   0          15d

ShellSession

Configure and Use OpenShift AI

After installing OpenShift AI on a cluster, you can use its graphical UI. To access it, select “Red Hat OpenShift AI” from the menu at the top of the page.

After selecting the indicated option, you will be redirected to the following page. This page allows you to configure and use OpenShift AI on a cluster. The first step is to select a namespace on the cluster for the AI project. In my case, the namespace is ai.

To run an AI model on a cluster, choose how to serve it first. You can choose between a single-model serving platform and a multi-model serving platform. With the former, each model is deployed on its model server. Multiple models can be deployed on a single shared server with multi-model platforms. This article will use the first option: a single-model serving platform.

The next step is to create an acceleration profile. This profile should be created automatically after installing and configuring the NVIDIA GPU Operator. If, for some reason, it was not, you can easily create it manually. When creating this object, enter the nvidia.com/gpu value in the identifier field.

You can either click on the profile from the UI or create it using the YAML manifest.

apiVersion: dashboard.opendatahub.io/v1
kind: AcceleratorProfile
metadata:
  name: nvidia
  namespace: redhat-ods-applications
spec:
  displayName: nvidia
  enabled: true
  identifier: nvidia.com/gpu

YAML

Serve Model on OpenShift AI with vLLM

Create ServingRuntime Resource

In the previous step, we configured OpenShift AI to deploy the model with a single-model serving platform and a GPU accelerator. We will use KServe’s ModelCar functionality to deploy the model, which allows us to serve models directly from a container. This functionality is described in an article published on the Red Hat Developer blog. The article demonstrates how to build an image containing a model downloaded from the Hugging Face Hub. In turn, we will use images that have already been built and are available in the quay.io/repository/redhat-ai-services/modelcar-catalog repository. You can find ready-made images for AI models such as Granite and Llama.

To run a model on OpenShift AI in single-model serving runtime mode, you must define two objects: ServingRuntime and InferenceService. According to the OpenShift AI documentation, the ServingRuntime CR creates a serving runtime, an environment for deploying and managing a model. Here’s the ServingRuntime object that creates a runtime for the Llama 3.2 AI model. The annotation opendatahub.io/recommended-accelerators sets the name of the recommended accelerator to use with the runtime. Its value should be identical to the identifier field in the AcceleratorProfile object (1). The openshift.io/display-name annotation keeps the name with which the serving runtime is displayed (2). The spec.containers.image field indicates the runtime container image used by the serving runtime (3). This image differs depending on the type of accelerator used. Finally, the ServingRuntime object specifies that the single-model serving is used (4) and the vLLM model is supported by the runtime (5).

apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
  annotations:
    opendatahub.io/recommended-accelerators: '["nvidia.com/gpu"]' # (1)
    openshift.io/display-name: vLLM ServingRuntime for KServe # (2)
  labels:
    opendatahub.io/dashboard: "true"
  name: llama-32-3b-instruct
spec:
  annotations:
    prometheus.io/path: /metrics 
    prometheus.io/port: "8080" 
  containers :
    - args:
        - --port=8080
        - --model=/mnt/models 
        - --served-model-name={{.Name}} 
      command: 
        - python
        - '-m'
        - vllm.entrypoints.openai.api_server
      env:
        - name: HF_HOME
          value: /tmp/hf_home
      # (3)
      image:
quay.io/modh/vllm@sha256:0d55419f3d168fd80868a36ac89815dded9e063937a8409b7edf3529771383f3
    name: kserve-container
    ports:
      - containerPort: 8080
        protocol: TCP
  multiModel: false # (4)
  supportedModelFormats: # (5) 
    - autoSelect: true
      name: vLLM

YAML

Create InterferenceService Resource

The InferenceService CRD creates a server or inference service that processes inference queries, passes them to the model, and returns the inference output. Here’s the InferenceService object related to the previously created llama-32-3b-instruct runtime (1). It must define some vLLM parameters to successfully run the model on the existing infrastructure and enable tool calling support on the Llama 3.2 model (2). The InferenceService object specifies the image containing the Llama 3.2 model, published in the the quay.io/redhat-ai-services/modelcar-catalog:llama-3.2-3b-instruct repository (3). Alternatively, you can create your image, publish it in the custom registry, and run it on OpenShift using InferenceService CR.

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  annotations:
    openshift.io/display-name: llama-32-3b-instruct
    serving.knative.openshift.io/enablePassthrough: 'true'
    serving.kserve.io/deploymentMode: Serverless
    sidecar.istio.io/inject: 'true'
    sidecar.istio.io/rewriteAppHTTPProbers: 'true'
  name: llama-32-3b-instruct # (1)
  labels:
    opendatahub.io/dashboard: 'true'
spec:
  predictor:
    maxReplicas: 1
    minReplicas: 1
    model: # (2)
      args:
        - '--dtype=half'
        - '--max_model_len=8192'
        - '--gpu_memory_utilization=.95'
        - '--enable-auto-tool-choice'
        - '--tool_call_parser=llama3_json'
      modelFormat:
        name: vLLM
      name: ''
      resources:
        limits:
          cpu: '8'
          memory: 10Gi
          nvidia.com/gpu: '1'
        requests:
          cpu: '4'
          memory: 8Gi
          nvidia.com/gpu: '1'
      runtime: llama-32-3b-instruct
      storageUri: 'oci://quay.io/redhat-ai-services/modelcar-catalog:llama-3.2-3b-instruct' # (3)

YAML

Deploy with OpenShift AI

You can also create the same configuration using the OpenShift AI UI. The diagram below shows the settings you need for Granite 3.2.

The OpenShift AI UI lists all the models running in a given AI project. You can check the endpoint where a particular model is available. In this case, two models are running in the AI project: Llama 3.2 and Granite 3.2. Both models are available internally on the cluster and externally via the Knative Route object.

Both models are automatically exposed on the node with the GPU. You can check the GPU resource reservations on a node using the oc describe command:

A single-model serving platform runs AI models as the Knative Service. You can use the oc get ksvc command to display a list of Knative services running in the ai namespace.

$ oc get ksvc -n ai
NAME                               URL                                                                                 LATESTCREATED                            LATESTREADY                              READY   REASON
granite-32-2b-instruct-predictor   https://granite-32-2b-instruct-predictor-ai.apps.piomin.ewyw.p1.openshiftapps.com   granite-32-2b-instruct-predictor-00007   granite-32-2b-instruct-predictor-00007   True    
llama-32-3b-instruct-predictor     https://llama-32-3b-instruct-predictor-ai.apps.piomin.ewyw.p1.openshiftapps.com     llama-32-3b-instruct-predictor-00002     llama-32-3b-instruct-predictor-00002     True

ShellSession

Integrate Spring AI with vLLM

Dependencies and Properties

The vLLM runtime is compatible with the OpenAI REST API. To integrate our sample Spring Boot application with a model running on vLLM, we must use the standard Spring AI OpenAI starter. The app in the spring-ai-showcase repository has more functionality than what is tested in this article. In simplified terms, the list of dependencies needed for the app to communicate with the OpenAI API and the model running on OpenShift AI is below.


  org.springframework.boot
  spring-boot-starter-web


  org.springframework.ai
  spring-ai-starter-model-openai


  org.springframework.ai
  spring-ai-autoconfigure-model-openai

XML

Although the model itself served on OpenShift AI does not require authorization with an API key the spring.ai.openai.api-key Spring AI parameter must be set. The endpoint’s address provided through the vLLM runtime must be specified in the spring.ai.openai.chat.base-url parameter. The default name of the model used must also be overwritten with the name under which the model was run on OpenShift AI. This name is for Llama 3.2 llama-32-3b-instruct. Below is a list of all the Spring Boot settings required for vLLM integration, which is available in the application-vllm.properties file.

spring.ai.openai.api-key = ${OPENAI_API_KEY:dummy}
spring.ai.openai.chat.base-url = https://llama-32-3b-instruct-ai.apps.piomin.ewyw.p1.openshiftapps.com
spring.ai.openai.chat.options.model = llama-32-3b-instruct

Plaintext

Implementation with Spring AI

The code below demonstrates how @RestController implements communication between the application and the target AI model. The @RestController class injects an auto-configured ChatClient.Builder to create an instance of ChatClient. The PersonController class implements a method for returning a list of persons from the GET /persons endpoint. The main goal is to generate a list of 10 objects with the fields defined in the Person class. The id field should be auto-incremented. The PromptTemplate object defines a message that will be sent to the chat model AI API. It doesn’t have to specify the exact fields that should be returned. This part is handled automatically by the Spring AI library after we invoke the entity() method on the ChatClient instance. The ParameterizedTypeReference object inside the entity method tells Spring AI to generate a list of objects.

findAll() { PromptTemplate pt = new PromptTemplate(""" Return a current list of 10 persons if exists or generate a new list with random values. Each object should contain an auto-incremented id field. The age value should be a random number between 18 and 99. Do not include any explanations or additional text. Return data in RFC8259 compliant JSON format. """); return this.chatClient.prompt(pt.create()) .call() .entity(new ParameterizedTypeReference<>() {}); } @GetMapping("/{id}") Person findById(@PathVariable String id) { PromptTemplate pt = new PromptTemplate(""" Find and return the object with id {id} in a current list of persons. """); Prompt p = pt.create(Map.of("id", id)); return this.chatClient.prompt(p) .call() .entity(Person.class); } }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

@RestController
@RequestMapping("/persons")
public class PersonController {

    private final ChatClient chatClient;

    public PersonController(ChatClient.Builder chatClientBuilder,
                            ChatMemory chatMemory) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(
                        new PromptChatMemoryAdvisor(chatMemory),
                        new SimpleLoggerAdvisor())
                .build();
    }

    @GetMapping
    List<Person> findAll() {
        PromptTemplate pt = new PromptTemplate("""
                Return a current list of 10 persons if exists or generate a new list with random values.
                Each object should contain an auto-incremented id field.
                The age value should be a random number between 18 and 99.
                Do not include any explanations or additional text.
                Return data in RFC8259 compliant JSON format.
                """);

        return this.chatClient.prompt(pt.create())
                .call()
                .entity(new ParameterizedTypeReference<>() {});
    }

    @GetMapping("/{id}")
    Person findById(@PathVariable String id) {
        PromptTemplate pt = new PromptTemplate("""
                Find and return the object with id {id} in a current list of persons.
                """);
        Prompt p = pt.create(Map.of("id", id));
        return this.chatClient.prompt(p)
                .call()
                .entity(Person.class);
    }
}

Java

The llama-32-3b-instruct model uses a “tool-calling” approach for API calls. You can read more about it in one of my Spring AI articles, which are available at this link. For instance, the class below implements the @Tool annotation, connecting to the database and searching it for a list of shares for individual companies. The key to using this tool is its description in the description field, which is then appropriately interpreted by the LLM model.

getNumberOfShares() { return (List) walletRepository.findAll(); } }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

public class WalletTools {

    private WalletRepository walletRepository;

    public WalletTools(WalletRepository walletRepository) {
        this.walletRepository = walletRepository;
    }

    @Tool(description = "Number of shares for each company in my wallet")
    public List<Share> getNumberOfShares() {
        return (List<Share>) walletRepository.findAll();
    }
}

Java

Then, the @Tool reference is set to the chat client when it interacts with the AI model. The AI model can call the tool’s method as required based on the tool’s description and the input prompt’s content.

@RestController
@RequestMapping("/wallet")
public class WalletController {

    private final ChatClient chatClient;
    private final StockTools stockTools;
    private final WalletTools walletTools;

    public WalletController(ChatClient.Builder chatClientBuilder,
                            StockTools stockTools,
                            WalletTools walletTools) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        this.stockTools = stockTools;
        this.walletTools = walletTools;
    }
    
    @GetMapping("/with-tools")
    String calculateWalletValueWithTools() {
        PromptTemplate pt = new PromptTemplate("""
        What’s the current value in dollars of my wallet based on the latest stock daily prices ?
        """);

        return this.chatClient.prompt(pt.create())
                .tools(stockTools, walletTools)
                .call()
                .content();
    }
    
}

Java

Run Spring Boot Application

Activate the vllm profile when launching the Spring Boot application. This will cause the application to read the settings entered in the application-vllm.properties file.

mvn spring-boot:run -Dspring-boot.run.profiles=vllm

ShellSession

Once the application runs, you will call all three endpoints implemented in the previously discussed code snippets. These endpoints are:

GET /persons
GET /persons/{id}
GET /wallet/with-tools

Once launched, the application can be accessed locally on 8080 port.

$ curl http://localhost:8080/persons
$ curl http://localhost:8080/persons/1
$ curl http://localhost:8080/wallet/with-tools

ShellSession

Alternatively, you can deploy the Spring Boot application on OpenShift and expose it outside the cluster with the Route object. The simplest way to achieve that is through the odo CLI tool. You can find more details about odo in the following post. To deploy the app with odo run the following command:

odo dev

ShellSession

After that, the application should be deployed in the selected namespace and available for testing on the 20001 local port, thanks to the port-forwarding feature.

Here’s the example output:

Final Thoughts

This article demonstrates the simplest way to integrate a Java application with an AI model running on OpenShift via an OpenAI-compliant interface. Preparing and exposing such a model to OpenShift AI requires several steps, such as installing and configuring Kubernetes operators. However, KServe’s ModelCar approach standardizes the entire process, making AI models available as containers.

The post OpenShift AI with vLLM and Spring AI appeared first on Piotr's TechBlog.

Spring AI with Azure OpenAI

piotr.minkowski — Tue, 25 Mar 2025 16:02:02 +0000

This article will show you how to use Spring AI features like chat client memory, multimodality, tool calling, or embedding models with the Azure OpenAI service. Azure OpenAI is supported in almost all Spring AI use cases. Moreover, it goes beyond standard OpenAI capabilities, providing advanced AI-driven text generation and incorporating additional AI safety and responsible AI features. It also enables the integration of AI-focused resources, such as Vector Stores on Azure.

This is the eighth part of my series of articles about Spring Boot and AI. It is worth reading the following posts before proceeding with the current one. Here’s a list of articles about Spring AI on my blog with a short description:

https://piotrminkowski.com/2025/01/28/getting-started-with-spring-ai-and-chat-model: The first tutorial introduces the Spring AI project and its support for building applications based on chat models like OpenAI or Mistral AI.
https://piotrminkowski.com/2025/01/30/getting-started-with-spring-ai-function-calling: The second tutorial shows Spring AI support for Java function calling with the OpenAI chat model.
https://piotrminkowski.com/2025/02/24/using-rag-and-vector-store-with-spring-ai: The third tutorial shows Spring AI support for RAG (Retrieval Augmented Generation) and vector store.
https://piotrminkowski.com/2025/03/04/spring-ai-with-multimodality-and-images: The fourth tutorial shows Spring AI support for a multimodality feature and image generation
https://piotrminkowski.com/2025/03/10/using-ollama-with-spring-ai: The fifth tutorial shows Spring AI support for interactions with AI models run with Ollama
https://piotrminkowski.com/2025/03/13/tool-calling-with-spring-ai: The sixth tutorial shows Spring AI support for the Tool Calling feature.
https://piotrminkowski.com/2025/03/17/using-model-context-protocol-mcp-with-spring-ai: The seventh tutorial shows Spring AI support for MCP (Model Context Protocol) in Spring Boot server-side and client-side applications

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Enable and Configure Azure OpenAI

You need to begin the exercise by creating an instance of the Azure OpenAI service. The most crucial element here is the service’s name since it is part of the exposed Open AI endpoint. My service’s name is piomin-azure-openai.

The Azure OpenAI service should be exposed without restrictions to allow easy access to the Spring AI app.

After creating the service, go to its main page in the Azure Portal. It provides information about API keys and an endpoint URL. Also, you have to deploy an Azure OpenAI model to start making API calls from your Spring AI app.

Copy the key and the endpoint URL and save them for later usage.

You must create a new deployment with an AI model in the Azure AI Foundry portal. There are several available options. The Spring AI Azure OpenAI starter by default uses the gpt-4o model. If you choose another AI model, you will have to set its name in the spring.ai.azure.openai.chat.options.deployment-name Spring AI property. After selecting the preferred model, click the “Confirm” button.

Finally, you can deploy the model on the Azure AI Foundry portal. Choose the most suitable deployment type for your needs.

Azure allows us to deploy multiple models. You can verify a list of model deployments here:

That’s all on the Azure Portal side. Now it’s time for the implementation part in the application source code.

Enable Azure OpenAI for Spring AI

Spring AI provides the Spring Boot starter for the Azure OpenAI Chat Client. You must add the following dependency to your Maven pom.xml file. Since the sample Spring Boot application is portable across various AI models, it includes the Azure OpenAI starter only if the azure-ai profile is active. Otherwise, it uses the spring-ai-openai-spring-boot-starter library.


  azure-ai
  
    
      org.springframework.ai
      spring-ai-azure-openai-spring-boot-starter

XML

It’s time to use the key you previously copied from the Azure OpenAI service page. Let’s export it as the AZURE_OPENAI_API_KEY environment variable.

export AZURE_OPENAI_API_KEY=

ShellSession

Here are the application properties dedicated to the azure-ai Spring Boot profile. The previously exported AZURE_OPENAI_API_KEY environment variable is set as the spring.ai.azure.openai.api-key property. You also must set the OpenAI service endpoint. This address depends on your Azure OpenAI service name.

spring.ai.azure.openai.api-key = ${AZURE_OPENAI_API_KEY}
spring.ai.azure.openai.endpoint = https://piomin-azure-openai.openai.azure.com/

application-azure-ai.properties

To run the application and connect to your instance of the Azure OpenAI service, you must activate the azure-ai Maven profile and the Spring Boot profile under the same name. Here’s the required command:

mvn spring-boot:run -Pazure-ai -Dspring-boot.run.profiles=azure-ai

ShellSession

Test Spring AI Features with Azure OpenAI

I described several Spring AI features in the previous articles from this series. In each section, I will briefly mention the tested feature with a fragment of the sample source code. Please refer to my previous posts for more details about each feature and its sample implementation.

Chat Client with Memory and Structured Output

Here’s the @RestController containing endpoints we will use in these tests.

@RestController
@RequestMapping("/persons")
public class PersonController {

    private final ChatClient chatClient;

    public PersonController(ChatClient.Builder chatClientBuilder,
                            ChatMemory chatMemory) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(
                        new PromptChatMemoryAdvisor(chatMemory),
                        new SimpleLoggerAdvisor())
                .build();
    }

    @GetMapping
    List<Person> findAll() {
        PromptTemplate pt = new PromptTemplate("""
                Return a current list of 10 persons if exists or generate a new list with random values.
                Each object should contain an auto-incremented id field.
                The age value should be a random number between 18 and 99.
                Do not include any explanations or additional text.
                Return data in RFC8259 compliant JSON format.
                """);

        return this.chatClient.prompt(pt.create())
                .call()
                .entity(new ParameterizedTypeReference<>() {});
    }

    @GetMapping("/{id}")
    Person findById(@PathVariable String id) {
        PromptTemplate pt = new PromptTemplate("""
                Find and return the object with id {id} in a current list of persons.
                """);
        Prompt p = pt.create(Map.of("id", id));
        return this.chatClient.prompt(p)
                .call()
                .entity(Person.class);
    }
}

Java

First, you must call the endpoint that generates a list of ten persons from different countries. Then choose one person by ID to pick it up from the chat memory. Here are the results.

The interesting part happens in the background. Here’s a fragment of advice context added to the prompt by Spring AI.

Tool Calling

Here’s the @RestController containing endpoints we will use in these tests. There are two tools injected into the chat client: StockTools and WalletTools. These tools interact with a local H2 database to get a sample stock wallet structure and with the stock online API to load the latest share prices.

@RestController
@RequestMapping("/wallet")
public class WalletController {

    private final ChatClient chatClient;
    private final StockTools stockTools;
    private final WalletTools walletTools;

    public WalletController(ChatClient.Builder chatClientBuilder,
                            StockTools stockTools,
                            WalletTools walletTools) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        this.stockTools = stockTools;
        this.walletTools = walletTools;
    }

    @GetMapping("/with-tools")
    String calculateWalletValueWithTools() {
        PromptTemplate pt = new PromptTemplate("""
        What’s the current value in dollars of my wallet based on the latest stock daily prices ?
        """);

        return this.chatClient.prompt(pt.create())
                .tools(stockTools, walletTools)
                .call()
                .content();
    }

    @GetMapping("/highest-day/{days}")
    String calculateHighestWalletValue(@PathVariable int days) {
        PromptTemplate pt = new PromptTemplate("""
        On which day during last {days} days my wallet had the highest value in dollars based on the historical daily stock prices ?
        """);

        return this.chatClient.prompt(pt.create(Map.of("days", days)))
                .tools(stockTools, walletTools)
                .call()
                .content();
    }
}

Java

You must have your API key for the Twelvedata service to run these tests. Don’t forget to export it as the STOCK_API_KEY environment variable before running the app.

export STOCK_API_KEY=<YOUR_STOCK_API_KEY>

Java

The GET /wallet/with-tools endpoint calculates the current stock wallet value in dollars.

The GET /wallet/highest-day/{days} computes the value of the stock wallet for a given period in days and identifies the day with the highest value.

Multimodality and Images

Here’s a part of the @RestController responsible for describing image content and generating a new image with a given item.

imageModel) { this.chatClient = chatClientBuilder .defaultAdvisors(new SimpleLoggerAdvisor()) .build(); imageModel.ifPresent(model -> this.imageModel = model); } @GetMapping("/describe/{image}") List describeImage(@PathVariable String image) { Media media = Media.builder() .id(image) .mimeType(MimeTypeUtils.IMAGE_PNG) .data(new ClassPathResource("images/" + image + ".png")) .build(); UserMessage um = new UserMessage(""" List all items you see on the image and define their category. Return items inside the JSON array in RFC8259 compliant JSON format. """, media); return this.chatClient.prompt(new Prompt(um)) .call() .entity(new ParameterizedTypeReference<>() {}); } @GetMapping(value = "/generate/{object}", produces = MediaType.IMAGE_PNG_VALUE) byte[] generate(@PathVariable String object) throws IOException, NotSupportedException { if (imageModel == null) throw new NotSupportedException("Image model is not supported"); ImageResponse ir = imageModel.call(new ImagePrompt("Generate an image with " + object, ImageOptionsBuilder.builder() .height(1024) .width(1024) .N(1) .responseFormat("url") .build())); String url = ir.getResult().getOutput().getUrl(); UrlResource resource = new UrlResource(url); LOG.info("Generated URL: {}", url); dynamicImages.add(Media.builder() .id(UUID.randomUUID().toString()) .mimeType(MimeTypeUtils.IMAGE_PNG) .data(url) .build()); return resource.getContentAsByteArray(); } }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

@RestController
@RequestMapping("/images")
public class ImageController {

    private final static Logger LOG = LoggerFactory.getLogger(ImageController.class);
    private final ObjectMapper mapper = new ObjectMapper();

    private final ChatClient chatClient;
    private ImageModel imageModel;

    public ImageController(ChatClient.Builder chatClientBuilder,
                           Optional<ImageModel> imageModel) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        imageModel.ifPresent(model -> this.imageModel = model);
    }
        
    @GetMapping("/describe/{image}")
    List<Item> describeImage(@PathVariable String image) {
        Media media = Media.builder()
                .id(image)
                .mimeType(MimeTypeUtils.IMAGE_PNG)
                .data(new ClassPathResource("images/" + image + ".png"))
                .build();
        UserMessage um = new UserMessage("""
        List all items you see on the image and define their category.
        Return items inside the JSON array in RFC8259 compliant JSON format.
        """, media);
        return this.chatClient.prompt(new Prompt(um))
                .call()
                .entity(new ParameterizedTypeReference<>() {});
    }
    
    @GetMapping(value = "/generate/{object}", produces = MediaType.IMAGE_PNG_VALUE)
    byte[] generate(@PathVariable String object) throws IOException, NotSupportedException {
        if (imageModel == null)
            throw new NotSupportedException("Image model is not supported");
        ImageResponse ir = imageModel.call(new ImagePrompt("Generate an image with " + object, ImageOptionsBuilder.builder()
                .height(1024)
                .width(1024)
                .N(1)
                .responseFormat("url")
                .build()));
        String url = ir.getResult().getOutput().getUrl();
        UrlResource resource = new UrlResource(url);
        LOG.info("Generated URL: {}", url);
        dynamicImages.add(Media.builder()
                .id(UUID.randomUUID().toString())
                .mimeType(MimeTypeUtils.IMAGE_PNG)
                .data(url)
                .build());
        return resource.getContentAsByteArray();
    }
    
}

Java

The GET /images/describe/{image} returns a structured list of items identified on a given image. It also categorizes each detected item. In this case, there are two available categories: fruits and vegetables.

By the way, here’s the image described above.

The image generation feature requires a dedicated model on Azure AI. The DALL-E 2 and DALL-E 3 models on Azure support a text-to-image feature.

The application must be aware of the model name. That’s why you must add a new property to your application properties with the following value.

spring.ai.azure.openai.image.options.deployment-name = dall-e-3

Plaintext

Then you must restart the application. After that, you can generate an image by calling the GET /images/generate/{object} endpoint. Here’s the result for the pineapple.

Enable Azure CosmosDB Vector Store

Dependency

By default, the sample Spring Boot application uses Pinecone vector store. However, SpringAI supports two services available on Azure: Azure AI Search and CosmosDB. Let’s choose CosmosDB as the vector store. You must add the following dependency to your Maven pom.xml file:


    org.springframework.ai
    spring-ai-azure-cosmos-db-store-spring-boot-starter

XML

Configuration on Azure

Then, you must create an instance of CosmosDB in your Azure account. The name of my instance is piomin-ai-cosmos.

Once it is created, you will obtain its address and API key. To do that, go to the “Settings -> Keys” menu and save both values visible below.

Then, you have to create a dedicated database and container for your application. To do that, go to the “Data Explorer” tab and provide names for the database and container ID. You must also set the partition key.

All previously provided values must be set in the application properties. Export your CosmosDB API key as the AZURE_VECTORSTORE_API_KEY environment variable.

spring.ai.vectorstore.cosmosdb.endpoint = https://piomin-ai-cosmos.documents.azure.com:443/
spring.ai.vectorstore.cosmosdb.key = ${AZURE_VECTORSTORE_API_KEY}
spring.ai.vectorstore.cosmosdb.databaseName = spring-ai
spring.ai.vectorstore.cosmosdb.containerName = spring-ai
spring.ai.vectorstore.cosmosdb.partitionKeyPath = /id

application-azure-ai.properties

Unfortunately, there are still some issues with the Azure CosmosDB support in the Spring AI M6 milestone version. I see that they were fixed in the SNAPSHOT version. So, if you want to test it by yourself, you will have to switch from milestones to snapshots.


  21
  1.0.0-SNAPSHOT

  

  
    Central Portal Snapshots
    central-portal-snapshots
    https://central.sonatype.com/repository/maven-snapshots/
    
      false
    
    
      true
    
  
  
    spring-snapshots
    Spring Snapshots
    https://repo.spring.io/snapshot
    
      false
    
    
      true

XML

Run and Test the Application

After those changes, you can start the application with the following command:

mvn spring-boot:run -Pazure-ai -Dspring-boot.run.profiles=azure-ai

XML

Once the application is running, you can test the following @RestController that offers RAG functionality. The GET /stocks/load-data endpoint obtains stock prices of given companies and puts them in the vector store. The GET /stocks/v2/most-growth-trend uses the RetrievalAugmentationAdvisor instance to retrieve the most suitable data and include it in the user query.

companies = List.of("AAPL", "MSFT", "GOOG", "AMZN", "META", "NVDA"); for (String company : companies) { StockData data = restTemplate.getForObject("https://api.twelvedata.com/time_series?symbol={0}&interval=1day&outputsize=10&apikey={1}", StockData.class, company, apiKey); if (data != null && data.getValues() != null) { var list = data.getValues().stream().map(DailyStockData::getClose).toList(); var doc = Document.builder() .id(company) .text(mapper.writeValueAsString(new Stock(company, list))) .build(); store.add(List.of(doc)); LOG.info("Document added: {}", company); } } } @RequestMapping("/v2/most-growth-trend") String getBestTrendV2() { PromptTemplate pt = new PromptTemplate(""" {query}. Which {target} is the most % growth? The 0 element in the prices table is the latest price, while the last element is the oldest price. """); Prompt p = pt.create(Map.of("query", "Find the most growth trends", "target", "share")); Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder() .documentRetriever(VectorStoreDocumentRetriever.builder() .similarityThreshold(0.7) .topK(3) .vectorStore(store) .build()) .queryTransformers(rqtBuilder.promptTemplate(pt).build()) .build(); return this.chatClient.prompt(p) .advisors(retrievalAugmentationAdvisor) .call() .content(); } }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

@RestController
@RequestMapping("/stocks")
public class StockController {

    private final ObjectMapper mapper = new ObjectMapper();
    private final static Logger LOG = LoggerFactory.getLogger(StockController.class);
    private final ChatClient chatClient;
    private final RewriteQueryTransformer.Builder rqtBuilder;
    private final RestTemplate restTemplate;
    private final VectorStore store;

    @Value("${STOCK_API_KEY:none}")
    private String apiKey;

    public StockController(ChatClient.Builder chatClientBuilder,
                           VectorStore store,
                           RestTemplate restTemplate) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        this.rqtBuilder = RewriteQueryTransformer.builder()
                .chatClientBuilder(chatClientBuilder);
        this.store = store;
        this.restTemplate = restTemplate;
    }

    @GetMapping("/load-data")
    void load() throws JsonProcessingException {
        final List<String> companies = List.of("AAPL", "MSFT", "GOOG", "AMZN", "META", "NVDA");
        for (String company : companies) {
            StockData data = restTemplate.getForObject("https://api.twelvedata.com/time_series?symbol={0}&interval=1day&outputsize=10&apikey={1}",
                    StockData.class,
                    company,
                    apiKey);
            if (data != null && data.getValues() != null) {
                var list = data.getValues().stream().map(DailyStockData::getClose).toList();
                var doc = Document.builder()
                        .id(company)
                        .text(mapper.writeValueAsString(new Stock(company, list)))
                        .build();
                store.add(List.of(doc));
                LOG.info("Document added: {}", company);
            }
        }
    }

    @RequestMapping("/v2/most-growth-trend")
    String getBestTrendV2() {
        PromptTemplate pt = new PromptTemplate("""
                {query}.
                Which {target} is the most % growth?
                The 0 element in the prices table is the latest price, while the last element is the oldest price.
                """);

        Prompt p = pt.create(Map.of("query", "Find the most growth trends", "target", "share"));

        Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()
                .documentRetriever(VectorStoreDocumentRetriever.builder()
                        .similarityThreshold(0.7)
                        .topK(3)
                        .vectorStore(store)
                        .build())
                .queryTransformers(rqtBuilder.promptTemplate(pt).build())
                .build();

        return this.chatClient.prompt(p)
                .advisors(retrievalAugmentationAdvisor)
                .call()
                .content();
    }

}

Java

Finally, you can call the following two endpoints.

$ curl http://localhost:8080/stocks/load-data
$ curl http://localhost:8080/stocks/v2/most-growth-trend

ShellSession

Final Thoughts

This exercise shows how to modify an existing Spring Boot AI application to integrate it with the Azure OpenAI service. It also gives a recipe on how to include Azure CosmosDB as a vector store for RAG scenarios and similarity searches.

The post Spring AI with Azure OpenAI appeared first on Piotr's TechBlog.

Using Model Context Protocol (MCP) with Spring AI

piotr.minkowski — Mon, 17 Mar 2025 16:17:32 +0000

This article will show how to use Spring AI support for MCP (Model Context Protocol) in Spring Boot server-side and client-side applications. You will learn how to serve tools and prompts on the server side and discover them on the client-side Spring AI application. The Model Context Protocol is a standard for managing contextual interactions with AI models. It provides a standardized way to connect AI models to external data sources and tools. It can help with building complex workflows on top of LLMs. Spring AI MCP extends the MCP Java SDK and provides client and server Spring Boot starters. The MCP Client is responsible for establishing and managing connections with MCP servers.

This is the seventh part of my series of articles about Spring Boot and AI. It is worth reading the following posts before proceeding with the current one. Please pay special attention to the last article from the list about the tool calling feature since we will implement it in our sample client and server apps using MCP.

https://piotrminkowski.com/2025/01/28/getting-started-with-spring-ai-and-chat-model: The first tutorial introduces the Spring AI project and its support for building applications based on chat models like OpenAI or Mistral AI.
https://piotrminkowski.com/2025/01/30/getting-started-with-spring-ai-function-calling: The second tutorial shows Spring AI support for Java function calling with the OpenAI chat model.
https://piotrminkowski.com/2025/02/24/using-rag-and-vector-store-with-spring-ai: The third tutorial shows Spring AI support for RAG (Retrieval Augmented Generation) and vector store.
https://piotrminkowski.com/2025/03/04/spring-ai-with-multimodality-and-images: The fourth tutorial shows Spring AI support for a multimodality feature and image generation
https://piotrminkowski.com/2025/03/10/using-ollama-with-spring-ai: The fifth tutorial shows Spring AI support for interactions with AI models run with Ollama
https://piotrminkowski.com/2025/03/13/tool-calling-with-spring-ai: The sixth tutorial show Spring AI for the Tool Calling feature.

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Motivation for MCP with Spring AI

MCP introduces an interesting concept for applications interacting with AI models. With MCP the application can provide specific tools/functions for several other services, which need to use data exposed by that application. Additionally, it can expose prompt templates and resources. Thanks to that, we don’t need to implement AI tools/functions inside every client service but integrate them with the application that exposes tools over MCP.

The best way to analyze the MCP concept is through an example. Let’s consider an application that connects to a database and exposes data through REST endpoints. If we want to use that data in our AI application we should implement and register AI tools that retrieve data by connecting such the REST endpoints. So, each client-side application that needs data from the source service would have to implement its own set of AI tools locally. Here comes the MCP concept. The source service defines and exposes AI tools/functions in the standardized form. All other apps that need to provide data to AI models can load and use a predefined set of tools.

The following diagram illustrates our scenario. Two Spring Boot applications act as MCP servers. They connect to the database and use Spring AI MCP Server support to expose @Tool methods to the MCP client-side app. The client-side app communicates with the OpenAI model. It includes the tools exposed by the server-side apps in the user query to the AI model. The person-mcp-service app provides @Tool methods for searching persons in the database table. The account-mcp-service is doing the same for the persons’ accounts.

Build MCP Server App with Spring AI

Let’s begin with the implementation of applications that act as MCP servers. They both run and use an in-memory H2 database. To interact with a database we include the Spring Data JPA module. Spring AI allows us to switch between three transport types: STDIO, Spring MVC, and Spring WebFlux. MCP Server with Spring WebFlux supports Server-Sent Events (SSE) and an optional STDIO transport. Here’s a list of required Maven dependencies:


  
    org.springframework.ai
    spring-ai-mcp-server-webflux-spring-boot-starter
  
  
    org.springframework.boot
    spring-boot-starter-data-jpa
  
  
    com.h2database
    h2
    runtime

XML

Create the Person MCP Server

Here’s an @Entity class for interacting with the person table:

@Entity
public class Person {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    private String firstName;
    private String lastName;
    private int age;
    private String nationality;
    @Enumerated(EnumType.STRING)
    private Gender gender;
    
    // ... getters and setters
    
}

Java

The Spring Data Repository interface contains a single method for searching persons by their nationality:

public interface PersonRepository extends CrudRepository<Person, Long> {
    List<Person> findByNationality(String nationality);
}

Java

The PersonTools @Service bean contains two Spring AI @Tool methods. It injects the PersonRepository bean to interact with the H2 database. The getPersonById method returns a single person with a specific ID field, while the getPersonsByNationality returns a list of all persons with a given nationality.

getPersonsByNationality( @ToolParam(description = "Nationality") String nationality) { return personRepository.findByNationality(nationality); } }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

@Service
public class PersonTools {

    private PersonRepository personRepository;

    public PersonTools(PersonRepository personRepository) {
        this.personRepository = personRepository;
    }

    @Tool(description = "Find person by ID")
    public Person getPersonById(
            @ToolParam(description = "Person ID") Long id) {
        return personRepository.findById(id).orElse(null);
    }

    @Tool(description = "Find all persons by nationality")
    public List<Person> getPersonsByNationality(
            @ToolParam(description = "Nationality") String nationality) {
        return personRepository.findByNationality(nationality);
    }
    
}

Java

Once we define @Tool methods, we must register them within the Spring AI MCP server. We can use the ToolCallbackProvider bean for that. More specifically, the MethodToolCallbackProvider class provides a builder that creates an instance of the ToolCallbackProvider class with a list of references to objects with @Tool methods.

@SpringBootApplication
public class PersonMCPServer {

    public static void main(String[] args) {
        SpringApplication.run(PersonMCPServer.class, args);
    }

    @Bean
    public ToolCallbackProvider tools(PersonTools personTools) {
        return MethodToolCallbackProvider.builder()
                .toolObjects(personTools)
                .build();
    }

}

Java

Finally, we must provide configuration properties. The person-mcp-server app will listen on the 8060 port. We should also set the name and version of the MCP server embedded in our application.

spring:
  ai:
    mcp:
      server:
        name: person-mcp-server
        version: 1.0.0
  jpa:
    database-platform: H2
    generate-ddl: true
    hibernate:
      ddl-auto: create-drop

logging.level.org.springframework.ai: DEBUG

server.port: 8060

YAML

That’s all. We can start the application.

$ cd spring-ai-mcp/person-mcp-service
$ mvn spring-boot:run

ShellSession

Create the Account MCP Server

Then, we will do very similar things in the second application that acts as an MCP server. Here’s the @Entity class for interacting with the account table:

@Entity
public class Account {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    private String number;
    private int balance;
    private Long personId;
    
    // ... getters and setters
    
}

Java

The Spring Data Repository interface contains a single method for searching accounts belonging to a given person:

public interface AccountRepository extends CrudRepository<Account, Long> {
    List<Account> findByPersonId(Long personId);
}

Java

The AccountTools @Service bean contains a single Spring AI @Tool method. It injects the AccountRepository bean to interact with the H2 database. The getAccountsByPersonId method returns a list of accounts owned by the person with a specified ID field value.

getAccountsByPersonId( @ToolParam(description = "Person ID") Long personId) { return accountRepository.findByPersonId(personId); } }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

@Service
public class AccountTools {

    private AccountRepository accountRepository;

    public AccountTools(AccountRepository accountRepository) {
        this.accountRepository = accountRepository;
    }

    @Tool(description = "Find all accounts by person ID")
    public List<Account> getAccountsByPersonId(
            @ToolParam(description = "Person ID") Long personId) {
        return accountRepository.findByPersonId(personId);
    }
}

Java

Of course, the account-mcp-server application will use ToolCallbackProvider to register @Tool methods defined inside the AccountTools class.

@SpringBootApplication
public class AccountMCPService {

    public static void main(String[] args) {
        SpringApplication.run(AccountMCPService.class, args);
    }

    @Bean
    public ToolCallbackProvider tools(AccountTools accountTools) {
        return MethodToolCallbackProvider.builder()
                .toolObjects(accountTools)
                .build();
    }
    
}

Java

Here are the application configuration properties. The account-mcp-server app will listen on the 8040 port.

spring:
  ai:
    mcp:
      server:
        name: account-mcp-server
        version: 1.0.0
  jpa:
    database-platform: H2
    generate-ddl: true
    hibernate:
      ddl-auto: create-drop

logging.level.org.springframework.ai: DEBUG

server.port: 8040

YAML

Let’s run the second server-side app:

$ cd spring-ai-mcp/account-mcp-service
$ mvn spring-boot:run

ShellSession

Once we start the application, we should see the log indicating how many tools were registered in the MCP server.

Build MCP Client App with Spring AI

Implementation

We will create a single client-side application. However, we can imagine an architecture where many applications consume tools exposed by one MCP server. Our application interacts with the OpenAI chat model, so we must include the Spring AI OpenAI starter. For the MCP Client starter, we can choose between two dependencies: Standard MCP client and Spring WebFlux client. Spring team recommends using the WebFlux-based SSE connection with the spring-ai-mcp-client-webflux-spring-boot-starter. Finally, we include the Spring Web starter to expose the REST endpoint. However, you can use Spring WebFlux starter to expose them reactively.


  
    org.springframework.boot
    spring-boot-starter-web
  
  
    org.springframework.ai
    spring-ai-mcp-client-webflux-spring-boot-starter
  
  
    org.springframework.ai
    spring-ai-openai-spring-boot-starter

XML

Our MCP client connects with two MCP servers. We must provide the following connection settings in the application.yml file.

spring.ai.mcp.client.sse.connections:
  person-mcp-server:
    url: http://localhost:8060
  account-mcp-server:
    url: http://localhost:8040

ShellSession

Our sample Spring Boot application contains to @RestControllers, which expose HTTP endpoints. The PersonController class defines two endpoints for searching and counting persons by nationality. The MCP Client Boot Starter automatically configures tool callbacks that integrate with Spring AI’s tool execution framework. Thanks to that we can use the ToolCallbackProvider instance to provide default tools to the ChatClient bean. Then, we can perform the standard steps to interact with the AI model with Spring AI ChatClient. However, the client will use tools exposed by both sample MCP servers.

@RestController
@RequestMapping("/persons")
public class PersonController {

    private final static Logger LOG = LoggerFactory
        .getLogger(PersonController.class);
    private final ChatClient chatClient;

    public PersonController(ChatClient.Builder chatClientBuilder,
                            ToolCallbackProvider tools) {
        this.chatClient = chatClientBuilder
                .defaultTools(tools)
                .build();
    }

    @GetMapping("/nationality/{nationality}")
    String findByNationality(@PathVariable String nationality) {

        PromptTemplate pt = new PromptTemplate("""
                Find persons with {nationality} nationality.
                """);
        Prompt p = pt.create(Map.of("nationality", nationality));
        return this.chatClient.prompt(p)
                .call()
                .content();
    }

    @GetMapping("/count-by-nationality/{nationality}")
    String countByNationality(@PathVariable String nationality) {
        PromptTemplate pt = new PromptTemplate("""
                How many persons come from {nationality} ?
                """);
        Prompt p = pt.create(Map.of("nationality", nationality));
        return this.chatClient.prompt(p)
                .call()
                .content();
    }
}

Java

Let’s switch to the second @RestController. The AccountController class defines two endpoints for searching accounts by person ID. The GET /accounts/count-by-person-id/{personId} returns the number of accounts belonging to a given person. The GET /accounts/balance-by-person-id/{personId} is slightly more complex. It counts the total balance in all person’s accounts. However, it must also return the person’s name and nationality, which means that it must call the getPersonById tool method exposed by the person-mcp-server app after calling the tool for searching accounts by person ID.

@RestController
@RequestMapping("/accounts")
public class AccountController {

    private final static Logger LOG = LoggerFactory.getLogger(PersonController.class);
    private final ChatClient chatClient;

    public AccountController(ChatClient.Builder chatClientBuilder,
                            ToolCallbackProvider tools) {
        this.chatClient = chatClientBuilder
                .defaultTools(tools)
                .build();
    }

    @GetMapping("/count-by-person-id/{personId}")
    String countByPersonId(@PathVariable String personId) {
        PromptTemplate pt = new PromptTemplate("""
                How many accounts has person with {personId} ID ?
                """);
        Prompt p = pt.create(Map.of("personId", personId));
        return this.chatClient.prompt(p)
                .call()
                .content();
    }

    @GetMapping("/balance-by-person-id/{personId}")
    String balanceByPersonId(@PathVariable String personId) {
        PromptTemplate pt = new PromptTemplate("""
                How many accounts has person with {personId} ID ?
                Return person name, nationality and a total balance on his/her accounts.
                """);
        Prompt p = pt.create(Map.of("personId", personId));
        return this.chatClient.prompt(p)
                .call()
                .content();
    }

}

Java

Running the Application

Before starting the client-side app we must export the OpenAI token as the SPRING_AI_OPENAI_API_KEY environment variable.

export SPRING_AI_OPENAI_API_KEY=

ShellSession

Then go to the sample-client directory and run the app with the following command:

$ cd spring-ai-mcp/sample-client
$ mvn spring-boot:run

ShellSession

Once we start the application, we can switch to the logs. As you see, the sample-client app receives responses with tools from both person-mcp-server and account-mcp-server apps.

Testing MCP with Spring Boot

Both server-side applications load data from the import.sql scripts on startup. Spring Data JPA automatically imports data from such scripts. Our MCP client application listens on the 8080 port. Let’s call the first endpoint to get a list of persons from Germany:

curl http://localhost:8080/persons/nationality/Germany

ShellSession

Here’s the response from the OpenAI model:

We can also call the endpoint that counts the number with a given nationality.

curl http://localhost:8080/persons/count-by-nationality/Germany

ShellSession

As the final test, we can call the GET /accounts/balance-by-person-id/{personId} endpoint that interacts with tools exposed by both MCP server-side apps. It requires an AI model to combine data from person and account sources.

Exposing Prompts with MCP

We can also expose prompts and resources with the Spring AI MCP server support. To register and expose prompts we need to define the list of SyncPromptRegistration objects. It contains the name of the prompt, a list of input arguments, and a text content.

{ String argument = (String) getPromptRequest.arguments().get("nationality"); var userMessage = new McpSchema.PromptMessage(McpSchema.Role.USER, new McpSchema.TextContent("How many persons come from " + argument + " ?")); return new McpSchema.GetPromptResult("Count persons by nationality", List.of(userMessage)); }); return List.of(promptRegistration); } }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

@SpringBootApplication
public class PersonMCPServer {

    public static void main(String[] args) {
        SpringApplication.run(PersonMCPServer.class, args);
    }

    @Bean
    public ToolCallbackProvider tools(PersonTools personTools) {
        return MethodToolCallbackProvider.builder()
                .toolObjects(personTools)
                .build();
    }

    @Bean
    public List prompts() {
        var prompt = new McpSchema.Prompt("persons-by-nationality", "Get persons by nationality",
                List.of(new McpSchema.PromptArgument("nationality", "Person nationality", true)));

        var promptRegistration = new McpServerFeatures.SyncPromptRegistration(prompt, getPromptRequest -> {
            String argument = (String) getPromptRequest.arguments().get("nationality");
            var userMessage = new McpSchema.PromptMessage(McpSchema.Role.USER,
                    new McpSchema.TextContent("How many persons come from " + argument + " ?"));
            return new McpSchema.GetPromptResult("Count persons by nationality", List.of(userMessage));
        });

        return List.of(promptRegistration);
    }
}

ShellSession

After startup, the application prints information about a list of registered prompts in the logs.

There is no built-in Spring AI support for loading prompts using the MCP client. However, Spring AI MCP support is under active development so we may expect some new features soon. For now, Spring AI provides the auto-configured instance of McpSyncClient. We can use it to search the prompt in the list of prompts received from the server. Then, we can prepare the PromptTemplate instance using the registered content and create the Prompt by filling the template with the input parameters.

mcpSyncClients; public PersonController(ChatClient.Builder chatClientBuilder, ToolCallbackProvider tools, List mcpSyncClients) { this.chatClient = chatClientBuilder .defaultTools(tools) .build(); this.mcpSyncClients = mcpSyncClients; } // ... other endpoints @GetMapping("/count-by-nationality-from-client/{nationality}") String countByNationalityFromClient(@PathVariable String nationality) { return this.chatClient .prompt(loadPromptByName("persons-by-nationality", nationality)) .call() .content(); } Prompt loadPromptByName(String name, String nationality) { McpSchema.GetPromptRequest r = new McpSchema .GetPromptRequest(name, Map.of("nationality", nationality)); var client = mcpSyncClients.stream() .filter(c -> c.getServerInfo().name().equals("person-mcp-server")) .findFirst(); if (client.isPresent()) { var content = (McpSchema.TextContent) client.get() .getPrompt(r) .messages() .getFirst() .content(); PromptTemplate pt = new PromptTemplate(content.text()); Prompt p = pt.create(Map.of("nationality", nationality)); LOG.info("Prompt: {}", p); return p; } else return null; } }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

@RestController
@RequestMapping("/persons")
public class PersonController {

    private final static Logger LOG = LoggerFactory
        .getLogger(PersonController.class);
    private final ChatClient chatClient;
    private final List<McpSyncClient> mcpSyncClients;

    public PersonController(ChatClient.Builder chatClientBuilder,
                            ToolCallbackProvider tools,
                            List<McpSyncClient> mcpSyncClients) {
        this.chatClient = chatClientBuilder
                .defaultTools(tools)
                .build();
        this.mcpSyncClients = mcpSyncClients;
    }

    // ... other endpoints
    
    @GetMapping("/count-by-nationality-from-client/{nationality}")
    String countByNationalityFromClient(@PathVariable String nationality) {
        return this.chatClient
                .prompt(loadPromptByName("persons-by-nationality", nationality))
                .call()
                .content();
    }

    Prompt loadPromptByName(String name, String nationality) {
        McpSchema.GetPromptRequest r = new McpSchema
            .GetPromptRequest(name, Map.of("nationality", nationality));
        var client = mcpSyncClients.stream()
                .filter(c -> c.getServerInfo().name().equals("person-mcp-server"))
                .findFirst();
        if (client.isPresent()) {
            var content = (McpSchema.TextContent) client.get() 
                .getPrompt(r)
                .messages()
                .getFirst()
                .content();
            PromptTemplate pt = new PromptTemplate(content.text());
            Prompt p = pt.create(Map.of("nationality", nationality));
            LOG.info("Prompt: {}", p);
            return p;
        } else return null;
    }
}

Java

Final Thoughts

Model Context Protocol is an important initiative in the AI world. It allows us to avoid reinventing the wheel for each new data source. A unified protocol streamlines integration, minimizing development time and complexity. As businesses expand their AI toolsets, MCP enables seamless connectivity across multiple systems without the burden of excessive custom code. Spring AI introduced the initial version of MCP support recently. It seems promising. With Spring AI Client and Server starters, we may implement a distributed architecture, where several different apps use the AI tools exposed by a single service.

The post Using Model Context Protocol (MCP) with Spring AI appeared first on Piotr's TechBlog.

Tool Calling with Spring AI

piotr.minkowski — Thu, 13 Mar 2025 15:55:40 +0000

This article will show you how to use Spring AI support with the most popular AI models for the tool calling feature. Tool calling (or function calling), is a common pattern in AI applications that enables a model to interact with APIs or tools, extending its capabilities. The most popular AI models are trained to know when to call a function. Spring AI formerly supported it through the Function Calling API, which has been deprecated and marked for removal in the next release. My previous article described that feature based on interactions with an internal database and an external market stock API. Today, we will consider the same use case. This time, however, we will replace the deprecated Function Calling API with a new Tool calling feature.

This is the sixth part of my series of articles about Spring Boot and AI. It is worth reading the following posts before proceeding with the current one. Please pay special attention to the second article. I will refer to it often in this article.

https://piotrminkowski.com/2025/01/28/getting-started-with-spring-ai-and-chat-model: The first tutorial introduces the Spring AI project and its support for building applications based on chat models like OpenAI or Mistral AI.
https://piotrminkowski.com/2025/01/30/getting-started-with-spring-ai-function-calling: The second tutorial shows Spring AI support for Java function calling with the OpenAI chat model.
https://piotrminkowski.com/2025/02/24/using-rag-and-vector-store-with-spring-ai: The third tutorial shows Spring AI support for RAG (Retrieval Augmented Generation) and vector store.
https://piotrminkowski.com/2025/03/04/spring-ai-with-multimodality-and-images: The fourth tutorial shows Spring AI support for a multimodality feature and image generation
https://piotrminkowski.com/2025/03/10/using-ollama-with-spring-ai: The fifth tutorial shows Spring AI supports for interactions with AI models run with Ollama

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Motivation for Tool Calling in Spring AI

The tool calling feature helps us solve a common AI model challenge related to internal or live data sources. If we want to augment a model with such data our applications must allow it to interact with a set of APIs or tools. In our case, the internal database (H2) contains information about the structure of our stock wallet. The sample Spring Boot application asks an AI model about the total value of the wallet based on daily stock prices or the highest value for the last few days. The model must retrieve the structure of our stock wallet and the latest stock prices. We will do the same exercise as for a function calling feature. It will be enhanced with additional scenarios I’ll describe later.

Use the Calling Tools Feature in Spring AI

Create WalletTools

Let’s begin with the WalletTools implementation, which is responsible for interaction with a database. We can compare it to the previous implementation based on Spring functions available in the pl.piomin.services.functions.stock.WalletService class. It defines a single method annotated with @Tool. The important element is the right description that must inform the model what that method does. The method returns the number of shares for each company in our portfolio retrieved from the database through the Spring Data @Repository.

getNumberOfShares() { return (List) walletRepository.findAll(); } }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

public class WalletTools {

    private WalletRepository walletRepository;

    public WalletTools(WalletRepository walletRepository) {
        this.walletRepository = walletRepository;
    }

    @Tool(description = "Number of shares for each company in my wallet")
    public List<Share> getNumberOfShares() {
        return (List<Share>) walletRepository.findAll();
    }
}

Java

We can register the WalletTools class as a Spring @Bean in the application main class.

@Bean
public WalletTools walletTools(WalletRepository walletRepository) {
   return new WalletTools(walletRepository);
}

Java

The Spring Boot application launches an embedded, in-memory database and inserts test data into the stock table. Our wallet contains the most popular companies on the U.S. stock market, including Amazon, Meta, and Microsoft.

insert into share(id, company, quantity) values (1, 'AAPL', 100);
insert into share(id, company, quantity) values (2, 'AMZN', 300);
insert into share(id, company, quantity) values (3, 'META', 300);
insert into share(id, company, quantity) values (4, 'MSFT', 400);
insert into share(id, company, quantity) values (5, 'NVDA', 200);

SQL

Create StockTools

The StockTools class is responsible for interaction with TwelveData stock API. It defines two methods. The getLatestStockPrices method returns only the latest close price for a specified company. It is a tool calling version of the method provided within the pl.piomin.services.functions.stock.StockService function. The second method is more complicated. It must return a historical daily close prices for a defined number of days. Each price must be correlated with a quotation date.

{}", company, latestData.getClose()); return new StockResponse(Float.parseFloat(latestData.getClose())); } @Tool(description = "Historical daily stock prices") public List getHistoricalStockPrices(@ToolParam(description = "Search period in days") int days, @ToolParam(description = "Name of company") String company) { StockData data = restTemplate.getForObject("https://api.twelvedata.com/time_series?symbol={0}&interval=1day&outputsize={1}&apikey={2}", StockData.class, company, days, apiKey); return data.getValues().stream() .map(d -> new DailyShareQuote(company, Float.parseFloat(d.getClose()), d.getDatetime())) .toList(); } }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

public class StockTools {

    private static final Logger LOG = LoggerFactory.getLogger(StockTools.class);

    private RestTemplate restTemplate;
    @Value("${STOCK_API_KEY:none}")
    String apiKey;

    public StockTools(RestTemplate restTemplate) {
        this.restTemplate = restTemplate;
    }

    @Tool(description = "Latest stock prices")
    public StockResponse getLatestStockPrices(@ToolParam(description = "Name of company") String company) {
        StockData data = restTemplate.getForObject("https://api.twelvedata.com/time_series?symbol={0}&interval=1min&outputsize=1&apikey={1}",
                StockData.class,
                company,
                apiKey);
        DailyStockData latestData = data.getValues().get(0);
        LOG.info("Get stock prices: {} -> {}", company, latestData.getClose());
        return new StockResponse(Float.parseFloat(latestData.getClose()));
    }

    @Tool(description = "Historical daily stock prices")
    public List<DailyShareQuote> getHistoricalStockPrices(@ToolParam(description = "Search period in days") int days,
                                                          @ToolParam(description = "Name of company") String company) {
        StockData data = restTemplate.getForObject("https://api.twelvedata.com/time_series?symbol={0}&interval=1day&outputsize={1}&apikey={2}",
                StockData.class,
                company,
                days,
                apiKey);
        return data.getValues().stream()
                .map(d -> new DailyShareQuote(company, Float.parseFloat(d.getClose()), d.getDatetime()))
                .toList();
    }
}

Java

Here’s the DailyShareQuote Java record returned in the response list.

public record DailyShareQuote(String company, float price, String datetime) {
}

Java

Then, let’s register the StockUtils class as a Spring @Bean.

@Bean
public StockTools stockTools() {
   return new StockTools(restTemplate());
}

Java

Spring AI Tool Calling Flow

Here’s a fragment of the WalletController code, which is responsible for defining interactions with LLM and HTTP endpoints implementation. It injects both StockTools and WalletTools beans.

@RestController
@RequestMapping("/wallet")
public class WalletController {

    private final ChatClient chatClient;
    private final StockTools stockTools;
    private final WalletTools walletTools;

    public WalletController(ChatClient.Builder chatClientBuilder,
                            StockTools stockTools,
                            WalletTools walletTools) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        this.stockTools = stockTools;
        this.walletTools = walletTools;
    }
    
    // HTTP endpoints implementation
}

Java

The GET /wallet/with-tools endpoint calculates the value of our stock wallet in dollars. It uses the latest daily stock prices for each company’s shares from the wallet. There are a few ways to register tools for a chat model call. We use the tools method provided by the ChatClient interface. It allows us to pass the tool object references directly to the chat client. In this case, we are registering the StockTools bean which contains two @Tool methods. The AI model must choose the right method to call in StockTools based on the description and input argument. It should call the getLatestStockPrices method.

@GetMapping("/with-tools")
String calculateWalletValueWithTools() {
   PromptTemplate pt = new PromptTemplate("""
   What’s the current value in dollars of my wallet based on the latest stock daily prices ?
   """);

   return this.chatClient.prompt(pt.create())
           .tools(stockTools, walletTools)
           .call()
           .content();
}

Java

The GET /wallet/highest-day/{days} endpoint calculates the value of our stock wallet in dollars for each day in the specified period determined by the days variable. Then it must return the day with the highest stock wallet value. Same as before we use the tools method from ChatClient to register our tool calling methods. It should call the getHistoricalStockPrices method.

@GetMapping("/highest-day/{days}")
String calculateHighestWalletValue(@PathVariable int days) {
   PromptTemplate pt = new PromptTemplate("""
   On which day during last {days} days my wallet had the highest value in dollars based on the historical daily stock prices ?
   """);

   return this.chatClient.prompt(pt.create(Map.of("days", days)))
            .tools(stockTools, walletTools)
            .call()
            .content();
}

Java

The following diagram illustrates a flow for the second use case that returns the day with the highest stock wallet value. First, it must connect with the database and retrieve the stock wallet structure containing a number of each company shares. Then, it must call the stock API for every company found in the wallet. So, finally, the method calculateHighestWalletValue should be called five times with different values of the company @ToolParam and a value of the days determined by the HTTP endpoint path variable. Once all the data is collected AI model calculates the highest wallet value and returns it together with the quotation date.

Run Application and Verify Tool Calling

Before starting the application we must set environment variables with the AI model and stock API tokens.

export OPEN_AI_TOKEN=<YOUR_OPEN_AI_TOKEN>
export STOCK_API_KEY=<YOUR_STOCK_API_KEY>

Java

Then run the following Maven command:

mvn spring-boot:run

Java

Once the application is started, we can call the first endpoint. The GET /wallet/with-tools calculates the total least value of the stock wallet structure stored in the database.

curl http://localhost:8080/wallet/with-tools

ShellSession

Here’s the fragment of logs generated by the Spring AI @Tool methods. The model behaves as expected. First, it calls the getNumberOfShares tool to retrieve a wallet structure. Then it calls the getLatestStockPrices tool per share to obtain its current price.

Here’s a final response with a wallet value with a detailed explanation.

Then we can call the GET /wallet/highest-day/{days} endpoint to return the day with the highest wallet value. Let’s calculate it for the last 20 days.

curl http://localhost:8080/wallet/highest-day/20

ShellSession

The response is very detailed. Here’s the final part of the content returned by the OpenAI chat model. It returns 26.02.2025 as the day with the highest wallet value. Frankly, sometimes it returns different answers…

However, the AI flow works fine. First, it calls the getNumberOfShares tool to retrieve a wallet structure. Then it calls the getHistoricalStockPrices tool per share to obtain its prices for the last 20 days.

We can switch to another AI model to compare their responses. You can connect my sample Spring Boot application e.g. with Mistral AI by activating the mistral-ai Maven profile.

mvn spring-boot:run -Pmistral-ai

ShellSession

Before running the app we must export the Mistral API token.

export MISTRAL_AI_TOKEN=

ShellSession

To get the best results I changed the Mistral model to mistral-large-latest.

spring.ai.mistralai.chat.options.model = mistral-large-latest

ShellSession

The response from Mistral AI was pretty quick and short:

Final Thoughts

In this article, we analyzed the Spring AI support for tool calling support, which replaces Function Calling API. Tool calling is a powerful feature that enhances how AI models interact with external tools, APIs, and structured data. It makes AI more interactive and practical for real-world applications. Spring AI provides a flexible way to register and invoke such tools. However, it still requires attention from developers, who need to define clear function schemas and handle edge cases.

The post Tool Calling with Spring AI appeared first on Piotr's TechBlog.

Using Ollama with Spring AI

piotr.minkowski — Mon, 10 Mar 2025 09:46:35 +0000

This article will teach you how to create a Spring Boot application that implements several AI scenarios using Spring AI and the Ollama tool. Ollama is an open-source tool that aims to run open LLMs on our local machine. It acts like a bridge between LLM and a workstation, providing an API layer on top of them for other applications or services. With Ollama we can run almost every model we want only by pulling it from a huge library.

This is the fifth part of my series of articles about Spring Boot and AI. I mentioned Ollama in the first part of the series to show how to switch between different AI models with Spring AI. However, it was only a brief introduction. Today, we try to run all AI use cases described in the previous tutorials with the Ollama tool. Those tutorials integrated mostly with OpenAI. In this article, we will test them against different AI models.

https://piotrminkowski.com/2025/01/28/getting-started-with-spring-ai-and-chat-model: The first tutorial introduces the Spring AI project and its support for building applications based on chat models like OpenAI or Mistral AI.
https://piotrminkowski.com/2025/01/30/getting-started-with-spring-ai-function-calling: The second tutorial shows Spring AI support for Java function calling with the OpenAI chat model.
https://piotrminkowski.com/2025/02/24/using-rag-and-vector-store-with-spring-ai: The third tutorial shows Spring AI support for RAG (Retrieval Augmented Generation) and vector store.
https://piotrminkowski.com/2025/03/04/spring-ai-with-multimodality-and-images: The fourth tutorial shows Spring AI support for a multimodality feature and image generation

Fortunately, our application can easily switch between different AI tools or models. To achieve this, we must activate the right Maven profile.

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Prepare a Local Environment for Ollama

A few options exist for accessing Ollama on the local machine with Spring AI. I downloaded Ollama from the following link and installed it on my laptop. Alternatively, we can run it e.g. with Docker Compose or Testcontainers.

Once we install Ollama on our workstation we can run the AI model from its library with the ollama run command. The full list of available models can be found here. At the beginning, we will choose the Llava model. It is one of the most popular models which supports both a vision encoder and language understanding.

ollama run llava

ShellSession

Ollama must pull the model manifest and image. Here’s the ollama run command output. Once we see that, we can interact with the model.

The sample application source code already defines the ollama-ai Maven profile with the spring-ai-ollama-spring-boot-starter Spring Boot starter.


  ollama-ai
  
    
      org.springframework.ai
      spring-ai-ollama-spring-boot-starter

XML

The profile is disabled by default. We might enable it during development as shown below (for IntelliJ IDEA). However, the application doesn’t use any vendor-specific components but only generic Spring AI classes and interfaces.

We must activate the ollama-ai profile when running the same application. Assuming we are in the project root directory, we need to run the following Maven command:

mvn spring-boot:run -Pollama-ai

ShellSession

Portability across AI Models

We should avoid using specific model library components to make our application portable between different models. For example, when registering functions in the chat model client we should use FunctionCallingOptions instead of model-specific components like OpenAIChatOptions or OllamaOptions.

@GetMapping
String calculateWalletValue() {
   PromptTemplate pt = new PromptTemplate("""
   What’s the current value in dollars of my wallet based on the latest stock daily prices ?
   """);

   return this.chatClient.prompt(pt.create(
        FunctionCallingOptions.builder()
                    .function("numberOfShares")
                    .function("latestStockPrices")
                    .build()))
            .call()
            .content();
}

Java

Not all models support all the AI capabilities used in our sample application. For models like Ollama or Mistral AI, Spring AI doesn’t provide image generation implementation since those tools don’t support it right now. Therefore we should inject the ImageModel optionally, in case it is not provided by the model-specific library.

imageModel, VectorStore store) { this.chatClient = chatClientBuilder .defaultAdvisors(new SimpleLoggerAdvisor()) .build(); imageModel.ifPresent(model -> this.imageModel = model); // other initializations } }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

@RestController
@RequestMapping("/images")
public class ImageController {

    private final static Logger LOG = LoggerFactory.getLogger(ImageController.class);
    private final ObjectMapper mapper = new ObjectMapper();

    private final ChatClient chatClient;
    private ImageModel imageModel;

    public ImageController(ChatClient.Builder chatClientBuilder,
                           Optional<ImageModel> imageModel,
                           VectorStore store) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        imageModel.ifPresent(model -> this.imageModel = model);
        
        // other initializations 
    }
}

Java

Then, if a method requires the ImageModel bean, we can throw an exception informing it is not by the AI model (1). On the other hand, Spring AI does not provide a dedicated interface for multimodality, which enables AI models to process information from multiple sources. We can use the UserMessage class and the Media class to combine e.g. text with image(s) in the user prompt. The GET /images/describe/{image} endpoint lists items detected in the source image from the classpath (2).

describeImage(@PathVariable String image) { Media media = Media.builder() .id(image) .mimeType(MimeTypeUtils.IMAGE_PNG) .data(new ClassPathResource("images/" + image + ".png")) .build(); UserMessage um = new UserMessage(""" List all items you see on the image and define their category. Return items inside the JSON array in RFC8259 compliant JSON format. """, media); return this.chatClient.prompt(new Prompt(um)) .call() .entity(new ParameterizedTypeReference<>() {}); }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

@GetMapping(value = "/generate/{object}", produces = MediaType.IMAGE_PNG_VALUE)
byte[] generate(@PathVariable String object) throws IOException, NotSupportedException {
   if (imageModel == null)
      throw new NotSupportedException("Image model is not supported by the AI model"); // (1)
   ImageResponse ir = imageModel.call(new ImagePrompt("Generate an image with " + object, ImageOptionsBuilder.builder()
           .height(1024)
           .width(1024)
           .N(1)
           .responseFormat("url")
           .build()));
   String url = ir.getResult().getOutput().getUrl();
   UrlResource resource = new UrlResource(url);
   LOG.info("Generated URL: {}", url);
   dynamicImages.add(Media.builder()
           .id(UUID.randomUUID().toString())
           .mimeType(MimeTypeUtils.IMAGE_PNG)
           .data(url)
           .build());
   return resource.getContentAsByteArray();
}
    
@GetMapping("/describe/{image}") // (2)
List<Item> describeImage(@PathVariable String image) {
   Media media = Media.builder()
           .id(image)
           .mimeType(MimeTypeUtils.IMAGE_PNG)
           .data(new ClassPathResource("images/" + image + ".png"))
           .build();
   UserMessage um = new UserMessage("""
   List all items you see on the image and define their category. 
   Return items inside the JSON array in RFC8259 compliant JSON format.
   """, media);
   return this.chatClient.prompt(new Prompt(um))
           .call()
           .entity(new ParameterizedTypeReference<>() {});
}

Java

Let’s try to avoid similar declarations described in Spring AI. Although they are perfectly correct, they will cause problems when switching between different Spring Boot starters for different AI vendors.

ChatResponse response = chatModel.call(
    new Prompt(
        "Generate the names of 5 famous pirates.",
        OllamaOptions.builder()
            .model(OllamaModel.LLAMA3_1)
            .temperature(0.4)
            .build()
    ));

Java

In this case, we can set the global property in the application.properties file which sets the default model used in the scenario with Ollama.

spring.ai.ollama.chat.options.model = llava

Java

Testing Multiple Models with Spring AI and Ollama

By default, Ollama doesn’t require any API token to establish communication with AI models. The Ollama Spring Boot starter provides auto-configuration that connects the chat client to the Ollama API server running on the localhost:11434 address. So, before running our sample application we must export tokens used to authorize against stock market API and a vector store.

export STOCK_API_KEY=<YOUR_STOCK_API_KEY>
export PINECONE_TOKEN=<YOUR_PINECONE_TOKEN>

Java

Llava on Ollama

Let’s begin with the Llava model. We can call the first endpoint that asks the model to generate a list of persons (GET /persons) and then search for the person with a particular in the list stored in the chat memory (GET /persons/{id}).

Then we can the endpoint that displays all the items visible on the particular image from the classpath (GET /images/describe/{image}).

By the way, here is the analyzed image stored in the src/main/resources/images/fruits-3.png file.

The endpoint for describing all the input images from the classpath doesn’t work fine. I tried to tweak it by adding the RFC8259 JSON format sentence or changing a query. However, the AI model always returned a description of a single instead of a whole Media list. The OpenAI model could print descriptions for all images in the String[] format.

@GetMapping("/describe")
String[] describe() {
   UserMessage um = new UserMessage("""
            Explain what do you see on each image from the input list.
            Return data in RFC8259 compliant JSON format.
            """, List.copyOf(Stream.concat(images.stream(), dynamicImages.stream()).toList()));
   return this.chatClient.prompt(new Prompt(um))
            .call()
            .entity(String[].class);
}

Java

Here’s the response. Of course, we can train a model to receive better results or try to prepare a better prompt.

After calling the GET /wallet endpoint exposed by the WalletController, I received the [400] Bad Request - {"error":"registry.ollama.ai/library/llava:latest does not support tools"} response. It seems Llava doesn’t support the Function/Tool calling feature. We will also always receive the NotSupportedExcpetion for GET /images/generate/{object} endpoint, since the Spring AI Ollama library doesn’t provide ImageModel bean. You can perform other tests e.g. for RAG and vector store features implemented in the StockController @RestController.

Granite on Ollama

Let’s switch to another interesting model – Granite. Particularly we will test the granite3.2-vision model dedicated to automated content extraction from tables, charts, infographics, plots, and diagrams. First, we set the current model name in the Ollama Spring AI configuration properties.

spring.ai.ollama.chat.options.model = granite3.2-vision

Plaintext

Let’s stop the Llava model and then run granite3.2-vision on Ollama:

ollama run granite3.2-vision

Java

After the application restarts, we can perform some test calls. The endpoint for describing a single image returns a more detailed response than the Llava model. The response for the query with multiple images still looks the same as before.

The Granite Vision model supports a “function calling” feature, but it couldn’t call functions properly using my prompt. Please refer to my article for more details about the Spring AI function calling with OpenAI.

Deepseek on Ollama

The last model we will run within this exercise is Deepseek. DeepSeek-R1 achieves performance comparable to OpenAI-o1 on reasoning tasks. First, we must set the current model name in the Ollama Spring AI configuration properties.

spring.ai.ollama.chat.options.model = deepseek-r1

Plaintext

Then let’s stop the Granite model and then run deepseek-r1 on Ollama:

ollama run deepseek-r1

ShellSession

We need to restart the app:

mvn spring-boot:run -Pollama-ai

ShellSession

As usual, we can call the first endpoint that asks the model to generate a list of persons (GET /persons) and then search for the person with a particular in the list stored in the chat memory (GET /persons/{id}). The response was pretty large, but not in the required JSON format. Here’s the fragment of the response:

The deepseek-r1 model doesn’t support a tool/function calling feature. Also, it didn’t analyze my input image properly and it didn’t return a JSON response according to the Spring AI structured output feature.

Final Thoughts

This article shows how to easily switch between multiple AI models with Spring AI and Ollama. We tested several AI use cases implemented in the sample Spring Boot application across models such as Llava, Granite, or Deepseek. The app provides several endpoints for showing such features as multimodality, chat memory, RAG, vector store, or a function calling. It aims not to compare the AI models, but to give a simple recipe for integration with different AI models and allow playing with them using Spring AI.

The post Using Ollama with Spring AI appeared first on Piotr's TechBlog.

Spring AI with Multimodality and Images

piotr.minkowski — Tue, 04 Mar 2025 08:56:24 +0000

This article will teach you how to create a Spring Boot application that handles images and text using the Spring AI multimodality feature. Multimodality is the ability to understand and process information from different sources simultaneously. It covers text, images, audio, and other data formats. We will perform simple experiments with multimodality and images. This is the fourth part of my series of articles about Spring Boot and AI. It is worth reading the following posts before proceeding with the current one:

https://piotrminkowski.com/2025/01/28/getting-started-with-spring-ai-and-chat-model: The first tutorial introduces the Spring AI project and its support for building applications based on chat models like OpenAI or Mistral AI.
https://piotrminkowski.com/2025/01/30/getting-started-with-spring-ai-function-calling: The second tutorial shows Spring AI support for Java function calling with the OpenAI chat model.
https://piotrminkowski.com/2025/02/24/using-rag-and-vector-store-with-spring-ai: The third tutorial shows Spring AI support for RAG (Retrieval Augmented Generation) and vector store.

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Motivation for Multimodality with Spring AI

The multimodal large language model (LLM) capabilities allow it to process and generate text alongside other modalities, including images, audio, and video. This feature covers a use case when we want LLM to detect something specific inside an image or describe its content. Let’s assume we have a list of input images. We want to find the image in that list that matches our description. For example, this description can ask a model to find the image that contains a specified item. The Spring AI Message API provides all the necessary elements to support multimodal LLMs. Here’s a diagram that illustrates our scenario.

Use Multimodality with Spring AI

We don’t need to include any specific library other than the Spring AI starter for a particular AI model. The default option is spring-ai-openai-spring-boot-starter. Our application uses images stored in the src/main/resources/images directory. Spring AI multimodality support requires the image to be passed inside the Media object. We load all the pictures from the classpath inside the constructor.

Recognize Items in the Image

The GET /images/find/{object} tries to find the image that contains the item determined by the object path variable. AI model must return a position on the image in the input list. To achieve that, we create an UserMessage object that contains a user query and a list of the Media objects. Once the model returns the position, the endpoint reads the image from the list and returns its content in the image/png format.

images; private List dynamicImages = new ArrayList<>(); public ImageController(ChatClient.Builder chatClientBuilder) { this.chatClient = chatClientBuilder .defaultAdvisors(new SimpleLoggerAdvisor()) .build(); this.images = List.of( Media.builder().id("fruits").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/fruits.png")).build(), Media.builder().id("fruits-2").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/fruits-2.png")).build(), Media.builder().id("fruits-3").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/fruits-3.png")).build(), Media.builder().id("fruits-4").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/fruits-4.png")).build(), Media.builder().id("fruits-5").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/fruits-5.png")).build(), Media.builder().id("animals").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/animals.png")).build(), Media.builder().id("animals-2").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/animals-2.png")).build(), Media.builder().id("animals-3").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/animals-3.png")).build(), Media.builder().id("animals-4").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/animals-4.png")).build(), Media.builder().id("animals-5").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/animals-5.png")).build() ); } @GetMapping(value = "/find/{object}", produces = MediaType.IMAGE_PNG_VALUE) @ResponseBody byte[] analyze(@PathVariable String object) { String msg = """ Which picture contains %s. Return only a single picture. Return only the number that indicates its position in the media list. """.formatted(object); LOG.info(msg); UserMessage um = new UserMessage(msg, images); String content = this.chatClient.prompt(new Prompt(um)) .call() .content(); assert content != null; return images.get(Integer.parseInt(content)-1).getDataAsByteArray(); } }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

@RestController
@RequestMapping("/images")
public class ImageController {

    private final static Logger LOG = LoggerFactory
        .getLogger(ImageController.class);

    private final ChatClient chatClient;
    private List<Media> images;
    private List<Media> dynamicImages = new ArrayList<>();

    public ImageController(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        this.images = List.of(
                Media.builder().id("fruits").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/fruits.png")).build(),
                Media.builder().id("fruits-2").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/fruits-2.png")).build(),
                Media.builder().id("fruits-3").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/fruits-3.png")).build(),
                Media.builder().id("fruits-4").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/fruits-4.png")).build(),
                Media.builder().id("fruits-5").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/fruits-5.png")).build(),
                Media.builder().id("animals").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/animals.png")).build(),
                Media.builder().id("animals-2").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/animals-2.png")).build(),
                Media.builder().id("animals-3").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/animals-3.png")).build(),
                Media.builder().id("animals-4").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/animals-4.png")).build(),
                Media.builder().id("animals-5").mimeType(MimeTypeUtils.IMAGE_PNG).data(new ClassPathResource("images/animals-5.png")).build()
        );
    }

    @GetMapping(value = "/find/{object}", produces = MediaType.IMAGE_PNG_VALUE)
    @ResponseBody byte[] analyze(@PathVariable String object) {
        String msg = """
        Which picture contains %s.
        Return only a single picture.
        Return only the number that indicates its position in the media list.
        """.formatted(object);
        LOG.info(msg);

        UserMessage um = new UserMessage(msg, images);

        String content = this.chatClient.prompt(new Prompt(um))
                .call()
                .content();

        assert content != null;
        return images.get(Integer.parseInt(content)-1).getDataAsByteArray();
    }

}

Java

Let’s make a test call. We will look for the picture containing a banana. Here’s the AI model response after calling the http://localhost:8080/images/find/banana. You can try to make other test calls and find an image with e.g. an orange or a tomato.

Describe Image Contents

On the other hand, we can ask the AI model to generate a short description of all images included as the Media content. The GET /images/describe endpoint merges two lists of images.

@GetMapping("/describe")
String[] describe() {
   UserMessage um = new UserMessage("Explain what do you see on each image.",
            List.copyOf(Stream.concat(images.stream(), dynamicImages.stream()).toList()));
      return this.chatClient.prompt(new Prompt(um))
              .call()
              .entity(String[].class);
}

Java

Once we call the http://localhost:8080/images/describe URL we will receive a compact description of all input images. The two highlighted descriptions have been generated for images from the dynamicImages List. These images were generated by the AI image model. We will discuss this in the next section.

Generate Images with AI Model

To generate an image using AI API we must inject the ImageModel bean. It provides a single call method that allows us to communicate with AI Models dedicated to image generation. This method takes the ImagePrompt object as an argument. Typically, we use the ImagePrompt constructor that takes instructions for image generation and options that customize the height, width, and number of images. We will generate a single (N=1) image with 1024 pixels in height and width. The AI model returns the image URL (responseFormat). Once the image is generated, we create an UrlResource object, create the Media object, and put it into the dynamicImages List. The GET /images/generate/{object} endpoint returns a byte array representation of the image object.

images; private List dynamicImages = new ArrayList<>(); public ImageController(ChatClient.Builder chatClientBuilder, ImageModel imageModel) { this.chatClient = chatClientBuilder .defaultAdvisors(new SimpleLoggerAdvisor()) .build(); this.imageModel = imageModel; // other initializations } @GetMapping(value = "/generate/{object}", produces = MediaType.IMAGE_PNG_VALUE) byte[] generate(@PathVariable String object) throws IOException { ImageResponse ir = imageModel.call(new ImagePrompt("Generate an image with " + object, ImageOptionsBuilder.builder() .height(1024) .width(1024) .N(1) .responseFormat("url") .build())); UrlResource url = new UrlResource(ir.getResult().getOutput().getUrl()); LOG.info("Generated URL: {}", ir.getResult().getOutput().getUrl()); dynamicImages.add(Media.builder() .id(UUID.randomUUID().toString()) .mimeType(MimeTypeUtils.IMAGE_PNG) .data(url) .build()); return url.getContentAsByteArray(); } }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

@RestController
@RequestMapping("/images")
public class ImageController {

    private final ChatClient chatClient;
    private final ImageModel imageModel;
    private List<Media> images;
    private List<Media> dynamicImages = new ArrayList<>();
    
    public ImageController(ChatClient.Builder chatClientBuilder,
                           ImageModel imageModel) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        this.imageModel = imageModel;
        // other initializations
    }
    
    @GetMapping(value = "/generate/{object}", produces = MediaType.IMAGE_PNG_VALUE)
    byte[] generate(@PathVariable String object) throws IOException {
        ImageResponse ir = imageModel.call(new ImagePrompt("Generate an image with " + object, ImageOptionsBuilder.builder()
                .height(1024)
                .width(1024)
                .N(1)
                .responseFormat("url")
                .build()));
        UrlResource url = new UrlResource(ir.getResult().getOutput().getUrl());
        LOG.info("Generated URL: {}", ir.getResult().getOutput().getUrl());
        dynamicImages.add(Media.builder()
                .id(UUID.randomUUID().toString())
                .mimeType(MimeTypeUtils.IMAGE_PNG)
                .data(url)
                .build());
        return url.getContentAsByteArray();
    }
    
}

Java

Do you remember the description of that image returned by the GET /images/describe endpoint? Here’s our image with strawberry generated by the AI model after calling the http://localhost:8080/images/generate/strawberry URL.

Here’s a similar test for the banana input parameter.

Use Vector Store with Spring AI Multimodality

Let’s consider how we can leverage vector store in our scenario. We cannot insert image representation directly to a vector store since most popular vendors like OpenAI or Mistral AI do not provide image embedding models. We could integrate directly with a model like clip-vit-base-patch32 to generate image embeddings, but this article won’t cover such a scenario. Instead, a vector store may contain an image description and its location (or name). The GET /images/load endpoint provides a method for loading image descriptions into a vector store. It uses Spring AI multimodality support to generate a compact description of each image in the input list and then puts it into the store.

    @GetMapping("/load")
    void load() throws JsonProcessingException {
        String msg = """
        Explain what do you see on the image.
        Generate a compact description that explains only what is visible.
        """;
        for (Media image : images) {
            UserMessage um = new UserMessage(msg, image);
            String content = this.chatClient.prompt(new Prompt(um))
                    .call()
                    .content();

            var doc = Document.builder()
                    .id(image.getId())
                    .text(mapper.writeValueAsString(new ImageDescription(image.getId(), content)))
                    .build();
            store.add(List.of(doc));
            LOG.info("Document added: {}", image.getId());
        }
    }

Java

Finally, we can implement another endpoint that generates a new image and asks the AI model to generate an image description. Then, it performs a similarity search in a vector store to find the most similar image based on its text description.

generateAndMatch(@PathVariable String object) throws IOException { ImageResponse ir = imageModel.call(new ImagePrompt("Generate an image with " + object, ImageOptionsBuilder.builder() .height(1024) .width(1024) .N(1) .responseFormat("url") .build())); UrlResource url = new UrlResource(ir.getResult().getOutput().getUrl()); LOG.info("URL: {}", ir.getResult().getOutput().getUrl()); String msg = """ Explain what do you see on the image. Generate a compact description that explains only what is visible. """; UserMessage um = new UserMessage(msg, new Media(MimeTypeUtils.IMAGE_PNG, url)); String content = this.chatClient.prompt(new Prompt(um)) .call() .content(); SearchRequest searchRequest = SearchRequest.builder() .query("Find the most similar description to this: " + content) .topK(2) .build(); return store.similaritySearch(searchRequest); }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

    @GetMapping("/generate-and-match/{object}")
    List<Document> generateAndMatch(@PathVariable String object) throws IOException {
        ImageResponse ir = imageModel.call(new ImagePrompt("Generate an image with " + object, ImageOptionsBuilder.builder()
                .height(1024)
                .width(1024)
                .N(1)
                .responseFormat("url")
                .build()));
        UrlResource url = new UrlResource(ir.getResult().getOutput().getUrl());
        LOG.info("URL: {}", ir.getResult().getOutput().getUrl());

        String msg = """
        Explain what do you see on the image.
        Generate a compact description that explains only what is visible.
        """;

        UserMessage um = new UserMessage(msg, new Media(MimeTypeUtils.IMAGE_PNG, url));
        String content = this.chatClient.prompt(new Prompt(um))
                .call()
                .content();

        SearchRequest searchRequest = SearchRequest.builder()
                .query("Find the most similar description to this: " + content)
                .topK(2)
                .build();

        return store.similaritySearch(searchRequest);
    }

Java

Let’s test the GET /images/generate-and-match/{object} endpoint using the pineapple parameter. It returns the description of the fruits.png image from the classpath.

By the way, here’s the fruits.png image located in the /src/main/resources/images directory.

Final Thoughts

Spring AI provides multimodality and image generation support. All the features presented in this article work fine with OpenAI. It supports both the image model and multimodality. To read more about the support offered by other models, refer to the Spring AI chat and image model docs.

This article shows how we can use Spring AI and AI models to interact with images in various ways.

The post Spring AI with Multimodality and Images appeared first on Piotr's TechBlog.

Using RAG and Vector Store with Spring AI

piotr.minkowski — Mon, 24 Feb 2025 14:29:03 +0000

This article will teach you how to create a Spring Boot application that uses RAG (Retrieval Augmented Generation) and vector store with Spring AI. We will continue experiments with stock data, which were initiated in my previous article about Spring AI. This is the third part of my series of articles about Spring Boot and AI. It is worth reading the following posts before proceeding with the current one:

https://piotrminkowski.com/2025/01/28/getting-started-with-spring-ai-and-chat-model: The first tutorial introduces the Spring AI project and its support for building applications based on chat models like OpenAI or Mistral AI.
https://piotrminkowski.com/2025/01/30/getting-started-with-spring-ai-function-calling: The second tutorial shows Spring AI support for Java function calling with the OpenAI chat model.

This article will show how to include one of the vector stores supported by Spring AI and advisors dedicated to RAG support in our sample application codebase used by two previous articles. It will connect to Open AI API, but you can easily switch to other models using Mistral AI or Ollama support in Spring AI. For more details, please refer to my first article.

Source Code

If you would like to try it by yourself, you may always take a look at my source code. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Motivation for RAG with Spring AI

The problem to solve is similar to the one described in my previous article about the Spring AI function calling feature. Since the OpenAI model is trained on a static dataset it does not have direct access to the online services or APIs. We want it to analyze stock growth trends for the biggest companies in the US stock market. Therefore, we must obtain share prices from a public API that returns live stock market data. Then, we can store this data in our local database and integrate it with the sample Spring Boot AI application. Instead of a typical relational database, we will use a vector store. In vector databases, queries work differently from those in traditional relational databases. Instead of looking for exact matches, they conduct similarity searches. It retrieves the most similar vectors to a given input vector.

After loading all required data into a vector database, we must integrate it with the AI model. Spring AI provides a comfortable mechanism for that based on the Advisors API. We have already used some built-in advisors in the previous examples, e.g. to print detailed AI communication logs or enable chat memory. This time they will allow us to implement a Retrieval Augmented Generation (RAG) technique for our app. Thanks to that, the Spring Boot app will retrieve similar documents that best match a user query before sending a request to the AI model. These documents provide context for the query and are sent to the AI model alongside the user’s question.

Here’s a simplified visualization of our process.

Vector Store with Spring AI

Set up Pinecone Database

In this section, we will prepare a vector store, integrate it with our Spring Boot application, and load some data there. Spring AI supports various vector databases. It provides the VectorStore interface to directly interact with a vector store from our Spring Boot app. The full list of supported databases can be found in the Spring AI docs here.

We will proceed with the Pinecone database. It is a popular cloud-based vector database, that allows us to store and search vectors efficiently. Instead of a cloud-based database, we can set up a local instance of another popular vector store – ChromaDB. In that case, you can use the docker-compose.yml file in the repository root directory, to run that database with the docker compose up command. With Pinecone we need to sign up to create an account on their portal. Then we should create an index. There are several customizations available, but the most important thing is to choose the right embedding model. Since text-embedding-ada-002 is a default embedding model for OpenAI we should choose that option. The name of our index is spring-ai. We can read an environment and project name from the generated host URL.

After creating an index, we should generate an API key.

Then, we will copy the generated key and export it as the PINECONE_TOKEN environment variable.

export PINECONE_TOKEN=

ShellSession

Integrate Spring Boot app with Pinecode using Spring AI

Our Spring Boot application must include the spring-ai-pinecone-store-spring-boot-starter dependency to smoothly integrate with the Pinecone vector store.


  org.springframework.ai
  spring-ai-pinecone-store-spring-boot-starter

XML

Then, we must provide connection settings and credentials to the Pinecone database in the Spring Boot application.properties file. It must be at least the Pinecode API key, environment name, project name, and index name.

spring.ai.vectorstore.pinecone.apiKey = ${PINECONE_TOKEN}
spring.ai.vectorstore.pinecone.environment = aped-4627-b74a
spring.ai.vectorstore.pinecone.projectId = fsbak04
spring.ai.vectorstore.pinecone.index-name = spring-ai

SQL

After providing all the required configuration settings we can inject and use the VectorStore bean e.g. in our application REST controller. In the following code fragment, we load input data into a vector store and perform a simple similarity search to find the most growth stock trend. We individually query the Twelvedata API for each company from a list and deserialize the response to the StockData object. Then we create a Spring AI Document object, which contains the name of a company and share close prices for the last 10 days. Data is written in the JSON format.

@RestController
@RequestMapping("/stocks")
public class StockController {

    private final ObjectMapper mapper = new ObjectMapper();
    private final static Logger LOG = LoggerFactory.getLogger(StockController.class);
    private final RestTemplate restTemplate;
    private final VectorStore store;

    @Value("${STOCK_API_KEY}")
    private String apiKey;

    public StockController(VectorStore store,
                           RestTemplate restTemplate) {
        this.store = store;
        this.restTemplate = restTemplate;
    }

    @PostMapping("/load-data")
    void load() throws JsonProcessingException {
        final List<String> companies = List.of("AAPL", "MSFT", "GOOG", "AMZN", "META", "NVDA");
        for (String company : companies) {
            StockData data = restTemplate.getForObject("https://api.twelvedata.com/time_series?symbol={0}&interval=1day&outputsize=10&apikey={1}",
                    StockData.class,
                    company,
                    apiKey);
            if (data != null && data.getValues() != null) {
                var list = data.getValues().stream().map(DailyStockData::getClose).toList();
                var doc = Document.builder()
                        .id(company)
                        .text(mapper.writeValueAsString(new Stock(company, list)))
                        .build();
                store.add(List.of(doc));
                LOG.info("Document added: {}", company);
            }
        }
    }
    
    @GetMapping("/docs")
    List<Document> query() {
        SearchRequest searchRequest = SearchRequest.builder()
                .query("Find the most growth trends")
                .topK(2)
                .build();
        List<Document> docs = store.similaritySearch(searchRequest);
        return docs;
    }
    
}

Java

Once we start our application and call the POST /stocks/load-data endpoint, we should see 6 records loaded into the target store. You can verify the content of the database in the Pinocone index browser.

Then we can interact directly with a vector store by calling the GET /stocks/docs endpoint.

curl http://localhost:8080/stocks/docs

ShellSession

Implement RAG with Spring AI

Use QuestionAnswerAdvisor

Previously we loaded data into a target vector store and performed a simple search to find the most growth trend. Our main goal in this section is to incorporate relevant data into an AI model prompt. We can implement RAG with Spring AI in two ways with different advisors. Let’s begin with QuestionAnswerAdvisor. To perform RAG we must provide an instance of QuestionAnswerAdvisor to the ChatClient bean. The QuestionAnswerAdvisor constructor takes the VectorStore instance as an input argument.

@RequestMapping("/v1/most-growth-trend")
String getBestTrend() {
   PromptTemplate pt = new PromptTemplate("""
            {query}.
            Which {target} is the most % growth?
            The 0 element in the prices table is the latest price, while the last element is the oldest price.
            """);

   Prompt p = pt.create(
            Map.of("query", "Find the most growth trends",
                   "target", "share")
   );

   return this.chatClient.prompt(p)
            .advisors(new QuestionAnswerAdvisor(store))
            .call()
            .content();
}

Java

Then, we can call the endpoint GET /stocks/v1/most-growth-trend to see the AI model response. By the way, the result is not accurate.

Let’s work a little bit on our previous code. We will publish a new version of the AI prompt under the GET /stocks/v1-1/most-growth-trend endpoint. The changed lines have been highlighted. We build the SearchRequest objects that return the top 3 records with the 0.7 similarity threshold. The newly created SearchRequest object must be passed as an argument in the QuestionAnswerAdvisor constructor.

@RequestMapping("/v1-1/most-growth-trend")
String getBestTrendV11() {
   PromptTemplate pt = new PromptTemplate("""
            Which share is the most % growth?
            The 0 element in the prices table is the latest price, while the last element is the oldest price.
            Return a full name of company instead of a market shortcut. 
            """);

   SearchRequest searchRequest = SearchRequest.builder()
            .query("""
            Find the most growth trends.
            The 0 element in the prices table is the latest price, while the last element is the oldest price.
            """)
            .topK(3)
            .similarityThreshold(0.7)
            .build();

   return this.chatClient.prompt(pt.create())
            .advisors(new QuestionAnswerAdvisor(store, searchRequest))
            .call()
            .content();
}

Java

Now, the results are more accurate. The model also returns the full name of companies instead of a market shortcut.

Use RetrievalAugmentationAdvisor

Instead of the QuestionAnswerAdvisor class, we can also use the experimental RetrievalAugmentationAdvisor. It provides an out-of-the-box implementation for the most common RAG flows, based on a modular architecture. There are several built-in modules we can use with RetrievalAugmentationAdvisor. We will include the RewriteQueryTransformer module that uses LLM to rewrite a user query to provide better results when querying a target vector database. It requires the query and target placeholders to be present in the prompt template. Thanks to that transformer we can retrieve the optimal set of records for a percentage growth calculation.

@RestController
@RequestMapping("/stocks")
public class StockController {

    private final ObjectMapper mapper = new ObjectMapper();
    private final static Logger LOG = LoggerFactory.getLogger(StockController.class);
    private final ChatClient chatClient;
    private final RewriteQueryTransformer.Builder rqtBuilder;
    private final RestTemplate restTemplate;
    private final VectorStore store;

    @Value("${STOCK_API_KEY}")
    private String apiKey;

    public StockController(ChatClient.Builder chatClientBuilder,
                           VectorStore store,
                           RestTemplate restTemplate) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        this.rqtBuilder = RewriteQueryTransformer.builder()
                .chatClientBuilder(chatClientBuilder);
        this.store = store;
        this.restTemplate = restTemplate;
    }
    
    // other methods ...
    
    @RequestMapping("/v2/most-growth-trend")
    String getBestTrendV2() {
        PromptTemplate pt = new PromptTemplate("""
                {query}.
                Which {target} is the most % growth?
                The 0 element in the prices table is the latest price, while the last element is the oldest price.
                """);

        Prompt p = pt.create(Map.of("query", "Find the most growth trends", "target", "share"));

        Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()
                .documentRetriever(VectorStoreDocumentRetriever.builder()
                        .similarityThreshold(0.7)
                        .topK(3)
                        .vectorStore(store)
                        .build())
                .queryTransformers(rqtBuilder.promptTemplate(pt).build())
                .build();

        return this.chatClient.prompt(p)
                .advisors(retrievalAugmentationAdvisor)
                .call()
                .content();
    }

}

Java

Once again, we can verify the AI model response by calling the GET /stocks/v2/most-growth-trend endpoint. The response is similar to those generated by the GET /stocks/v1-1/most-growth-trend endpoint.

Run the Application

Only to remind you. Before running the application, we should provide OpenAI and TwelveData API tokens.

$ export OPEN_AI_TOKEN=<YOUR_OPEN_AI_TOKEN>
$ export STOCK_API_KEY=<YOUR_STOCK_API_KEY>
$ mvn spring-boot:run

ShellSession

Final Thoughts

In this article you learned how to use an important AI technique called Retrieval Augmented Generation (RAG) with Spring AI. Spring AI simplifies RAG by providing built-in support for vector stores and easy data incorporation into the chat model through the Advisor API. However, since RetrievalAugmentationAdvisor is an experimental feature we cannot rule out some changes in future releases.

The post Using RAG and Vector Store with Spring AI appeared first on Piotr's TechBlog.

Getting Started with Spring AI Function Calling

piotr.minkowski — Thu, 30 Jan 2025 16:40:30 +0000

This article will show you how to use Spring AI support for Java function calling with the OpenAI chat model. The Spring AI function calling feature lets us connect the LLM capabilities with external APIs or systems. OpenAI’s models are trained to know when to call a function. We will work on implementing a Java function that takes the call arguments from the AI model and sends the result back. Our main goal is to connect to the third-party APIs to provide these results. Then the AI model uses the provided results to complete the conversation.

This article is the second part of a series describing some of the AI project’s most notable features. Before reading on, I recommend checking out my introduction to Spring AI, which is available here. The first part describes such features as prompts, structured output, chat memory, and built-in advisors. Additionally, it demonstrates the capability to switch between the most popular AI chat model API providers.

Source Code

If you would like to try it by yourself, you may always take a look at my source code. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Problem

Whenever I create a new article or example related to AI, I like to define the problem I’m trying to solve. The problem we will solve in this exercise is visible in the following prompt template. I’m asking the AI model about the value of my stock wallet. However, the model doesn’t know how many shares I have, and can’t get the latest stock prices. Since the OpenAI model is trained on a static dataset it does not have direct access to the online services or APIs.

So, in this case, we should provide private data with our wallet structure and “connect” our model with a public API that returns live stock market data. Let’s see how we tackle this challenge with Spring AI function calling.

Create Spring Functions

WalletService Supplier

We will begin with a source code. Then we will visualize the whole process on the diagram. Spring AI supports different ways of registering a function to call. You can read more about it in the Spring AI docs here. We will choose the way based on plain Java functions defined as beans in the Spring application context. This approach allows us to use interfaces from the java.util.function package such as Function, Supplier, or Consumer. Our first function takes no input, so it implements the Supplier interface. It just returns a list of shares that we currently have. It obtains such information from the database through the Spring Data WalletRepository bean.

public class WalletService implements Supplier<WalletResponse> {

    private WalletRepository walletRepository;

    public WalletService(WalletRepository walletRepository) {
        this.walletRepository = walletRepository;
    }

    @Override
    public WalletResponse get() {
        return new WalletResponse((List<Share>) walletRepository.findAll());
    }
}

Java

Information about the number of owned shares is stored in the share table. Each row contains a company name and the quantity of that company shares.

@Entity
public class Share {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    private String company;
    private int quantity;
    
    // ... GETTERS/SETTERS
}

Java

insert into share(id, company, quantity) values (1, 'AAPL', 100);
insert into share(id, company, quantity) values (2, 'AMZN', 300);
insert into share(id, company, quantity) values (3, 'META', 300);
insert into share(id, company, quantity) values (4, 'MSFT', 400);
insert into share(id, company, quantity) values (5, 'NVDA', 200);

SQL

StockService Function

Our second function takes an input argument and returns an output. Therefore, it implements the Function interface. It must interact with live stock market API to get the current price of a given company share. We use the api.twelvedata.com service to access stock exchange quotes. The function returns a current price wrapped by the StockResponse object.

{}", stockRequest.company(), latestData.getClose()); return new StockResponse(Float.parseFloat(latestData.getClose())); } }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

public class StockService implements Function<StockRequest, StockResponse> {

    private static final Logger LOG = LoggerFactory.getLogger(StockService.class);

    @Autowired
    RestTemplate restTemplate;
    @Value("${STOCK_API_KEY}")
    String apiKey;

    @Override
    public StockResponse apply(StockRequest stockRequest) {
        StockData data = restTemplate.getForObject("https://api.twelvedata.com/time_series?symbol={0}&interval=1min&outputsize=1&apikey={1}",
                StockData.class,
                stockRequest.company(),
                apiKey);
        DailyStockData latestData = data.getValues().get(0);
        LOG.info("Get stock prices: {} -> {}", stockRequest.company(), latestData.getClose());
        return new StockResponse(Float.parseFloat(latestData.getClose()));
    }
}

Java

Here are the Java records for request and response objects.

public record StockRequest(String company) { }

public record StockResponse(Float price) { }

Java

To summarize, the first function accesses the database to get owned shares quantity, while the second function communicates public API to get the current price of a company share.

Spring AI Function Calling Flow

Architecture

Here’s the diagram that visualizes the flow of our application. The Spring AI Prompt object must contain references to our function beans. This allows the OpenAI model to recognize when a function should be called. However, the model does not call the function directly but only generates JSON used to call the function on the application side. Each function must provide a name, description, and signature (as JSON schema) to let the model know what arguments it expects. We have two functions. The StockService function returns a list of owned company shares, while the second function takes a single company name as the argument. This is where the magic happens. The chat model should call the StockService function for each object in the list returned by the WalletService function. The final response combines results received from both our functions.

Implementation

To implement the flow visualized above we must register our functions as Spring beans. The method name determines the name of the bean in the Spring context. Each bean declaration should also contain a description, which helps the model to understand when to call the function. The WalletResponse function is registered under the numberOfShares name, while the StockService function under the latestStockPrices name. The WalletService doesn’t take any input arguments, but injects the WalletRepository bean to interact with the database.

numberOfShares(WalletRepository walletRepository) { return new WalletService(walletRepository); } @Bean @Description("Latest stock prices") public Function latestStockPrices() { return new StockService(); }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

@Bean
@Description("Number of shares for each company in my portfolio")
public Supplier<WalletResponse> numberOfShares(WalletRepository walletRepository) {
    return new WalletService(walletRepository);
}

@Bean
@Description("Latest stock prices")
public Function<StockRequest, StockResponse> latestStockPrices() {
    return new StockService();
}

Java

Finally, let’s take a look at the REST controller implementation. It exposes the GET /wallet endpoint that communicates with the OpenAI chat model. When creating a prompt we should register both our functions using the OpenAiChatOptions class and its function method. The reference contains only the function @Bean name.

@RestController
@RequestMapping("/wallet")
public class WalletController {

    private final ChatClient chatClient;

    public WalletController(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
    }

    @GetMapping
    String calculateWalletValue() {
        PromptTemplate pt = new PromptTemplate("""
        What’s the current value in dollars of my wallet based on the latest stock daily prices ?
        """);

        return this.chatClient.prompt(pt.create(
                OpenAiChatOptions.builder()
                        .function("numberOfShares")
                        .function("latestStockPrices")
                        .build()))
                .call()
                .content();
    }
}

Java

Run and Test the Spring AI Application

Before running the app, we must export OpenAI and Twelvedata API keys as environment variables.

export STOCK_API_KEY=
export OPEN_AI_TOKEN=

ShellSession

We must create an account on the Twelvedata platform to obtain its API key. The Twelvedata platform provides API to get the latest stock prices.

Of course, we must have an API key on the OpenAI platform. Once you create an account there you should go to that page. Then choose the name for your token and copy it after creation.

Then, we run our Spring AI app using the following Maven command:

mvn spring-boot:run

ShellSession

After running the app, we can call the /wallet endpoint to calculate our stock portfolio.

curl http://localhost:8080/wallet

ShellSession

Here’s the response returned by OpenAI for the provided test data and the current stock market prices.

Then, let’s switch to the application logs. We can see that the StockService function was called five times – once for every company in the wallet. After we added the SimpleLoggerAdvisor advisor to the and set the property logging.level.org.springframework.ai to DEBUG, we can observe detailed logs with requests and responses from the OpenAI chat model.

Final Thoughts

In this article, we analyzed the Spring AI integration with function support in AI models. OpenAI’s function calling is a powerful feature that enhances how AI models interact with external tools, APIs, and structured data. It makes AI more interactive and practical for real-world applications. Spring AI provides a flexible way to register and invoke such functions. However, it still requires attention from developers, who need to define clear function schemas and handle edge cases.

The post Getting Started with Spring AI Function Calling appeared first on Piotr's TechBlog.

Getting Started with Spring AI and Chat Model

piotr.minkowski — Tue, 28 Jan 2025 10:02:24 +0000

This article will teach you how to use the Spring AI project to build applications based on different chat models. The Spring AI Chat Model is a simple and portable interface that allows us to interact with these models. Our sample Spring Boot application will switch between three popular chat models provided by OpenAI, Mistral AI, and Ollama. This article is the first in a series explaining AI concepts with Spring Boot. Look for more on my blog in this area soon.

If you are interested in Spring Boot, read my article about tips, tricks, and techniques for this framework here.

Source Code

If you would like to try it by yourself, you may always take a look at my source code. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Problem

Whenever I create a new article or example related to AI, I like to define the problem I’m trying to solve. The problem this example solves is very trivial. I publish a lot of small demo apps to explain technology concepts. These apps usually need data to show a demo output. Usually, I add demo data by myself or use a library like Datafaker to do it for me. This time, we can leverage AI Chat Models API for that. Let’s begin!

Dependencies

The Spring AI project is still under active development. Currently, we are waiting for the 1.0 GA release. Until then, we will switch to the milestone releases of the project. The current milestone is 1.0.0-M5. So let’s add the Spring Milestones repository to our Maven pom.xml file.

    
        
            central
            Central
            https://repo1.maven.org/maven2/
        
        
            spring-milestones
            Spring Milestones
            https://repo.spring.io/milestone
            
                false

XML

Then we should include the Maven BOM with a specified version of the Spring AI project.

    
        21
        1.0.0-M5
    
    
    
        
            
                org.springframework.ai
                spring-ai-bom
                ${spring-ai.version}
                pom
                import

XML

Since our sample application exposes some REST endpoints, we should include the Spring Boot Web Starter. We can include the Spring Boot Test Starter to create some JUnit tests. The Spring AI modules are included in the Maven profiles section. There are three different profiles for each chat model provider. By default, our application uses Open AI, and thus it activates the open-ai profile, which includes the spring-ai-openai-spring-boot-starter library. We should activate the mistral-ai profile to switch to Mistral AI. The third option is the ollama-ai profile including the spring-ai-ollama-spring-boot-starter dependency. Here’s a full list of dependencies. That’ll make it a breeze to switch between different chat model AI providers — we’ll only need to set the profile parameter in the Maven running command.

    
        
            org.springframework.boot
            spring-boot-starter-web
        
        
            org.springframework.boot
            spring-boot-starter-test
            test
        
    

    
        
            open-ai
            
                true
            
            
                
                    org.springframework.ai
                    spring-ai-openai-spring-boot-starter
                
            
        
        
            mistral-ai
            
                
                    org.springframework.ai
                    spring-ai-mistral-ai-spring-boot-starter
                
            
        
        
            ollama-ai
            
                
                    org.springframework.ai
                    spring-ai-ollama-spring-boot-starter

XML

Connect to AI Chat Model Providers

Configure OpenAI

Before we proceed with a source code, we should prepare chat model AI tools. Let’s begin with OpenAI. We must have an account on the OpenAI Platform portal. After signing in we should access the API Keys page to generate an API token. Once we set its name, we can click the “Create secret key” button. Don’t forget to copy the key after creation.

The value of the generated token should be saved as an environment variable. Our sample Spring Boot application read its value from the OPEN_AI_TOKEN variable.

export OPEN_AI_TOKEN=

ShellSession

Configure Mistral AI

Then, we should repeat a very similar action for Mistral AI. We must have an account on the Mistral AI Platform portal. After signing in we should access the API Keys page to generate an API token. Both the name and expiration date fields are optional. Once we generate a token by clicking the “Create key” button, we should copy it.

The value of the generated token should be saved as an environment variable. Our sample Spring Boot application read its value for Mistral AI from the MISTRAL_AI_TOKEN variable.

export MISTRAL_AI_TOKEN=

ShellSession

Run and Configure Ollama

Opposite to OpenAI or Mistral AI, Ollama is built to allow to run large language models (LLMs) directly on our workstations. This means we don’t have any connection to the remote API to access it. First, we must download the Ollama binary dedicated to our OS from the following page. After installation, we can interact with it using the ollama CLI. First, we should choose the model to run. The full list of available models can be found here. By default, Spring AI expects the mistral model for the Ollama. Let’s choose llama3.2.

ollama run llama3.2

ShellSession

After running Ollama locally we can interact with it using the CLI terminal.

Configure Spring Boot Properties

Ollama exposes port over localhost and does not require an API token. Fortunately, all necessary URLs for our APIs come with the Spring AI auto-configuration. After choosing the llama3.2 model, we should provide the change in Spring Boot application properties respectively. We can also set the gpt-4o-mini model for OpenAI to decrease API costs.

spring.ai.openai.api-key = ${OPEN_AI_TOKEN}
spring.ai.openai.chat.options.model = gpt-4o-mini
spring.ai.mistralai.api-key = ${MISTRAL_AI_TOKEN}
spring.ai.ollama.chat.options.model = llama3.2

Plaintext

Spring AI Chat Model API

Prompting and Structured Output

Here is our model class. It contains the id field and several other fields that best describe each person.

public class Person {

    private Integer id;
    private String firstName;
    private String lastName;
    private int age;
    private Gender gender;
    private String nationality;
    
    //... GETTERS/SETTERS
}

public enum Gender {
    MALE, FEMALE;
}

Java

The @RestController class injects auto-configured ChatClient.Builder to create an instance of ChatClient. PersonController implements a method for returning a list of persons from the GET /persons endpoint. The main goal is to generate a list of 10 objects with the fields defined in the Person class. The id field should be auto-incremented. The PromptTemplate object defines a message, that will be sent to the chat model AI API. It doesn’t have to specify the exact fields that should be returned. This part is handled automatically by the Spring AI library after we invoke the entity() method on the ChatClient instance. The ParameterizedTypeReference object inside the entity method tells Spring AI to generate a list of objects.

findAll() { PromptTemplate pt = new PromptTemplate(""" Return a current list of 10 persons if exists or generate a new list with random values. Each object should contain an auto-incremented id field. Do not include any explanations or additional text. """); return this.chatClient.prompt(pt.create()) .call() .entity(new ParameterizedTypeReference<>() {}); } } " style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

@RestController
@RequestMapping("/persons")
public class PersonController {

    private final ChatClient chatClient;

    public PersonController(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }

    @GetMapping
    List<Person> findAll() {
        PromptTemplate pt = new PromptTemplate("""
                Return a current list of 10 persons if exists or generate a new list with random values.
                Each object should contain an auto-incremented id field.
                Do not include any explanations or additional text.
                """);

        return this.chatClient.prompt(pt.create())
                .call()
                .entity(new ParameterizedTypeReference<>() {});
    }

}

Java

Assuming you exported the OpenAI token to the OPEN_AI_TOKEN environment variable, you can run the application using the following command:

mvn spring-boot:run

ShellSession

Then, let’s call the http://localhost:8080/persons endpoint. It returns a list of 10 people with different nationalities. It

Now, we can change the PromptTemplate content and add the word “famous” before persons. Just for fun.

The results are not surprising at all – “Elon Musk” enters the list However, the list will be slightly different the second time you call the same endpoint. According to our prompt, a chat client should “return a current list of 10 persons”. So, I expected to get the same list as before. In this case, the problem is that the chat client doesn’t remember a previous conversation.

Advisors and Chat Memory

Let’s try to change it. First, we should define the implementation of the ChatMemory interface. InMemoryChatMemory is good enough for our tests.

@SpringBootApplication
public class SpringAIShowcase {

    public static void main(String[] args) {
        SpringApplication.run(SpringAIShowcase.class, args);
    }

    @Bean
    InMemoryChatMemory chatMemory() {
        return new InMemoryChatMemory();
    }
}

Java

To enable conversation history for a chat client we should define an advisor. The Spring AI Advisors API lets us intercept, modify, and enhance AI-driven interactions handled by Spring applications. Spring AI offers API to create custom advisors, but we can also leverage several built-in advisors. It can be e.g. PromptChatMemoryAdvisor that enables chat memory and adds it to the prompt’s system text or SimpleLoggerAdvisor which enables request/response logging. Let’s take a look at the latest implementation of the PersonController class. I highlighted the added lines of code. Besides advisors, it contains a new GET /persons/{id} endpoint implementation. This endpoint takes a previously returned list of persons and seeks the object with a specified id. The PromptTemplate object specifies the id parameter filled with the value read from the context path.

findAll() { PromptTemplate pt = new PromptTemplate(""" Return a current list of 10 persons if exists or generate a new list with random values. Each object should contain an auto-incremented id field. Do not include any explanations or additional text. """); return this.chatClient.prompt(pt.create()) .call() .entity(new ParameterizedTypeReference<>() {}); } @GetMapping("/{id}") Person findById(@PathVariable String id) { PromptTemplate pt = new PromptTemplate(""" Find and return the object with id {id} in a current list of persons. """); Prompt p = pt.create(Map.of("id", id)); return this.chatClient.prompt(p) .call() .entity(Person.class); } }" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button">

@RestController
@RequestMapping("/persons")
public class PersonController {

    private final ChatClient chatClient;

    public PersonController(ChatClient.Builder chatClientBuilder, 
                            ChatMemory chatMemory) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(
                        new PromptChatMemoryAdvisor(chatMemory),
                        new SimpleLoggerAdvisor())
                .build();
    }

    @GetMapping
    List<Person> findAll() {
        PromptTemplate pt = new PromptTemplate("""
                Return a current list of 10 persons if exists or generate a new list with random values.
                Each object should contain an auto-incremented id field.
                Do not include any explanations or additional text.
                """);

        return this.chatClient.prompt(pt.create())
                .call()
                .entity(new ParameterizedTypeReference<>() {});
    }

    @GetMapping("/{id}")
    Person findById(@PathVariable String id) {
        PromptTemplate pt = new PromptTemplate("""
                Find and return the object with id {id} in a current list of persons.
                """);
        Prompt p = pt.create(Map.of("id", id));
        return this.chatClient.prompt(p)
                .call()
                .entity(Person.class);
    }
}

Java

Now, let’s make a final test. After the application restarts, we can call the endpoint that generates a list of persons. Then, we will call the GET /persons/{id} endpoint to display only a single person by ID. Spring application reads the value from the list of persons stored in the chat memory. Finally, we can repeat the call to the GET /persons endpoint to verify if it returns the same list.

Different Chat AI Models

Assuming you exported the Mistral AI token to the MISTRAL_AI_TOKEN environment variable, you can run the application using the following command. It activates the mistral-ai Maven profile and includes the starter with the Mistral AI support.

mvn spring-boot:run -Pmistral-ai

ShellSession

It returns responses similar to OpenAI’s, but some small differences exist. It always returns 0 in the age field and a 3-letter shortcut as a country name.

Let’s tweak our template to get Mistrai AI to generate an accurate age number. Here’s the fixed prompt template:

Now, it looks quite better. Even so, the names don’t match up with the countries they’re from, so there’s room for improvement.

The last test is for Ollama. Let’s run our application once again. This time we should activate the ollama-ai Maven profile.

mvn spring-boot:run -Pollama-ai

ShellSession

Then, we can repeat the same requests to check out the responses from Ollama AI. You can check out the responses by yourself.

$ curl http://localhost:8080/persons
$ curl http://localhost:8080/persons/2

ShellSession

Final Thoughts

This example doesn’t do anything unusual but only shows some basic features offered by Spring AI Chat Models API. We quickly reviewed features like prompts, structured output, chat memory, and built-in advisors. We also switched between some popular AI Chat Models API providers. You can expect more articles in this area soon. If you want to continue with the next part of the AI series on my blog, go here.

The post Getting Started with Spring AI and Chat Model appeared first on Piotr's TechBlog.