AI Archives - Piotr's TechBlog https://piotrminkowski.com/category/ai/ Java, Spring, Kotlin, microservices, Kubernetes, containers Tue, 17 Feb 2026 16:25:21 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.1 https://i0.wp.com/piotrminkowski.com/wp-content/uploads/2020/08/cropped-me-2-tr-x-1.png?fit=32%2C32&ssl=1 AI Archives - Piotr's TechBlog https://piotrminkowski.com/category/ai/ 32 32 181738725 Create Apps with Claude Code on Ollama https://piotrminkowski.com/2026/02/17/create-apps-with-claude-code-on-ollama/ https://piotrminkowski.com/2026/02/17/create-apps-with-claude-code-on-ollama/#comments Tue, 17 Feb 2026 16:25:18 +0000 https://piotrminkowski.com/?p=15992 This article explains how to run Claude Code on Ollama and use local or cloud models served by Ollama to create Java apps. Read this article if you are experimenting with AI code generation and using paid APIs for this purpose. Relatively recently, Ollama has made a built-in integration with developer tools such as Codex […]

The post Create Apps with Claude Code on Ollama appeared first on Piotr's TechBlog.

]]>
This article explains how to run Claude Code on Ollama and use local or cloud models served by Ollama to create Java apps. Read this article if you are experimenting with AI code generation and using paid APIs for this purpose. Relatively recently, Ollama has made a built-in integration with developer tools such as Codex and Claude Code available. This is a really useful feature. Using the example of integration with Claude Coda and several different models running both locally and in the cloud, you will see how it works.

You can find other articles about AI and Java on my blog. For example, if you are interested in how to use Ollama to serve models for Spring AI applications, you can read the following article.

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions. This repository contains several branches, each with an application generated from the same prompt using different models. Currently, the branch with the fewest comments in the code review has been merged into master. This is the version of the code generated using the glm-5 model. However, this may change in the future, and the master branch may be modified. Therefore, it is best to simply refer to the individual branches or pull requests shown below.

Below is the current list of branches. The dev branch contains the initial version of the repository with the CLAUDE.md file, which specifies the basic requirements for the generated code.

$ git branch
    dev
    glm-5
    gpt-oss
  * master
    minimax
    qwen3-coder
ShellSession

Here are instructions for AI from the CLAUDE.md file. They include a description of the technologies I plan to use in my application and a few practices I intend to apply. For example, I don’t want to use Lombok, a popular Java library that automates the generation of code parts such as getters, setters, and constructors. It seems that in the age of AI, this approach doesn’t make sense, but for some reason, AI models really like this library 🙂 Also, each time I make a code change, I want the LLM model to increment the version number and update the README.md file, etc.

# Project Instructions

- Always use the latest versions of dependencies.
- Always write Java code as the Spring Boot application.
- Always use Maven for dependency management.
- Always create test cases for the generated code both positive and negative.
- Always generate the CircleCI pipeline in the .circleci directory to verify the code.
- Minimize the amount of code generated.
- The Maven artifact name must be the same as the parent directory name.
- Use semantic versioning for the Maven project. Each time you generate a new version, bump the PATCH section of the version number.
- Use `pl.piomin.services` as the group ID for the Maven project and base Java package.
- Do not use the Lombok library.
- Generate the Docker Compose file to run all components used by the application.
- Update README.md each time you generate a new version.
Markdown

Run Claude on Ollama

First, install Ollama on your computer. You can download the installer for your OS here. If you have used Ollama before, please update to the latest version.

$ ollama --version
  ollama version is 0.16.1
ShellSession

Next, install Claude Code.

curl -fsSL https://claude.ai/install.sh | bash
ShellSession

Before you start, it is worth increasing the maximum context window value allowed by Ollama. By default, it is set to 4k, and on the Ollama website itself, you will find a recommendation of 64k for Claude Code. I set the maximum value to 256k for testing different models.

For example, the gpt-oss model supports a 128k context window size.

Let’s pull and run the gpt-oss model with Ollama:

ollama run gpt-oss
ShellSession

After downloading and launching, you can verify the model parameters with the ollama ps command. If you have 100% GPU and a context window size of ~131k, that’s exactly what I meant.

Ensure you are in the root repository directory, then run Claude Code with the command ollama launch claude. Next, choose the gpt-oss model visible in the list under “More”.

ollama-claude-code-gpt-oss

That’s it! Finally, we can start playing with AI.

ollama-claude-code-run

Generate a Java App with Claude Code

My application will be very simple. I just need something to quickly test the solution. Of course, all guidelines defined in the CLAUDE.md file should be followed. So, here is my prompt. Nothing more, nothing less 🙂

Generate an application that exposes REST API and connects to a PostgreSQL database.
The application should have a Person entity with id, and typical fields related to each person.
All REST endpoints should be protected with JWT and OAuth2.
The codebase should use Skaffold to deploy on Kubernetes.
Plaintext

After a few minutes, I have the entire code generated. Below is a summary from the AI of what has been done. If you want to check it out for yourself, take a look at this branch in my repository.

ollama-claude-code-generated

For the sake of formality, let’s take a look at the generated code. There is nothing spectacular here, because it is just a regular Spring Boot application that exposes a few REST endpoints for CRUD operations. However, it doen’t look bad. Here’s the Spring Boot @Service implementation responsible for using PersonRepository to interact with database.

@Service
public class PersonService {
    private final PersonRepository repository;

    public PersonService(PersonRepository repository) {
        this.repository = repository;
    }

    public List<Person> findAll() {
        return repository.findAll();
    }

    public Optional<Person> findById(Long id) {
        return repository.findById(id);
    }

    @Transactional
    public Person create(Person person) {
        return repository.save(person);
    }

    @Transactional
    public Optional<Person> update(Long id, Person person) {
        return repository.findById(id).map(existing -> {
            existing.setFirstName(person.getFirstName());
            existing.setLastName(person.getLastName());
            existing.setEmail(person.getEmail());
            existing.setAge(person.getAge());
            return repository.save(existing);
        });
    }

    @Transactional
    public void delete(Long id) {
        repository.deleteById(id);
    }
}
Java

Here’s the generated @RestController witn REST endpoints implementation:

@RestController
@RequestMapping("/api/people")
public class PersonController {
    private final PersonService service;

    public PersonController(PersonService service) {
        this.service = service;
    }

    @GetMapping
    public List<Person> getAll() {
        return service.findAll();
    }

    @GetMapping("/{id}")
    public ResponseEntity<Person> getById(@PathVariable Long id) {
        Optional<Person> person = service.findById(id);
        return person.map(ResponseEntity::ok).orElseGet(() -> ResponseEntity.notFound().build());
    }

    @PostMapping
    public ResponseEntity<Person> create(@RequestBody Person person) {
        Person saved = service.create(person);
        return ResponseEntity.status(201).body(saved);
    }

    @PutMapping("/{id}")
    public ResponseEntity<Person> update(@PathVariable Long id, @RequestBody Person person) {
        Optional<Person> updated = service.update(id, person);
        return updated.map(ResponseEntity::ok).orElseGet(() -> ResponseEntity.notFound().build());
    }

    @DeleteMapping("/{id}")
    public ResponseEntity<Void> delete(@PathVariable Long id) {
        service.delete(id);
        return ResponseEntity.noContent().build();
    }
}
Java

Below is a summary in a pull request with the generated code.

ollama-claude-code-pr

Using Ollama Cloud Models

Recently, Ollama has made it possible to run models not only locally, but also in the cloud. By default, all models tagged with cloud are run this way. Cloud models are automatically offloaded to Ollama’s cloud service while offering the same capabilities as local models. This is the most useful for larger models that wouldn’t fit on a personal computer. You can for example try to experiment with the qwen3-coder model locally. Unfortunately, it didn’t look very good on my laptop.

Then, I can run a same or event a larger model in cloud and automatically connect Claude Code with that model using the following command:

ollama launch claude --model qwen3-coder:480b-cloud
Java

Now you can repeat exactly the same exercise as before or take a look at my branch containing the code generated using this model.

You can also try some other cloud models like minimax-m2.5 or glm-5.

Conclusion

If you’re developing locally and don’t want to burn money on APIs, use Claude Code with Ollama, and e.g., the gpt-oss or glm-5 models. It’s a pretty powerful and free option. If you have a powerful personal computer, the locally launched model should be able to generate the code efficiently. Otherwise, you can use the option of launching the model in the cloud offered by Ollama free of charge up to a certain usage limit (it is difficult to say exactly what that limit is). The gpt-oss model worked really well on my laptop (MacBook Pro M3), and it took about 7-8 minutes to generate the application. You can also look for a model that suits you better.

The post Create Apps with Claude Code on Ollama appeared first on Piotr's TechBlog.

]]>
https://piotrminkowski.com/2026/02/17/create-apps-with-claude-code-on-ollama/feed/ 5 15992
Spring AI with External MCP Servers https://piotrminkowski.com/2026/02/06/spring-ai-with-external-mcp-servers/ https://piotrminkowski.com/2026/02/06/spring-ai-with-external-mcp-servers/#respond Fri, 06 Feb 2026 10:00:53 +0000 https://piotrminkowski.com/?p=15974 This article explains how to integrate Spring AI with external MCP servers that provide APIs for popular tools such as GitHub and SonarQube. Spring AI provides built-in support for MCP clients and servers. In this article, we will use only the Spring MCP client. If you are interested in more details on building MCP servers, […]

The post Spring AI with External MCP Servers appeared first on Piotr's TechBlog.

]]>
This article explains how to integrate Spring AI with external MCP servers that provide APIs for popular tools such as GitHub and SonarQube. Spring AI provides built-in support for MCP clients and servers. In this article, we will use only the Spring MCP client. If you are interested in more details on building MCP servers, please refer to the following post on my blog. MCP has recently become very popular, and you can easily find an MCP server implementation for almost any existing technology.

You can actually run MCP servers in many different ways. Ultimately, they are just ordinary applications whose task is to make a given tool available via an API compatible with the MCP protocol. The most popular AI IDE tools, such as Cloud Code, Codex, and Cursor, make it easy to run any MCP server. I will take a slightly unusual approach and use the support provided with Docker Desktop, namely the MCP Toolkit.

My idea for today is to build a simple Spring AI application that communicates with MCP servers for GitHub, SonarQube, and CircleCI to retrieve information about my repositories and projects hosted on those platforms. The Docker MCP Toolkit provides a single gateway that distributes incoming requests among running MCP servers. Let’s see how it works in practice!

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions. This repository contains several sample applications. The correct application for this article is in the spring-ai-mcp/external-mcp-sample-client directory.

Getting Started with Docker MCP Toolkit

First, run your Docker Desktop. You can find more than 300 popular MCP servers to run in the “Catalog” bookmark. Next, you should search for SonarQube, CircleCI, and GitHub Official servers (note that there are additional GitHub servers). To be honest, I encountered unexpected issues running the CircleCI server, so for now, I based the application on MCP communication with GitHub and SonarCloud.

spring-ai-mcp-docker-toolkit

Each MCP server usually requires configuration, such as your authorization token or service address. Therefore, before adding a server to Docker Toolkit, you must first configure it as described below. Only then should you click the “Add MCP server” button.

spring-ai-mcp-sonarqube-server

For the GitHub MCP server, in addition to entering the token itself, you must also authorize it via OAuth. Here, too, the MCP Toolkit provides graphical support. After entering the token, go to the “OAuth” tab to complete the process.

This is what your final result should look like before moving on to implementing the Spring Boot application. You have added two MCP servers, which together offer 65 tools.

To make both MCP servers available outside of Docker, you need to run the Docker MCP gateway. In the default stdio mode, the API is not exposed outside Docker. Therefore, you need to change the mode to streaming using the transport parameter, as shown below. The gateway is exposed on port 8811.

docker mcp gateway run --port 8811 --transport streaming
ShellSession

This is what it looks like after launch. Additionally, the Docker MCP gateway is secured by an API token. This will require appropriate settings on the MCP client side in the Spring AI application.

spring-ai-mcp-docker-gateway-start

Integrate Spring AI with External MCP Clients

Prepare the MCP Client with Spring AI

Let’s move on to implementing our sample application. We need to include the Spring AI MCP client and the library that communicates with the LLM model. For me, it’s OpenAI, but you can use many other options available through Spring AI’s integration with popular chat models.

<dependencies>
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
  </dependency>
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-mcp-client-webflux</artifactId>
  </dependency>
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
  </dependency>
</dependencies>
XML

Our MCP client must authenticate itself to the Docker MCP gateway using an API token. Therefore, we need to modify the Spring WebClient used by Spring AI to communicate with MCP servers. It is best to use the ExchangeFilterFunction interface to create an HTTP filter that adds the appropriate Authorization header with the bearer token to the outgoing request. The token will be injected from the application properties.

@Component
public class McpSyncClientExchangeFilterFunction implements ExchangeFilterFunction {

    @Value("${mcp.token}")
    private String token;

    @Override
    public Mono<ClientResponse> filter(ClientRequest request, 
                                       ExchangeFunction next) {

            var requestWithToken = ClientRequest.from(request)
                    .headers(headers -> headers.setBearerAuth(token))
                    .build();
            return next.exchange(requestWithToken);

    }

}
Java

Then, let’s set the previously implemented filter for the default WebClient builder.

@SpringBootApplication
public class ExternalMcpSampleClient {

    public static void main(String[] args) {
        SpringApplication.run(ExternalMcpSampleClient.class, args);
    }

    @Bean
    WebClient.Builder webClientBuilder(McpSyncClientExchangeFilterFunction filterFunction) {
        return WebClient.builder()
                .filter(filterFunction);
    }
}
Java

After that, we must configure the MCP gateway address and token in the application properties. To achieve that, we must use the spring.ai.mcp.client.streamable-http.connections property. The MCP gateway listens on port 8811. The token value will be read from the MCP_TOKEN environment variable.

spring.ai.mcp.client.streamable-http.connections:
  docker-mcp-gateway:
    url: http://localhost:8811

mcp.token: ${MCP_TOKEN}
YAML

Implement Application Logic with Spring AI and OpenAI Support

The concept behind the sample application is quite simple. It involves creating a @RestController per tool provided by each MCP server. For each, I will create a simple prompt to request the number of repositories or projects in my account on a given platform. Let’s start with SonCloud. Each implementation uses the Spring AI ToolCallbackProvider bean to enable the available MCP server to communicate with the LLM model.

@RestController
@RequestMapping("/sonarcloud")
public class SonarCloudController {

    private final static Logger LOG = LoggerFactory
        .getLogger(SonarCloudController.class);
    private final ChatClient chatClient;

    public SonarCloudController(ChatClient.Builder chatClientBuilder,
                                ToolCallbackProvider tools) {
        this.chatClient = chatClientBuilder
                .defaultToolCallbacks(tools)
                .build();
    }

    @GetMapping("/count")
    String countRepositories() {
        PromptTemplate pt = new PromptTemplate("""
                How many projects in Sonarcloud do I have ?
                """);
        Prompt p = pt.create();
        return this.chatClient.prompt(p)
                .call()
                .content();
    }

}
Java

Below is a very similar implementation for GitHub MCP. This controller is exposed under the /github context path.

@RestController
@RequestMapping("/github")
public class GitHubController {

    private final static Logger LOG = LoggerFactory
        .getLogger(GitHubController.class);
    private final ChatClient chatClient;

    public GitHubController(ChatClient.Builder chatClientBuilder,
                            ToolCallbackProvider tools) {
        this.chatClient = chatClientBuilder
                .defaultToolCallbacks(tools)
                .build();
    }

    @GetMapping("/count")
    String countRepositories() {
        PromptTemplate pt = new PromptTemplate("""
                How many repositories in GitHub do I have ?
                """);
        Prompt p = pt.create();
        return this.chatClient.prompt(p)
                .call()
                .content();
    }

}
Java

Finally, there is the controller implementation for CircleCI MCP. It is available externally under the /circleci context path.

@RestController
@RequestMapping("/circleci")
public class CircleCIController {

    private final static Logger LOG = LoggerFactory
        .getLogger(CircleCIController.class);
    private final ChatClient chatClient;

    public CircleCIController(ChatClient.Builder chatClientBuilder,
                              ToolCallbackProvider tools) {
        this.chatClient = chatClientBuilder
                .defaultToolCallbacks(tools)
                .build();
    }

    @GetMapping("/count")
    String countRepositories() {
        PromptTemplate pt = new PromptTemplate("""
                How many projects in CircleCI do I have ?
                """);
        Prompt p = pt.create();
        return this.chatClient.prompt(p)
                .call()
                .content();
    }

}
Java

The last controller implementation is a bit more complex. First, I need to instruct the LLM model to generate project names in SonarQube and specify my GitHub username. This will not be part of the main prompt. Rather, it will be the system role, which guides the AI’s behavior and response style. Therefore, I’ll create the SystemPromptTemplate first. The user role prompt accepts an input parameter specifying the name of my GitHub repository. The response should combine data on the last commit in a given repository with the status of the most recent SonarQube analysis. In this case, the LLM will need to communicate with two MCP servers running with Docker MCP simultaneously.

@RestController
@RequestMapping("/global")
public class GlobalController {

    private final static Logger LOG = LoggerFactory
        .getLogger(CircleCIController.class);
    private final ChatClient chatClient;

    public GlobalController(ChatClient.Builder chatClientBuilder,
                            ToolCallbackProvider tools) {
        this.chatClient = chatClientBuilder
                .defaultToolCallbacks(tools)
                .build();
    }

    @GetMapping("/status/{repo}")
    String repoStatus(@PathVariable String repo) {
        SystemPromptTemplate st = new SystemPromptTemplate("""
                My username in GitHub is piomin.
                Each my project key in SonarCloud contains the prefix with my organization name and _ char.
                """);
        var stMsg = st.createMessage();

        PromptTemplate pt = new PromptTemplate("""
                When was the last commit made in my GitHub repository {repo} ?
                What is the latest analyze status in SonarCloud for that repo ?
                """);
        var usMsg = pt.createMessage(Map.of("repo", repo));

        Prompt prompt = new Prompt(List.of(usMsg, stMsg));
        return this.chatClient.prompt(prompt)
                .call()
                .content();
    }
}
Java

Before running the app, we must set two required environment variables that contain the OpenAI and Docker MCP gateway tokens.

export MCP_TOKEN=by1culxc6sctmycxtyl9xh7499mb8pctbsdb3brha1hvmm4d8l
export SPRING_AI_OPENAI_API_KEY=<YOUR_OPEN_AI_TOKEN>
Plaintext

Finally, we can run our Spring Boot app with the following command.

mvn spring-boot:run
ShellSession

Firstly, I’m going to ask about the number of my GitHub repositories.

curl http://localhost:8080/github/count
ShellSession

Then, I can check the number of projects in my SonarCloud account.

curl http://localhost:8080/github/sonarcloud
ShellSession

Finally, I can choose a specific repository and verify the last commit and the current analysis status in SonarCloud.

curl http://localhost:8080/global/status/sample-spring-boot-kafka
ShellSession

Here’s the LLM answer for my sample-spring-boot-kafka repository. You can perform the same exercise for your repositories and projects.

Conclusion

Spring AI, combined with the MCP client, opens a powerful path toward building truly tool-aware AI applications. By using the Docker MCP Gateway, we can easily host and manage MCP servers such as GitHub or SonarQube consistently and reproducibly, without tightly coupling them to our application runtime. Docker provides a user-friendly interface for managing MCP servers, giving users access to everything through a single MCP gateway. This approach appears to have advantages, particularly during application development.

The post Spring AI with External MCP Servers appeared first on Piotr's TechBlog.

]]>
https://piotrminkowski.com/2026/02/06/spring-ai-with-external-mcp-servers/feed/ 0 15974
MCP with Quarkus LangChain4j https://piotrminkowski.com/2025/11/24/mcp-with-quarkus-langchain4j/ https://piotrminkowski.com/2025/11/24/mcp-with-quarkus-langchain4j/#respond Mon, 24 Nov 2025 07:45:15 +0000 https://piotrminkowski.com/?p=15845 This article shows how to use Quarkus LangChain4j support for MCP (Model Context Protocol) on both the server and client sides. You will learn how to serve tools and prompts on the server side and discover them in the Quarkus MCP client-side application. The Model Context Protocol is a standard for managing contextual interactions with […]

The post MCP with Quarkus LangChain4j appeared first on Piotr's TechBlog.

]]>
This article shows how to use Quarkus LangChain4j support for MCP (Model Context Protocol) on both the server and client sides. You will learn how to serve tools and prompts on the server side and discover them in the Quarkus MCP client-side application. The Model Context Protocol is a standard for managing contextual interactions with AI models. It provides a standardized way to connect AI models to external data sources and tools. It can help with building complex workflows on top of LLMs.

This article is the second part of a series describing some of the Quarkus AI project’s most notable features.  Before reading this article, I recommend checking out two previous parts of the tutorial:

You can also compare Quarkus’ support for MCP with similar support on the Spring AI side. You can find the article I mentioned on my blog here.

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Architecture for the Quarkus MCP Scenario

Let’s start with a diagram of our application architecture. Two Quarkus applications act as MCP servers. They connect to the in-memory database and use Quarkus LangChain4j MCP Server support to expose @Tool methods to the MCP client-side app. The client-side app communicates with the OpenAI model. It includes the tools exposed by the server-side apps in the user query to the AI model. The person-mcp-server app provides @Tool methods for searching persons in the database table. The account-mcp-server is doing the same for the persons’ accounts.

quarkus-mcp-arch

Build an MCP Server with Quarkus

Both MCP server applications are similar. They connect to the H2 database via the Quarkus Panache ORM extension. Both provide MCP API via Server-Sent Events (SSE) transport. Here’s a list of required Maven dependencies:

<dependencies>
  <dependency>
    <groupId>io.quarkiverse.mcp</groupId>
    <artifactId>quarkus-mcp-server-sse</artifactId>
  </dependency>
  <dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-hibernate-orm-panache</artifactId>
  </dependency>
  <dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-jdbc-h2</artifactId>
  </dependency>
  <dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-junit5</artifactId>
    <scope>test</scope>
  </dependency>
</dependencies>
XML

Let’s start with the person-mcp-server application. Here’s the @Entity class for interacting with the person table. It uses Panache support to avoid the need for getter and setter declarations.

@Entity
public class Person extends PanacheEntityBase {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    public Long id;
    public String firstName;
    public String lastName;
    public int age;
    public String nationality;
    @Enumerated(EnumType.STRING)
    public Gender gender;

}
Java

The PersonRepository class contains a single method for searching persons by their nationality:

@ApplicationScoped
public class PersonRepository implements PanacheRepository<Person> {

    public List<Person> findByNationality(String nationality) {
        return find("nationality", nationality).list();
    }

}
Java

Next, prepare the “tools service” that searches for a single person by ID or a list of people of a given nationality in the database. Each method must be annotated with @Tool and include a description in the description field. Quarkus LangChain4j does not allow a Java List to be returned, so we need to wrap it using a dedicated Persons object.

@ApplicationScoped
public class PersonTools {

    PersonRepository personRepository;

    public PersonTools(PersonRepository personRepository) {
        this.personRepository = personRepository;
    }

    @Tool(description = "Find person by ID")
    public Person getPersonById(
            @ToolArg(description = "Person ID") Long id) {
        return personRepository.findById(id);
    }

    @Tool(description = "Find all persons by nationality")
    public Persons getPersonsByNationality(
            @ToolArg(description = "Nationality") String nationality) {
        return new Persons(personRepository.findByNationality(nationality));
    }
}
Java

Here’s our List<Person> wrapper:

public class Persons {

    private List<Person> persons;

    public Persons(List<Person> persons) {
        this.persons = persons;
    }

    public List<Person> getPersons() {
        return persons;
    }

    public void setPersons(List<Person> persons) {
        this.persons = persons;
    }
}
Java

The implementation of the account-mcp-server application is essentially very similar. Here’s the @Entity class for interacting with the account table. It uses Panache support to avoid the need for getter and setter declarations.

@Entity
public class Account extends PanacheEntityBase {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    public Long id;
    public String number;
    public int balance;
    public Long personId;

}
Java

The AccountRepository class contains a single method for searching accounts by person ID:

@ApplicationScoped
public class AccountRepository implements PanacheRepository<Account> {

    public List<Account> findByPersonId(Long personId) {
        return find("personId", personId).list();
    }

}
Java

Once again, the list inside the “tools service” must be wrapped by the dedicated object. The single method annotated with @Tool returns a list of accounts assigned to a given person.

@ApplicationScoped
public class AccountTools {

    AccountRepository accountRepository;

    public AccountTools(AccountRepository accountRepository) {
        this.accountRepository = accountRepository;
    }

    @Tool(description = "Find all accounts by person ID")
    public Accounts getAccountsByPersonId(
            @ToolArg(description = "Person ID") Long personId) {
        return new Accounts(accountRepository.findByPersonId(personId));
    }

}
Java

The person-mcp-server starts on port 8082, while the account-mcp-server listens on port 8081. To change the default HTTP, use the quarkus.http.port property in your application.properties file.

Build an MCP Client with Quarkus

Our application interacts with the OpenAI chat model, so we must include the Quarkus LangChain4j OpenAI extension. In turn, to integrate the client-side application with MCP-compliant servers, we need to include the quarkus-langchain4j-mcp extension.

<dependencies>
  <dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-rest-jackson</artifactId>
  </dependency>
  <dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-openai</artifactId>
    <version>${quarkus.langchain4j.version}</version>
  </dependency>
  <dependency>
    <groupId>io.quarkiverse.langchain4j</groupId>
    <artifactId>quarkus-langchain4j-mcp</artifactId>
    <version>${quarkus.langchain4j.version}</version>
  </dependency>
</dependencies>
XML

The sample-client Quarkus app interacts with both the person-mcp-server app and the account-mcp-server app. Therefore, it defines two AI services. As with a standard AI application in Quarkus, those services must be annotated with @RegisterAiService. Then we define methods and prompt templates, also with annotations @UserMessage or @SystemMessage. If a given method is to use one of the MCP servers, it must be annotated with @McpToolBox. The name inside the annotation corresponds to the MCP server name set in the configuration properties. The PersonService AI service visible below uses the person-service MCP server.

@ApplicationScoped
@RegisterAiService
public interface PersonService {

    @SystemMessage("""
        You are a helpful assistant that generates realistic person data.
        Always respond with valid JSON format.
        """)
    @UserMessage("""
        Find persons with {nationality} nationality.
        Output **only valid JSON**, no explanations, no markdown, no ```json blocks.
        """)
    @McpToolBox("person-service")
    Persons findByNationality(String nationality);

    @SystemMessage("""
        You are a helpful assistant that generates realistic person data.
        Always respond with valid JSON format.
        """)
    @UserMessage("How many persons come from {nationality} ?")
    @McpToolBox("person-service")
    int countByNationality(String nationality);

}
Java

The AI service shown above corresponds to this configuration. You need to specify the MCP server’s name and address. As you remember, person-mcp-server listens on port 8082. The client application uses the name person-service here, and the standard endpoint to MCP SSE for Quarkus is /mcp/sse. To explore the solution itself, it is also worth enabling logging of MCP requests and responses.

quarkus.langchain4j.mcp.person-service.transport-type = http
quarkus.langchain4j.mcp.person-service.url = http://localhost:8082/mcp/sse
quarkus.langchain4j.mcp.person-service.log-requests = true
quarkus.langchain4j.mcp.person-service.log-responses = true
Plaintext

Here is a similar implementation for the AccountService AI service. It interacts with the MCP server configured under the account-service name.

@ApplicationScoped
@RegisterAiService
public interface AccountService {

    @SystemMessage("""
        You are a helpful assistant that generates realistic data.
        Return a single number.
        """)
    @UserMessage("How many accounts has person with {personId} ID ?")
    @McpToolBox("account-service")
    int countByPersonId(int personId);

    @UserMessage("""
        How many accounts has person with {personId} ID ?
        Return person name, nationality and a total balance on his/her accounts.
        """)
    @McpToolBox("account-service")
    String balanceByPersonId(int personId);

}
Java

Here’s the corresponding configuration for that service. No surprises.

quarkus.langchain4j.mcp.account-service.transport-type = http
quarkus.langchain4j.mcp.account-service.url = http://localhost:8081/mcp/sse
quarkus.langchain4j.mcp.account-service.log-requests = true
quarkus.langchain4j.mcp.account-service.log-responses = true
Plaintext

Finally, we must provide some configuration to integrate our Quarkus application with Open AI chat model. It assumes that Open AI token is available as the OPEN_AI_TOKEN environment variable.

quarkus.langchain4j.chat-model.provider = openai
quarkus.langchain4j.log-requests = true
quarkus.langchain4j.log-responses = true
quarkus.langchain4j.openai.api-key = ${OPEN_AI_TOKEN}
quarkus.langchain4j.openai.timeout = 20s
Plaintext

We can test individual AI services by calling endpoints provided by the client-side application. There are two endpoints GET /count-by-person-id/{personId} and GET /balance-by-person-id/{personId} that use LLM prompts to calculate number of persons and a total balances amount of all accounts belonging to a given person.

@Path("/accounts")
public class AccountResource {

    private final AccountService accountService;

    public AccountResource(AccountService accountService) {
        this.accountService = accountService;
    }

    @POST
    @Path("/count-by-person-id/{personId}")
    public int countByPersonId(int personId) {
        return accountService.countByPersonId(personId);
    }

    @POST
    @Path("/balance-by-person-id/{personId}")
    public String balanceByPersonId(int personId) {
        return accountService.balanceByPersonId(personId);
    }

}
Java

MCP for Promps

MCP servers can also provide other functionalities beyond just tools. Let’s go back to the person-mcp-server app for a moment. To share a prompt message, you can create a class that defines methods returning the PromptMessage object. Then, we must annotate such methods with @Prompt, and their arguments with @PromptArg.

@ApplicationScoped
public class PersonPrompts {

    final String findByNationalityPrompt = """
        Find persons with {nationality} nationality.
        Output **only valid JSON**, no explanations, no markdown, no ```json blocks.
        """;

    @Prompt(description = "Find by nationality.")
    PromptMessage findByNationalityPrompt(@PromptArg(description = "The nationality") String nationality) {
        return PromptMessage.withUserRole(new TextContent(findByNationalityPrompt));
    }

}
Java

Once we start the application, we can use Quarkus Dev UI to verify a list of provided tools and prompts.

Client-side integration with MCP prompts is a bit more complex than with tools. We must inject the McpClient to a resource controller to load a given prompt programmatically using its name.

@Path("/persons")
public class PersonResource {

    @McpClientName("person-service")
    McpClient mcpClient;
    
    // OTHET METHODS...
    
    @POST
    @Path("/nationality-with-prompt/{nationality}")
    public List<Person> findByNationalityWithPrompt(String nationality) {
        Persons p = personService.findByNationalityWithPrompt(loadPrompt(nationality), nationality);
        return p.getPersons();
    }
    
    private String loadPrompt(String nationality) {
        McpGetPromptResult prompt = mcpClient.getPrompt("findByNationalityPrompt", Map.of("nationality", nationality));
        return ((TextContent) prompt.messages().getFirst().content().toContent()).text();
    }
}
Java

In this case, the Quarkus AI service should not define the @UserMessage on the entire method, but just as the method argument. Then a prompt message is loaded from the MCP server and filled with the nationality parameter value before sending to the AI model.

@ApplicationScoped
@RegisterAiService
public interface PersonService {

    // OTHER METHODS...
    
    @SystemMessage("""
        You are a helpful assistant that generates realistic person data.
        Always respond with valid JSON format.
        """)
    @McpToolBox("person-service")
    Persons findByNationalityWithPrompt(@UserMessage String userMessage, String nationality);

}
Java

Testing MCP Tools with Quarkus

Quarkus provides a dedicated module for testing MCP tools. We can use after including the following dependency in the Maven pom.xml:

<dependency>
  <groupId>io.quarkiverse.mcp</groupId>
  <artifactId>quarkus-mcp-server-test</artifactId>
  <scope>test</scope>
</dependency>
XML

The following test verifies the MCP tool methods provided by the person-mcp-server application. The McpAssured class allows us to use SSE, streamable, and WebSocket test clients. To create a new client for SSE, invoke the newConnectedSseClient() static method. After that, we can use one of several available variants of the toolsCall(...) method to verify the response returned by a given @Tool.

@QuarkusTest
public class PersonToolsTest {

    ObjectMapper mapper = new ObjectMapper();

    @Test
    public void testGetPersonsByNationality() {
        McpAssured.McpSseTestClient client = McpAssured.newConnectedSseClient();
        client.when()
                .toolsCall("getPersonsByNationality", Map.of("nationality", "Denmark"),
                        r -> {
                            try {
                                Persons p = mapper.readValue(r.content().getFirst().asText().text(), Persons.class);
                                assertFalse(p.getPersons().isEmpty());
                            } catch (JsonProcessingException e) {
                                throw new RuntimeException(e);
                            }
                        })
                .thenAssertResults();
    }

    @Test
    public void testGetPersonById() {
        McpAssured.McpSseTestClient client = McpAssured.newConnectedSseClient();
        client.when()
                .toolsCall("getPersonById", Map.of("id", 10),
                        r -> {
                            try {
                                Person p = mapper.readValue(r.content().getFirst().asText().text(), Person.class);
                                assertNotNull(p);
                                assertNotNull(p.id);
                            } catch (JsonProcessingException e) {
                                throw new RuntimeException(e);
                            }
                        })
                .thenAssertResults();
    }
}
Java

Running Quarkus Applications

Finally, let’s run all our quarkus applications. Go to the mcp/account-mcp-server directory and run the application in development mode:

$ cd mcp/account-mcp-server
$ mvn quarkus:dev
ShellSession

Then do the same for the person-mcp-server application.

$ cd mcp/person-mcp-server
$ mvn quarkus:dev
ShellSession

Before running the last sample-client application, export the OpenAI API token as the OPEN_AI_TOKEN environment variable.

$ cd mcp/sample-client
$ export OPEN_AI_TOKEN=<YOUR_OPENAI_TOKEN>
$ mvn quarkus:dev
ShellSession

We can verify a list of tools or prompts exposed by each MCP server application by visiting its Quarkus Dev UI console. It provides a dedicated “MCP Server tile.”

quarkus-mcp-devui

Here’s a list of tools provided by the person-mcp-server app via Quarkus Dev UI.

Then, we can switch to the sample-client Dev UI console. We can verify and test all interactions with the MCP servers from our client-side app.

quarkus-mcp-client-ui

Once all the sample applications are running, we can test the MCP communication by calling the HTTP endpoints exposed by the sample-client app. Both person-mcp-server and account-mcp-server load some test data on startup using the import.sql file. Here are the test API calls for all the REST endpoints.

$ curl -X POST http://localhost:8080/persons/nationality/Denmark
$ curl -X POST http://localhost:8080/persons/count-by-nationality/Denmark
$ curl -X POST http://localhost:8080/persons/nationality-with-prompt/Denmark
$ curl -X POST http://localhost:8080/accounts/count-by-person-id/2
$ curl -X POST http://localhost:8080/accounts/balance-by-person-id/2
ShellSession

Conclusion

With Quarkus, creating applications that use MCP is not difficult. If you understand the idea of tool calling in AI, understanding the MCP-based approach is not difficult for you. This article shows you how to connect your application to several MCP servers, implement tests to verify the elements shared by a given application using MCP, and support on the Quarkus Dev UI side.

The post MCP with Quarkus LangChain4j appeared first on Piotr's TechBlog.

]]>
https://piotrminkowski.com/2025/11/24/mcp-with-quarkus-langchain4j/feed/ 0 15845
AI Tool Calling with Quarkus LangChain4j https://piotrminkowski.com/2025/06/23/ai-tool-calling-with-quarkus-langchain4j/ https://piotrminkowski.com/2025/06/23/ai-tool-calling-with-quarkus-langchain4j/#comments Mon, 23 Jun 2025 05:19:14 +0000 https://piotrminkowski.com/?p=15757 This article will show you how to use Quarkus LangChain4j AI support with the most popular chat models for the “tool calling” feature. Tool calling (sometimes referred to as function calling) is a typical pattern in AI applications that enables a model to interact with APIs or tools, extending its capabilities. The most popular AI […]

The post AI Tool Calling with Quarkus LangChain4j appeared first on Piotr's TechBlog.

]]>
This article will show you how to use Quarkus LangChain4j AI support with the most popular chat models for the “tool calling” feature. Tool calling (sometimes referred to as function calling) is a typical pattern in AI applications that enables a model to interact with APIs or tools, extending its capabilities. The most popular AI models are trained to know when to call a function. The Quarkus LangChain4j extension offers built-in support for tool calling. In this article, you will learn how to define tool methods to get data from the third-party APIs and the internal database.

This article is the second part of a series describing some of the Quarkus AI project’s most notable features. Before reading on, I recommend checking out my introduction to Quarkus LangChain4j, which is available here. The first part describes such features as prompts, structured output, and chat memory. There is also a similar tutorial series about Spring AI. You can compare Quarkus support for tool calling described here with a similar Spring AI support described in the following post.

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Tool Calling Motivation

For ease of comparison, this article will implement an identical scenario to an analogous application written in Spring AI. You can find a GitHub sample repository with the Spring AI app here. As you know, the “tool calling” feature helps us solve a common AI model challenge related to internal or live data sources. If we want to augment a model with such data, our applications must allow it to interact with a set of APIs or tools. In our case, the internal database (H2) contains information about the structure of our stock wallet. The sample Quarkus application asks an AI model about the total value of the wallet based on daily stock prices or the highest value for the last few days. The model must retrieve the structure of our stock wallet and the latest stock prices.

Use Tool Calling with Quarkus LangChain4j

Create ShareTools

Let’s begin with the ShareTools implementation, which is responsible for getting a list of the wallet’s shares from a database. It defines a single method annotated with @Tool. The most crucial element here is to provide a clear description of the method within the @Tool annotation. It allows the AI model to understand the function’s responsibilities. The method returns the number of shares for each company in our portfolio. It is retrieved from the database through the Quarkus Panache ORM repository.

@ApplicationScoped
public class ShareTools {

    private ShareRepository shareRepository;

    public ShareTools(ShareRepository shareRepository) {
        this.shareRepository = shareRepository;
    }

    @Tool("Return number of shares for each company in my wallet")
    public List<Share> getNumberOfShares() {
        return shareRepository.findAll().list();
    }
}
Java

The sample application launches an embedded, in-memory database and inserts test data into the stock table. Our wallet contains the most popular companies on the U.S. stock market, including Amazon, Meta, and Microsoft. Here’s a dataset inserted on application startup.

insert into share(id, company, quantity) values (1, 'AAPL', 100);
insert into share(id, company, quantity) values (2, 'AMZN', 300);
insert into share(id, company, quantity) values (3, 'META', 300);
insert into share(id, company, quantity) values (4, 'MSFT', 400);
SQL

Create StockTools

The StockTools The class is responsible for interaction with the TwelveData stock API. It defines two methods. The getLatestStockPrices method returns only the latest close price for a specified company. It is a tool calling version of the method provided within the pl.piomin.services.functions.stock.StockService function. The second method is more complicated. It must return historical daily close prices for a defined number of days. Each price must be correlated with a quotation date.

@ApplicationScoped
public class StockTools {

    private Logger log;
    private StockDataClient stockDataClient;

    public StockTools(@RestClient StockDataClient stockDataClient, Logger log) {
        this.stockDataClient = stockDataClient;
        this.log = log;
    }
    
    @ConfigProperty(name = "STOCK_API_KEY", defaultValue = "none")
    String apiKey;

    @Tool("Return latest stock prices for a given company")
    public StockResponse getLatestStockPrices(String company) {
        log.infof("Get stock prices for: %s", company);
        StockData data = stockDataClient.getStockData(company, apiKey, "1min", 1);
        DailyStockData latestData = data.getValues().get(0);
        log.infof("Get stock prices (%s) -> %s", company, latestData.getClose());
        return new StockResponse(Float.parseFloat(latestData.getClose()));
    }

    @Tool("Return historical daily stock prices for a given company")
    public List<DailyShareQuote> getHistoricalStockPrices(String company, int days) {
        log.infof("Get historical stock prices: %s for %d days", company, days);
        StockData data = stockDataClient.getStockData(company, apiKey, "1min", days);
        return data.getValues().stream()
                .map(d -> new DailyShareQuote(company, Float.parseFloat(d.getClose()), d.getDatetime()))
                .toList();
    }

}
Java

Here’s the DailyShareQuote Java record returned in the response list.

public record DailyShareQuote(String company, float price, String datetime) {
}
Java

Here’s a @RestClient responsible for calling the TwelveData stock API.

@RegisterRestClient(configKey = "stock-api")
public interface StockDataClient {

    @GET
    @Path("/time_series")
    StockData getStockData(@RestQuery String symbol,
                           @RestQuery String apikey,
                           @RestQuery String interval,
                           @RestQuery int outputsize);
}
Java

For the demo, you can easily enable complete logging of both communication with the AI model through LangChain4j and with the stock API via @RestClient.

quarkus.langchain4j.log-requests = true
quarkus.langchain4j.log-responses = true
quarkus.rest-client.stock-api.url = https://api.twelvedata.com
quarkus.rest-client.logging.scope = request-response
quarkus.rest-client.stock-api.scope = all
%dev.quarkus.log.category."org.jboss.resteasy.reactive.client.logging".level = DEBUG
Plaintext

Quarkus LangChain4j Tool Calling Flow

You can easily register @Tools on your Quarkus AI service with the tools argument inside the @RegisterAiService annotation. The calculateWalletValueWithTools() method calculates the value of our stock wallet in dollars. It uses the latest daily stock prices for each company’s shares from the wallet. Since this method directly returns the response received from the AI model, it is essential to perform additional validation of the content received. For this purpose, a so-called guardrail should be implemented and set in place. We can easily achieve it with the @OutputGuardrails annotation. The calculateHighestWalletValue method calculates the value of our stock wallet in dollars for each day in the specified period determined by the days variable. Then it must return the day with the highest stock wallet value.

@RegisterAiService(tools = {StockTools.class, ShareTools.class})
public interface WalletAiService {

    @UserMessage("""
    What’s the current value in dollars of my wallet based on the latest stock daily prices ?
    
    Return subtotal value in dollars for each company in my wallet.
    In the end, return the total value in dollars wrapped by ***.
    """)
    @OutputGuardrails(WalletGuardrail.class)
    String calculateWalletValueWithTools();

    @UserMessage("""
    On which day during last {days} days my wallet had the highest value in dollars based on the historical daily stock prices ?
    """)
    String calculateHighestWalletValue(int days);
}
Java

Here’s the implementation of the guardrail that validates the response returned by the calculateWalletValueWithTools method. It verifies if the total value in dollars is wrapped by *** and starts with the $ sign.

@ApplicationScoped
public class WalletGuardrail implements OutputGuardrail {

    Pattern pattern = Pattern.compile("\\*\\*\\*(.*?)\\*\\*\\*");

    private Logger log;
    
    public WalletGuardrail(Logger log) {
        this.log = log;
    }

    @Override
    public OutputGuardrailResult validate(AiMessage responseFromLLM) {
        try {
            Matcher matcher = pattern.matcher(responseFromLLM.text());
            if (matcher.find()) {
                String amount = matcher.group(1);
                log.infof("Extracted amount: %s", amount);
                if (amount.startsWith("$")) {
                    return success();
                }
            }
        } catch (Exception e) {
            return reprompt("Invalid text format", e, "Make sure you return a valid requested text");
        }
        return failure("Total amount not found");
    }
}
Java

Here’s the REST endpoints implementation. It uses the WalletAiService bean to interact with the AI model. It exposes two endpoints: GET /wallet/with-tools and GET /wallet/highest-day/{days}.

@Path("/wallet")
@Produces(MediaType.TEXT_PLAIN)
public class WalletController {

    private final WalletAiService walletAiService;

    public WalletController(WalletAiService walletAiService) {
        this.walletAiService = walletAiService;
    }

    @GET
    @Path("/with-tools")
    public String calculateWalletValueWithTools() {
        return walletAiService.calculateWalletValueWithTools();
    }

    @GET
    @Path("/highest-day/{days}")
    public String calculateHighestWalletValue(int days) {
        return walletAiService.calculateHighestWalletValue(days);
    }

}
Java

The following diagram illustrates the flow for the second use case, which returns the day with the highest stock wallet value. First, it must connect to the database and retrieve the stock wallet structure, which contains the number of shares for each company. Then, it must call the stock API for every company found in the wallet. So, finally, the method calculateHighestWalletValue should be called four times with different values of the company name parameter and a value of the days determined by the HTTP endpoint path variable. Once all the data is collected, the AI model calculates the highest wallet value and returns it together with the quotation date.

quarkus-tool-calling-arch

Automated Testing

Most of my repositories are automatically updated to the latest versions of libraries. After updating the library version, automated tests are run to verify that everything works as expected. To verify the correctness of today’s scenario, we will mock stock API calls while integrating with the actual OpenAI service. To mock API calls, you can use the quarkus-junit5-mockito extension.

<dependency>
  <groupId>io.quarkus</groupId>
  <artifactId>quarkus-junit5</artifactId>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>io.quarkus</groupId>
  <artifactId>quarkus-junit5-mockito</artifactId>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>io.rest-assured</groupId>
  <artifactId>rest-assured</artifactId>
  <scope>test</scope>
</dependency>
XML

The following JUnit test verifies two endpoints exposed by WalletController. As you may remember, there is also an output guardrail set on the AI service called by the GET /wallet/with-tools endpoint.

@QuarkusTest
@TestMethodOrder(MethodOrderer.OrderAnnotation.class)
class WalletControllerTest {

    @InjectMock
    @RestClient
    StockDataClient stockDataClient;

    @BeforeEach
    void setUp() {
        // Mock the stock data responses
        StockData aaplStockData = createMockStockData("AAPL", "150.25");
        StockData amznStockData = createMockStockData("AMZN", "120.50");
        StockData metaStockData = createMockStockData("META", "250.75");
        StockData msftStockData = createMockStockData("MSFT", "300.00");

        // Mock the stock data client responses
        when(stockDataClient.getStockData(eq("AAPL"), anyString(), anyString(), anyInt()))
            .thenReturn(aaplStockData);
        when(stockDataClient.getStockData(eq("AMZN"), anyString(), anyString(), anyInt()))
            .thenReturn(amznStockData);
        when(stockDataClient.getStockData(eq("META"), anyString(), anyString(), anyInt()))
            .thenReturn(metaStockData);
        when(stockDataClient.getStockData(eq("MSFT"), anyString(), anyString(), anyInt()))
            .thenReturn(msftStockData);
    }

    private StockData createMockStockData(String symbol, String price) {
        DailyStockData dailyData = new DailyStockData();
        dailyData.setDatetime("2023-01-01");
        dailyData.setOpen(price);
        dailyData.setHigh(price);
        dailyData.setLow(price);
        dailyData.setClose(price);
        dailyData.setVolume("1000");

        StockData stockData = new StockData();
        stockData.setValues(List.of(dailyData));
        return stockData;
    }

    @Test
    @Order(1)
    void testCalculateWalletValueWithTools() {
        given()
          .when().get("/wallet/with-tools")
          .then().statusCode(200)
                 .contentType(ContentType.TEXT)
                 .body(notNullValue())
                 .body(not(emptyString()));
    }

    @Test
    @Order(2)
    void testCalculateHighestWalletValue() {
        given()
          .pathParam("days", 7)
          .when().get("/wallet/highest-day/{days}")
          .then().statusCode(200)
                 .contentType(ContentType.TEXT)
                 .body(notNullValue())
                 .body(not(emptyString()));
    }
}
Java

Tests can be automatically run, for example, by the CircleCI pipeline on each dependency update via the pull request.

Run the Application to Verify Tool Calling

Before starting the application, we must set environment variables with the AI model and stock API tokens.

$ export OPEN_AI_TOKEN=<YOUR_OPEN_AI_TOKEN>
$ export STOCK_API_KEY=<YOUR_STOCK_API_KEY>
ShellSession

Then, run the application in development mode with the following command:

mvn quarkus:dev
ShellSession

Once the application is started, you can call the first endpoint. The GET /wallet/with-tools calculates the total least value of the stock wallet structure stored in the database.

curl http://localhost:8080/wallet/with-tools
ShellSession

You can see either the response from the chat AI model or the exception thrown after an unsuccessful validation using a guardrail. If LLM response validation fails, the REST endpoint returns the HTTP 500 code.

quarkus-tool-calling-guardrail

Here’s the successfully validated LLM response.

quarkus-tool-calling-success

The sample Quarkus application logs the whole communication with the AI model. Here, you can see a first request containing a list of registered functions (tools) along with their descriptions.

quarkus-tool-calling-logs

Then we can call the GET /wallet/highest-day/{days} endpoint to return the day with the highest wallet value. Let’s calculate it for the last 7 days.

curl http://localhost:8080/wallet/highest-day/7
ShellSession

Here’s the response.

Finally, you can perform a similar test as before, but for the Mistral AI model. Before running the application, set your API token for Mistral AI and rename the default model to mistralai.

$ export MISTRAL_AI_TOKEN=<YOUR_MISTRAL_AI_TOKEN>
$ export AI_MODEL_PROVIDER=mistralai
ShellSession

Then, run the sample Quarkus application with the following command and repeat the same “tool calling” tests as before.

mvn quarkus:dev -Pmistral-ai
ShellSession

Final Thoughts

Quarkus LangChain4j provides a seamless way to run tools in AI-powered conversations. You can register a tool by adding it as a part of the @RegisterAiService annotation. Also, you can easily add a guardrail on the selected AI service method. Tools are a vital part of agentic AI and the MCP concepts. It is therefore essential to understand it properly. You can expect more articles on Quarkus LangChain4j soon, including on MCP.

The post AI Tool Calling with Quarkus LangChain4j appeared first on Piotr's TechBlog.

]]>
https://piotrminkowski.com/2025/06/23/ai-tool-calling-with-quarkus-langchain4j/feed/ 2 15757
Getting Started with Quarkus LangChain4j and Chat Model https://piotrminkowski.com/2025/06/18/getting-started-with-quarkus-langchain4j-and-chat-model/ https://piotrminkowski.com/2025/06/18/getting-started-with-quarkus-langchain4j-and-chat-model/#respond Wed, 18 Jun 2025 16:36:08 +0000 https://piotrminkowski.com/?p=15736 This article will teach you how to use the Quarkus LangChain4j project to build applications based on different chat models. The Quarkus AI Chat Model offers a portable and straightforward interface, enabling seamless interaction with these models. Our sample Quarkus application will switch between three popular chat models provided by OpenAI, Mistral AI, and Ollama. […]

The post Getting Started with Quarkus LangChain4j and Chat Model appeared first on Piotr's TechBlog.

]]>
This article will teach you how to use the Quarkus LangChain4j project to build applications based on different chat models. The Quarkus AI Chat Model offers a portable and straightforward interface, enabling seamless interaction with these models. Our sample Quarkus application will switch between three popular chat models provided by OpenAI, Mistral AI, and Ollama. This article is the first in a series explaining AI concepts with Quarkus LangChain4j. Look for more on my blog in this area soon. The idea of this tutorial is very similar to the series on Spring AI. Therefore, you will be able to easily compare the two approaches, as the sample application will do the same thing as an analogous Spring Boot application.

If you like Quarkus, then you can find quite a few articles about it on my blog. Just go to the Quarkus category and find the topic you are interested in.

SourceCode

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Motivation

Whenever I create a new article or example related to AI, I like to define the problem I’m trying to solve. The problem this example solves is very trivial. I publish numerous small demo apps to explain complex technology concepts. These apps typically require data to display a demo output. Usually, I add demo data by myself or use a library like Datafaker to do it for me. This time, we can leverage the AI Chat Models API for that. Let’s begin!

The Quarkus-related topic I’m describing today, I also explained earlier for Spring Boot. For a comparison of the features offered by both frameworks for simple interaction with the AI chat model, you can read this article on Spring AI.

Dependencies

The sample application uses the current latest version of the Quarkus framework.

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>io.quarkus.platform</groupId>
      <artifactId>quarkus-bom</artifactId>
      <version>${quarkus.platform.version}</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>
XML

You can easily switch between multiple AI model implementations by activating a dedicated Maven profile. By default, the open-ai profile is active. It includes the quarkus-langchain4j-openai module in the Maven dependencies. You can also activate the mistral-ai and ollama profile. In that case, the quarkus-langchain4j-mistral-ai or quarkus-langchain4j-ollama module will be included instead of the LangChain4j OpenAI extension.

<profiles>
  <profile>
    <id>open-ai</id>
    <activation>
      <activeByDefault>true</activeByDefault>
    </activation>
    <dependencies>
      <dependency>
        <groupId>io.quarkiverse.langchain4j</groupId>
        <artifactId>quarkus-langchain4j-openai</artifactId>
        <version>${quarkus-langchain4j.version}</version>
      </dependency>
    </dependencies>
  </profile>
  <profile>
    <id>mistral-ai</id>
    <dependencies>
      <dependency>
        <groupId>io.quarkiverse.langchain4j</groupId>
        <artifactId>quarkus-langchain4j-mistral-ai</artifactId>
        <version>${quarkus-langchain4j.version}</version>
      </dependency>
    </dependencies>
  </profile>
  <profile>
    <id>ollama</id>
    <dependencies>
      <dependency>
        <groupId>io.quarkiverse.langchain4j</groupId>
        <artifactId>quarkus-langchain4j-ollama</artifactId>
        <version>${quarkus-langchain4j.version}</version>
      </dependency>
    </dependencies>
  </profile>
</profiles>
XML

The sample Quarkus application is simple. It exposes some REST endpoints and communicates with a selected AI model to return an AI-generated response via each endpoint. So, you need to include only core Quarkus modules like quarkus-rest-jackson or quarkus-arc. To implement JUnit tests with REST API, it also includes the quarkus-junit5 and rest-assured modules in the test scope.

<dependencies>
  <!-- Core Quarkus dependencies -->
  <dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-rest-jackson</artifactId>
  </dependency>
  <dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-arc</artifactId>
  </dependency>

  <!-- Test dependencies -->
  <dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-junit5</artifactId>
    <scope>test</scope>
  </dependency>
  <dependency>
    <groupId>io.rest-assured</groupId>
    <artifactId>rest-assured</artifactId>
    <scope>test</scope>
  </dependency>
</dependencies>
XML

Quarkus LangChain4j Chat Models Integration

Quarkus provides an innovative approach to interacting with AI chat models. First, you need to annotate your interface by defining AI-oriented methods with the @RegisterAiService annotation. Then you must add a proper description and input prompt inside the @SystemMessage and @UserMessage annotations. Here is the sample PersonAiService interaction, which defines two methods. The generatePersonList method aims to ask the AI model to generate a list of 10 unique persons in a form consistent with the input object structure. The getPersonById method must read the previously generated list from chat memory and return a person’s data with a specified id field.

@RegisterAiService
@ApplicationScoped
public interface PersonAiService {

    @SystemMessage("""
        You are a helpful assistant that generates realistic person data.
        Always respond with valid JSON format.
        """)
    @UserMessage("""
        Generate exactly 10 unique persons

        Requirements:
        - Each person must have a unique integer ID (like 1, 2, 3, etc.)
        - Use realistic first and last names per each nationality
        - Ages should be between 18 and 80
        - Return ONLY the JSON array, no additional text
        """)
    PersonResponse generatePersonList(@MemoryId int userId);

    @SystemMessage("""
        You are a helpful assistant that can recall generated person data from chat memory.
        """)
    @UserMessage("""
        In the previously generated list of persons for user {userId}, find and return the person with id {id}.
        
        Return ONLY the JSON object, no additional text.
        """)
    Person getPersonById(@MemoryId int userId, int id);

}
Java

There are a few more things to add regarding the code snippet above. The beans created by @RegisterAiService are @RequestScoped by default. The Quarkus LangChain4j documentation states that this is possible, allowing objects to be deleted from the chat memory. In the case seen above, the list of people is generated per user ID, which acts as the key by which we search the chat memory. To guarantee that the getPersonById method finds a list of persons generated per @MemoryId the PersonAiService interface must be annotated with @ApplicationScoped. The InMemoryChatMemoryStore implementation is enabled by default, so you don’t need to declare any additional beans to use it.

Quarkus LangChain4j can automatically map the LLM’s JSON response to the output POJO. However, until now, it has not been possible to map it directly to the output collection. Therefore, you must wrap the output list with the additional class, as shown below.

public class PersonResponse {

    private List<Person> persons;

    public List<Person> getPersons() {
        return persons;
    }

    public void setPersons(List<Person> persons) {
        this.persons = persons;
    }
}
Java

Here’s the Person class:

public class Person {

    private Integer id;
    private String firstName;
    private String lastName;
    private int age;
    private String nationality;
    private Gender gender;
    
    // GETTERS and SETTERS

}
Java

Finally, the last part of our implementation is REST endpoints. Here’s the REST controller that injects and uses PersonAiService to interact with the AI chat model. It exposes two endpoints: GET /api/{userId}/persons and GET /api/{userId}/persons/{id}. You can generate several lists of persons by specifying the userId path parameter.

@Path("/api")
@Produces(MediaType.APPLICATION_JSON)
@Consumes(MediaType.APPLICATION_JSON)
public class PersonController {

    private static final Logger LOG = Logger.getLogger(PersonController.class);

    PersonAiService personAiService;

    public PersonController(PersonAiService personAiService) {
        this.personAiService = personAiService;
    }

    @GET
    @Path("/{userId}/persons")
    public PersonResponse generatePersons(@PathParam("userId") int userId) {
        return personAiService.generatePersonList(userId);
    }

    @GET
    @Path("/{userId}/persons/{id}")
    public Person getPersonById(@PathParam("userId") int userId, @PathParam("id") int id) {
        return personAiService.getPersonById(userId, id);
    }

}
Java

Use Different AI Models with Quarkus LangChain4j

Configuration Properties

Here is a configuration defined within the application.properties file. Before proceeding, you must generate the OpenAI and Mistral AI API tokens and export them as environment variables. Additionally, you can enable logging of requests and responses in AI model communication. It is also worth increasing the default timeout for a single request from 10 seconds to a higher value, such as 20 seconds.

quarkus.langchain4j.chat-model.provider = ${AI_MODEL_PROVIDER:openai}
quarkus.langchain4j.log-requests = true
quarkus.langchain4j.log-responses = true

# OpenAI Configuration
quarkus.langchain4j.openai.api-key = ${OPEN_AI_TOKEN}
quarkus.langchain4j.openai.timeout = 20s

# Mistral AI Configuration
quarkus.langchain4j.mistralai.api-key = ${MISTRAL_AI_TOKEN}
quarkus.langchain4j.mistralai.timeout = 20s

# Ollama Configuration
quarkus.langchain4j.ollama.base-url = ${OLLAMA_BASE_URL:http://localhost:11434}
Plaintext

To run a sample Quarkus application and connect it with OpenAI, you must set the OPEN_AI_TOKEN environment variable. Since the open-ai Maven profile is activated by default, you don’t need to set anything else while running an app.

$ export OPEN_AI_TOKEN=<your_openai_token>
$ mvn quarkus:dev
ShellSession

Then, you can call the GET /api/{userId}/persons endpoint with different userId path variable values. Here are sample API requests and responses.

quarkus-langchain4j-calls

After that, you can call the GET /api/{userId}/persons/{id} endpoint to return a specified person found in the chat memory.

Switch Between AI Models

Then, you can repeat the same exercise with the Mistral AI model. You must set the AI_MODEL_PROVIDER to mistral, export its API token as the MISTRAL_AI_TOKEN environment variable, and enable the mistral-ai profile while running the app.

$ export AI_MODEL_PROVIDER=mistralai
$ export MISTRAL_AI_TOKEN=<your_mistralai_token>
$ mvn quarkus:dev -Pmistral-ai
ShellSession

The app should start successfully.

quarkus-langchain4j-logs

Once it happens, you can repeat the same sequence of requests as before for OpenAI.

$ curl http://localhost:8080/api/1/persons
$ curl http://localhost:8080/api/2/persons
$ curl http://localhost:8080/api/1/persons/1
$ curl http://localhost:8080/api/2/persons/1
ShellSession

You can check the request sent to the AI model in the application logs.

Here’s a log showing an AI chat model response:

Finally, you can run a test with ollama. By default, the LangChain4j extension for Ollama uses the llama3.2 model. You can change it by setting the quarkus.langchain4j.ollama.chat-model.model-id property in the application.properties file. Assuming that you use the llama3.3 model, here’s your configuration:

quarkus.langchain4j.ollama.base-url = ${OLLAMA_BASE_URL:http://localhost:11434}
quarkus.langchain4j.ollama.chat-model.model-id = llama3.3
quarkus.langchain4j.ollama.timeout = 60s
Plaintext

Before proceeding, you must run the llama3.3 model on your laptop. Of course, you can choose another, smaller model, because llama3.3 is 42 GB.

ollama run llama3.3
ShellSession

It can take a lot of time. However, a model is finally ready to use.

Once a model is running, you can set the AI_MODEL_PROVIDER environment variable to ollama and activate the ollama profile for the app:

$ export AI_MODEL_PROVIDER=ollama
$ mvn quarkus:dev -Pollama
ShellSession

This time, our application is connected to the llama3.3 model started with ollama:

quarkus-langchain4j-ollama

With the Quarkus LangChain4j Ollama extension, you can take advantage of dev services support. It means that you don’t need to install and run Ollama on your laptop or run a model with ollama CLI. Quarkus will run Ollama as a Docker container and automatically run a selected AI model on it. In that case, you don’t need to set the quarkus.langchain4j.ollama.base-url property. Before switching to that option, let’s use a smaller AI model by setting the quarkus.langchain4j.ollama.chat-model.model-id = mistral property. Then start the app in the same way as before.

Final Thoughts

I must admit that the Quarkus LangChain4j extension is enjoyable to use. With a few simple annotations, you can configure your application to talk to the AI model of your choice correctly. In this article, I presented a straightforward example of integrating Quarkus with an AI chat model. However, we quickly reviewed features such as prompts, structured output, and chat memory. You can expect more articles in the Quarkus series with AI soon.

The post Getting Started with Quarkus LangChain4j and Chat Model appeared first on Piotr's TechBlog.

]]>
https://piotrminkowski.com/2025/06/18/getting-started-with-quarkus-langchain4j-and-chat-model/feed/ 0 15736
OpenShift AI with vLLM and Spring AI https://piotrminkowski.com/2025/05/12/openshift-ai-with-vllm-and-spring-ai/ https://piotrminkowski.com/2025/05/12/openshift-ai-with-vllm-and-spring-ai/#comments Mon, 12 May 2025 06:47:19 +0000 https://piotrminkowski.com/?p=15684 This article will teach you how to use OpenShift AI and vLLM to serve models used by the Spring AI application. To run the model on OpenShift AI, we will use a solution called KServe ModelCar. It can serve models directly from a container without using the S3 bucket. KServe is a standard, cloud-agnostic Model Inference […]

The post OpenShift AI with vLLM and Spring AI appeared first on Piotr's TechBlog.

]]>
This article will teach you how to use OpenShift AI and vLLM to serve models used by the Spring AI application. To run the model on OpenShift AI, we will use a solution called KServe ModelCar. It can serve models directly from a container without using the S3 bucket. KServe is a standard, cloud-agnostic Model Inference Platform designed to serve predictive and generative AI models on Kubernetes. OpenShift AI includes a single model serving platform based on the KServe component. We can serve models on the single-model serving platform using model-serving runtimes. OpenShift AI includes several preinstalled runtimes. However, only the vLLM runtime is compatible with the OpenAI REST API. Therefore, we will use this one.

Previously, I published several articles about Spring AI with examples of using different AI models. Therefore, I will not focus on the introduction to Spring AI. For example, you can read about integration between Spring AI and Azure AI in the following post. Please refer to the following article for a quick intro to the Spring AI project.

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Prerequisites

Create the OpenShift Cluster

For this exercise, you will need a relatively large OpenShift cluster. At least one of the cluster’s nodes must have a GPU. I created a cluster on AWS with one node on a g4dn.12xlarge machine. On OpenShift, you can achieve this by creating the MachineSet object that creates nodes using the appropriate virtual machine available on AWS.

openshift-ai-nodes

Install Required Operators

Next, install and configure several operators on the cluster. Begin with the “Node Feature Discovery” operator. On OpenShift, this operator enables automatic discovery of cluster nodes with features such as GPUs. After installing the operator, create the NodeFeatureDiscovery object. The default values set by the OpenShift console during object creation are sufficient.

The operator’s task is to mark the node with the detected GPU using the appropriate label. The label is feature.node.kubernetes.io/pci-10de.present=true. After configuring the operator, verify that the correct GPU has been detected.

$ oc get node -l feature.node.kubernetes.io/pci-10de.present=true
NAME                                        STATUS   ROLES    AGE   VERSION
ip-10-0-45-120.us-east-2.compute.internal   Ready    worker   15d   v1.31.6
ShellSession

Next, install the NVIDIA GPU Operator. This operator automatically installs, configures, and manages NVIDIA drivers and tools on nodes with NVIDIA graphics cards. This allows OpenShift to recognize the GPU as a resource that can be declared in pods. This will enable OpenShift to work with the “Node Feature Discovery” operator to label nodes with GPUs. The NVIDIA GPU operator uses the feature.node.kubernetes.io/pci-10de.present=true label to determine where to install the drivers. For this to happen, the ClusterPolicy object must be created. As before, you can use the default values generated by the OpenShift Console when creating this object.

The OpenShift AI feature for serving AI models requires the installation of OpenShift Serverless and OpenShift Service Mesh operators. The key solution here is KServe. KServe uses Knative to scale models on demand and integrates with Istio to secure model routing and versioning.

The final step in this phase is to install the OpenShift AI Operator and create the DataScienceCluster object. If the previous installations were successful, everything will be configured automatically after creating the DataScienceCluster object. For instance, OpenShift AI will make the Istio control plane and the Knative Serving component.

openshift-ai-crd

OpenShift AI creates several namespaces within a cluster. The most important is the redhat-ods-applications namespace, where most components comprising the entire solution are run.

$ oc get pod -n redhat-ods-applications
NAME                                                              READY   STATUS    RESTARTS   AGE
authorino-767bd64465-fq8bl                                        1/1     Running   0          15d
codeflare-operator-manager-5c69778b87-wxcwp                       1/1     Running   0          15d
data-science-pipelines-operator-controller-manager-6686587wcmkr   1/1     Running   0          15d
etcd-549d769449-hqzwt                                             1/1     Running   0          15d
kserve-controller-manager-85f9b8d66f-qpxbf                        1/1     Running   0          15d
kuberay-operator-8d77dcf84-qgsq5                                  1/1     Running   0          15d
kueue-controller-manager-7c895bd669-467nk                         1/1     Running   0          6h8m
modelmesh-controller-7f9dd5f848-ljlxp                             1/1     Running   0          15d
modelmesh-controller-7f9dd5f848-qqsl8                             1/1     Running   0          24d
modelmesh-controller-7f9dd5f848-txlhd                             1/1     Running   0          24d
notebook-controller-deployment-86f5b87585-p6nz5                   1/1     Running   0          15d
odh-model-controller-574ff4657-q75gr                              1/1     Running   0          15d
odh-notebook-controller-manager-9d754d5f-2ptk9                    1/1     Running   0          15d
rhods-dashboard-5b96595667-79tx6                                  2/2     Running   0          15d
rhods-dashboard-5b96595667-8m52g                                  2/2     Running   0          15d
rhods-dashboard-5b96595667-kx7p4                                  2/2     Running   0          15d
rhods-dashboard-5b96595667-nn2cf                                  2/2     Running   0          15d
rhods-dashboard-5b96595667-ttcht                                  2/2     Running   0          15d
trustyai-service-operator-controller-manager-bd9fbdb6d-kcd57      1/1     Running   0          15d
ShellSession

Configure and Use OpenShift AI

After installing OpenShift AI on a cluster, you can use its graphical UI. To access it, select “Red Hat OpenShift AI” from the menu at the top of the page.

After selecting the indicated option, you will be redirected to the following page. This page allows you to configure and use OpenShift AI on a cluster. The first step is to select a namespace on the cluster for the AI project. In my case, the namespace is ai.

openshift-ai-ui

To run an AI model on a cluster, choose how to serve it first. You can choose between a single-model serving platform and a multi-model serving platform. With the former, each model is deployed on its model server. Multiple models can be deployed on a single shared server with multi-model platforms. This article will use the first option: a single-model serving platform.

openshift-ai-runtime

The next step is to create an acceleration profile. This profile should be created automatically after installing and configuring the NVIDIA GPU Operator. If, for some reason, it was not, you can easily create it manually. When creating this object, enter the nvidia.com/gpu value in the identifier field.

You can either click on the profile from the UI or create it using the YAML manifest.

apiVersion: dashboard.opendatahub.io/v1
kind: AcceleratorProfile
metadata:
  name: nvidia
  namespace: redhat-ods-applications
spec:
  displayName: nvidia
  enabled: true
  identifier: nvidia.com/gpu
YAML

Serve Model on OpenShift AI with vLLM

Create ServingRuntime Resource

In the previous step, we configured OpenShift AI to deploy the model with a single-model serving platform and a GPU accelerator. We will use KServe’s ModelCar functionality to deploy the model, which allows us to serve models directly from a container. This functionality is described in an article published on the Red Hat Developer blog. The article demonstrates how to build an image containing a model downloaded from the Hugging Face Hub. In turn, we will use images that have already been built and are available in the quay.io/repository/redhat-ai-services/modelcar-catalog repository. You can find ready-made images for AI models such as Granite and Llama.

To run a model on OpenShift AI in single-model serving runtime mode, you must define two objects: ServingRuntime and InferenceService. According to the OpenShift AI documentation, the ServingRuntime CR creates a serving runtime, an environment for deploying and managing a model. Here’s the ServingRuntime object that creates a runtime for the Llama 3.2 AI model. The annotation opendatahub.io/recommended-accelerators sets the name of the recommended accelerator to use with the runtime. Its value should be identical to the identifier field in the AcceleratorProfile object (1). The openshift.io/display-name annotation keeps the name with which the serving runtime is displayed (2). The spec.containers.image field indicates the runtime container image used by the serving runtime (3). This image differs depending on the type of accelerator used. Finally, the ServingRuntime object specifies that the single-model serving is used (4) and the vLLM model is supported by the runtime (5).

apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
  annotations:
    opendatahub.io/recommended-accelerators: '["nvidia.com/gpu"]' # (1)
    openshift.io/display-name: vLLM ServingRuntime for KServe # (2)
  labels:
    opendatahub.io/dashboard: "true"
  name: llama-32-3b-instruct
spec:
  annotations:
    prometheus.io/path: /metrics 
    prometheus.io/port: "8080" 
  containers :
    - args:
        - --port=8080
        - --model=/mnt/models 
        - --served-model-name={{.Name}} 
      command: 
        - python
        - '-m'
        - vllm.entrypoints.openai.api_server
      env:
        - name: HF_HOME
          value: /tmp/hf_home
      # (3)
      image:
quay.io/modh/vllm@sha256:0d55419f3d168fd80868a36ac89815dded9e063937a8409b7edf3529771383f3
    name: kserve-container
    ports:
      - containerPort: 8080
        protocol: TCP
  multiModel: false # (4)
  supportedModelFormats: # (5) 
    - autoSelect: true
      name: vLLM
YAML

Create InterferenceService Resource

The InferenceService CRD creates a server or inference service that processes inference queries, passes them to the model, and returns the inference output. Here’s the InferenceService object related to the previously created llama-32-3b-instruct runtime (1). It must define some vLLM parameters to successfully run the model on the existing infrastructure and enable tool calling support on the Llama 3.2 model (2). The InferenceService object specifies the image containing the Llama 3.2 model, published in the the quay.io/redhat-ai-services/modelcar-catalog:llama-3.2-3b-instruct repository (3). Alternatively, you can create your image, publish it in the custom registry, and run it on OpenShift using InferenceService CR.

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  annotations:
    openshift.io/display-name: llama-32-3b-instruct
    serving.knative.openshift.io/enablePassthrough: 'true'
    serving.kserve.io/deploymentMode: Serverless
    sidecar.istio.io/inject: 'true'
    sidecar.istio.io/rewriteAppHTTPProbers: 'true'
  name: llama-32-3b-instruct # (1)
  labels:
    opendatahub.io/dashboard: 'true'
spec:
  predictor:
    maxReplicas: 1
    minReplicas: 1
    model: # (2)
      args:
        - '--dtype=half'
        - '--max_model_len=8192'
        - '--gpu_memory_utilization=.95'
        - '--enable-auto-tool-choice'
        - '--tool_call_parser=llama3_json'
      modelFormat:
        name: vLLM
      name: ''
      resources:
        limits:
          cpu: '8'
          memory: 10Gi
          nvidia.com/gpu: '1'
        requests:
          cpu: '4'
          memory: 8Gi
          nvidia.com/gpu: '1'
      runtime: llama-32-3b-instruct
      storageUri: 'oci://quay.io/redhat-ai-services/modelcar-catalog:llama-3.2-3b-instruct' # (3)
YAML

Deploy with OpenShift AI

You can also create the same configuration using the OpenShift AI UI. The diagram below shows the settings you need for Granite 3.2.

openshift-ai-model-serving

The OpenShift AI UI lists all the models running in a given AI project. You can check the endpoint where a particular model is available. In this case, two models are running in the AI project: Llama 3.2 and Granite 3.2. Both models are available internally on the cluster and externally via the Knative Route object.

Both models are automatically exposed on the node with the GPU. You can check the GPU resource reservations on a node using the oc describe command:

A single-model serving platform runs AI models as the Knative Service. You can use the oc get ksvc command to display a list of Knative services running in the ai namespace.

$ oc get ksvc -n ai
NAME                               URL                                                                                 LATESTCREATED                            LATESTREADY                              READY   REASON
granite-32-2b-instruct-predictor   https://granite-32-2b-instruct-predictor-ai.apps.piomin.ewyw.p1.openshiftapps.com   granite-32-2b-instruct-predictor-00007   granite-32-2b-instruct-predictor-00007   True    
llama-32-3b-instruct-predictor     https://llama-32-3b-instruct-predictor-ai.apps.piomin.ewyw.p1.openshiftapps.com     llama-32-3b-instruct-predictor-00002     llama-32-3b-instruct-predictor-00002     True 
ShellSession

Integrate Spring AI with vLLM

Dependencies and Properties

The vLLM runtime is compatible with the OpenAI REST API. To integrate our sample Spring Boot application with a model running on vLLM, we must use the standard Spring AI OpenAI starter. The app in the spring-ai-showcase repository has more functionality than what is tested in this article. In simplified terms, the list of dependencies needed for the app to communicate with the OpenAI API and the model running on OpenShift AI is below.

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>
<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-autoconfigure-model-openai</artifactId>
</dependency>
XML

Although the model itself served on OpenShift AI does not require authorization with an API key the spring.ai.openai.api-key Spring AI parameter must be set. The endpoint’s address provided through the vLLM runtime must be specified in the spring.ai.openai.chat.base-url parameter. The default name of the model used must also be overwritten with the name under which the model was run on OpenShift AI. This name is for Llama 3.2 llama-32-3b-instruct. Below is a list of all the Spring Boot settings required for vLLM integration, which is available in the application-vllm.properties file.

spring.ai.openai.api-key = ${OPENAI_API_KEY:dummy}
spring.ai.openai.chat.base-url = https://llama-32-3b-instruct-ai.apps.piomin.ewyw.p1.openshiftapps.com
spring.ai.openai.chat.options.model = llama-32-3b-instruct
Plaintext

Implementation with Spring AI

The code below demonstrates how @RestController implements communication between the application and the target AI model. The @RestController class injects an auto-configured ChatClient.Builder to create an instance of ChatClient. The PersonController class implements a method for returning a list of persons from the GET /persons endpoint. The main goal is to generate a list of 10 objects with the fields defined in the Person class. The id field should be auto-incremented. The PromptTemplate object defines a message that will be sent to the chat model AI API. It doesn’t have to specify the exact fields that should be returned. This part is handled automatically by the Spring AI library after we invoke the entity() method on the ChatClient instance. The ParameterizedTypeReference object inside the entity method tells Spring AI to generate a list of objects.

@RestController
@RequestMapping("/persons")
public class PersonController {

    private final ChatClient chatClient;

    public PersonController(ChatClient.Builder chatClientBuilder,
                            ChatMemory chatMemory) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(
                        new PromptChatMemoryAdvisor(chatMemory),
                        new SimpleLoggerAdvisor())
                .build();
    }

    @GetMapping
    List<Person> findAll() {
        PromptTemplate pt = new PromptTemplate("""
                Return a current list of 10 persons if exists or generate a new list with random values.
                Each object should contain an auto-incremented id field.
                The age value should be a random number between 18 and 99.
                Do not include any explanations or additional text.
                Return data in RFC8259 compliant JSON format.
                """);

        return this.chatClient.prompt(pt.create())
                .call()
                .entity(new ParameterizedTypeReference<>() {});
    }

    @GetMapping("/{id}")
    Person findById(@PathVariable String id) {
        PromptTemplate pt = new PromptTemplate("""
                Find and return the object with id {id} in a current list of persons.
                """);
        Prompt p = pt.create(Map.of("id", id));
        return this.chatClient.prompt(p)
                .call()
                .entity(Person.class);
    }
}
Java

The llama-32-3b-instruct model uses a “tool-calling” approach for API calls. You can read more about it in one of my Spring AI articles, which are available at this link. For instance, the class below implements the @Tool annotation, connecting to the database and searching it for a list of shares for individual companies. The key to using this tool is its description in the description field, which is then appropriately interpreted by the LLM model.

public class WalletTools {

    private WalletRepository walletRepository;

    public WalletTools(WalletRepository walletRepository) {
        this.walletRepository = walletRepository;
    }

    @Tool(description = "Number of shares for each company in my wallet")
    public List<Share> getNumberOfShares() {
        return (List<Share>) walletRepository.findAll();
    }
}
Java

Then, the @Tool reference is set to the chat client when it interacts with the AI model. The AI model can call the tool’s method as required based on the tool’s description and the input prompt’s content.

@RestController
@RequestMapping("/wallet")
public class WalletController {

    private final ChatClient chatClient;
    private final StockTools stockTools;
    private final WalletTools walletTools;

    public WalletController(ChatClient.Builder chatClientBuilder,
                            StockTools stockTools,
                            WalletTools walletTools) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        this.stockTools = stockTools;
        this.walletTools = walletTools;
    }
    
    @GetMapping("/with-tools")
    String calculateWalletValueWithTools() {
        PromptTemplate pt = new PromptTemplate("""
        What’s the current value in dollars of my wallet based on the latest stock daily prices ?
        """);

        return this.chatClient.prompt(pt.create())
                .tools(stockTools, walletTools)
                .call()
                .content();
    }
    
}
Java

Run Spring Boot Application

Activate the vllm profile when launching the Spring Boot application. This will cause the application to read the settings entered in the application-vllm.properties file.

mvn spring-boot:run -Dspring-boot.run.profiles=vllm
ShellSession

Once the application runs, you will call all three endpoints implemented in the previously discussed code snippets. These endpoints are:

  • GET /persons
  • GET /persons/{id}
  • GET /wallet/with-tools

Once launched, the application can be accessed locally on 8080 port.

$ curl http://localhost:8080/persons
$ curl http://localhost:8080/persons/1
$ curl http://localhost:8080/wallet/with-tools
ShellSession

Alternatively, you can deploy the Spring Boot application on OpenShift and expose it outside the cluster with the Route object. The simplest way to achieve that is through the odo CLI tool. You can find more details about odo in the following post. To deploy the app with odo run the following command:

odo dev
ShellSession

After that, the application should be deployed in the selected namespace and available for testing on the 20001 local port, thanks to the port-forwarding feature.

Here’s the example output:

Final Thoughts

This article demonstrates the simplest way to integrate a Java application with an AI model running on OpenShift via an OpenAI-compliant interface. Preparing and exposing such a model to OpenShift AI requires several steps, such as installing and configuring Kubernetes operators. However, KServe’s ModelCar approach standardizes the entire process, making AI models available as containers.

The post OpenShift AI with vLLM and Spring AI appeared first on Piotr's TechBlog.

]]>
https://piotrminkowski.com/2025/05/12/openshift-ai-with-vllm-and-spring-ai/feed/ 2 15684
Spring AI with Azure OpenAI https://piotrminkowski.com/2025/03/25/spring-ai-with-azure-openai/ https://piotrminkowski.com/2025/03/25/spring-ai-with-azure-openai/#comments Tue, 25 Mar 2025 16:02:02 +0000 https://piotrminkowski.com/?p=15651 This article will show you how to use Spring AI features like chat client memory, multimodality, tool calling, or embedding models with the Azure OpenAI service. Azure OpenAI is supported in almost all Spring AI use cases. Moreover, it goes beyond standard OpenAI capabilities, providing advanced AI-driven text generation and incorporating additional AI safety and […]

The post Spring AI with Azure OpenAI appeared first on Piotr's TechBlog.

]]>
This article will show you how to use Spring AI features like chat client memory, multimodality, tool calling, or embedding models with the Azure OpenAI service. Azure OpenAI is supported in almost all Spring AI use cases. Moreover, it goes beyond standard OpenAI capabilities, providing advanced AI-driven text generation and incorporating additional AI safety and responsible AI features. It also enables the integration of AI-focused resources, such as Vector Stores on Azure.

This is the eighth part of my series of articles about Spring Boot and AI. It is worth reading the following posts before proceeding with the current one. Here’s a list of articles about Spring AI on my blog with a short description:

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Enable and Configure Azure OpenAI

You need to begin the exercise by creating an instance of the Azure OpenAI service. The most crucial element here is the service’s name since it is part of the exposed Open AI endpoint. My service’s name is piomin-azure-openai.

spring-ai-azure-openai-create

The Azure OpenAI service should be exposed without restrictions to allow easy access to the Spring AI app.

After creating the service, go to its main page in the Azure Portal. It provides information about API keys and an endpoint URL. Also, you have to deploy an Azure OpenAI model to start making API calls from your Spring AI app.

Copy the key and the endpoint URL and save them for later usage.

spring-ai-azure-openai-api-key

You must create a new deployment with an AI model in the Azure AI Foundry portal. There are several available options. The Spring AI Azure OpenAI starter by default uses the gpt-4o model. If you choose another AI model, you will have to set its name in the spring.ai.azure.openai.chat.options.deployment-name Spring AI property. After selecting the preferred model, click the “Confirm” button.

spring-ai-azure-openai-deploy-model

Finally, you can deploy the model on the Azure AI Foundry portal. Choose the most suitable deployment type for your needs.

Azure allows us to deploy multiple models. You can verify a list of model deployments here:

That’s all on the Azure Portal side. Now it’s time for the implementation part in the application source code.

Enable Azure OpenAI for Spring AI

Spring AI provides the Spring Boot starter for the Azure OpenAI Chat Client. You must add the following dependency to your Maven pom.xml file. Since the sample Spring Boot application is portable across various AI models, it includes the Azure OpenAI starter only if the azure-ai profile is active. Otherwise, it uses the spring-ai-openai-spring-boot-starter library.

<profile>
  <id>azure-ai</id>
  <dependencies>
    <dependency>
      <groupId>org.springframework.ai</groupId>
      <artifactId>spring-ai-azure-openai-spring-boot-starter</artifactId>
    </dependency>
  </dependencies>
</profile>
XML

It’s time to use the key you previously copied from the Azure OpenAI service page. Let’s export it as the AZURE_OPENAI_API_KEY environment variable.

export AZURE_OPENAI_API_KEY=<YOUR_AZURE_OPENAI_API_KEY>
ShellSession

Here are the application properties dedicated to the azure-ai Spring Boot profile. The previously exported AZURE_OPENAI_API_KEY environment variable is set as the spring.ai.azure.openai.api-key property. You also must set the OpenAI service endpoint. This address depends on your Azure OpenAI service name.

spring.ai.azure.openai.api-key = ${AZURE_OPENAI_API_KEY}
spring.ai.azure.openai.endpoint = https://piomin-azure-openai.openai.azure.com/
application-azure-ai.properties

To run the application and connect to your instance of the Azure OpenAI service, you must activate the azure-ai Maven profile and the Spring Boot profile under the same name. Here’s the required command:

mvn spring-boot:run -Pazure-ai -Dspring-boot.run.profiles=azure-ai
ShellSession

Test Spring AI Features with Azure OpenAI

I described several Spring AI features in the previous articles from this series. In each section, I will briefly mention the tested feature with a fragment of the sample source code. Please refer to my previous posts for more details about each feature and its sample implementation.

Chat Client with Memory and Structured Output

Here’s the @RestController containing endpoints we will use in these tests.

@RestController
@RequestMapping("/persons")
public class PersonController {

    private final ChatClient chatClient;

    public PersonController(ChatClient.Builder chatClientBuilder,
                            ChatMemory chatMemory) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(
                        new PromptChatMemoryAdvisor(chatMemory),
                        new SimpleLoggerAdvisor())
                .build();
    }

    @GetMapping
    List<Person> findAll() {
        PromptTemplate pt = new PromptTemplate("""
                Return a current list of 10 persons if exists or generate a new list with random values.
                Each object should contain an auto-incremented id field.
                The age value should be a random number between 18 and 99.
                Do not include any explanations or additional text.
                Return data in RFC8259 compliant JSON format.
                """);

        return this.chatClient.prompt(pt.create())
                .call()
                .entity(new ParameterizedTypeReference<>() {});
    }

    @GetMapping("/{id}")
    Person findById(@PathVariable String id) {
        PromptTemplate pt = new PromptTemplate("""
                Find and return the object with id {id} in a current list of persons.
                """);
        Prompt p = pt.create(Map.of("id", id));
        return this.chatClient.prompt(p)
                .call()
                .entity(Person.class);
    }
}
Java

First, you must call the endpoint that generates a list of ten persons from different countries. Then choose one person by ID to pick it up from the chat memory. Here are the results.

spring-ai-azure-openai-test-chat-model

The interesting part happens in the background. Here’s a fragment of advice context added to the prompt by Spring AI.

Tool Calling

Here’s the @RestController containing endpoints we will use in these tests. There are two tools injected into the chat client: StockTools and WalletTools. These tools interact with a local H2 database to get a sample stock wallet structure and with the stock online API to load the latest share prices.

@RestController
@RequestMapping("/wallet")
public class WalletController {

    private final ChatClient chatClient;
    private final StockTools stockTools;
    private final WalletTools walletTools;

    public WalletController(ChatClient.Builder chatClientBuilder,
                            StockTools stockTools,
                            WalletTools walletTools) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        this.stockTools = stockTools;
        this.walletTools = walletTools;
    }

    @GetMapping("/with-tools")
    String calculateWalletValueWithTools() {
        PromptTemplate pt = new PromptTemplate("""
        What’s the current value in dollars of my wallet based on the latest stock daily prices ?
        """);

        return this.chatClient.prompt(pt.create())
                .tools(stockTools, walletTools)
                .call()
                .content();
    }

    @GetMapping("/highest-day/{days}")
    String calculateHighestWalletValue(@PathVariable int days) {
        PromptTemplate pt = new PromptTemplate("""
        On which day during last {days} days my wallet had the highest value in dollars based on the historical daily stock prices ?
        """);

        return this.chatClient.prompt(pt.create(Map.of("days", days)))
                .tools(stockTools, walletTools)
                .call()
                .content();
    }
}
Java

You must have your API key for the Twelvedata service to run these tests. Don’t forget to export it as the STOCK_API_KEY environment variable before running the app.

export STOCK_API_KEY=<YOUR_STOCK_API_KEY>
Java

The GET /wallet/with-tools endpoint calculates the current stock wallet value in dollars.

spring-ai-azure-openai-test-tool-calling

The GET /wallet/highest-day/{days} computes the value of the stock wallet for a given period in days and identifies the day with the highest value.

Multimodality and Images

Here’s a part of the @RestController responsible for describing image content and generating a new image with a given item.

@RestController
@RequestMapping("/images")
public class ImageController {

    private final static Logger LOG = LoggerFactory.getLogger(ImageController.class);
    private final ObjectMapper mapper = new ObjectMapper();

    private final ChatClient chatClient;
    private ImageModel imageModel;

    public ImageController(ChatClient.Builder chatClientBuilder,
                           Optional<ImageModel> imageModel) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        imageModel.ifPresent(model -> this.imageModel = model);
    }
        
    @GetMapping("/describe/{image}")
    List<Item> describeImage(@PathVariable String image) {
        Media media = Media.builder()
                .id(image)
                .mimeType(MimeTypeUtils.IMAGE_PNG)
                .data(new ClassPathResource("images/" + image + ".png"))
                .build();
        UserMessage um = new UserMessage("""
        List all items you see on the image and define their category.
        Return items inside the JSON array in RFC8259 compliant JSON format.
        """, media);
        return this.chatClient.prompt(new Prompt(um))
                .call()
                .entity(new ParameterizedTypeReference<>() {});
    }
    
    @GetMapping(value = "/generate/{object}", produces = MediaType.IMAGE_PNG_VALUE)
    byte[] generate(@PathVariable String object) throws IOException, NotSupportedException {
        if (imageModel == null)
            throw new NotSupportedException("Image model is not supported");
        ImageResponse ir = imageModel.call(new ImagePrompt("Generate an image with " + object, ImageOptionsBuilder.builder()
                .height(1024)
                .width(1024)
                .N(1)
                .responseFormat("url")
                .build()));
        String url = ir.getResult().getOutput().getUrl();
        UrlResource resource = new UrlResource(url);
        LOG.info("Generated URL: {}", url);
        dynamicImages.add(Media.builder()
                .id(UUID.randomUUID().toString())
                .mimeType(MimeTypeUtils.IMAGE_PNG)
                .data(url)
                .build());
        return resource.getContentAsByteArray();
    }
    
}
Java

The GET /images/describe/{image} returns a structured list of items identified on a given image. It also categorizes each detected item. In this case, there are two available categories: fruits and vegetables.

spring-ai-azure-openai-test-multimodality

By the way, here’s the image described above.

The image generation feature requires a dedicated model on Azure AI. The DALL-E 2 and DALL-E 3 models on Azure support a text-to-image feature.

spring-ai-azure-openai-dalle3

The application must be aware of the model name. That’s why you must add a new property to your application properties with the following value.

spring.ai.azure.openai.image.options.deployment-name = dall-e-3
Plaintext

Then you must restart the application. After that, you can generate an image by calling the GET /images/generate/{object} endpoint. Here’s the result for the pineapple.

Enable Azure CosmosDB Vector Store

Dependency

By default, the sample Spring Boot application uses Pinecone vector store. However, SpringAI supports two services available on Azure: Azure AI Search and CosmosDB. Let’s choose CosmosDB as the vector store. You must add the following dependency to your Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-azure-cosmos-db-store-spring-boot-starter</artifactId>
</dependency>
XML

Configuration on Azure

Then, you must create an instance of CosmosDB in your Azure account. The name of my instance is piomin-ai-cosmos.

Once it is created, you will obtain its address and API key. To do that, go to the “Settings -> Keys” menu and save both values visible below.

spring-ai-azure-openai-cosmosdb

Then, you have to create a dedicated database and container for your application. To do that, go to the “Data Explorer” tab and provide names for the database and container ID. You must also set the partition key.

All previously provided values must be set in the application properties. Export your CosmosDB API key as the AZURE_VECTORSTORE_API_KEY environment variable.

spring.ai.vectorstore.cosmosdb.endpoint = https://piomin-ai-cosmos.documents.azure.com:443/
spring.ai.vectorstore.cosmosdb.key = ${AZURE_VECTORSTORE_API_KEY}
spring.ai.vectorstore.cosmosdb.databaseName = spring-ai
spring.ai.vectorstore.cosmosdb.containerName = spring-ai
spring.ai.vectorstore.cosmosdb.partitionKeyPath = /id
application-azure-ai.properties

Unfortunately, there are still some issues with the Azure CosmosDB support in the Spring AI M6 milestone version. I see that they were fixed in the SNAPSHOT version. So, if you want to test it by yourself, you will have to switch from milestones to snapshots.

<properties>
  <java.version>21</java.version>
  <spring-ai.version>1.0.0-SNAPSHOT</spring-ai.version>
</properties>
  
<repositories>
  <repository>
    <name>Central Portal Snapshots</name>
    <id>central-portal-snapshots</id>
    <url>https://central.sonatype.com/repository/maven-snapshots/</url>
    <releases>
      <enabled>false</enabled>
    </releases>
    <snapshots>
      <enabled>true</enabled>
    </snapshots>
  </repository>
  <repository>
    <id>spring-snapshots</id>
    <name>Spring Snapshots</name>
    <url>https://repo.spring.io/snapshot</url>
    <releases>
      <enabled>false</enabled>
    </releases>
    <snapshots>
      <enabled>true</enabled>
    </snapshots>
  </repository>
</repositories>
XML

Run and Test the Application

After those changes, you can start the application with the following command:

mvn spring-boot:run -Pazure-ai -Dspring-boot.run.profiles=azure-ai
XML

Once the application is running, you can test the following @RestController that offers RAG functionality. The GET /stocks/load-data endpoint obtains stock prices of given companies and puts them in the vector store. The GET /stocks/v2/most-growth-trend uses the RetrievalAugmentationAdvisor instance to retrieve the most suitable data and include it in the user query.

@RestController
@RequestMapping("/stocks")
public class StockController {

    private final ObjectMapper mapper = new ObjectMapper();
    private final static Logger LOG = LoggerFactory.getLogger(StockController.class);
    private final ChatClient chatClient;
    private final RewriteQueryTransformer.Builder rqtBuilder;
    private final RestTemplate restTemplate;
    private final VectorStore store;

    @Value("${STOCK_API_KEY:none}")
    private String apiKey;

    public StockController(ChatClient.Builder chatClientBuilder,
                           VectorStore store,
                           RestTemplate restTemplate) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        this.rqtBuilder = RewriteQueryTransformer.builder()
                .chatClientBuilder(chatClientBuilder);
        this.store = store;
        this.restTemplate = restTemplate;
    }

    @GetMapping("/load-data")
    void load() throws JsonProcessingException {
        final List<String> companies = List.of("AAPL", "MSFT", "GOOG", "AMZN", "META", "NVDA");
        for (String company : companies) {
            StockData data = restTemplate.getForObject("https://api.twelvedata.com/time_series?symbol={0}&interval=1day&outputsize=10&apikey={1}",
                    StockData.class,
                    company,
                    apiKey);
            if (data != null && data.getValues() != null) {
                var list = data.getValues().stream().map(DailyStockData::getClose).toList();
                var doc = Document.builder()
                        .id(company)
                        .text(mapper.writeValueAsString(new Stock(company, list)))
                        .build();
                store.add(List.of(doc));
                LOG.info("Document added: {}", company);
            }
        }
    }

    @RequestMapping("/v2/most-growth-trend")
    String getBestTrendV2() {
        PromptTemplate pt = new PromptTemplate("""
                {query}.
                Which {target} is the most % growth?
                The 0 element in the prices table is the latest price, while the last element is the oldest price.
                """);

        Prompt p = pt.create(Map.of("query", "Find the most growth trends", "target", "share"));

        Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()
                .documentRetriever(VectorStoreDocumentRetriever.builder()
                        .similarityThreshold(0.7)
                        .topK(3)
                        .vectorStore(store)
                        .build())
                .queryTransformers(rqtBuilder.promptTemplate(pt).build())
                .build();

        return this.chatClient.prompt(p)
                .advisors(retrievalAugmentationAdvisor)
                .call()
                .content();
    }

}
Java

Finally, you can call the following two endpoints.

$ curl http://localhost:8080/stocks/load-data
$ curl http://localhost:8080/stocks/v2/most-growth-trend
ShellSession

Final Thoughts

This exercise shows how to modify an existing Spring Boot AI application to integrate it with the Azure OpenAI service. It also gives a recipe on how to include Azure CosmosDB as a vector store for RAG scenarios and similarity searches.

The post Spring AI with Azure OpenAI appeared first on Piotr's TechBlog.

]]>
https://piotrminkowski.com/2025/03/25/spring-ai-with-azure-openai/feed/ 4 15651
Using Model Context Protocol (MCP) with Spring AI https://piotrminkowski.com/2025/03/17/using-model-context-protocol-mcp-with-spring-ai/ https://piotrminkowski.com/2025/03/17/using-model-context-protocol-mcp-with-spring-ai/#comments Mon, 17 Mar 2025 16:17:32 +0000 https://piotrminkowski.com/?p=15608 This article will show how to use Spring AI support for MCP (Model Context Protocol) in Spring Boot server-side and client-side applications. You will learn how to serve tools and prompts on the server side and discover them on the client-side Spring AI application. The Model Context Protocol is a standard for managing contextual interactions […]

The post Using Model Context Protocol (MCP) with Spring AI appeared first on Piotr's TechBlog.

]]>
This article will show how to use Spring AI support for MCP (Model Context Protocol) in Spring Boot server-side and client-side applications. You will learn how to serve tools and prompts on the server side and discover them on the client-side Spring AI application. The Model Context Protocol is a standard for managing contextual interactions with AI models. It provides a standardized way to connect AI models to external data sources and tools. It can help with building complex workflows on top of LLMs. Spring AI MCP extends the MCP Java SDK and provides client and server Spring Boot starters. The MCP Client is responsible for establishing and managing connections with MCP servers.

This is the seventh part of my series of articles about Spring Boot and AI. It is worth reading the following posts before proceeding with the current one. Please pay special attention to the last article from the list about the tool calling feature since we will implement it in our sample client and server apps using MCP.

  1. https://piotrminkowski.com/2025/01/28/getting-started-with-spring-ai-and-chat-model: The first tutorial introduces the Spring AI project and its support for building applications based on chat models like OpenAI or Mistral AI.
  2. https://piotrminkowski.com/2025/01/30/getting-started-with-spring-ai-function-calling: The second tutorial shows Spring AI support for Java function calling with the OpenAI chat model.
  3. https://piotrminkowski.com/2025/02/24/using-rag-and-vector-store-with-spring-ai: The third tutorial shows Spring AI support for RAG (Retrieval Augmented Generation) and vector store.
  4. https://piotrminkowski.com/2025/03/04/spring-ai-with-multimodality-and-images: The fourth tutorial shows Spring AI support for a multimodality feature and image generation
  5. https://piotrminkowski.com/2025/03/10/using-ollama-with-spring-ai: The fifth tutorial shows Spring AI support for interactions with AI models run with Ollama
  6. https://piotrminkowski.com/2025/03/13/tool-calling-with-spring-ai: The sixth tutorial show Spring AI for the Tool Calling feature.

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Motivation for MCP with Spring AI

MCP introduces an interesting concept for applications interacting with AI models. With MCP the application can provide specific tools/functions for several other services, which need to use data exposed by that application. Additionally, it can expose prompt templates and resources. Thanks to that, we don’t need to implement AI tools/functions inside every client service but integrate them with the application that exposes tools over MCP.

The best way to analyze the MCP concept is through an example. Let’s consider an application that connects to a database and exposes data through REST endpoints. If we want to use that data in our AI application we should implement and register AI tools that retrieve data by connecting such the REST endpoints. So, each client-side application that needs data from the source service would have to implement its own set of AI tools locally. Here comes the MCP concept. The source service defines and exposes AI tools/functions in the standardized form. All other apps that need to provide data to AI models can load and use a predefined set of tools.

The following diagram illustrates our scenario. Two Spring Boot applications act as MCP servers. They connect to the database and use Spring AI MCP Server support to expose @Tool methods to the MCP client-side app. The client-side app communicates with the OpenAI model. It includes the tools exposed by the server-side apps in the user query to the AI model. The person-mcp-service app provides @Tool methods for searching persons in the database table. The account-mcp-service is doing the same for the persons’ accounts.

spring-ai-mcp-arch

Build MCP Server App with Spring AI

Let’s begin with the implementation of applications that act as MCP servers. They both run and use an in-memory H2 database. To interact with a database we include the Spring Data JPA module. Spring AI allows us to switch between three transport types: STDIO, Spring MVC, and Spring WebFlux. MCP Server with Spring WebFlux supports Server-Sent Events (SSE) and an optional STDIO transport. Here’s a list of required Maven dependencies:

<dependencies>
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-mcp-server-webflux-spring-boot-starter</artifactId>
  </dependency>
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-jpa</artifactId>
  </dependency>
  <dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
    <scope>runtime</scope>
  </dependency>
</dependencies>
XML

Create the Person MCP Server

Here’s an @Entity class for interacting with the person table:

@Entity
public class Person {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    private String firstName;
    private String lastName;
    private int age;
    private String nationality;
    @Enumerated(EnumType.STRING)
    private Gender gender;
    
    // ... getters and setters
    
}
Java

The Spring Data Repository interface contains a single method for searching persons by their nationality:

public interface PersonRepository extends CrudRepository<Person, Long> {
    List<Person> findByNationality(String nationality);
}
Java

The PersonTools @Service bean contains two Spring AI @Tool methods. It injects the PersonRepository bean to interact with the H2 database. The getPersonById method returns a single person with a specific ID field, while the getPersonsByNationality returns a list of all persons with a given nationality.

@Service
public class PersonTools {

    private PersonRepository personRepository;

    public PersonTools(PersonRepository personRepository) {
        this.personRepository = personRepository;
    }

    @Tool(description = "Find person by ID")
    public Person getPersonById(
            @ToolParam(description = "Person ID") Long id) {
        return personRepository.findById(id).orElse(null);
    }

    @Tool(description = "Find all persons by nationality")
    public List<Person> getPersonsByNationality(
            @ToolParam(description = "Nationality") String nationality) {
        return personRepository.findByNationality(nationality);
    }
    
}
Java

Once we define @Tool methods, we must register them within the Spring AI MCP server. We can use the ToolCallbackProvider bean for that. More specifically, the MethodToolCallbackProvider class provides a builder that creates an instance of the ToolCallbackProvider class with a list of references to objects with @Tool methods.

@SpringBootApplication
public class PersonMCPServer {

    public static void main(String[] args) {
        SpringApplication.run(PersonMCPServer.class, args);
    }

    @Bean
    public ToolCallbackProvider tools(PersonTools personTools) {
        return MethodToolCallbackProvider.builder()
                .toolObjects(personTools)
                .build();
    }

}
Java

Finally, we must provide configuration properties. The person-mcp-server app will listen on the 8060 port. We should also set the name and version of the MCP server embedded in our application.

spring:
  ai:
    mcp:
      server:
        name: person-mcp-server
        version: 1.0.0
  jpa:
    database-platform: H2
    generate-ddl: true
    hibernate:
      ddl-auto: create-drop

logging.level.org.springframework.ai: DEBUG

server.port: 8060
YAML

That’s all. We can start the application.

$ cd spring-ai-mcp/person-mcp-service
$ mvn spring-boot:run
ShellSession

Create the Account MCP Server

Then, we will do very similar things in the second application that acts as an MCP server. Here’s the @Entity class for interacting with the account table:

@Entity
public class Account {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    private String number;
    private int balance;
    private Long personId;
    
    // ... getters and setters
    
}
Java

The Spring Data Repository interface contains a single method for searching accounts belonging to a given person:

public interface AccountRepository extends CrudRepository<Account, Long> {
    List<Account> findByPersonId(Long personId);
}
Java

The AccountTools @Service bean contains a single Spring AI @Tool method. It injects the AccountRepository bean to interact with the H2 database. The getAccountsByPersonId method returns a list of accounts owned by the person with a specified ID field value.

@Service
public class AccountTools {

    private AccountRepository accountRepository;

    public AccountTools(AccountRepository accountRepository) {
        this.accountRepository = accountRepository;
    }

    @Tool(description = "Find all accounts by person ID")
    public List<Account> getAccountsByPersonId(
            @ToolParam(description = "Person ID") Long personId) {
        return accountRepository.findByPersonId(personId);
    }
}
Java

Of course, the account-mcp-server application will use ToolCallbackProvider to register @Tool methods defined inside the AccountTools class.

@SpringBootApplication
public class AccountMCPService {

    public static void main(String[] args) {
        SpringApplication.run(AccountMCPService.class, args);
    }

    @Bean
    public ToolCallbackProvider tools(AccountTools accountTools) {
        return MethodToolCallbackProvider.builder()
                .toolObjects(accountTools)
                .build();
    }
    
}
Java

Here are the application configuration properties. The account-mcp-server app will listen on the 8040 port.

spring:
  ai:
    mcp:
      server:
        name: account-mcp-server
        version: 1.0.0
  jpa:
    database-platform: H2
    generate-ddl: true
    hibernate:
      ddl-auto: create-drop

logging.level.org.springframework.ai: DEBUG

server.port: 8040
YAML

Let’s run the second server-side app:

$ cd spring-ai-mcp/account-mcp-service
$ mvn spring-boot:run
ShellSession

Once we start the application, we should see the log indicating how many tools were registered in the MCP server.

spring-ai-mcp-app

Build MCP Client App with Spring AI

Implementation

We will create a single client-side application. However, we can imagine an architecture where many applications consume tools exposed by one MCP server. Our application interacts with the OpenAI chat model, so we must include the Spring AI OpenAI starter. For the MCP Client starter, we can choose between two dependencies: Standard MCP client and Spring WebFlux client. Spring team recommends using the WebFlux-based SSE connection with the spring-ai-mcp-client-webflux-spring-boot-starter. Finally, we include the Spring Web starter to expose the REST endpoint. However, you can use Spring WebFlux starter to expose them reactively.

<dependencies>
  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
  </dependency>
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-mcp-client-webflux-spring-boot-starter</artifactId>
  </dependency>
  <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
  </dependency>
</dependencies>
XML

Our MCP client connects with two MCP servers. We must provide the following connection settings in the application.yml file.

spring.ai.mcp.client.sse.connections:
  person-mcp-server:
    url: http://localhost:8060
  account-mcp-server:
    url: http://localhost:8040
ShellSession

Our sample Spring Boot application contains to @RestControllers, which expose HTTP endpoints. The PersonController class defines two endpoints for searching and counting persons by nationality. The MCP Client Boot Starter automatically configures tool callbacks that integrate with Spring AI’s tool execution framework. Thanks to that we can use the ToolCallbackProvider instance to provide default tools to the ChatClient bean. Then, we can perform the standard steps to interact with the AI model with Spring AI ChatClient. However, the client will use tools exposed by both sample MCP servers.

@RestController
@RequestMapping("/persons")
public class PersonController {

    private final static Logger LOG = LoggerFactory
        .getLogger(PersonController.class);
    private final ChatClient chatClient;

    public PersonController(ChatClient.Builder chatClientBuilder,
                            ToolCallbackProvider tools) {
        this.chatClient = chatClientBuilder
                .defaultTools(tools)
                .build();
    }

    @GetMapping("/nationality/{nationality}")
    String findByNationality(@PathVariable String nationality) {

        PromptTemplate pt = new PromptTemplate("""
                Find persons with {nationality} nationality.
                """);
        Prompt p = pt.create(Map.of("nationality", nationality));
        return this.chatClient.prompt(p)
                .call()
                .content();
    }

    @GetMapping("/count-by-nationality/{nationality}")
    String countByNationality(@PathVariable String nationality) {
        PromptTemplate pt = new PromptTemplate("""
                How many persons come from {nationality} ?
                """);
        Prompt p = pt.create(Map.of("nationality", nationality));
        return this.chatClient.prompt(p)
                .call()
                .content();
    }
}
Java

Let’s switch to the second @RestController. The AccountController class defines two endpoints for searching accounts by person ID. The GET /accounts/count-by-person-id/{personId} returns the number of accounts belonging to a given person. The GET /accounts/balance-by-person-id/{personId} is slightly more complex. It counts the total balance in all person’s accounts. However, it must also return the person’s name and nationality, which means that it must call the getPersonById tool method exposed by the person-mcp-server app after calling the tool for searching accounts by person ID.

@RestController
@RequestMapping("/accounts")
public class AccountController {

    private final static Logger LOG = LoggerFactory.getLogger(PersonController.class);
    private final ChatClient chatClient;

    public AccountController(ChatClient.Builder chatClientBuilder,
                            ToolCallbackProvider tools) {
        this.chatClient = chatClientBuilder
                .defaultTools(tools)
                .build();
    }

    @GetMapping("/count-by-person-id/{personId}")
    String countByPersonId(@PathVariable String personId) {
        PromptTemplate pt = new PromptTemplate("""
                How many accounts has person with {personId} ID ?
                """);
        Prompt p = pt.create(Map.of("personId", personId));
        return this.chatClient.prompt(p)
                .call()
                .content();
    }

    @GetMapping("/balance-by-person-id/{personId}")
    String balanceByPersonId(@PathVariable String personId) {
        PromptTemplate pt = new PromptTemplate("""
                How many accounts has person with {personId} ID ?
                Return person name, nationality and a total balance on his/her accounts.
                """);
        Prompt p = pt.create(Map.of("personId", personId));
        return this.chatClient.prompt(p)
                .call()
                .content();
    }

}
Java

Running the Application

Before starting the client-side app we must export the OpenAI token as the SPRING_AI_OPENAI_API_KEY environment variable.

export SPRING_AI_OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
ShellSession

Then go to the sample-client directory and run the app with the following command:

$ cd spring-ai-mcp/sample-client
$ mvn spring-boot:run
ShellSession

Once we start the application, we can switch to the logs. As you see, the sample-client app receives responses with tools from both person-mcp-server and account-mcp-server apps.

Testing MCP with Spring Boot

Both server-side applications load data from the import.sql scripts on startup. Spring Data JPA automatically imports data from such scripts. Our MCP client application listens on the 8080 port. Let’s call the first endpoint to get a list of persons from Germany:

curl http://localhost:8080/persons/nationality/Germany
ShellSession

Here’s the response from the OpenAI model:

spring-ai-mcp-result

We can also call the endpoint that counts the number with a given nationality.

curl http://localhost:8080/persons/count-by-nationality/Germany
ShellSession

As the final test, we can call the GET /accounts/balance-by-person-id/{personId} endpoint that interacts with tools exposed by both MCP server-side apps. It requires an AI model to combine data from person and account sources.

Exposing Prompts with MCP

We can also expose prompts and resources with the Spring AI MCP server support. To register and expose prompts we need to define the list of SyncPromptRegistration objects. It contains the name of the prompt, a list of input arguments, and a text content.

@SpringBootApplication
public class PersonMCPServer {

    public static void main(String[] args) {
        SpringApplication.run(PersonMCPServer.class, args);
    }

    @Bean
    public ToolCallbackProvider tools(PersonTools personTools) {
        return MethodToolCallbackProvider.builder()
                .toolObjects(personTools)
                .build();
    }

    @Bean
    public List<McpServerFeatures.SyncPromptRegistration> prompts() {
        var prompt = new McpSchema.Prompt("persons-by-nationality", "Get persons by nationality",
                List.of(new McpSchema.PromptArgument("nationality", "Person nationality", true)));

        var promptRegistration = new McpServerFeatures.SyncPromptRegistration(prompt, getPromptRequest -> {
            String argument = (String) getPromptRequest.arguments().get("nationality");
            var userMessage = new McpSchema.PromptMessage(McpSchema.Role.USER,
                    new McpSchema.TextContent("How many persons come from " + argument + " ?"));
            return new McpSchema.GetPromptResult("Count persons by nationality", List.of(userMessage));
        });

        return List.of(promptRegistration);
    }
}
ShellSession

After startup, the application prints information about a list of registered prompts in the logs.

There is no built-in Spring AI support for loading prompts using the MCP client. However, Spring AI MCP support is under active development so we may expect some new features soon. For now, Spring AI provides the auto-configured instance of McpSyncClient. We can use it to search the prompt in the list of prompts received from the server. Then, we can prepare the PromptTemplate instance using the registered content and create the Prompt by filling the template with the input parameters.

@RestController
@RequestMapping("/persons")
public class PersonController {

    private final static Logger LOG = LoggerFactory
        .getLogger(PersonController.class);
    private final ChatClient chatClient;
    private final List<McpSyncClient> mcpSyncClients;

    public PersonController(ChatClient.Builder chatClientBuilder,
                            ToolCallbackProvider tools,
                            List<McpSyncClient> mcpSyncClients) {
        this.chatClient = chatClientBuilder
                .defaultTools(tools)
                .build();
        this.mcpSyncClients = mcpSyncClients;
    }

    // ... other endpoints
    
    @GetMapping("/count-by-nationality-from-client/{nationality}")
    String countByNationalityFromClient(@PathVariable String nationality) {
        return this.chatClient
                .prompt(loadPromptByName("persons-by-nationality", nationality))
                .call()
                .content();
    }

    Prompt loadPromptByName(String name, String nationality) {
        McpSchema.GetPromptRequest r = new McpSchema
            .GetPromptRequest(name, Map.of("nationality", nationality));
        var client = mcpSyncClients.stream()
                .filter(c -> c.getServerInfo().name().equals("person-mcp-server"))
                .findFirst();
        if (client.isPresent()) {
            var content = (McpSchema.TextContent) client.get() 
                .getPrompt(r)
                .messages()
                .getFirst()
                .content();
            PromptTemplate pt = new PromptTemplate(content.text());
            Prompt p = pt.create(Map.of("nationality", nationality));
            LOG.info("Prompt: {}", p);
            return p;
        } else return null;
    }
}
Java

Final Thoughts

Model Context Protocol is an important initiative in the AI world. It allows us to avoid reinventing the wheel for each new data source. A unified protocol streamlines integration, minimizing development time and complexity. As businesses expand their AI toolsets, MCP enables seamless connectivity across multiple systems without the burden of excessive custom code. Spring AI introduced the initial version of MCP support recently. It seems promising. With Spring AI Client and Server starters, we may implement a distributed architecture, where several different apps use the AI tools exposed by a single service.

The post Using Model Context Protocol (MCP) with Spring AI appeared first on Piotr's TechBlog.

]]>
https://piotrminkowski.com/2025/03/17/using-model-context-protocol-mcp-with-spring-ai/feed/ 18 15608
Tool Calling with Spring AI https://piotrminkowski.com/2025/03/13/tool-calling-with-spring-ai/ https://piotrminkowski.com/2025/03/13/tool-calling-with-spring-ai/#comments Thu, 13 Mar 2025 15:55:40 +0000 https://piotrminkowski.com/?p=15596 This article will show you how to use Spring AI support with the most popular AI models for the tool calling feature. Tool calling (or function calling), is a common pattern in AI applications that enables a model to interact with APIs or tools, extending its capabilities. The most popular AI models are trained to […]

The post Tool Calling with Spring AI appeared first on Piotr's TechBlog.

]]>
This article will show you how to use Spring AI support with the most popular AI models for the tool calling feature. Tool calling (or function calling), is a common pattern in AI applications that enables a model to interact with APIs or tools, extending its capabilities. The most popular AI models are trained to know when to call a function. Spring AI formerly supported it through the Function Calling API, which has been deprecated and marked for removal in the next release. My previous article described that feature based on interactions with an internal database and an external market stock API. Today, we will consider the same use case. This time, however, we will replace the deprecated Function Calling API with a new Tool calling feature.

This is the sixth part of my series of articles about Spring Boot and AI. It is worth reading the following posts before proceeding with the current one. Please pay special attention to the second article. I will refer to it often in this article.

  1. https://piotrminkowski.com/2025/01/28/getting-started-with-spring-ai-and-chat-model: The first tutorial introduces the Spring AI project and its support for building applications based on chat models like OpenAI or Mistral AI.
  2. https://piotrminkowski.com/2025/01/30/getting-started-with-spring-ai-function-calling: The second tutorial shows Spring AI support for Java function calling with the OpenAI chat model.
  3. https://piotrminkowski.com/2025/02/24/using-rag-and-vector-store-with-spring-ai: The third tutorial shows Spring AI support for RAG (Retrieval Augmented Generation) and vector store.
  4. https://piotrminkowski.com/2025/03/04/spring-ai-with-multimodality-and-images: The fourth tutorial shows Spring AI support for a multimodality feature and image generation
  5. https://piotrminkowski.com/2025/03/10/using-ollama-with-spring-ai: The fifth tutorial shows Spring AI supports for interactions with AI models run with Ollama

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Motivation for Tool Calling in Spring AI

The tool calling feature helps us solve a common AI model challenge related to internal or live data sources. If we want to augment a model with such data our applications must allow it to interact with a set of APIs or tools. In our case, the internal database (H2) contains information about the structure of our stock wallet. The sample Spring Boot application asks an AI model about the total value of the wallet based on daily stock prices or the highest value for the last few days. The model must retrieve the structure of our stock wallet and the latest stock prices. We will do the same exercise as for a function calling feature. It will be enhanced with additional scenarios I’ll describe later.

Use the Calling Tools Feature in Spring AI

Create WalletTools

Let’s begin with the WalletTools implementation, which is responsible for interaction with a database. We can compare it to the previous implementation based on Spring functions available in the pl.piomin.services.functions.stock.WalletService class. It defines a single method annotated with @Tool. The important element is the right description that must inform the model what that method does. The method returns the number of shares for each company in our portfolio retrieved from the database through the Spring Data @Repository.

public class WalletTools {

    private WalletRepository walletRepository;

    public WalletTools(WalletRepository walletRepository) {
        this.walletRepository = walletRepository;
    }

    @Tool(description = "Number of shares for each company in my wallet")
    public List<Share> getNumberOfShares() {
        return (List<Share>) walletRepository.findAll();
    }
}
Java

We can register the WalletTools class as a Spring @Bean in the application main class.

@Bean
public WalletTools walletTools(WalletRepository walletRepository) {
   return new WalletTools(walletRepository);
}
Java

The Spring Boot application launches an embedded, in-memory database and inserts test data into the stock table. Our wallet contains the most popular companies on the U.S. stock market, including Amazon, Meta, and Microsoft.

insert into share(id, company, quantity) values (1, 'AAPL', 100);
insert into share(id, company, quantity) values (2, 'AMZN', 300);
insert into share(id, company, quantity) values (3, 'META', 300);
insert into share(id, company, quantity) values (4, 'MSFT', 400);
insert into share(id, company, quantity) values (5, 'NVDA', 200);
SQL

Create StockTools

The StockTools class is responsible for interaction with TwelveData stock API. It defines two methods. The getLatestStockPrices method returns only the latest close price for a specified company. It is a tool calling version of the method provided within the pl.piomin.services.functions.stock.StockService function. The second method is more complicated. It must return a historical daily close prices for a defined number of days. Each price must be correlated with a quotation date.

public class StockTools {

    private static final Logger LOG = LoggerFactory.getLogger(StockTools.class);

    private RestTemplate restTemplate;
    @Value("${STOCK_API_KEY:none}")
    String apiKey;

    public StockTools(RestTemplate restTemplate) {
        this.restTemplate = restTemplate;
    }

    @Tool(description = "Latest stock prices")
    public StockResponse getLatestStockPrices(@ToolParam(description = "Name of company") String company) {
        StockData data = restTemplate.getForObject("https://api.twelvedata.com/time_series?symbol={0}&interval=1min&outputsize=1&apikey={1}",
                StockData.class,
                company,
                apiKey);
        DailyStockData latestData = data.getValues().get(0);
        LOG.info("Get stock prices: {} -> {}", company, latestData.getClose());
        return new StockResponse(Float.parseFloat(latestData.getClose()));
    }

    @Tool(description = "Historical daily stock prices")
    public List<DailyShareQuote> getHistoricalStockPrices(@ToolParam(description = "Search period in days") int days,
                                                          @ToolParam(description = "Name of company") String company) {
        StockData data = restTemplate.getForObject("https://api.twelvedata.com/time_series?symbol={0}&interval=1day&outputsize={1}&apikey={2}",
                StockData.class,
                company,
                days,
                apiKey);
        return data.getValues().stream()
                .map(d -> new DailyShareQuote(company, Float.parseFloat(d.getClose()), d.getDatetime()))
                .toList();
    }
}
Java

Here’s the DailyShareQuote Java record returned in the response list.

public record DailyShareQuote(String company, float price, String datetime) {
}
Java

Then, let’s register the StockUtils class as a Spring @Bean.

@Bean
public StockTools stockTools() {
   return new StockTools(restTemplate());
}
Java

Spring AI Tool Calling Flow

Here’s a fragment of the WalletController code, which is responsible for defining interactions with LLM and HTTP endpoints implementation. It injects both StockTools and WalletTools beans.

@RestController
@RequestMapping("/wallet")
public class WalletController {

    private final ChatClient chatClient;
    private final StockTools stockTools;
    private final WalletTools walletTools;

    public WalletController(ChatClient.Builder chatClientBuilder,
                            StockTools stockTools,
                            WalletTools walletTools) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        this.stockTools = stockTools;
        this.walletTools = walletTools;
    }
    
    // HTTP endpoints implementation
}
Java

The GET /wallet/with-tools endpoint calculates the value of our stock wallet in dollars. It uses the latest daily stock prices for each company’s shares from the wallet. There are a few ways to register tools for a chat model call. We use the tools method provided by the ChatClient interface. It allows us to pass the tool object references directly to the chat client. In this case, we are registering the StockTools bean which contains two @Tool methods. The AI model must choose the right method to call in StockTools based on the description and input argument. It should call the getLatestStockPrices method.

@GetMapping("/with-tools")
String calculateWalletValueWithTools() {
   PromptTemplate pt = new PromptTemplate("""
   What’s the current value in dollars of my wallet based on the latest stock daily prices ?
   """);

   return this.chatClient.prompt(pt.create())
           .tools(stockTools, walletTools)
           .call()
           .content();
}
Java

The GET /wallet/highest-day/{days} endpoint calculates the value of our stock wallet in dollars for each day in the specified period determined by the days variable. Then it must return the day with the highest stock wallet value. Same as before we use the tools method from ChatClient to register our tool calling methods. It should call the getHistoricalStockPrices method.

@GetMapping("/highest-day/{days}")
String calculateHighestWalletValue(@PathVariable int days) {
   PromptTemplate pt = new PromptTemplate("""
   On which day during last {days} days my wallet had the highest value in dollars based on the historical daily stock prices ?
   """);

   return this.chatClient.prompt(pt.create(Map.of("days", days)))
            .tools(stockTools, walletTools)
            .call()
            .content();
}
Java

The following diagram illustrates a flow for the second use case that returns the day with the highest stock wallet value. First, it must connect with the database and retrieve the stock wallet structure containing a number of each company shares. Then, it must call the stock API for every company found in the wallet. So, finally, the method calculateHighestWalletValue should be called five times with different values of the company @ToolParam and a value of the days determined by the HTTP endpoint path variable. Once all the data is collected AI model calculates the highest wallet value and returns it together with the quotation date.

spring-ai-tool-calling-arch

Run Application and Verify Tool Calling

Before starting the application we must set environment variables with the AI model and stock API tokens.

export OPEN_AI_TOKEN=<YOUR_OPEN_AI_TOKEN>
export STOCK_API_KEY=<YOUR_STOCK_API_KEY>
Java

Then run the following Maven command:

mvn spring-boot:run
Java

Once the application is started, we can call the first endpoint. The GET /wallet/with-tools calculates the total least value of the stock wallet structure stored in the database.

curl http://localhost:8080/wallet/with-tools
ShellSession

Here’s the fragment of logs generated by the Spring AI @Tool methods. The model behaves as expected. First, it calls the getNumberOfShares tool to retrieve a wallet structure. Then it calls the getLatestStockPrices tool per share to obtain its current price.

spring-ai-tool-calling-logs

Here’s a final response with a wallet value with a detailed explanation.

Then we can call the GET /wallet/highest-day/{days} endpoint to return the day with the highest wallet value. Let’s calculate it for the last 20 days.

curl http://localhost:8080/wallet/highest-day/20
ShellSession

The response is very detailed. Here’s the final part of the content returned by the OpenAI chat model. It returns 26.02.2025 as the day with the highest wallet value. Frankly, sometimes it returns different answers…

spring-ai-tool-calling-chat-response

However, the AI flow works fine. First, it calls the getNumberOfShares tool to retrieve a wallet structure. Then it calls the getHistoricalStockPrices tool per share to obtain its prices for the last 20 days.

We can switch to another AI model to compare their responses. You can connect my sample Spring Boot application e.g. with Mistral AI by activating the mistral-ai Maven profile.

mvn spring-boot:run -Pmistral-ai
ShellSession

Before running the app we must export the Mistral API token.

export MISTRAL_AI_TOKEN=<YOUR_MISTRAL_AI_TOKEN>
ShellSession

To get the best results I changed the Mistral model to mistral-large-latest.

spring.ai.mistralai.chat.options.model = mistral-large-latest
ShellSession

The response from Mistral AI was pretty quick and short:

Final Thoughts

In this article, we analyzed the Spring AI support for tool calling support, which replaces Function Calling API. Tool calling is a powerful feature that enhances how AI models interact with external tools, APIs, and structured data. It makes AI more interactive and practical for real-world applications. Spring AI provides a flexible way to register and invoke such tools. However, it still requires attention from developers, who need to define clear function schemas and handle edge cases.

The post Tool Calling with Spring AI appeared first on Piotr's TechBlog.

]]>
https://piotrminkowski.com/2025/03/13/tool-calling-with-spring-ai/feed/ 2 15596
Using Ollama with Spring AI https://piotrminkowski.com/2025/03/10/using-ollama-with-spring-ai/ https://piotrminkowski.com/2025/03/10/using-ollama-with-spring-ai/#respond Mon, 10 Mar 2025 09:46:35 +0000 https://piotrminkowski.com/?p=15575 This article will teach you how to create a Spring Boot application that implements several AI scenarios using Spring AI and the Ollama tool. Ollama is an open-source tool that aims to run open LLMs on our local machine. It acts like a bridge between LLM and a workstation, providing an API layer on top […]

The post Using Ollama with Spring AI appeared first on Piotr's TechBlog.

]]>
This article will teach you how to create a Spring Boot application that implements several AI scenarios using Spring AI and the Ollama tool. Ollama is an open-source tool that aims to run open LLMs on our local machine. It acts like a bridge between LLM and a workstation, providing an API layer on top of them for other applications or services. With Ollama we can run almost every model we want only by pulling it from a huge library.

This is the fifth part of my series of articles about Spring Boot and AI. I mentioned Ollama in the first part of the series to show how to switch between different AI models with Spring AI. However, it was only a brief introduction. Today, we try to run all AI use cases described in the previous tutorials with the Ollama tool. Those tutorials integrated mostly with OpenAI. In this article, we will test them against different AI models.

  1. https://piotrminkowski.com/2025/01/28/getting-started-with-spring-ai-and-chat-model: The first tutorial introduces the Spring AI project and its support for building applications based on chat models like OpenAI or Mistral AI.
  2. https://piotrminkowski.com/2025/01/30/getting-started-with-spring-ai-function-calling: The second tutorial shows Spring AI support for Java function calling with the OpenAI chat model.
  3. https://piotrminkowski.com/2025/02/24/using-rag-and-vector-store-with-spring-ai: The third tutorial shows Spring AI support for RAG (Retrieval Augmented Generation) and vector store.
  4. https://piotrminkowski.com/2025/03/04/spring-ai-with-multimodality-and-images: The fourth tutorial shows Spring AI support for a multimodality feature and image generation

Fortunately, our application can easily switch between different AI tools or models. To achieve this, we must activate the right Maven profile.

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Prepare a Local Environment for Ollama

A few options exist for accessing Ollama on the local machine with Spring AI. I downloaded Ollama from the following link and installed it on my laptop. Alternatively, we can run it e.g. with Docker Compose or Testcontainers.

Once we install Ollama on our workstation we can run the AI model from its library with the ollama run command. The full list of available models can be found here. At the beginning, we will choose the Llava model. It is one of the most popular models which supports both a vision encoder and language understanding.

ollama run llava
ShellSession

Ollama must pull the model manifest and image. Here’s the ollama run command output. Once we see that, we can interact with the model.

spring-ai-ollama-run-llava-model

The sample application source code already defines the ollama-ai Maven profile with the spring-ai-ollama-spring-boot-starter Spring Boot starter.

<profile>
  <id>ollama-ai</id>
  <dependencies>
    <dependency>
      <groupId>org.springframework.ai</groupId>
      <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
    </dependency>
  </dependencies>
</profile>
XML

The profile is disabled by default. We might enable it during development as shown below (for IntelliJ IDEA). However, the application doesn’t use any vendor-specific components but only generic Spring AI classes and interfaces.

We must activate the ollama-ai profile when running the same application. Assuming we are in the project root directory, we need to run the following Maven command:

mvn spring-boot:run -Pollama-ai
ShellSession

Portability across AI Models

We should avoid using specific model library components to make our application portable between different models. For example, when registering functions in the chat model client we should use FunctionCallingOptions instead of model-specific components like OpenAIChatOptions or OllamaOptions.

@GetMapping
String calculateWalletValue() {
   PromptTemplate pt = new PromptTemplate("""
   What’s the current value in dollars of my wallet based on the latest stock daily prices ?
   """);

   return this.chatClient.prompt(pt.create(
        FunctionCallingOptions.builder()
                    .function("numberOfShares")
                    .function("latestStockPrices")
                    .build()))
            .call()
            .content();
}
Java

Not all models support all the AI capabilities used in our sample application. For models like Ollama or Mistral AI, Spring AI doesn’t provide image generation implementation since those tools don’t support it right now. Therefore we should inject the ImageModel optionally, in case it is not provided by the model-specific library.

@RestController
@RequestMapping("/images")
public class ImageController {

    private final static Logger LOG = LoggerFactory.getLogger(ImageController.class);
    private final ObjectMapper mapper = new ObjectMapper();

    private final ChatClient chatClient;
    private ImageModel imageModel;

    public ImageController(ChatClient.Builder chatClientBuilder,
                           Optional<ImageModel> imageModel,
                           VectorStore store) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        imageModel.ifPresent(model -> this.imageModel = model);
        
        // other initializations 
    }
}
Java

Then, if a method requires the ImageModel bean, we can throw an exception informing it is not by the AI model (1). On the other hand, Spring AI does not provide a dedicated interface for multimodality, which enables AI models to process information from multiple sources. We can use the UserMessage class and the Media class to combine e.g. text with image(s) in the user prompt. The GET /images/describe/{image} endpoint lists items detected in the source image from the classpath (2).

@GetMapping(value = "/generate/{object}", produces = MediaType.IMAGE_PNG_VALUE)
byte[] generate(@PathVariable String object) throws IOException, NotSupportedException {
   if (imageModel == null)
      throw new NotSupportedException("Image model is not supported by the AI model"); // (1)
   ImageResponse ir = imageModel.call(new ImagePrompt("Generate an image with " + object, ImageOptionsBuilder.builder()
           .height(1024)
           .width(1024)
           .N(1)
           .responseFormat("url")
           .build()));
   String url = ir.getResult().getOutput().getUrl();
   UrlResource resource = new UrlResource(url);
   LOG.info("Generated URL: {}", url);
   dynamicImages.add(Media.builder()
           .id(UUID.randomUUID().toString())
           .mimeType(MimeTypeUtils.IMAGE_PNG)
           .data(url)
           .build());
   return resource.getContentAsByteArray();
}
    
@GetMapping("/describe/{image}") // (2)
List<Item> describeImage(@PathVariable String image) {
   Media media = Media.builder()
           .id(image)
           .mimeType(MimeTypeUtils.IMAGE_PNG)
           .data(new ClassPathResource("images/" + image + ".png"))
           .build();
   UserMessage um = new UserMessage("""
   List all items you see on the image and define their category. 
   Return items inside the JSON array in RFC8259 compliant JSON format.
   """, media);
   return this.chatClient.prompt(new Prompt(um))
           .call()
           .entity(new ParameterizedTypeReference<>() {});
}
Java

Let’s try to avoid similar declarations described in Spring AI. Although they are perfectly correct, they will cause problems when switching between different Spring Boot starters for different AI vendors.

ChatResponse response = chatModel.call(
    new Prompt(
        "Generate the names of 5 famous pirates.",
        OllamaOptions.builder()
            .model(OllamaModel.LLAMA3_1)
            .temperature(0.4)
            .build()
    ));
Java

In this case, we can set the global property in the application.properties file which sets the default model used in the scenario with Ollama.

spring.ai.ollama.chat.options.model = llava
Java

Testing Multiple Models with Spring AI and Ollama

By default, Ollama doesn’t require any API token to establish communication with AI models. The Ollama Spring Boot starter provides auto-configuration that connects the chat client to the Ollama API server running on the localhost:11434 address. So, before running our sample application we must export tokens used to authorize against stock market API and a vector store.

export STOCK_API_KEY=<YOUR_STOCK_API_KEY>
export PINECONE_TOKEN=<YOUR_PINECONE_TOKEN>
Java

Llava on Ollama

Let’s begin with the Llava model. We can call the first endpoint that asks the model to generate a list of persons (GET /persons) and then search for the person with a particular in the list stored in the chat memory (GET /persons/{id}).

spring-ai-ollama-get-persons

Then we can the endpoint that displays all the items visible on the particular image from the classpath (GET /images/describe/{image}).

spring-ai-ollama-describe-image

By the way, here is the analyzed image stored in the src/main/resources/images/fruits-3.png file.

The endpoint for describing all the input images from the classpath doesn’t work fine. I tried to tweak it by adding the RFC8259 JSON format sentence or changing a query. However, the AI model always returned a description of a single instead of a whole Media list. The OpenAI model could print descriptions for all images in the String[] format.

@GetMapping("/describe")
String[] describe() {
   UserMessage um = new UserMessage("""
            Explain what do you see on each image from the input list.
            Return data in RFC8259 compliant JSON format.
            """, List.copyOf(Stream.concat(images.stream(), dynamicImages.stream()).toList()));
   return this.chatClient.prompt(new Prompt(um))
            .call()
            .entity(String[].class);
}
Java

Here’s the response. Of course, we can train a model to receive better results or try to prepare a better prompt.

spring-ai-ollama-describe-all-images

After calling the GET /wallet endpoint exposed by the WalletController, I received the [400] Bad Request - {"error":"registry.ollama.ai/library/llava:latest does not support tools"} response. It seems Llava doesn’t support the Function/Tool calling feature. We will also always receive the NotSupportedExcpetion for GET /images/generate/{object} endpoint, since the Spring AI Ollama library doesn’t provide ImageModel bean. You can perform other tests e.g. for RAG and vector store features implemented in the StockController @RestController.

Granite on Ollama

Let’s switch to another interesting model – Granite. Particularly we will test the granite3.2-vision model dedicated to automated content extraction from tables, charts, infographics, plots, and diagrams. First, we set the current model name in the Ollama Spring AI configuration properties.

spring.ai.ollama.chat.options.model = granite3.2-vision
Plaintext

Let’s stop the Llava model and then run granite3.2-vision on Ollama:

ollama run granite3.2-vision
Java

After the application restarts, we can perform some test calls. The endpoint for describing a single image returns a more detailed response than the Llava model. The response for the query with multiple images still looks the same as before.

The Granite Vision model supports a “function calling” feature, but it couldn’t call functions properly using my prompt. Please refer to my article for more details about the Spring AI function calling with OpenAI.

Deepseek on Ollama

The last model we will run within this exercise is Deepseek. DeepSeek-R1 achieves performance comparable to OpenAI-o1 on reasoning tasks. First, we must set the current model name in the Ollama Spring AI configuration properties.

spring.ai.ollama.chat.options.model = deepseek-r1
Plaintext

Then let’s stop the Granite model and then run deepseek-r1 on Ollama:

ollama run deepseek-r1
ShellSession

We need to restart the app:

mvn spring-boot:run -Pollama-ai
ShellSession

As usual, we can call the first endpoint that asks the model to generate a list of persons (GET /persons) and then search for the person with a particular in the list stored in the chat memory (GET /persons/{id}). The response was pretty large, but not in the required JSON format. Here’s the fragment of the response:

The deepseek-r1 model doesn’t support a tool/function calling feature. Also, it didn’t analyze my input image properly and it didn’t return a JSON response according to the Spring AI structured output feature.

Final Thoughts

This article shows how to easily switch between multiple AI models with Spring AI and Ollama. We tested several AI use cases implemented in the sample Spring Boot application across models such as Llava, Granite, or Deepseek. The app provides several endpoints for showing such features as multimodality, chat memory, RAG, vector store, or a function calling. It aims not to compare the AI models, but to give a simple recipe for integration with different AI models and allow playing with them using Spring AI.

The post Using Ollama with Spring AI appeared first on Piotr's TechBlog.

]]>
https://piotrminkowski.com/2025/03/10/using-ollama-with-spring-ai/feed/ 0 15575