RAG Archives - Piotr's TechBlog https://piotrminkowski.com/tag/rag/ Java, Spring, Kotlin, microservices, Kubernetes, containers Tue, 25 Mar 2025 16:02:06 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.1 https://i0.wp.com/piotrminkowski.com/wp-content/uploads/2020/08/cropped-me-2-tr-x-1.png?fit=32%2C32&ssl=1 RAG Archives - Piotr's TechBlog https://piotrminkowski.com/tag/rag/ 32 32 181738725 Spring AI with Azure OpenAI https://piotrminkowski.com/2025/03/25/spring-ai-with-azure-openai/ https://piotrminkowski.com/2025/03/25/spring-ai-with-azure-openai/#comments Tue, 25 Mar 2025 16:02:02 +0000 https://piotrminkowski.com/?p=15651 This article will show you how to use Spring AI features like chat client memory, multimodality, tool calling, or embedding models with the Azure OpenAI service. Azure OpenAI is supported in almost all Spring AI use cases. Moreover, it goes beyond standard OpenAI capabilities, providing advanced AI-driven text generation and incorporating additional AI safety and […]

The post Spring AI with Azure OpenAI appeared first on Piotr's TechBlog.

]]>
This article will show you how to use Spring AI features like chat client memory, multimodality, tool calling, or embedding models with the Azure OpenAI service. Azure OpenAI is supported in almost all Spring AI use cases. Moreover, it goes beyond standard OpenAI capabilities, providing advanced AI-driven text generation and incorporating additional AI safety and responsible AI features. It also enables the integration of AI-focused resources, such as Vector Stores on Azure.

This is the eighth part of my series of articles about Spring Boot and AI. It is worth reading the following posts before proceeding with the current one. Here’s a list of articles about Spring AI on my blog with a short description:

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Enable and Configure Azure OpenAI

You need to begin the exercise by creating an instance of the Azure OpenAI service. The most crucial element here is the service’s name since it is part of the exposed Open AI endpoint. My service’s name is piomin-azure-openai.

spring-ai-azure-openai-create

The Azure OpenAI service should be exposed without restrictions to allow easy access to the Spring AI app.

After creating the service, go to its main page in the Azure Portal. It provides information about API keys and an endpoint URL. Also, you have to deploy an Azure OpenAI model to start making API calls from your Spring AI app.

Copy the key and the endpoint URL and save them for later usage.

spring-ai-azure-openai-api-key

You must create a new deployment with an AI model in the Azure AI Foundry portal. There are several available options. The Spring AI Azure OpenAI starter by default uses the gpt-4o model. If you choose another AI model, you will have to set its name in the spring.ai.azure.openai.chat.options.deployment-name Spring AI property. After selecting the preferred model, click the “Confirm” button.

spring-ai-azure-openai-deploy-model

Finally, you can deploy the model on the Azure AI Foundry portal. Choose the most suitable deployment type for your needs.

Azure allows us to deploy multiple models. You can verify a list of model deployments here:

That’s all on the Azure Portal side. Now it’s time for the implementation part in the application source code.

Enable Azure OpenAI for Spring AI

Spring AI provides the Spring Boot starter for the Azure OpenAI Chat Client. You must add the following dependency to your Maven pom.xml file. Since the sample Spring Boot application is portable across various AI models, it includes the Azure OpenAI starter only if the azure-ai profile is active. Otherwise, it uses the spring-ai-openai-spring-boot-starter library.

<profile>
  <id>azure-ai</id>
  <dependencies>
    <dependency>
      <groupId>org.springframework.ai</groupId>
      <artifactId>spring-ai-azure-openai-spring-boot-starter</artifactId>
    </dependency>
  </dependencies>
</profile>
XML

It’s time to use the key you previously copied from the Azure OpenAI service page. Let’s export it as the AZURE_OPENAI_API_KEY environment variable.

export AZURE_OPENAI_API_KEY=<YOUR_AZURE_OPENAI_API_KEY>
ShellSession

Here are the application properties dedicated to the azure-ai Spring Boot profile. The previously exported AZURE_OPENAI_API_KEY environment variable is set as the spring.ai.azure.openai.api-key property. You also must set the OpenAI service endpoint. This address depends on your Azure OpenAI service name.

spring.ai.azure.openai.api-key = ${AZURE_OPENAI_API_KEY}
spring.ai.azure.openai.endpoint = https://piomin-azure-openai.openai.azure.com/
application-azure-ai.properties

To run the application and connect to your instance of the Azure OpenAI service, you must activate the azure-ai Maven profile and the Spring Boot profile under the same name. Here’s the required command:

mvn spring-boot:run -Pazure-ai -Dspring-boot.run.profiles=azure-ai
ShellSession

Test Spring AI Features with Azure OpenAI

I described several Spring AI features in the previous articles from this series. In each section, I will briefly mention the tested feature with a fragment of the sample source code. Please refer to my previous posts for more details about each feature and its sample implementation.

Chat Client with Memory and Structured Output

Here’s the @RestController containing endpoints we will use in these tests.

@RestController
@RequestMapping("/persons")
public class PersonController {

    private final ChatClient chatClient;

    public PersonController(ChatClient.Builder chatClientBuilder,
                            ChatMemory chatMemory) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(
                        new PromptChatMemoryAdvisor(chatMemory),
                        new SimpleLoggerAdvisor())
                .build();
    }

    @GetMapping
    List<Person> findAll() {
        PromptTemplate pt = new PromptTemplate("""
                Return a current list of 10 persons if exists or generate a new list with random values.
                Each object should contain an auto-incremented id field.
                The age value should be a random number between 18 and 99.
                Do not include any explanations or additional text.
                Return data in RFC8259 compliant JSON format.
                """);

        return this.chatClient.prompt(pt.create())
                .call()
                .entity(new ParameterizedTypeReference<>() {});
    }

    @GetMapping("/{id}")
    Person findById(@PathVariable String id) {
        PromptTemplate pt = new PromptTemplate("""
                Find and return the object with id {id} in a current list of persons.
                """);
        Prompt p = pt.create(Map.of("id", id));
        return this.chatClient.prompt(p)
                .call()
                .entity(Person.class);
    }
}
Java

First, you must call the endpoint that generates a list of ten persons from different countries. Then choose one person by ID to pick it up from the chat memory. Here are the results.

spring-ai-azure-openai-test-chat-model

The interesting part happens in the background. Here’s a fragment of advice context added to the prompt by Spring AI.

Tool Calling

Here’s the @RestController containing endpoints we will use in these tests. There are two tools injected into the chat client: StockTools and WalletTools. These tools interact with a local H2 database to get a sample stock wallet structure and with the stock online API to load the latest share prices.

@RestController
@RequestMapping("/wallet")
public class WalletController {

    private final ChatClient chatClient;
    private final StockTools stockTools;
    private final WalletTools walletTools;

    public WalletController(ChatClient.Builder chatClientBuilder,
                            StockTools stockTools,
                            WalletTools walletTools) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        this.stockTools = stockTools;
        this.walletTools = walletTools;
    }

    @GetMapping("/with-tools")
    String calculateWalletValueWithTools() {
        PromptTemplate pt = new PromptTemplate("""
        What’s the current value in dollars of my wallet based on the latest stock daily prices ?
        """);

        return this.chatClient.prompt(pt.create())
                .tools(stockTools, walletTools)
                .call()
                .content();
    }

    @GetMapping("/highest-day/{days}")
    String calculateHighestWalletValue(@PathVariable int days) {
        PromptTemplate pt = new PromptTemplate("""
        On which day during last {days} days my wallet had the highest value in dollars based on the historical daily stock prices ?
        """);

        return this.chatClient.prompt(pt.create(Map.of("days", days)))
                .tools(stockTools, walletTools)
                .call()
                .content();
    }
}
Java

You must have your API key for the Twelvedata service to run these tests. Don’t forget to export it as the STOCK_API_KEY environment variable before running the app.

export STOCK_API_KEY=<YOUR_STOCK_API_KEY>
Java

The GET /wallet/with-tools endpoint calculates the current stock wallet value in dollars.

spring-ai-azure-openai-test-tool-calling

The GET /wallet/highest-day/{days} computes the value of the stock wallet for a given period in days and identifies the day with the highest value.

Multimodality and Images

Here’s a part of the @RestController responsible for describing image content and generating a new image with a given item.

@RestController
@RequestMapping("/images")
public class ImageController {

    private final static Logger LOG = LoggerFactory.getLogger(ImageController.class);
    private final ObjectMapper mapper = new ObjectMapper();

    private final ChatClient chatClient;
    private ImageModel imageModel;

    public ImageController(ChatClient.Builder chatClientBuilder,
                           Optional<ImageModel> imageModel) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        imageModel.ifPresent(model -> this.imageModel = model);
    }
        
    @GetMapping("/describe/{image}")
    List<Item> describeImage(@PathVariable String image) {
        Media media = Media.builder()
                .id(image)
                .mimeType(MimeTypeUtils.IMAGE_PNG)
                .data(new ClassPathResource("images/" + image + ".png"))
                .build();
        UserMessage um = new UserMessage("""
        List all items you see on the image and define their category.
        Return items inside the JSON array in RFC8259 compliant JSON format.
        """, media);
        return this.chatClient.prompt(new Prompt(um))
                .call()
                .entity(new ParameterizedTypeReference<>() {});
    }
    
    @GetMapping(value = "/generate/{object}", produces = MediaType.IMAGE_PNG_VALUE)
    byte[] generate(@PathVariable String object) throws IOException, NotSupportedException {
        if (imageModel == null)
            throw new NotSupportedException("Image model is not supported");
        ImageResponse ir = imageModel.call(new ImagePrompt("Generate an image with " + object, ImageOptionsBuilder.builder()
                .height(1024)
                .width(1024)
                .N(1)
                .responseFormat("url")
                .build()));
        String url = ir.getResult().getOutput().getUrl();
        UrlResource resource = new UrlResource(url);
        LOG.info("Generated URL: {}", url);
        dynamicImages.add(Media.builder()
                .id(UUID.randomUUID().toString())
                .mimeType(MimeTypeUtils.IMAGE_PNG)
                .data(url)
                .build());
        return resource.getContentAsByteArray();
    }
    
}
Java

The GET /images/describe/{image} returns a structured list of items identified on a given image. It also categorizes each detected item. In this case, there are two available categories: fruits and vegetables.

spring-ai-azure-openai-test-multimodality

By the way, here’s the image described above.

The image generation feature requires a dedicated model on Azure AI. The DALL-E 2 and DALL-E 3 models on Azure support a text-to-image feature.

spring-ai-azure-openai-dalle3

The application must be aware of the model name. That’s why you must add a new property to your application properties with the following value.

spring.ai.azure.openai.image.options.deployment-name = dall-e-3
Plaintext

Then you must restart the application. After that, you can generate an image by calling the GET /images/generate/{object} endpoint. Here’s the result for the pineapple.

Enable Azure CosmosDB Vector Store

Dependency

By default, the sample Spring Boot application uses Pinecone vector store. However, SpringAI supports two services available on Azure: Azure AI Search and CosmosDB. Let’s choose CosmosDB as the vector store. You must add the following dependency to your Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-azure-cosmos-db-store-spring-boot-starter</artifactId>
</dependency>
XML

Configuration on Azure

Then, you must create an instance of CosmosDB in your Azure account. The name of my instance is piomin-ai-cosmos.

Once it is created, you will obtain its address and API key. To do that, go to the “Settings -> Keys” menu and save both values visible below.

spring-ai-azure-openai-cosmosdb

Then, you have to create a dedicated database and container for your application. To do that, go to the “Data Explorer” tab and provide names for the database and container ID. You must also set the partition key.

All previously provided values must be set in the application properties. Export your CosmosDB API key as the AZURE_VECTORSTORE_API_KEY environment variable.

spring.ai.vectorstore.cosmosdb.endpoint = https://piomin-ai-cosmos.documents.azure.com:443/
spring.ai.vectorstore.cosmosdb.key = ${AZURE_VECTORSTORE_API_KEY}
spring.ai.vectorstore.cosmosdb.databaseName = spring-ai
spring.ai.vectorstore.cosmosdb.containerName = spring-ai
spring.ai.vectorstore.cosmosdb.partitionKeyPath = /id
application-azure-ai.properties

Unfortunately, there are still some issues with the Azure CosmosDB support in the Spring AI M6 milestone version. I see that they were fixed in the SNAPSHOT version. So, if you want to test it by yourself, you will have to switch from milestones to snapshots.

<properties>
  <java.version>21</java.version>
  <spring-ai.version>1.0.0-SNAPSHOT</spring-ai.version>
</properties>
  
<repositories>
  <repository>
    <name>Central Portal Snapshots</name>
    <id>central-portal-snapshots</id>
    <url>https://central.sonatype.com/repository/maven-snapshots/</url>
    <releases>
      <enabled>false</enabled>
    </releases>
    <snapshots>
      <enabled>true</enabled>
    </snapshots>
  </repository>
  <repository>
    <id>spring-snapshots</id>
    <name>Spring Snapshots</name>
    <url>https://repo.spring.io/snapshot</url>
    <releases>
      <enabled>false</enabled>
    </releases>
    <snapshots>
      <enabled>true</enabled>
    </snapshots>
  </repository>
</repositories>
XML

Run and Test the Application

After those changes, you can start the application with the following command:

mvn spring-boot:run -Pazure-ai -Dspring-boot.run.profiles=azure-ai
XML

Once the application is running, you can test the following @RestController that offers RAG functionality. The GET /stocks/load-data endpoint obtains stock prices of given companies and puts them in the vector store. The GET /stocks/v2/most-growth-trend uses the RetrievalAugmentationAdvisor instance to retrieve the most suitable data and include it in the user query.

@RestController
@RequestMapping("/stocks")
public class StockController {

    private final ObjectMapper mapper = new ObjectMapper();
    private final static Logger LOG = LoggerFactory.getLogger(StockController.class);
    private final ChatClient chatClient;
    private final RewriteQueryTransformer.Builder rqtBuilder;
    private final RestTemplate restTemplate;
    private final VectorStore store;

    @Value("${STOCK_API_KEY:none}")
    private String apiKey;

    public StockController(ChatClient.Builder chatClientBuilder,
                           VectorStore store,
                           RestTemplate restTemplate) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        this.rqtBuilder = RewriteQueryTransformer.builder()
                .chatClientBuilder(chatClientBuilder);
        this.store = store;
        this.restTemplate = restTemplate;
    }

    @GetMapping("/load-data")
    void load() throws JsonProcessingException {
        final List<String> companies = List.of("AAPL", "MSFT", "GOOG", "AMZN", "META", "NVDA");
        for (String company : companies) {
            StockData data = restTemplate.getForObject("https://api.twelvedata.com/time_series?symbol={0}&interval=1day&outputsize=10&apikey={1}",
                    StockData.class,
                    company,
                    apiKey);
            if (data != null && data.getValues() != null) {
                var list = data.getValues().stream().map(DailyStockData::getClose).toList();
                var doc = Document.builder()
                        .id(company)
                        .text(mapper.writeValueAsString(new Stock(company, list)))
                        .build();
                store.add(List.of(doc));
                LOG.info("Document added: {}", company);
            }
        }
    }

    @RequestMapping("/v2/most-growth-trend")
    String getBestTrendV2() {
        PromptTemplate pt = new PromptTemplate("""
                {query}.
                Which {target} is the most % growth?
                The 0 element in the prices table is the latest price, while the last element is the oldest price.
                """);

        Prompt p = pt.create(Map.of("query", "Find the most growth trends", "target", "share"));

        Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()
                .documentRetriever(VectorStoreDocumentRetriever.builder()
                        .similarityThreshold(0.7)
                        .topK(3)
                        .vectorStore(store)
                        .build())
                .queryTransformers(rqtBuilder.promptTemplate(pt).build())
                .build();

        return this.chatClient.prompt(p)
                .advisors(retrievalAugmentationAdvisor)
                .call()
                .content();
    }

}
Java

Finally, you can call the following two endpoints.

$ curl http://localhost:8080/stocks/load-data
$ curl http://localhost:8080/stocks/v2/most-growth-trend
ShellSession

Final Thoughts

This exercise shows how to modify an existing Spring Boot AI application to integrate it with the Azure OpenAI service. It also gives a recipe on how to include Azure CosmosDB as a vector store for RAG scenarios and similarity searches.

The post Spring AI with Azure OpenAI appeared first on Piotr's TechBlog.

]]>
https://piotrminkowski.com/2025/03/25/spring-ai-with-azure-openai/feed/ 4 15651
Using Ollama with Spring AI https://piotrminkowski.com/2025/03/10/using-ollama-with-spring-ai/ https://piotrminkowski.com/2025/03/10/using-ollama-with-spring-ai/#respond Mon, 10 Mar 2025 09:46:35 +0000 https://piotrminkowski.com/?p=15575 This article will teach you how to create a Spring Boot application that implements several AI scenarios using Spring AI and the Ollama tool. Ollama is an open-source tool that aims to run open LLMs on our local machine. It acts like a bridge between LLM and a workstation, providing an API layer on top […]

The post Using Ollama with Spring AI appeared first on Piotr's TechBlog.

]]>
This article will teach you how to create a Spring Boot application that implements several AI scenarios using Spring AI and the Ollama tool. Ollama is an open-source tool that aims to run open LLMs on our local machine. It acts like a bridge between LLM and a workstation, providing an API layer on top of them for other applications or services. With Ollama we can run almost every model we want only by pulling it from a huge library.

This is the fifth part of my series of articles about Spring Boot and AI. I mentioned Ollama in the first part of the series to show how to switch between different AI models with Spring AI. However, it was only a brief introduction. Today, we try to run all AI use cases described in the previous tutorials with the Ollama tool. Those tutorials integrated mostly with OpenAI. In this article, we will test them against different AI models.

  1. https://piotrminkowski.com/2025/01/28/getting-started-with-spring-ai-and-chat-model: The first tutorial introduces the Spring AI project and its support for building applications based on chat models like OpenAI or Mistral AI.
  2. https://piotrminkowski.com/2025/01/30/getting-started-with-spring-ai-function-calling: The second tutorial shows Spring AI support for Java function calling with the OpenAI chat model.
  3. https://piotrminkowski.com/2025/02/24/using-rag-and-vector-store-with-spring-ai: The third tutorial shows Spring AI support for RAG (Retrieval Augmented Generation) and vector store.
  4. https://piotrminkowski.com/2025/03/04/spring-ai-with-multimodality-and-images: The fourth tutorial shows Spring AI support for a multimodality feature and image generation

Fortunately, our application can easily switch between different AI tools or models. To achieve this, we must activate the right Maven profile.

Source Code

Feel free to use my source code if you’d like to try it out yourself. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Prepare a Local Environment for Ollama

A few options exist for accessing Ollama on the local machine with Spring AI. I downloaded Ollama from the following link and installed it on my laptop. Alternatively, we can run it e.g. with Docker Compose or Testcontainers.

Once we install Ollama on our workstation we can run the AI model from its library with the ollama run command. The full list of available models can be found here. At the beginning, we will choose the Llava model. It is one of the most popular models which supports both a vision encoder and language understanding.

ollama run llava
ShellSession

Ollama must pull the model manifest and image. Here’s the ollama run command output. Once we see that, we can interact with the model.

spring-ai-ollama-run-llava-model

The sample application source code already defines the ollama-ai Maven profile with the spring-ai-ollama-spring-boot-starter Spring Boot starter.

<profile>
  <id>ollama-ai</id>
  <dependencies>
    <dependency>
      <groupId>org.springframework.ai</groupId>
      <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
    </dependency>
  </dependencies>
</profile>
XML

The profile is disabled by default. We might enable it during development as shown below (for IntelliJ IDEA). However, the application doesn’t use any vendor-specific components but only generic Spring AI classes and interfaces.

We must activate the ollama-ai profile when running the same application. Assuming we are in the project root directory, we need to run the following Maven command:

mvn spring-boot:run -Pollama-ai
ShellSession

Portability across AI Models

We should avoid using specific model library components to make our application portable between different models. For example, when registering functions in the chat model client we should use FunctionCallingOptions instead of model-specific components like OpenAIChatOptions or OllamaOptions.

@GetMapping
String calculateWalletValue() {
   PromptTemplate pt = new PromptTemplate("""
   What’s the current value in dollars of my wallet based on the latest stock daily prices ?
   """);

   return this.chatClient.prompt(pt.create(
        FunctionCallingOptions.builder()
                    .function("numberOfShares")
                    .function("latestStockPrices")
                    .build()))
            .call()
            .content();
}
Java

Not all models support all the AI capabilities used in our sample application. For models like Ollama or Mistral AI, Spring AI doesn’t provide image generation implementation since those tools don’t support it right now. Therefore we should inject the ImageModel optionally, in case it is not provided by the model-specific library.

@RestController
@RequestMapping("/images")
public class ImageController {

    private final static Logger LOG = LoggerFactory.getLogger(ImageController.class);
    private final ObjectMapper mapper = new ObjectMapper();

    private final ChatClient chatClient;
    private ImageModel imageModel;

    public ImageController(ChatClient.Builder chatClientBuilder,
                           Optional<ImageModel> imageModel,
                           VectorStore store) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        imageModel.ifPresent(model -> this.imageModel = model);
        
        // other initializations 
    }
}
Java

Then, if a method requires the ImageModel bean, we can throw an exception informing it is not by the AI model (1). On the other hand, Spring AI does not provide a dedicated interface for multimodality, which enables AI models to process information from multiple sources. We can use the UserMessage class and the Media class to combine e.g. text with image(s) in the user prompt. The GET /images/describe/{image} endpoint lists items detected in the source image from the classpath (2).

@GetMapping(value = "/generate/{object}", produces = MediaType.IMAGE_PNG_VALUE)
byte[] generate(@PathVariable String object) throws IOException, NotSupportedException {
   if (imageModel == null)
      throw new NotSupportedException("Image model is not supported by the AI model"); // (1)
   ImageResponse ir = imageModel.call(new ImagePrompt("Generate an image with " + object, ImageOptionsBuilder.builder()
           .height(1024)
           .width(1024)
           .N(1)
           .responseFormat("url")
           .build()));
   String url = ir.getResult().getOutput().getUrl();
   UrlResource resource = new UrlResource(url);
   LOG.info("Generated URL: {}", url);
   dynamicImages.add(Media.builder()
           .id(UUID.randomUUID().toString())
           .mimeType(MimeTypeUtils.IMAGE_PNG)
           .data(url)
           .build());
   return resource.getContentAsByteArray();
}
    
@GetMapping("/describe/{image}") // (2)
List<Item> describeImage(@PathVariable String image) {
   Media media = Media.builder()
           .id(image)
           .mimeType(MimeTypeUtils.IMAGE_PNG)
           .data(new ClassPathResource("images/" + image + ".png"))
           .build();
   UserMessage um = new UserMessage("""
   List all items you see on the image and define their category. 
   Return items inside the JSON array in RFC8259 compliant JSON format.
   """, media);
   return this.chatClient.prompt(new Prompt(um))
           .call()
           .entity(new ParameterizedTypeReference<>() {});
}
Java

Let’s try to avoid similar declarations described in Spring AI. Although they are perfectly correct, they will cause problems when switching between different Spring Boot starters for different AI vendors.

ChatResponse response = chatModel.call(
    new Prompt(
        "Generate the names of 5 famous pirates.",
        OllamaOptions.builder()
            .model(OllamaModel.LLAMA3_1)
            .temperature(0.4)
            .build()
    ));
Java

In this case, we can set the global property in the application.properties file which sets the default model used in the scenario with Ollama.

spring.ai.ollama.chat.options.model = llava
Java

Testing Multiple Models with Spring AI and Ollama

By default, Ollama doesn’t require any API token to establish communication with AI models. The Ollama Spring Boot starter provides auto-configuration that connects the chat client to the Ollama API server running on the localhost:11434 address. So, before running our sample application we must export tokens used to authorize against stock market API and a vector store.

export STOCK_API_KEY=<YOUR_STOCK_API_KEY>
export PINECONE_TOKEN=<YOUR_PINECONE_TOKEN>
Java

Llava on Ollama

Let’s begin with the Llava model. We can call the first endpoint that asks the model to generate a list of persons (GET /persons) and then search for the person with a particular in the list stored in the chat memory (GET /persons/{id}).

spring-ai-ollama-get-persons

Then we can the endpoint that displays all the items visible on the particular image from the classpath (GET /images/describe/{image}).

spring-ai-ollama-describe-image

By the way, here is the analyzed image stored in the src/main/resources/images/fruits-3.png file.

The endpoint for describing all the input images from the classpath doesn’t work fine. I tried to tweak it by adding the RFC8259 JSON format sentence or changing a query. However, the AI model always returned a description of a single instead of a whole Media list. The OpenAI model could print descriptions for all images in the String[] format.

@GetMapping("/describe")
String[] describe() {
   UserMessage um = new UserMessage("""
            Explain what do you see on each image from the input list.
            Return data in RFC8259 compliant JSON format.
            """, List.copyOf(Stream.concat(images.stream(), dynamicImages.stream()).toList()));
   return this.chatClient.prompt(new Prompt(um))
            .call()
            .entity(String[].class);
}
Java

Here’s the response. Of course, we can train a model to receive better results or try to prepare a better prompt.

spring-ai-ollama-describe-all-images

After calling the GET /wallet endpoint exposed by the WalletController, I received the [400] Bad Request - {"error":"registry.ollama.ai/library/llava:latest does not support tools"} response. It seems Llava doesn’t support the Function/Tool calling feature. We will also always receive the NotSupportedExcpetion for GET /images/generate/{object} endpoint, since the Spring AI Ollama library doesn’t provide ImageModel bean. You can perform other tests e.g. for RAG and vector store features implemented in the StockController @RestController.

Granite on Ollama

Let’s switch to another interesting model – Granite. Particularly we will test the granite3.2-vision model dedicated to automated content extraction from tables, charts, infographics, plots, and diagrams. First, we set the current model name in the Ollama Spring AI configuration properties.

spring.ai.ollama.chat.options.model = granite3.2-vision
Plaintext

Let’s stop the Llava model and then run granite3.2-vision on Ollama:

ollama run granite3.2-vision
Java

After the application restarts, we can perform some test calls. The endpoint for describing a single image returns a more detailed response than the Llava model. The response for the query with multiple images still looks the same as before.

The Granite Vision model supports a “function calling” feature, but it couldn’t call functions properly using my prompt. Please refer to my article for more details about the Spring AI function calling with OpenAI.

Deepseek on Ollama

The last model we will run within this exercise is Deepseek. DeepSeek-R1 achieves performance comparable to OpenAI-o1 on reasoning tasks. First, we must set the current model name in the Ollama Spring AI configuration properties.

spring.ai.ollama.chat.options.model = deepseek-r1
Plaintext

Then let’s stop the Granite model and then run deepseek-r1 on Ollama:

ollama run deepseek-r1
ShellSession

We need to restart the app:

mvn spring-boot:run -Pollama-ai
ShellSession

As usual, we can call the first endpoint that asks the model to generate a list of persons (GET /persons) and then search for the person with a particular in the list stored in the chat memory (GET /persons/{id}). The response was pretty large, but not in the required JSON format. Here’s the fragment of the response:

The deepseek-r1 model doesn’t support a tool/function calling feature. Also, it didn’t analyze my input image properly and it didn’t return a JSON response according to the Spring AI structured output feature.

Final Thoughts

This article shows how to easily switch between multiple AI models with Spring AI and Ollama. We tested several AI use cases implemented in the sample Spring Boot application across models such as Llava, Granite, or Deepseek. The app provides several endpoints for showing such features as multimodality, chat memory, RAG, vector store, or a function calling. It aims not to compare the AI models, but to give a simple recipe for integration with different AI models and allow playing with them using Spring AI.

The post Using Ollama with Spring AI appeared first on Piotr's TechBlog.

]]>
https://piotrminkowski.com/2025/03/10/using-ollama-with-spring-ai/feed/ 0 15575
Using RAG and Vector Store with Spring AI https://piotrminkowski.com/2025/02/24/using-rag-and-vector-store-with-spring-ai/ https://piotrminkowski.com/2025/02/24/using-rag-and-vector-store-with-spring-ai/#comments Mon, 24 Feb 2025 14:29:03 +0000 https://piotrminkowski.com/?p=15538 This article will teach you how to create a Spring Boot application that uses RAG (Retrieval Augmented Generation) and vector store with Spring AI. We will continue experiments with stock data, which were initiated in my previous article about Spring AI. This is the third part of my series of articles about Spring Boot and […]

The post Using RAG and Vector Store with Spring AI appeared first on Piotr's TechBlog.

]]>
This article will teach you how to create a Spring Boot application that uses RAG (Retrieval Augmented Generation) and vector store with Spring AI. We will continue experiments with stock data, which were initiated in my previous article about Spring AI. This is the third part of my series of articles about Spring Boot and AI. It is worth reading the following posts before proceeding with the current one:

  1. https://piotrminkowski.com/2025/01/28/getting-started-with-spring-ai-and-chat-model: The first tutorial introduces the Spring AI project and its support for building applications based on chat models like OpenAI or Mistral AI.
  2. https://piotrminkowski.com/2025/01/30/getting-started-with-spring-ai-function-calling: The second tutorial shows Spring AI support for Java function calling with the OpenAI chat model.

This article will show how to include one of the vector stores supported by Spring AI and advisors dedicated to RAG support in our sample application codebase used by two previous articles. It will connect to Open AI API, but you can easily switch to other models using Mistral AI or Ollama support in Spring AI. For more details, please refer to my first article.

Source Code

If you would like to try it by yourself, you may always take a look at my source code. To do that, you must clone my sample GitHub repository. Then you should only follow my instructions.

Motivation for RAG with Spring AI

The problem to solve is similar to the one described in my previous article about the Spring AI function calling feature. Since the OpenAI model is trained on a static dataset it does not have direct access to the online services or APIs. We want it to analyze stock growth trends for the biggest companies in the US stock market. Therefore, we must obtain share prices from a public API that returns live stock market data. Then, we can store this data in our local database and integrate it with the sample Spring Boot AI application. Instead of a typical relational database, we will use a vector store. In vector databases, queries work differently from those in traditional relational databases. Instead of looking for exact matches, they conduct similarity searches. It retrieves the most similar vectors to a given input vector.

After loading all required data into a vector database, we must integrate it with the AI model. Spring AI provides a comfortable mechanism for that based on the Advisors API. We have already used some built-in advisors in the previous examples, e.g. to print detailed AI communication logs or enable chat memory. This time they will allow us to implement a Retrieval Augmented Generation (RAG) technique for our app. Thanks to that, the Spring Boot app will retrieve similar documents that best match a user query before sending a request to the AI model. These documents provide context for the query and are sent to the AI model alongside the user’s question.

Here’s a simplified visualization of our process.

spring-ai-rag-arch

Vector Store with Spring AI

Set up Pinecone Database

In this section, we will prepare a vector store, integrate it with our Spring Boot application, and load some data there. Spring AI supports various vector databases. It provides the VectorStore interface to directly interact with a vector store from our Spring Boot app. The full list of supported databases can be found in the Spring AI docs here.

We will proceed with the Pinecone database. It is a popular cloud-based vector database, that allows us to store and search vectors efficiently. Instead of a cloud-based database, we can set up a local instance of another popular vector store – ChromaDB. In that case, you can use the docker-compose.yml file in the repository root directory, to run that database with the docker compose up command. With Pinecone we need to sign up to create an account on their portal. Then we should create an index. There are several customizations available, but the most important thing is to choose the right embedding model. Since text-embedding-ada-002 is a default embedding model for OpenAI we should choose that option. The name of our index is spring-ai. We can read an environment and project name from the generated host URL.

spring-ai-rag-pinecone

After creating an index, we should generate an API key.

Then, we will copy the generated key and export it as the PINECONE_TOKEN environment variable.

export PINECONE_TOKEN=<YOUR_PINECONE_APIKEY>
ShellSession

Integrate Spring Boot app with Pinecode using Spring AI

Our Spring Boot application must include the spring-ai-pinecone-store-spring-boot-starter dependency to smoothly integrate with the Pinecone vector store.

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-pinecone-store-spring-boot-starter</artifactId>
</dependency>
XML

Then, we must provide connection settings and credentials to the Pinecone database in the Spring Boot application.properties file. It must be at least the Pinecode API key, environment name, project name, and index name.

spring.ai.vectorstore.pinecone.apiKey = ${PINECONE_TOKEN}
spring.ai.vectorstore.pinecone.environment = aped-4627-b74a
spring.ai.vectorstore.pinecone.projectId = fsbak04
spring.ai.vectorstore.pinecone.index-name = spring-ai
SQL

After providing all the required configuration settings we can inject and use the VectorStore bean e.g. in our application REST controller. In the following code fragment, we load input data into a vector store and perform a simple similarity search to find the most growth stock trend. We individually query the Twelvedata API for each company from a list and deserialize the response to the StockData object. Then we create a Spring AI Document object, which contains the name of a company and share close prices for the last 10 days. Data is written in the JSON format.

@RestController
@RequestMapping("/stocks")
public class StockController {

    private final ObjectMapper mapper = new ObjectMapper();
    private final static Logger LOG = LoggerFactory.getLogger(StockController.class);
    private final RestTemplate restTemplate;
    private final VectorStore store;

    @Value("${STOCK_API_KEY}")
    private String apiKey;

    public StockController(VectorStore store,
                           RestTemplate restTemplate) {
        this.store = store;
        this.restTemplate = restTemplate;
    }

    @PostMapping("/load-data")
    void load() throws JsonProcessingException {
        final List<String> companies = List.of("AAPL", "MSFT", "GOOG", "AMZN", "META", "NVDA");
        for (String company : companies) {
            StockData data = restTemplate.getForObject("https://api.twelvedata.com/time_series?symbol={0}&interval=1day&outputsize=10&apikey={1}",
                    StockData.class,
                    company,
                    apiKey);
            if (data != null && data.getValues() != null) {
                var list = data.getValues().stream().map(DailyStockData::getClose).toList();
                var doc = Document.builder()
                        .id(company)
                        .text(mapper.writeValueAsString(new Stock(company, list)))
                        .build();
                store.add(List.of(doc));
                LOG.info("Document added: {}", company);
            }
        }
    }
    
    @GetMapping("/docs")
    List<Document> query() {
        SearchRequest searchRequest = SearchRequest.builder()
                .query("Find the most growth trends")
                .topK(2)
                .build();
        List<Document> docs = store.similaritySearch(searchRequest);
        return docs;
    }
    
}
Java

Once we start our application and call the POST /stocks/load-data endpoint, we should see 6 records loaded into the target store. You can verify the content of the database in the Pinocone index browser.

spring-ai-rag-vector-store-data

Then we can interact directly with a vector store by calling the GET /stocks/docs endpoint.

curl http://localhost:8080/stocks/docs
ShellSession

Implement RAG with Spring AI

Use QuestionAnswerAdvisor

Previously we loaded data into a target vector store and performed a simple search to find the most growth trend. Our main goal in this section is to incorporate relevant data into an AI model prompt. We can implement RAG with Spring AI in two ways with different advisors. Let’s begin with QuestionAnswerAdvisor. To perform RAG we must provide an instance of QuestionAnswerAdvisor to the ChatClient bean. The QuestionAnswerAdvisor constructor takes the VectorStore instance as an input argument.

@RequestMapping("/v1/most-growth-trend")
String getBestTrend() {
   PromptTemplate pt = new PromptTemplate("""
            {query}.
            Which {target} is the most % growth?
            The 0 element in the prices table is the latest price, while the last element is the oldest price.
            """);

   Prompt p = pt.create(
            Map.of("query", "Find the most growth trends",
                   "target", "share")
   );

   return this.chatClient.prompt(p)
            .advisors(new QuestionAnswerAdvisor(store))
            .call()
            .content();
}
Java

Then, we can call the endpoint GET /stocks/v1/most-growth-trend to see the AI model response. By the way, the result is not accurate.

Let’s work a little bit on our previous code. We will publish a new version of the AI prompt under the GET /stocks/v1-1/most-growth-trend endpoint. The changed lines have been highlighted. We build the SearchRequest objects that return the top 3 records with the 0.7 similarity threshold. The newly created SearchRequest object must be passed as an argument in the QuestionAnswerAdvisor constructor.

@RequestMapping("/v1-1/most-growth-trend")
String getBestTrendV11() {
   PromptTemplate pt = new PromptTemplate("""
            Which share is the most % growth?
            The 0 element in the prices table is the latest price, while the last element is the oldest price.
            Return a full name of company instead of a market shortcut. 
            """);

   SearchRequest searchRequest = SearchRequest.builder()
            .query("""
            Find the most growth trends.
            The 0 element in the prices table is the latest price, while the last element is the oldest price.
            """)
            .topK(3)
            .similarityThreshold(0.7)
            .build();

   return this.chatClient.prompt(pt.create())
            .advisors(new QuestionAnswerAdvisor(store, searchRequest))
            .call()
            .content();
}
Java

Now, the results are more accurate. The model also returns the full name of companies instead of a market shortcut.

spring-ai-rag-model-response

Use RetrievalAugmentationAdvisor

Instead of the QuestionAnswerAdvisor class, we can also use the experimental RetrievalAugmentationAdvisor. It provides an out-of-the-box implementation for the most common RAG flows, based on a modular architecture. There are several built-in modules we can use with RetrievalAugmentationAdvisor. We will include the RewriteQueryTransformer module that uses LLM to rewrite a user query to provide better results when querying a target vector database. It requires the query and target placeholders to be present in the prompt template. Thanks to that transformer we can retrieve the optimal set of records for a percentage growth calculation.

@RestController
@RequestMapping("/stocks")
public class StockController {

    private final ObjectMapper mapper = new ObjectMapper();
    private final static Logger LOG = LoggerFactory.getLogger(StockController.class);
    private final ChatClient chatClient;
    private final RewriteQueryTransformer.Builder rqtBuilder;
    private final RestTemplate restTemplate;
    private final VectorStore store;

    @Value("${STOCK_API_KEY}")
    private String apiKey;

    public StockController(ChatClient.Builder chatClientBuilder,
                           VectorStore store,
                           RestTemplate restTemplate) {
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
        this.rqtBuilder = RewriteQueryTransformer.builder()
                .chatClientBuilder(chatClientBuilder);
        this.store = store;
        this.restTemplate = restTemplate;
    }
    
    // other methods ...
    
    @RequestMapping("/v2/most-growth-trend")
    String getBestTrendV2() {
        PromptTemplate pt = new PromptTemplate("""
                {query}.
                Which {target} is the most % growth?
                The 0 element in the prices table is the latest price, while the last element is the oldest price.
                """);

        Prompt p = pt.create(Map.of("query", "Find the most growth trends", "target", "share"));

        Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()
                .documentRetriever(VectorStoreDocumentRetriever.builder()
                        .similarityThreshold(0.7)
                        .topK(3)
                        .vectorStore(store)
                        .build())
                .queryTransformers(rqtBuilder.promptTemplate(pt).build())
                .build();

        return this.chatClient.prompt(p)
                .advisors(retrievalAugmentationAdvisor)
                .call()
                .content();
    }

}
Java

Once again, we can verify the AI model response by calling the GET /stocks/v2/most-growth-trend endpoint. The response is similar to those generated by the GET /stocks/v1-1/most-growth-trend endpoint.

Run the Application

Only to remind you. Before running the application, we should provide OpenAI and TwelveData API tokens.

$ export OPEN_AI_TOKEN=<YOUR_OPEN_AI_TOKEN>
$ export STOCK_API_KEY=<YOUR_STOCK_API_KEY>
$ mvn spring-boot:run
ShellSession

Final Thoughts

In this article you learned how to use an important AI technique called Retrieval Augmented Generation (RAG) with Spring AI. Spring AI simplifies RAG by providing built-in support for vector stores and easy data incorporation into the chat model through the Advisor API. However, since RetrievalAugmentationAdvisor is an experimental feature we cannot rule out some changes in future releases.

The post Using RAG and Vector Store with Spring AI appeared first on Piotr's TechBlog.

]]>
https://piotrminkowski.com/2025/02/24/using-rag-and-vector-store-with-spring-ai/feed/ 5 15538