使用 Spring AI + Elasticsearch 让 RAG 变得简单

aidu_pl">

作者：来自 Elastic Laura Trotta

使用私人数据定制你的人工智能聊天机器人体验。

Spring AI 最近将 Elasticsearch 添加为向量存储，Elastic 团队为其提供了优化。我们很高兴展示使用 Spring AI 和 Elasticsearch 向量数据库（vector database）在 Java 中构建完整的 RAG 应用程序是多么简单和直观。开发人员现在可以将 Spring 的模块化功能与 Elasticsearch 的高级检索和 AI 工具结合使用，并快速构建用于企业用例的 Spring Boot 应用程序。

在此博客中，我们将使用 Spring AI 构建检索增强生成 (RAG) Java 应用程序，使用新的 Elasticsearch 向量存储集成进行文档存储和检索。你将学习如何配置 Maven 项目、设置所有必要的依赖项以及将 Elasticsearch 集成为向量存储。我们还将指导你阅读和标记 PDF 文档、将其发送到 Elasticsearch 以及使用 AI 模型对其进行查询以提供准确且上下文相关的信息。让我们开始吧！

免责声明

spring-ai-elasticsearch 工件仍处于技术预览阶段，仅在 Spring Milestones 存储库中可用。因此，在正式发布之前，我们不建议在任何生产环境中使用提供的代码。

先决条件

Elasticsearch 版本 >= 8.14.0
Java 版本 >= 17
SpringAI 支持的任何 LLM（完整列表）

用例：Runewars

Runewars 是一款小型游戏，其 40 页手册中解释了一套相当复杂的规则，如果在距离上一场比赛过去几年后再玩这款游戏，就意味着会忘记大部分规则。让我们尝试向 ChatGPT（版本 GPT-4o）询问一些复习内容：

这不仅是一般性的，而且是错误的：奖励卡必须对其他玩家隐藏。很明显，它不知道这个游戏的规则，所以让我们用规则（rules）来增强模型吧！

演示目标

拥有一个能够回答与 Runewars 规则相关的问题的 AI 聊天模型，并提供找到该信息的手册页和响应。用于完成所有这些工作的代码可在 Github 上找到。

项目配置

我们将使用 Apache Maven 作为构建工具创建一个新的 Java 项目，因此让我们相应地设置 POM，从添加 Milestones 和 Snapshot Spring 存储库开始，如 Spring AI 入门中所述：

  <repositories><repository><id>spring-milestones</id><name>Spring Milestones</name><url>https://repo.spring.io/milestone</url><snapshots><enabled>false</enabled></snapshots></repository><repository><id>spring-snapshots</id><name>Spring Snapshots</name><url>https://repo.spring.io/snapshot</url><releases><enabled>false</enabled></releases></repository></repositories>

我们还需要导入 Spring AI bom：

<dependencyManagement><dependencies><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-bom</artifactId><version>1.0.0-M3</version><type>pom</type><scope>import</scope></dependency></dependencies>
</dependencyManagement>

我们将依靠 Spring boot 自动配置来设置我们所需要的 bean：

<dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-spring-boot-autoconfigure</artifactId><version>1.0.0-SNAPSHOT</version>
</dependency>

现在介绍 Elasticsearch 和嵌入式模型（例如 OpenAI）的具体模块：

<dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-elasticsearch-store</artifactId><version>1.0.0-SNAPSHOT</version>
</dependency>
<dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-openai</artifactId><version>1.0.0-SNAPSHOT</version>
</dependency>

最后，Spring 还提供了一个用于获取游戏手册的 PDF 阅读器：

<dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-pdf-document-reader</artifactId><version>1.0.0-SNAPSHOT</version>
</dependency>

完整的 POM 可在此处找到。

Beans

运行应用程序所需的所有 Spring bean 都可以自动装配，因为在这种情况下，我们不需要任何需要自己创建 bean 的特定配置。我们唯一要做的就是向 application.properties 文件提供必要的信息：

spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.chat.client.enabled=truespring.elasticsearch.uris=${ES_SERVER_URL}
spring.elasticsearch.username=${ES_USERNAME}
spring.elasticsearch.password=${ES_PASSWORD}
spring.ai.vectorstore.elasticsearch.initialize-schema=true

如果正确设置了这些属性，Spring 框架将自动选择向量存储和嵌入/聊天模型类的正确实现。如果你使用不同的 LLM 执行此操作，请务必使用以下方法配置适当的向量维度：spring.ai.vectorstore.elasticsearch.dimensions。例如，OpenAI 的向量维度为 1536，这是默认值，因此我们不需要设置该属性。

有关所有可能的配置参数的更多信息，请参阅官方 Elasticsearch Vector Store 文档。

服务

首先，创建一个新的服务类（Service），其中向量存储和聊天客户端 bean 将自动装配：

@Service
public class RagService {private ElasticsearchVectorStore vectorStore;private ChatClient chatClient;public RagService(ElasticsearchVectorStore vectorStore, ChatClient.Builder clientBuilder) {this.vectorStore = vectorStore;this.chatClient = clientBuilder.build();}
}

它将有两个方法：

一种是从给定路径读取 PDF 文件，将其转换为 SpringAI 文档格式并将其发送到 Elasticsearch。
另一种是查询 Elasticsearch 中与问题相关的文档，然后将这些文档提供给 LLM，以便其给出准确的答复。

内容提取

让我们从第一个开始：

public void ingestPDF(String path) {// Spring AI utility class to read a PDF file page by pagePagePdfDocumentReader pdfReader = new PagePdfDocumentReader(path);List<Document> docbatch = pdfReader.read();// Sending batch of documents to vector store// applying tokenizerdocbatch = new TokenTextSplitter().apply(docbatch);vectorStore.doAdd(docbatch);
}

请注意，在发送到向量存储之前，这批文档如何经过拆分过程：这称为 “标记化（tokenization）”，这意味着文本被分成更小的标记，LLM 可以更有效地对其进行分类和管理。SpringAI 提供了 TokenTextSplitter，可以对其进行自定义以调整块的大小和所需的块数；在这种情况下，默认配置就足够了，因此我们的页面将被分成 800 个字符长的块。

这似乎太简单了，我们只是将字符串发送到数据库吗？与任何与 Spring 相关的事情一样，底层发生了很多事情，隐藏在高层次的抽象之下：文档被发送到嵌入模型进行嵌入，或转换为内容的数字表示，称为向量。文档及其相应的嵌入被索引到 Elasticsearch 向量数据库中，该数据库经过优化，可在提取和查询时处理此类数据。

查询

第二种方法将实现用户与聊天客户端的交互：

public String queryLLM(String question) {// Querying the vector store for documents related to the questionList<Document> vectorStoreResult =vectorStore.doSimilaritySearch(SearchRequest.query(question).withTopK(5).withSimilarityThreshold(0.0));// Merging the documents into a single stringString documents = vectorStoreResult.stream().map(Document::getContent).collect(Collectors.joining(System.lineSeparator()));// Setting the prompt with the contextString prompt = """You're assisting with providing the rules of the tabletop game Runewars.Use the information from the DOCUMENTS section to provide accurate answers to thequestion in the QUESTION section.If unsure, simply state that you don't know.DOCUMENTS:""" + documents+ """QUESTION:""" + question;// Calling the chat model with the questionString response = chatClient.prompt().user(prompt).call().content();return response +System.lineSeparator() +"Found at page: " +// Retrieving the first ranked page number from the document metadatavectorStoreResult.get(0).getMetadata().get(PagePdfDocumentReader.METADATA_START_PAGE_NUMBER) +" of the manual";
}

问题首先被发送到 Elasticsearch 向量存储，以便它可以回复它认为与查询更相关的文档。它是如何做到的？正如调用的方法所说，通过执行相似性搜索（similarity search），或者更详细地说，KNN 搜索：简而言之，将文档的嵌入与问题（也已嵌入）进行比较，并返回被认为更接近的嵌入。

在这种情况下，我们希望答案准确，这意味着我们不希望出现幻觉，这就是为什么 withSimilarityThreshold 参数设置为 0。此外，考虑到数据的性质（手册），我们知道不会有太多重复，所以我们希望在不超过 5 个不同的页面中找到我们想要的内容，因此 withTopK 参数设置为 5。

控制器

测试 Spring 服务最简单的方法是构建一个调用它的基本 RestController：

@RestController
@RequestMapping("rag")
public class RagController {private final RagService ragService;@Autowiredpublic RagController(RagService ragService) {this.ragService = ragService;}@PostMapping("/ingestPdf")public ResponseEntity ingestPDF(String path) {try {ragService.ingestPDF(path);return ResponseEntity.ok().body("Done!");} catch (Exception e) {System.out.println(e.getMessage());return ResponseEntity.internalServerError().build();}}@PostMapping("/query")public ResponseEntity query(String question) {try {String response = ragService.queryLLM(question);return ResponseEntity.ok().body(response);} catch (Exception e) {System.out.println(e.getMessage());return ResponseEntity.internalServerError().build();}}
}

运行 Elasticsearch

连接到 Elasticsearch Cloud 实例是测试应用程序的最快方法，但如果你无法访问它，也没关系！你可以使用 start-local 开始使用 Elasticsearch 的本地实例，这是一个利用 Docker 快速配置和运行服务器和 Kibana 实例的脚本。

curl -fsSL https://elastic.co/start-local | sh

运行应用程序

代码写完了！让我们在熟悉的 8080 端口上启动应用程序，并使用 curl 调用它（懒惰才是这里的关键主题）：

curl -XPOST "http://localhost:8080/rag/ingestPdf" --header "Content-Type: text/plain" --data "where-you-downloaded-the-pdf"

请记住，嵌入是一项昂贵的操作，使用功能较弱的 LLM 意味着此调用可能需要一段时间才能完成。

最后，我们一开始提出的问题：

curl -XPOST "http://localhost:8080/rag/query" --header "Content-Type: text/plain" --data "where do you place the reward card after obtaining it?"

In Runewars, after a hero receives a Reward card, the controlling player draws the top card from the Reward deck, looks at it, and places it facedown under the Hero card of the hero who received it. 
The player does not flip the Reward card faceup until they wish to use its ability. Found at page 27 of the manual.

很棒，不是吗？聊天机器人精通 Runewars 的复杂规则，随时准备回答我们的所有问题。

奖励：Ollama

得益于 SpringAI 的抽象，我们可以通过更改几行配置代码轻松使用另一种语言模型。让我们从 POM 依赖项开始，用 Ollama 的本地实例替换 OpenAI：

<dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-ollama</artifactId><version>1.0.0-SNAPSHOT</version>
</dependency>

然后是 application.properties 中的属性：

spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.init.pull-model-strategy=always
spring.ai.chat.client.enabled=truespring.elasticsearch.uris=${ES_SERVER_URL}
spring.elasticsearch.username=${ES_USERNAME}
spring.elasticsearch.password=${ES_PASSWORD}
spring.ai.vectorstore.elasticsearch.initialize-schema=true
spring.ai.vectorstore.elasticsearch.dimensions=1024

pull-model-strategy 属性将方便地为你提取默认模型，因此如果你已完全配置好所有内容，请确保将其设置为 never 以禁用它。还要记得检查正确的向量维度，例如，对于 mxbai-embed-large（默认的 Ollama 嵌入模型），它是 1024。

就是这样，其他一切都保持不变！当然，更改嵌入模型意味着也必须更改 Elasticsearch 索引，因为旧嵌入将与新嵌入不兼容。