作者:来自 Elastic Tomás Murúa
将阿里云 AI 服务功能与 Elastic 结合使用。
更多阅读,请参阅 “Elasticsearch:使用阿里 infererence API 及 semantic text 进行向量搜索”。
在本文中,我们将介绍如何将阿里云 AI 功能与 Elasticsearch 集成,以提高语义搜索的相关性。
阿里云人工智能搜索是一种将高级人工智能功能与 Elasticsearch 工具相结合的解决方案,利用 Qwen LLM/DeepSeek-R1 系列提供高级推理和分类模型。在本文中,我们将使用同一作者撰写的小说和戏剧的描述来测试阿里巴巴重新排名和稀疏嵌入端点。
步骤
- 配置阿里云AI
- 创建 Elasticsearch 映射
- 将数据索引到 Elasticsearch 中
- 查询数据
- 奖励:完成回答问题
配置阿里云AI
阿里云 AI 重新排名和嵌入
开放推理阿里云(Open inference Alibaba Cloud)提供不同的服务。在此示例中,我们将使用阿加莎·克里斯蒂 (Agatha Christie) 的流行书籍和戏剧的描述来测试阿里云在语义搜索中的嵌入和重新排名端点。
阿里云 AI 重排名端点是一种语义重排名(semantic reranking)功能。这种重新排名使用机器学习模型根据搜索结果与查询的语义相似性对其进行重新排序。这使你可以在现有的全文搜索索引上使用开箱即用的语义搜索功能。
稀疏嵌入(sparse embedding)端点是一种大多数值为零的嵌入类型,使得相关信息更加突出。
获取阿里云 API Key
我们需要一个有效的 API 密钥来将阿里巴巴与 Elasticsearch 集成。要获取它,请按照下列步骤操作:
- 从服务广场部分访问阿里云门户。
- 转到左侧菜单 API Keys,如下所示。
- 生成一个新的 API 密钥。
配置阿里巴巴端点
我们首先配置稀疏嵌入端点,将文本描述转换为语义向量:
嵌入端点:
PUT _inference/sparse_embedding/alibabacloud_ai_search_sparse
{"service": "alibabacloud-ai-search","service_settings": {"api_key": "<api_key>","service_id": "ops-text-sparse-embedding-001","host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com","workspace": "default"}
}
然后我们将配置重新排序端点来重新组织结果。
重新排序端点:
PUT _inference/rerank/alibabacloud_ai_search_rerank
{"service": "alibabacloud-ai-search","service_settings": {"api_key": "<api_key>","service_id": "ops-bge-reranker-larger","host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com","workspace": "default"}
}
现在端点已经配置完毕,我们可以准备 Elasticsearch 索引。
创建 Elasticsearch 映射
让我们配置映射。为此,我们需要组织带有描述的文本以及模型生成的向量。
我们将使用以下属性:
- semantic_description:存储模型生成的嵌入并运行语义搜索。
- description:我们将使用 “text” 类型来存储小说(novels)和戏剧(plays)的描述,并使用它们进行全文搜索。
我们将包含 copy_to 参数,以便文本和语义字段均可用于混合搜索:
PUT arts
{"mappings": {"properties": {"semantic_description": {"type": "semantic_text","inference_id": "alibabacloud_ai_search_sparse"},"description": {"type": "text","copy_to": "semantic_description"}}}
}
映射准备好后,我们现在可以索引数据。
将数据索引到 Elasticsearch 中
这是我们将在本示例中使用的包含描述的数据集。我们将使用 Elasticsearch Bulk API 对其进行索引。
POST arts/_bulk
{ "index": {} }
{ "description": " Black Coffee is a play by the British crime-fiction author Agatha Christie. In the play, a scientist discovers that someone in his household has stolen the formula for an explosive." }
{ "index": {} }
{ "description": "The Mousetrap is a murder mystery play by Agatha Christie. The play opened in London's West End in 1952 and ran continuously until 16 March 2020." }
{ "index": {} }
{ "description": "The Body in the Murder is a Miss Marple mystery novel published by Agatha Christie in 1942. The case involves the murder of two teenage girls who are similar in appearance." }
{ "index": {} }
{ "description": " Agatha Christie's last published novel before she passed, Curtain: Poirot's Last Case is also her indelible detective's last appearance. Poirot and Hastings return to the very same house from The Mysterious Affairs at Styles over 30 years later." }
{ "index": {} }
{ "description": " Death on the Nile is Agatha Christie's most daring travel mystery novel. The tranquillity of a cruise along the Nile is shattered by the discovery that Linnet Ridgeway has been shot through the head." }
{ "index": {} }
{ "description": " The Murder of Roger Ackroyd was Agatha Christie’s first book to be published by William Collins in the spring of 1926. William Collins became part of HarperCollins and are still Christie’s publishers today." }
请注意,前两篇文献《Black Coffee - 黑咖啡》和《The Mousetraps - 捕鼠器》是戏剧(plays),而其他的是小说(novels)。
查询数据
为了查看不同类型查询的结果,我们将依次运行不同的查询类型,首先进行语义查询,然后应用重新排序,最后结合两者。我们将使用相同的问题:"Which novel was written by Agatha Christie?"(阿加莎·克里斯蒂写了哪部小说?),期望获得三个明确提到 “novel” 的文档,以及一个包含 “book” 的文档。同时,两部戏剧(plays)应排在最后。
语义搜索
我们将开始查询 semantic_text 字段来询问:“Which novel was written by Agatha Christie?” 让我们看看会发生什么:
GET /arts/_search
{"_source": {"includes": ["description"]},"query": {"semantic": {"field": "semantic_description","query": "Which novel was written by Agatha Christie?"}}
}
响应是:
{"took": 1246,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 6,"relation": "eq"},"max_score": 0.1759066,"hits": [{"_index": "arts","_id": "rdJ4-ZMB36zj9EVTnMgJ","_score": 0.1759066,"_source": {"description": " Death on the Nile is Agatha Christie's most daring travel mystery novel. The tranquillity of a cruise along the Nile is shattered by the discovery that Linnet Ridgeway has been shot through the head."}},{"_index": "arts","_id": "rNJ4-ZMB36zj9EVTnMgJ","_score": 0.17499167,"_source": {"description": " Agatha Christie's last published novel before she passed, Curtain: Poirot's Last Case is also her indelible detective's last appearance. Poirot and Hastings return to the very same house from The Mysterious Affairs at Styles over 30 years later."}},{"_index": "arts","_id": "q9J4-ZMB36zj9EVTnMgJ","_score": 0.16319725,"_source": {"description": "The Body in the Murder is a Miss Marple mystery novel published by Agatha Christie in 1942. The case involves the murder of two teenage girls who are similar in appearance."}},{"_index": "arts","_id": "qtJ4-ZMB36zj9EVTnMgJ","_score": 0.15506727,"_source": {"description": "The Mousetrap is a murder mystery play by Agatha Christie. The play opened in London's West End in 1952 and ran continuously until 16 March 2020."}},{"_index": "arts","_id": "qdJ4-ZMB36zj9EVTnMgJ","_score": 0.14572844,"_source": {"description": " Black Coffee is a play by the British crime-fiction author Agatha Christie. In the play, a scientist discovers that someone in his household has stolen the formula for an explosive."}},{"_index": "arts","_id": "rtJ4-ZMB36zj9EVTnMgJ","_score": 0.13951442,"_source": {"description": " The Murder of Roger Ackroyd was Agatha Christie’s first book to be published by William Collins in the spring of 1926. William Collins became part of HarperCollins and are still Christie’s publishers today."}}]}
}
在这种情况下,响应优先考虑了大多数小说,但写着 “book” 的文档出现在最后。我们仍然可以通过重新排序来进一步优化结果。
通过重新排序优化结果
在这种情况下,我们将使用 _inference/rerank 请求来评估我们在第一个查询中获得的文档并提高它们在结果中的排名。
POST _inference/rerank/alibabacloud_ai_search_rerank
{"query": "Which novel was written by Agatha Christie?","input": ["Black Coffee is a play by the British crime-fiction author Agatha Christie. In the play, a scientist discovers that someone in his household has stolen the formula for an explosive.","The Mousetrap is a murder mystery play by Agatha Christie. The play opened in London's West End in 1952 and ran continuously until 16 March 2020."," The Body in the Murder is a Miss Marple mystery novel published by Agatha Christie in 1942. The case involves the murder of two teenage girls who are similar in appearance."," Agatha Christie's last published novel before she passed, Curtain: Poirot's Last Case is also her indelible detective's last appearance. Poirot and Hastings return to the very same house from The Mysterious Affairs at Styles over 30 years later."," Death on the Nile is Agatha Christie's most daring travel mystery novel. The tranquillity of a cruise along the Nile is shattered by the discovery that Linnet Ridgeway has been shot through the head."," The Murder of Roger Ackroyd was Agatha Christie’s first book to be published by William Collins in the spring of 1926. William Collins became part of HarperCollins and are still Christie’s publishers today."]
}
响应是:
{"rerank": [{"index": 3,"relevance_score": 0.91086304},{"index": 4,"relevance_score": 0.8409133},{"index": 2,"relevance_score": 0.76838577},{"index": 5,"relevance_score": 0.2295352},{"index": 0,"relevance_score": 0.13846178},{"index": 1,"relevance_score": 0.06620602}]
}
这里的回应表明,这两部剧现在都处于结果的底部。
语义搜索和重新排名端点相结合
使用检索器,我们将语义查询和重新排序合并到一个步骤中:
POST /arts/_search
{"_source": {"includes": ["description"]},"retriever": {"text_similarity_reranker": {"retriever": {"standard": {"query": {"semantic": {"field": "semantic_description","query": "Which novel was written by Agatha Christie?"}}}},"field": "description","rank_window_size": 10,"inference_id": "alibabacloud_ai_search_rerank","inference_text": "Which novel was written by Agatha Christie?"}}
}
响应是:
"took": 1568,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 6,"relation": "eq"},"max_score": 0.91086304,"hits": [{"_index": "arts","_id": "rNJ4-ZMB36zj9EVTnMgJ","_score": 0.91086304,"_source": {"description": " Agatha Christie's last published novel before she passed, Curtain: Poirot's Last Case is also her indelible detective's last appearance. Poirot and Hastings return to the very same house from The Mysterious Affairs at Styles over 30 years later."}},{"_index": "arts","_id": "rdJ4-ZMB36zj9EVTnMgJ","_score": 0.8409133,"_source": {"description": " Death on the Nile is Agatha Christie's most daring travel mystery novel. The tranquillity of a cruise along the Nile is shattered by the discovery that Linnet Ridgeway has been shot through the head."}},{"_index": "arts","_id": "q9J4-ZMB36zj9EVTnMgJ","_score": 0.76838577,"_source": {"description": "The Body in the Murder is a Miss Marple mystery novel published by Agatha Christie in 1942. The case involves the murder of two teenage girls who are similar in appearance."}},{"_index": "arts","_id": "rtJ4-ZMB36zj9EVTnMgJ","_score": 0.2295352,"_source": {"description": " The Murder of Roger Ackroyd was Agatha Christie’s first book to be published by William Collins in the spring of 1926. William Collins became part of HarperCollins and are still Christie’s publishers today."}},{"_index": "arts","_id": "qdJ4-ZMB36zj9EVTnMgJ","_score": 0.13846178,"_source": {"description": " Black Coffee is a play by the British crime-fiction author Agatha Christie. In the play, a scientist discovers that someone in his household has stolen the formula for an explosive."}},{"_index": "arts","_id": "qtJ4-ZMB36zj9EVTnMgJ","_score": 0.06620602,"_source": {"description": "The Mousetrap is a murder mystery play by Agatha Christie. The play opened in London's West End in 1952 and ran continuously until 16 March 2020."}}]}
}
这里的结果与语义查询有所不同。我们可以看到,尽管文档中没有与 "novel" 完全匹配的内容,但包含 "book"(如 The Murder of Roger Ackroyd)的文档在排名中比第一次语义搜索时更靠前。此外,两部戏剧仍然排在最后,就像重新排序时一样。
奖励:使用 completion 来完成回答问题
通过嵌入和重新排名,我们可以满足搜索查询,但用户仍然会看到所有搜索结果而不是实际答案。
通过提供的示例,我们距离 RAG 实现只有一步之遥,我们可以将最佳结果 + 问题提供给 LLM 以获得正确答案。
幸运的是,阿里云AI服务还提供了一个 completion 端点服务,我们可以利用它来实现这一目的。
让我们创建端点
使用阿里 QWen 创建 Completion 终点:
PUT _inference/completion/alibabacloud_ai_search_completion
{"service": "alibabacloud-ai-search","service_settings": {"host" : "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com","api_key": "<api_key>","service_id": "ops-qwen-turbo","workspace" : "default"}
}
我们也可以使用 deepseek-r1 来创建:
PUT _inference/completion/alibabacloud_ai_search_completion_deepseek_r1
{"service": "alibabacloud-ai-search","service_settings": {"host" : "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com","api_key": "{{API_KEY}}","service_id": "deepseek-r1","workspace" : "default"}
}
现在,发送上一个查询的结果和问题:
使用阿里 QWen 来进行查询
POST _inference/completion/alibabacloud_ai_search_completion
{"input": """Answer the following question using the context provided:QUESTION: Which novel was written by Agatha Christie?CONTEXT:DOCUMENT1Black Coffee is a play by the British crime-fiction author Agatha Christie. In the play, a scientist discovers that someone in his household has stolen the formula for an explosive.DOCUMENT2The Mousetrap is a murder mystery play by Agatha Christie. The play opened in London's West End in 1952 and ran continuously until 16 March 2020.DOCUMENT3The Body in the Murder is a Miss Marple mystery novel published by Agatha Christie in 1942. The case involves the murder of two teenage girls who are similar in appearance.DOCUMENT4Agatha Christie's last published novel before she passed, Curtain: Poirot's Last Case is also her indelible detective's last appearance. Poirot and Hastings return to the very same house from The Mysterious Affairs at Styles over 30 years later.DOCUMENT5Death on the Nile is Agatha Christie's most daring travel mystery novel. The tranquillity of a cruise along the Nile is shattered by the discovery that Linnet Ridgeway has been shot through the head."DOCUMENT6The Murder of Roger Ackroyd was Agatha Christie’s first book to be published by William Collins in the spring of 1926. William Collins became part of HarperCollins and are still Christie’s publishers today.ANSWER:"""
}
响应是:
{"completion": [
{"result": "Agatha Christie wrote several novels, including \"The Body in the Murder,\" \"Curtain: Poirot's Last Case,\" \"Death on the Nile,\" and \"The Murder of Roger Ackroyd.\""}]
}
使用阿里 deepseek-r1 来进行查询
POST _inference/completion/alibabacloud_ai_search_completion_deepseek_r1?timeout=180s
{"input": "<|system|>你是一个机器人助手.</s><|user|>CONTEXT:Black Coffee is a play by the British crime-fiction author Agatha Christie. In the play, a scientist discovers that someone in his household has stolen the formula for an explosive;The Mousetrap is a murder mystery play by Agatha Christie. The play opened in London's West End in 1952 and ran continuously until 16 March 2020;The Body in the Murder is a Miss Marple mystery novel published by Agatha Christie in 1942. The case involves the murder of two teenage girls who are similar in appearance;Agatha Christie's last published novel before she passed, Curtain: Poirot's Last Case is also her indelible detective's last appearance. Poirot and Hastings return to the very same house from The Mysterious Affairs at Styles over 30 years later;Death on the Nile is Agatha Christie's most daring travel mystery novel. The tranquillity of a cruise along the Nile is shattered by the discovery that Linnet Ridgeway has been shot through the head;The Murder of Roger Ackroyd was Agatha Christie’s first book to be published by William Collins in the spring of 1926. William Collins became part of HarperCollins and are still Christie’s publishers today;QUESTION: Which novela were written by Agatha Christie?</s><|assistant|>"
}
注:由于 DeepSeek 的推理时间比较长,所以,我们把 timeout 参数设置为 180s。
推理的结果如下:
{"completion": [{"result": """<think>
Okay, let's see. The user is asking which novels were written by Agatha Christie based on the given context. First, I need to go through each item in the context and determine if it's a novel. The user mentioned "novela," which I think is Spanish for "novel," so they're asking about novels, not plays or other works.Looking at the context entries one by one:1. **Black Coffee** is described as a play by Christie. So that's a play, not a novel. Exclude.2. **The Mousetrap** is a murder mystery play, opened in London's West End. Definitely a play, not a novel. Exclude.3. **The Body in the Murder** is listed as a Miss Marple mystery novel published in 1942. Wait, the title here might be a bit off. Agatha Christie wrote a novel called "The Body in the Library," which is a Miss Marple story from 1942. Maybe the user made a typo. Assuming it's "The Body in the Library," then yes, that's a novel. But the title given is "The Body in the Murder," which I don't recall. Need to check if that's a real title or a mistake. However, since the context says it's a Miss Marple novel published in 1942, I'll proceed with that, even if the title is slightly wrong. So include as a novel.4. **Curtain: Poirot's Last Case** is mentioned as her last published novel before she passed. So that's a novel. Include.5. **Death on the Nile** is described as a travel mystery novel. That's a novel. Include.6. **The Murder of Roger Ackroyd** was her first book published by William Collins. That's a novel. Include.So the novels listed here are: The Body in the Murder (assuming typo), Curtain, Death on the Nile, and The Murder of Roger Ackroyd. However, "The Body in the Murder" might actually be "The Body in the Library," which is the correct title. But since the user provided that exact title, I should list it as given, even if there's an error. Alternatively, note the possible typo.Also, check if there are other works mentioned. The other entries are plays. So the answer should list the four novels mentioned in the context, being careful with the title accuracy.
</think>The novels written by Agatha Christie mentioned in the context are: 1. **The Body in the Murder** (likely a typo for *The Body in the Library*, a Miss Marple novel published in 1942).
2. **Curtain: Poirot's Last Case** (her final published novel featuring Hercule Poirot).
3. **Death on the Nile** (a travel mystery novel set on a Nile cruise).
4. **The Murder of Roger Ackroyd** (her breakthrough novel published in 1926). *Note*:
- *Black Coffee* and *The Mousetrap* are plays, not novels.
- If "The Body in the Murder" is intended to refer to *The Body in the Library*, the latter is the correct title of Christie's 1942 Miss Marple novel."""}]
}
结论
将阿里云 AI 搜索与 Elasticsearch 集成,使我们能够轻松访问完成、嵌入和重新排名模型,并将其合并到我们的搜索管道中。
我们可以借助检索器单独或一起使用重新排序和嵌入端点。
我们还可以引入 completion 端点来完成 RAG 端到端实现。
想要获得 Elastic 认证吗?了解下一期 Elasticsearch 工程师培训何时举行!
Elasticsearch 包含许多新功能,可帮助你为你的用例构建最佳的搜索解决方案。深入了解我们的示例笔记本以了解更多信息,开始免费云试用,或立即在本地机器上试用 Elastic。
原文:Embeddings and reranking with Alibaba Cloud AI Service - Elasticsearch Labs