本文介绍如何通过HTTP API在Collection中进行分组相似性检索。
前提条件
-
已创建Cluster:创建Cluster。
-
已获得API-KEY:API-KEY管理。
Method与URL
HTTP
POST https://{Endpoint}/v1/collections/{CollectionName}/query_group_by
使用示例
说明
-
需要使用您的api-key替换示例中的YOUR_API_KEY、您的Cluster Endpoint替换示例中的YOUR_CLUSTER_ENDPOINT,代码才能正常运行。
-
本示例需要参考分组向量检索提前创建好名称为
group_by_demo
的Collection,并插入部分数据。
根据向量进行分组相似性检索
Shell
l -XPOST \-H 'dashvector-auth-token: YOUR_API_KEY' \-H 'Content-Type: application/json' \-d '{"vector": [0.1, 0.2, 0.3, 0.4],"group_by_field": "document_id","group_topk": 1,"group_count": 3,"include_vector": true}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by
示例输出
{"code": 0,"request_id": "d6df634a-683d-445e-abe0-d547091d6b3a","message": "Success","output": [{"docs": [{"id": "4","vector": [0.621783971786499,0.5220040082931519,0.8403469920158386,0.995602011680603],"fields": {"document_id": "paper-02","content": "xxxD","chunk_id": 2},"score": 0.028402328}],"group_id": "paper-02"},{"docs": [{"id": "1","vector": [0.26870301365852356,0.8718249797821045,0.6066280007362366,0.6342290043830872],"fields": {"document_id": "paper-01","content": "xxxA","chunk_id": 1},"score": 0.08141637}],"group_id": "paper-01"},{"docs": [{"id": "6","vector": [0.661965012550354,0.730430006980896,0.6105219721794128,0.22164000570774078],"fields": {"document_id": "paper-03","content": "xxxF","chunk_id": 1},"score": 0.2513085}],"group_id": "paper-03"}]
}
根据主键(对应的向量)进行分组相似性检索
Shell
curl -XPOST \-H 'dashvector-auth-token: YOUR_API_KEY' \-H 'Content-Type: application/json' \-d '{"id": "1","group_by_field": "document_id","group_topk": 1,"group_count": 3,"include_vector": true}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by
带过滤条件的分组相似性检索
Shell
curl -XPOST \-H 'dashvector-auth-token: YOUR_API_KEY' \-H 'Content-Type: application/json' \-d '{"filter": "chunk_id > 1","group_by_field": "document_id","group_topk": 1,"group_count": 3,"include_vector": true}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query
带有Sparse Vector的分组向量检索
Shell
curl -XPOST \-H 'dashvector-auth-token: YOUR_API_KEY' \-H 'Content-Type: application/json' \-d '{"vector": [0.1, 0.2, 0.3, 0.4],"sparse_vector":{"1":0.4, "10000":0.6, "222222":0.8},"group_by_field": "document_id","group_topk": 1,"group_count": 3,"include_vector": true}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query
使用多向量集合的一个向量执行分组检索
curl -XPOST \-H 'dashvector-auth-token: YOUR_API_KEY' \-H 'Content-Type: application/json' \-d '{"vector": [0.1, 0.2, 0.3, 0.4],"group_by_field": "author","group_topk": 1,"group_count": 3,"include_vector": true,"vector_field": "title"
}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/multi_vector_demo/query_group_by# example output
#{
# "code": 0,
# "request_id": "b6f4997e-97e0-4d9b-9d3f-0659f4499305",
# "message": "Success",
# "output": [
# {
# "docs": [
# {
# "id": "2",
# "vectors": {
# "title": [
# 0.10000000149011612,
# 0.20000000298023224,
# 0.30000001192092896,
# 0.4000000059604645
# ]
# },
# "fields": {
# "author": "zhangsan"
# },
# "score": 0.0
# }
# ],
# "group_id": "zhangsan"
# },
# {
# "docs": [
# {
# "id": "1",
# "vectors": {
# "title": [
# 0.30000001192092896,
# 0.4000000059604645,
# 0.5,
# 0.6000000238418579
# ],
# "content": [
# 0.30000001192092896,
# 0.4000000059604645,
# 0.5,
# 0.6000000238418579,
# 0.699999988079071,
# 0.800000011920929
# ]
# },
# "fields": {
# "author": null
# },
# "score": 0.16000001
# }
# ]
# }
# ]
#}
#
入参描述
说明
vector
和id
两个入参需要二选一使用,并保证其中一个不为空。
参数 | Location | 类型 | 必填 | 说明 |
{Endpoint} | path | str | 是 | Cluster的Endpoint,可在控制台Cluster详情中查看 |
{CollectionName} | path | str | 是 | Collection名称 |
dashvector-auth-token | header | str | 是 | api-key |
group_by_field | body | str | 是 | 按指定字段的值来分组检索,目前不支持schema-free字段 |
group_count | body | int | 否 | 最多返回的分组个数,尽力而为参数,一般可以返回group_count个分组。 |
group_topk | body | int | 否 | 每个分组返回group_topk条相似性结果,尽力而为参数,优先级低于group_count。 |
vector | body | array | 否 | 向量数据 |
sparse_vector | body | dict | 否 | 稀疏向量 |
id | body | str | 否 | 主键,表示根据主键对应的向量进行相似性检索 |
filter | body | str | 否 | 过滤条件,需满足SQL where子句规范,详见 |
include_vector | body | bool | 否 | 是否返回向量数据,默认false |
output_fields | body | array | 否 | 返回field的字段名列表,默认返回所有Fields |
vector_field | body | str | 否 | 使用多向量检索的一个向量执行分组检索。 |
partition | body | str | 否 | Partition名称 |
出参描述
字段 | 类型 | 描述 | 示例 |
code | int | 返回值,参考返回状态码说明 | 0 |
message | str | 返回消息 | success |
request_id | str | 请求唯一id | 19215409-ea66-4db9-8764-26ce2eb5bb99 |
output | array | 分组相似性检索结果,Group列表 |