如何通过HTTP API分组检索Doc

本文介绍如何通过HTTP API在Collection中进行分组相似性检索。

前提条件

已创建Cluster：创建Cluster。
已获得API-KEY：API-KEY管理。

Method与URL

HTTP

POST https://{Endpoint}/v1/collections/{CollectionName}/query_group_by

使用示例

说明

需要使用您的api-key替换示例中的YOUR_API_KEY、您的Cluster Endpoint替换示例中的YOUR_CLUSTER_ENDPOINT，代码才能正常运行。
本示例需要参考分组向量检索提前创建好名称为group_by_demo的Collection，并插入部分数据。

根据向量进行分组相似性检索

Shell

l -XPOST \-H 'dashvector-auth-token: YOUR_API_KEY' \-H 'Content-Type: application/json' \-d '{"vector": [0.1, 0.2, 0.3, 0.4],"group_by_field": "document_id","group_topk": 1,"group_count": 3,"include_vector": true}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by

示例输出

{"code": 0,"request_id": "d6df634a-683d-445e-abe0-d547091d6b3a","message": "Success","output": [{"docs": [{"id": "4","vector": [0.621783971786499,0.5220040082931519,0.8403469920158386,0.995602011680603],"fields": {"document_id": "paper-02","content": "xxxD","chunk_id": 2},"score": 0.028402328}],"group_id": "paper-02"},{"docs": [{"id": "1","vector": [0.26870301365852356,0.8718249797821045,0.6066280007362366,0.6342290043830872],"fields": {"document_id": "paper-01","content": "xxxA","chunk_id": 1},"score": 0.08141637}],"group_id": "paper-01"},{"docs": [{"id": "6","vector": [0.661965012550354,0.730430006980896,0.6105219721794128,0.22164000570774078],"fields": {"document_id": "paper-03","content": "xxxF","chunk_id": 1},"score": 0.2513085}],"group_id": "paper-03"}]
}

根据主键（对应的向量）进行分组相似性检索

Shell

curl -XPOST \-H 'dashvector-auth-token: YOUR_API_KEY' \-H 'Content-Type: application/json' \-d '{"id": "1","group_by_field": "document_id","group_topk": 1,"group_count": 3,"include_vector": true}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by

带过滤条件的分组相似性检索

Shell

curl -XPOST \-H 'dashvector-auth-token: YOUR_API_KEY' \-H 'Content-Type: application/json' \-d '{"filter": "chunk_id > 1","group_by_field": "document_id","group_topk": 1,"group_count": 3,"include_vector": true}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query

带有Sparse Vector的分组向量检索

Shell

curl -XPOST \-H 'dashvector-auth-token: YOUR_API_KEY' \-H 'Content-Type: application/json' \-d '{"vector": [0.1, 0.2, 0.3, 0.4],"sparse_vector":{"1":0.4, "10000":0.6, "222222":0.8},"group_by_field": "document_id","group_topk": 1,"group_count": 3,"include_vector": true}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query

使用多向量集合的一个向量执行分组检索

curl -XPOST \-H 'dashvector-auth-token: YOUR_API_KEY' \-H 'Content-Type: application/json' \-d '{"vector": [0.1, 0.2, 0.3, 0.4],"group_by_field": "author","group_topk": 1,"group_count": 3,"include_vector": true,"vector_field": "title"
}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/multi_vector_demo/query_group_by# example output
#{
#    "code": 0,
#    "request_id": "b6f4997e-97e0-4d9b-9d3f-0659f4499305",
#    "message": "Success",
#    "output": [
#        {
#            "docs": [
#                {
#                    "id": "2",
#                    "vectors": {
#                        "title": [
#                            0.10000000149011612,
#                            0.20000000298023224,
#                            0.30000001192092896,
#                            0.4000000059604645
#                        ]
#                    },
#                    "fields": {
#                        "author": "zhangsan"
#                    },
#                    "score": 0.0
#                }
#            ],
#            "group_id": "zhangsan"
#        },
#        {
#            "docs": [
#                {
#                    "id": "1",
#                    "vectors": {
#                        "title": [
#                            0.30000001192092896,
#                            0.4000000059604645,
#                            0.5,
#                            0.6000000238418579
#                        ],
#                        "content": [
#                            0.30000001192092896,
#                            0.4000000059604645,
#                            0.5,
#                            0.6000000238418579,
#                            0.699999988079071,
#                            0.800000011920929
#                        ]
#                    },
#                    "fields": {
#                        "author": null
#                    },
#                    "score": 0.16000001
#                }
#            ]
#        }
#    ]
#}
#

入参描述

说明

vector和id两个入参需要二选一使用，并保证其中一个不为空。

参数	Location	类型	必填	说明
{Endpoint}	path	str	是	Cluster的Endpoint，可在控制台Cluster详情中查看
{CollectionName}	path	str	是	Collection名称
dashvector-auth-token	header	str	是	api-key
group_by_field	body	str	是	按指定字段的值来分组检索，目前不支持schema-free字段
group_count	body	int	否	最多返回的分组个数，尽力而为参数，一般可以返回group_count个分组。
group_topk	body	int	否	每个分组返回group_topk条相似性结果，尽力而为参数，优先级低于group_count。
vector	body	array	否	向量数据
sparse_vector	body	dict	否	稀疏向量
id	body	str	否	主键，表示根据主键对应的向量进行相似性检索
filter	body	str	否	过滤条件，需满足SQL where子句规范，详见
include_vector	body	bool	否	是否返回向量数据，默认false
output_fields	body	array	否	返回field的字段名列表，默认返回所有Fields
vector_field	body	str	否	使用多向量检索的一个向量执行分组检索。
partition	body	str	否	Partition名称

出参描述

字段	类型	描述	示例
code	int	返回值，参考返回状态码说明	0
message	str	返回消息	success
request_id	str	请求唯一id	19215409-ea66-4db9-8764-26ce2eb5bb99
output	array	分组相似性检索结果，Group列表