Chroma 检索操作：Query 与 Get

原文：Query 与 Get 检索

一句话

Chroma 提供 Query API 用于向量相似性搜索和 Get API 用于基于 ID/过滤器的精确检索。

什么时候翻这页

当你需要从 Chroma 集合中检索数据时，无论是进行相似性搜索还是精确查找记录。

核心概念

Query API：执行最近邻相似性搜索，使用 embedding 进行向量相似性比较
Get API：通过 ID 和/或过滤器检索记录，不进行相似性排序
列主形式结果：结果以每个字段为数组的形式返回，而非每条记录为一个对象
批量查询：Query API 是批量 API，结果按输入查询分组
包含控制：通过 include 参数控制返回的数据类型

怎么做

Query 查询操作

使用 .query 方法执行相似性搜索：

# 使用文本查询（自动转换为 embedding）
collection.query(
    query_texts=["thus spake zarathustra", "the oracle speaks"]
)

# 直接使用 embedding 查询
collection.query(
    query_embeddings=[[11.1, 12.1, 13.1], [1.1, 2.3, 3.2]]
)

# 限制结果数量
collection.query(
    query_embeddings=[[11.1, 12.1, 13.1], [1.1, 2.3, 3.2]],
    n_results=100
)

# 限制搜索 ID 范围
collection.query(
    query_embeddings=[[11.1, 12.1, 13.1], [1.1, 2.3, 3.2]],
    n_results=100,
    ids=["id1", "id2"]
)

# 使用元数据过滤
collection.query(
    query_embeddings=[[11.1, 12.1, 13.1], [1.1, 2.3, 3.2]],
    n_results=100,
    where={"page": 10},  # 元数据字段 'page' 等于 10
    where_document={"$contains": "search string"}  # 文档包含搜索字符串
)

Get 检索操作

使用 .get 方法通过 ID 和/或过滤器检索记录：

# 通过 ID 检索
collection.get(ids=["id1", "id2"])

# 分页检索
collection.get(limit=100, offset=0)

结果处理

Query 结果按输入查询分组，需要双重迭代：

result = collection.query(query_texts=["first query", "second query"])
for ids, documents, metadatas in zip(result["ids"], result["documents"], result["metadatas"]):
    for id, document, metadata in zip(ids, documents, metadatas):
        print(id, document, metadata)

Get 结果是扁平列表，单次迭代：

result = collection.get(include=["documents", "metadatas"])
for id, document, metadata in zip(result["ids"], result["documents"], result["metadatas"]):
    print(id, document, metadata)

控制返回数据

使用 include 参数控制返回的数据类型：

# 指定返回的数据类型
collection.query(
    query_texts=["my query"],
    include=["documents", "metadatas", "embeddings"]
)

collection.get(include=["documents"])

命令 / API 速查

Python

# Query 方法
collection.query(
    query_texts=["text query"],  # 或 query_embeddings=[[0.1, 0.2, 0.3]]
    n_results=10,  # 默认值
    ids=["id1", "id2"],  # 可选，限制搜索 ID
    where={"metadata_field": "value"},  # 元数据过滤
    where_document={"$contains": "text"},  # 文档内容过滤
    include=["documents", "metadatas", "embeddings"]  # 返回的数据类型
)

# Get 方法
collection.get(
    ids=["id1", "id2"],  # 可选，通过 ID 检索
    where={"metadata_field": "value"},  # 元数据过滤
    limit=100,  # 限制返回数量
    offset=0,  # 分页偏移
    include=["documents", "metadatas"]  # 返回的数据类型
)

TypeScript

// Query 方法
await collection.query({
  queryTexts: ["text query"],  # 或 queryEmbeddings: [[0.1, 0.2, 0.3]]
  nResults: 10,  # 默认值
  ids: ["id1", "id2"],  # 可选，限制搜索 ID
  where: { metadata_field: "value" },  # 元数据过滤
  whereDocument: { $contains: "text" },  # 文档内容过滤
  include: ["documents", "metadatas", "embeddings"]  # 返回的数据类型
});

// Get 方法
await collection.get({
  ids: ["id1", "id2"],  # 可选，通过 ID 检索
  where: { metadata_field: "value" },  # 元数据过滤
  limit: 100,  # 限制返回数量
  offset: 0,  # 分页偏移
  include: ["documents", "metadatas"]  # 返回的数据类型
});

Rust

// Query 方法
let results = collection
    .query(
        vec![vec![0.1, 0.2, 0.3]],  # query_embeddings
        Some(10),  # n_results
        None,  # where
        None,  # ids
        None,  # include
    )
    .await?;

// Get 方法
let response = collection
    .get(
        Some(vec!["id1".to_string(), "id2".to_string()]),  # ids
        None,  # where
        Some(10),  # limit
        Some(0),  # offset
        Some(IncludeList::default_get()),  # include
    )
    .await?;

与 Hello-Agents / LangGraph / 本博客 handbook 索引的联系

在 Hello-Agents 记忆与检索章节中，我们学习了如何使用 Chroma 作为 vector store 存储和检索 embedding。本页内容扩展了这一知识，详细介绍了 Chroma 的两种检索方式：Query（相似性搜索）和 Get（精确检索）。在构建 RAG 系统时，Query API 通常用于检索与用户查询最相关的文档，而 Get API 可用于检索特定 ID 的文档，例如在对话历史管理中检索特定会话的记录。本博客 handbook 中已使用 Chroma 作为索引，这些检索方法正是实现 RAG 系统核心功能的基础。

初学者易错点

结果格式误解：Query 和 Get 的结果格式不同，Query 是按输入查询分组的二维结构，Get 是扁平的一维结构
混淆相似性搜索与精确检索：Query 执行相似性搜索并返回距离分数，Get 仅返回精确匹配的记录
忽略 embedding 维度匹配：直接使用 query_embeddings 时，必须确保其维度与集合中的 embedding 维度一致
元数据过滤语法错误：where 和 where_document 的语法需要严格遵循 Chroma 的规范
批量查询处理不当：Query API 返回的是批量结果，需要双重迭代处理

语义检索