`langchain-chroma`¶

注意

此软件包参考尚未完全迁移到 v1。

langchain_chroma ¶

用于 Chroma 向量数据库的 LangChain 集成。

Chroma ¶

Bases: VectorStore

Chroma 向量存储集成。

设置

安装 chromadb、langchain-chroma 包

pip install -qU chromadb langchain-chroma

关键初始化参数 — 索引参数：collection_name：集合的名称。embedding_function：要使用的嵌入函数。

关键初始化参数 — 客户端参数：client：要使用的 Chroma 客户端。client_settings：Chroma 客户端设置。persist_directory：用于持久化集合的目录。host：已部署的 Chroma 服务器的主机名。port：已部署的 Chroma 服务器的连接端口。默认为 8000。ssl：是否与已部署的 Chroma 服务器建立 SSL 连接。默认为 False。headers：要发送到已部署的 Chroma 服务器的 HTTP 标头。chroma_cloud_api_key：Chroma Cloud API 密钥。tenant：租户 ID。Chroma Cloud 连接必需。对于本地 Chroma 服务器，默认为 'default_tenant'。database：数据库名称。Chroma Cloud 连接必需。默认为 'default_database'。

实例化

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vector_store = Chroma(
    collection_name="foo",
    embedding_function=OpenAIEmbeddings(),
    # other params...
)

添加文档

from langchain_core.documents import Document

document_1 = Document(page_content="foo", metadata={"baz": "bar"})
document_2 = Document(page_content="thud", metadata={"bar": "baz"})
document_3 = Document(page_content="i will be deleted :(")

documents = [document_1, document_2, document_3]
ids = ["1", "2", "3"]
vector_store.add_documents(documents=documents, ids=ids)

更新文档

updated_document = Document(
    page_content="qux",
    metadata={"bar": "baz"},
)

vector_store.update_documents(ids=["1"], documents=[updated_document])

删除文档

vector_store.delete(ids=["3"])

搜索

results = vector_store.similarity_search(query="thud", k=1)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

*thud[{"baz": "bar"}]

带筛选的搜索

results = vector_store.similarity_search(
    query="thud", k=1, filter={"baz": "bar"}
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

*foo[{"baz": "bar"}]

带分数的搜索

results = vector_store.similarity_search_with_score(query="qux", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

* [SIM=0.000000] qux [{'bar': 'baz', 'baz': 'bar'}]

异步

# add documents
# await vector_store.aadd_documents(documents=documents, ids=ids)

# delete documents
# await vector_store.adelete(ids=["3"])

# search
# results = vector_store.asimilarity_search(query="thud",k=1)

# search with score
results = await vector_store.asimilarity_search_with_score(query="qux", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

* [SIM=0.335463] foo [{'baz': 'bar'}]

用作检索器

retriever = vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 1, "fetch_k": 2, "lambda_mult": 0.5},
)
retriever.invoke("thud")

[Document(metadata={"baz": "bar"}, page_content="thud")]

方法	描述
`aget_by_ids`	通过 ID 异步获取文档。
`adelete`	按向量 ID 或其他条件异步删除。
`aadd_texts`	通过嵌入异步运行更多文本并添加到 `VectorStore`。
`add_documents`	在 `VectorStore` 中添加或更新文档。
`aadd_documents`	异步运行更多文档通过嵌入并添加到 `VectorStore`。
`search`	使用指定的搜索类型返回与查询最相似的文档。
`asearch`	异步返回与查询最相似的文档，使用指定的搜索类型。
`asimilarity_search_with_score`	异步运行带距离的相似性搜索。
`similarity_search_with_relevance_scores`	返回文档和在 `[0, 1]` 范围内的相关性分数。
`asimilarity_search_with_relevance_scores`	异步返回文档和在 `[0, 1]` 范围内的相关性分数。
`asimilarity_search`	异步返回与查询最相似的文档。
`asimilarity_search_by_vector`	异步返回与嵌入向量最相似的文档。
`amax_marginal_relevance_search`	异步返回使用最大边际相关性选择的文档。
`amax_marginal_relevance_search_by_vector`	异步返回使用最大边际相关性选择的文档。
`afrom_documents`	异步返回从文档和嵌入初始化的 `VectorStore`。
`afrom_texts`	异步返回从文本和嵌入初始化的 `VectorStore`。
`as_retriever`	返回从此 `VectorStore` 初始化的 `VectorStoreRetriever`。
`__init__`	使用 Chroma 客户端进行初始化。
`encode_image`	从图像 URI 获取 base64 字符串。
`fork`	派生此向量存储。
`add_images`	通过嵌入处理更多图像并将其添加到 `VectorStore` 中。
`add_texts`	通过嵌入运行更多文本并添加到 `VectorStore`。
`similarity_search`	使用 Chroma 运行相似性搜索。
`similarity_search_by_vector`	返回与嵌入向量最相似的文档。
`similarity_search_by_vector_with_relevance_scores`	返回与嵌入向量最相似的文档和相似度分数。
`similarity_search_with_score`	使用 Chroma 运行带距离的相似性搜索。
`similarity_search_with_vectors`	使用 Chroma 运行带向量的相似性搜索。
`similarity_search_by_image`	根据给定的图像 URI 搜索相似的图像。
`similarity_search_by_image_with_relevance_score`	根据给定的图像 URI 搜索相似的图像。
`max_marginal_relevance_search_by_vector`	返回使用最大边际相关性选择的文档。
`max_marginal_relevance_search`	返回使用最大边际相关性选择的文档。
`delete_collection`	删除集合。
`reset_collection`	重置集合。
`get`	获取集合。
`get_by_ids`	根据 ID 获取文档。
`update_document`	更新集合中的一个文档。
`update_documents`	更新集合中的一个文档。
`from_texts`	从原始文档创建 Chroma 向量存储。
`from_documents`	从文档列表创建 Chroma 向量存储。
`delete`	通过向量 ID 删除。

embeddings `属性` ¶

embeddings: Embeddings | None

访问查询嵌入对象。

aget_by_ids `异步` ¶

aget_by_ids(ids: Sequence[str]) -> list[Document]

通过 ID 异步获取文档。

返回的文档应将其 ID 字段设置为文档在向量存储中的 ID。

如果某些 ID 未找到或存在重复的 ID，返回的文档数量可能少于请求的数量。

用户不应假设返回文档的顺序与输入 ID 的顺序相匹配。相反，用户应依赖于返回文档的 ID 字段。

如果某些 ID 未找到文档，此方法不应**抛出异常**。

参数	描述
`ids`	要检索的 ID 列表。类型： `Sequence[str]`

返回	描述
`list[Document]`	`Document` 对象列表。

adelete `异步` ¶

adelete(ids: list[str] | None = None, **kwargs: Any) -> bool | None

按向量 ID 或其他条件异步删除。

参数	描述
`ids`	要删除的 ID 列表。如果为 `None`，则删除所有。类型: `list[str] \| None` 默认值: `None`
`**kwargs`	子类可能使用的其他关键字参数。类型： `Any` 默认值： `{}`

返回	描述
`bool \| None`	如果删除成功，则为 `True`，否则为 `False`，如果未实现，则为 `None`。

aadd_texts `异步` ¶

aadd_texts(
    texts: Iterable[str],
    metadatas: list[dict] | None = None,
    *,
    ids: list[str] | None = None,
    **kwargs: Any,
) -> list[str]

通过嵌入异步运行更多文本并添加到 `VectorStore`。

参数	描述
`texts`	要添加到 `VectorStore` 的字符串的可迭代对象。类型： `Iterable[str]`
`metadatas`	与文本关联的元数据可选列表。类型： `list[dict] \| None` 默认值： `None`
`ids`	可选列表类型: `list[str] \| None` 默认值: `None`
`**kwargs`	`VectorStore` 特定参数。类型： `Any` 默认值： `{}`

返回	描述
`list[str]`	将文本添加到 `VectorStore` 后返回的 ID 列表。

引发	描述
`ValueError`	如果元数据的数量与文本的数量不匹配。
`ValueError`	如果 ID 的数量与文本的数量不匹配。

add_documents ¶

add_documents(documents: list[Document], **kwargs: Any) -> list[str]

在 `VectorStore` 中添加或更新文档。

参数	描述
`documents`	要添加到 `VectorStore` 的文档。 TYPE: `list[Document]`
`**kwargs`	附加的关键字参数。如果 kwargs 包含 ID 并且文档也包含 ID，则 kwargs 中的 ID 将优先。类型： `Any` 默认值： `{}`

返回	描述
`list[str]`	已添加文本的 ID 列表。

aadd_documents `异步` ¶

aadd_documents(documents: list[Document], **kwargs: Any) -> list[str]

异步运行更多文档通过嵌入并添加到 `VectorStore`。

参数	描述
`documents`	要添加到 `VectorStore` 的文档。 TYPE: `list[Document]`
`**kwargs`	附加的关键字参数。类型： `Any` 默认值： `{}`

返回	描述
`list[str]`	已添加文本的 ID 列表。

search ¶

search(query: str, search_type: str, **kwargs: Any) -> list[Document]

使用指定的搜索类型返回与查询最相似的文档。

参数	描述
`query`	输入文本。类型： `str`
`search_type`	要执行的搜索类型。可以是 `'similarity'`、`'mmr'` 或 `'similarity_score_threshold'`。类型： `str`
`**kwargs`	传递给搜索方法的参数。类型： `Any` 默认值： `{}`

返回	描述
`list[Document]`	与查询最相似的 `Document` 对象列表。

引发	描述
`ValueError`	如果 `search_type` 不是 `'similarity'`、`'mmr'` 或 `'similarity_score_threshold'` 之一。

asearch `异步` ¶

asearch(query: str, search_type: str, **kwargs: Any) -> list[Document]

异步返回与查询最相似的文档，使用指定的搜索类型。

参数	描述
`query`	输入文本。类型： `str`
`search_type`	要执行的搜索类型。可以是 `'similarity'`、`'mmr'` 或 `'similarity_score_threshold'`。类型： `str`
`**kwargs`	传递给搜索方法的参数。类型： `Any` 默认值： `{}`

返回	描述
`list[Document]`	与查询最相似的 `Document` 对象列表。

引发	描述
`ValueError`	如果 `search_type` 不是 `'similarity'`、`'mmr'` 或 `'similarity_score_threshold'` 之一。

asimilarity_search_with_score `异步` ¶

asimilarity_search_with_score(
    *args: Any, **kwargs: Any
) -> list[tuple[Document, float]]

异步运行带距离的相似性搜索。

参数	描述
`*args`	传递给搜索方法的参数。类型: `Any` 默认值: `()`
`**kwargs`	传递给搜索方法的参数。类型： `Any` 默认值： `{}`

返回	描述
`list[tuple[Document, float]]`	由 `(doc, similarity_score)` 组成的元组列表。

similarity_search_with_relevance_scores ¶

similarity_search_with_relevance_scores(
    query: str, k: int = 4, **kwargs: Any
) -> list[tuple[Document, float]]

返回文档和在 `[0, 1]` 范围内的相关性分数。

`0` 表示不相似，`1` 表示最相似。

参数	描述
`query`	输入文本。类型： `str`
`k`	要返回的 `Document` 对象数量。 TYPE: `int` DEFAULT: `4`
`**kwargs`	将传递给相似性搜索的kwargs。应包括`score_threshold`，一个可选的浮点值，介于`0`到`1`之间，用于筛选检索到的文档结果集。类型： `Any` 默认值： `{}`

返回	描述
`list[tuple[Document, float]]`	由 `(doc, similarity_score)` 组成的元组列表。

asimilarity_search_with_relevance_scores `异步` ¶

asimilarity_search_with_relevance_scores(
    query: str, k: int = 4, **kwargs: Any
) -> list[tuple[Document, float]]

异步返回文档和在 `[0, 1]` 范围内的相关性分数。

`0` 表示不相似，`1` 表示最相似。

参数	描述
`query`	输入文本。类型： `str`
`k`	要返回的 `Document` 对象数量。 TYPE: `int` DEFAULT: `4`
`**kwargs`	将传递给相似性搜索的kwargs。应包括`score_threshold`，一个可选的浮点值，介于`0`到`1`之间，用于筛选检索到的文档结果集。类型： `Any` 默认值： `{}`

返回	描述
`list[tuple[Document, float]]`	元组列表 `(doc, similarity_score)`

asimilarity_search `异步` ¶

asimilarity_search(query: str, k: int = 4, **kwargs: Any) -> list[Document]

异步返回与查询最相似的文档。

参数	描述
`query`	输入文本。类型： `str`
`k`	要返回的 `Document` 对象数量。 TYPE: `int` DEFAULT: `4`
`**kwargs`	传递给搜索方法的参数。类型： `Any` 默认值： `{}`

返回	描述
`list[Document]`	与查询最相似的 `Document` 对象列表。

asimilarity_search_by_vector `异步` ¶

asimilarity_search_by_vector(
    embedding: list[float], k: int = 4, **kwargs: Any
) -> list[Document]

异步返回与嵌入向量最相似的文档。

参数	描述
`embedding`	用于查找相似文档的嵌入。类型： `list[float]`
`k`	要返回的 `Document` 对象数量。 TYPE: `int` DEFAULT: `4`
`**kwargs`	传递给搜索方法的参数。类型： `Any` 默认值： `{}`

返回	描述
`list[Document]`	与查询向量最相似的 `Document` 对象列表。

amax_marginal_relevance_search `异步` ¶

amax_marginal_relevance_search(
    query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any
) -> list[Document]

异步返回使用最大边际相关性选择的文档。

最大边际相关性优化查询相似度与所选文档之间的多样性。

参数	描述
`query`	用于查找相似文档的文本。类型： `str`
`k`	要返回的 `Document` 对象数量。 TYPE: `int` DEFAULT: `4`
`fetch_k`	要获取并传递给 MMR 算法的 `Document` 对象数量。类型： `int` 默认值： `20`
`lambda_mult`	一个介于 `0` 和 `1` 之间的数字，决定了结果之间的多样性程度，其中 `0` 对应最大多样性，`1` 对应最小多样性。类型： `float` 默认值： `0.5`
`**kwargs`	传递给搜索方法的参数。类型： `Any` 默认值： `{}`

返回	描述
`list[Document]`	通过最大边际相关性选择的 `Document` 对象列表。

amax_marginal_relevance_search_by_vector `异步` ¶

amax_marginal_relevance_search_by_vector(
    embedding: list[float],
    k: int = 4,
    fetch_k: int = 20,
    lambda_mult: float = 0.5,
    **kwargs: Any,
) -> list[Document]

异步返回使用最大边际相关性选择的文档。

最大边际相关性优化查询相似度与所选文档之间的多样性。

参数	描述
`embedding`	用于查找相似文档的嵌入。类型： `list[float]`
`k`	要返回的 `Document` 对象数量。 TYPE: `int` DEFAULT: `4`
`fetch_k`	要获取并传递给 MMR 算法的 `Document` 对象数量。类型： `int` 默认值： `20`
`lambda_mult`	一个介于 `0` 和 `1` 之间的数字，决定了结果之间的多样性程度，其中 `0` 对应最大多样性，`1` 对应最小多样性。类型： `float` 默认值： `0.5`
`**kwargs`	传递给搜索方法的参数。类型： `Any` 默认值： `{}`

返回	描述
`list[Document]`	通过最大边际相关性选择的 `Document` 对象列表。

afrom_documents `异步` `类方法` ¶

afrom_documents(
    documents: list[Document], embedding: Embeddings, **kwargs: Any
) -> Self

异步返回从文档和嵌入初始化的 `VectorStore`。

参数	描述
`documents`	要添加到 `VectorStore` 的 `Document` 对象列表。 TYPE: `list[Document]`
`embedding`	要使用的嵌入函数。 TYPE: `Embeddings`
`**kwargs`	附加的关键字参数。类型： `Any` 默认值： `{}`

返回	描述
`Self`	从文档和嵌入初始化的 `VectorStore`。

afrom_texts `异步` `类方法` ¶

afrom_texts(
    texts: list[str],
    embedding: Embeddings,
    metadatas: list[dict] | None = None,
    *,
    ids: list[str] | None = None,
    **kwargs: Any,
) -> Self

异步返回从文本和嵌入初始化的 `VectorStore`。

参数	描述
`texts`	要添加到 `VectorStore` 的文本。类型: `list[str]`
`embedding`	要使用的嵌入函数。 TYPE: `Embeddings`
`metadatas`	与文本关联的元数据可选列表。类型： `list[dict] \| None` 默认值： `None`
`ids`	与文本关联的 ID 可选列表。类型: `list[str] \| None` 默认值: `None`
`**kwargs`	附加的关键字参数。类型： `Any` 默认值： `{}`

返回	描述
`Self`	从文本和嵌入初始化的 `VectorStore`。

as_retriever ¶

as_retriever(**kwargs: Any) -> VectorStoreRetriever

返回从此 `VectorStore` 初始化的 `VectorStoreRetriever`。

参数描述

**kwargs

传递给搜索函数的关键字参数。可以包括

`search_type`：定义检索器应执行的搜索类型。可以是 `'similarity'` (默认)、`'mmr'` 或 `'similarity_score_threshold'`。
`search_kwargs`：传递给搜索函数的关键字参数。可以包括诸如
- `k`：要返回的文档数量（默认：`4`）
- `score_threshold`：`similarity_score_threshold` 的最小相关性阈值
- `fetch_k`：传递给 MMR 算法的文档数量（默认值：`20`）
- `lambda_mult`：MMR 返回结果的多样性；`1` 表示最小多样性，0 表示最大多样性。（默认值：`0.5`）
- `filter`：按文档元数据过滤

类型： Any 默认值： {}

返回	描述
`VectorStoreRetriever`	`VectorStore` 的检索器类。

示例

# Retrieve more documents with higher diversity
# Useful if your dataset has many similar documents
docsearch.as_retriever(
    search_type="mmr", search_kwargs={"k": 6, "lambda_mult": 0.25}
)

# Fetch more documents for the MMR algorithm to consider
# But only return the top 5
docsearch.as_retriever(search_type="mmr", search_kwargs={"k": 5, "fetch_k": 50})

# Only retrieve documents that have a relevance score
# Above a certain threshold
docsearch.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.8},
)

# Only get the single most similar document from the dataset
docsearch.as_retriever(search_kwargs={"k": 1})

# Use a filter to only retrieve documents from a specific paper
docsearch.as_retriever(
    search_kwargs={"filter": {"paper_title": "GPT-4 Technical Report"}}
)

init ¶

__init__(
    collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
    embedding_function: Embeddings | None = None,
    persist_directory: str | None = None,
    host: str | None = None,
    port: int | None = None,
    headers: dict[str, str] | None = None,
    chroma_cloud_api_key: str | None = None,
    tenant: str | None = None,
    database: str | None = None,
    client_settings: Settings | None = None,
    collection_metadata: dict | None = None,
    collection_configuration: CreateCollectionConfiguration | None = None,
    client: ClientAPI | None = None,
    relevance_score_fn: Callable[[float], float] | None = None,
    create_collection_if_not_exists: bool | None = True,
    *,
    ssl: bool = False,
) -> None

使用 Chroma 客户端进行初始化。

参数	描述
`collection_name`	要创建的集合的名称。类型： `str` 默认值： `_LANGCHAIN_DEFAULT_COLLECTION_NAME`
`embedding_function`	嵌入类对象。用于嵌入文本。 TYPE: `Embeddings \| None` DEFAULT: `None`
`persist_directory`	用于持久化集合的目录。类型: `str \| None` 默认值: `None`
`host`	已部署的 Chroma 服务器的主机名。类型: `str \| None` 默认值: `None`
`port`	已部署的 Chroma 服务器的连接端口。默认为 8000。 TYPE: `int \| None` DEFAULT: `None`
`ssl`	是否与已部署的 Chroma 服务器建立 SSL 连接。默认为 False。类型： `bool` 默认值： `False`
`headers`	要发送到已部署的 Chroma 服务器的 HTTP 标头。类型： `dict[str, str] \| None` 默认值： `None`
`chroma_cloud_api_key`	Chroma Cloud API 密钥。类型: `str \| None` 默认值: `None`
`tenant`	租户 ID。Chroma Cloud 连接必需。对于本地 Chroma 服务器，默认为 'default_tenant'。类型: `str \| None` 默认值: `None`
`database`	数据库名称。Chroma Cloud 连接必需。默认为 'default_database'。类型: `str \| None` 默认值: `None`
`client_settings`	Chroma 客户端设置类型： `Settings \| None` 默认值： `None`
`collection_metadata`	集合配置。类型： `dict \| None` 默认值： `None`
`collection_configuration`	集合的索引配置。类型： `CreateCollectionConfiguration \| None` 默认值： `None`
`client`	Chroma 客户端。文档：https://docs.trychroma.com/reference/python/client 类型： `ClientAPI \| None` 默认值： `None`
`relevance_score_fn`	用于从距离计算相关性分数的函数。仅在 `similarity_search_with_relevance_scores` 中使用类型： `Callable[[float], float] \| None` 默认值： `None`
`create_collection_if_not_exists`	如果集合不存在，是否创建它。默认为 `True`。类型： `bool \| None` 默认值： `True`

__ensure_collection ¶

__ensure_collection() -> None

确保集合存在或创建它。

__query_collection ¶

__query_collection(
    query_texts: list[str] | None = None,
    query_embeddings: list[list[float]] | None = None,
    n_results: int = 4,
    where: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[Document] | QueryResult

查询 chroma 集合。

参数	描述
`query_texts`	查询文本列表。类型: `list[str] \| None` 默认值: `None`
`query_embeddings`	查询嵌入列表。类型： `list[list[float]] \| None` 默认值： `None`
`n_results`	要返回的结果数。 TYPE: `int` DEFAULT: `4`
`where`	用于按元数据过滤结果的字典。例如 {"color" : "red"}。类型： `dict[str, str] \| None` 默认值： `None`
`where_document`	用于按文档内容过滤的字典。例如 {"$contains": "hello"}。类型： `dict[str, str] \| None` 默认值： `None`
`kwargs`	传递给 Chroma 集合查询的其他关键字参数。类型： `Any` 默认值： `{}`

返回	描述
`list[Document] \| QueryResult`	为提供的
`list[Document] \| QueryResult`	query_embeddings 或 query_texts 返回的 `n_results` 个最近邻嵌入的列表。

encode_image `静态方法` ¶

encode_image(uri: str) -> str

从图像 URI 获取 base64 字符串。

fork ¶

fork(new_name: str) -> Chroma

派生此向量存储。

参数	描述
`new_name`	派生存储的新名称。类型： `str`

返回	描述
`Chroma`	从此向量存储派生的新 Chroma 存储。

add_images ¶

add_images(
    uris: list[str], metadatas: list[dict] | None = None, ids: list[str] | None = None
) -> list[str]

通过嵌入处理更多图像并将其添加到 VectorStore 中。

参数	描述
`uris`	图像的文件路径。类型: `list[str]`
`metadatas`	可选的元数据列表。查询时，可以根据此元数据进行筛选。类型： `list[dict] \| None` 默认值： `None`
`ids`	可选的 ID 列表。（没有 ID 的项目将被分配 UUID）类型: `list[str] \| None` 默认值: `None`

返回	描述
`list[str]`	已添加图像的 ID 列表。

引发	描述
`ValueError`	当元数据不正确时。

add_texts ¶

add_texts(
    texts: Iterable[str],
    metadatas: list[dict] | None = None,
    ids: list[str] | None = None,
    **kwargs: Any,
) -> list[str]

通过嵌入运行更多文本并添加到 `VectorStore`。

参数	描述
`texts`	要添加到 `VectorStore` 的文本。类型： `Iterable[str]`
`metadatas`	可选的元数据列表。查询时，可以根据此元数据进行筛选。类型： `list[dict] \| None` 默认值： `None`
`ids`	可选的 ID 列表。（没有 ID 的项目将被分配 UUID）类型: `list[str] \| None` 默认值: `None`
`kwargs`	附加的关键字参数。类型： `Any` 默认值： `{}`

返回	描述
`list[str]`	已添加文本的 ID 列表。

引发	描述
`ValueError`	当元数据不正确时。

similarity_search ¶

similarity_search(
    query: str, k: int = DEFAULT_K, filter: dict[str, str] | None = None, **kwargs: Any
) -> list[Document]

使用 Chroma 运行相似性搜索。

参数	描述
`query`	要搜索的查询文本。类型： `str`
`k`	要返回的结果数。类型： `int` 默认值： `DEFAULT_K`
`filter`	按元数据筛选。类型： `dict[str, str] \| None` 默认值： `None`
`kwargs`	传递给 Chroma 集合查询的其他关键字参数。类型： `Any` 默认值： `{}`

返回	描述
`list[Document]`	与查询文本最相似的文档列表。

similarity_search_by_vector ¶

similarity_search_by_vector(
    embedding: list[float],
    k: int = DEFAULT_K,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[Document]

返回与嵌入向量最相似的文档。

参数	描述
`embedding`	用于查找相似文档的嵌入。类型： `list[float]`
`k`	要返回的文档数量。类型： `int` 默认值： `DEFAULT_K`
`filter`	按元数据筛选。类型： `dict[str, str] \| None` 默认值： `None`
`where_document`	用于按文档内容过滤的字典。例如 {"$contains": "hello"}。类型： `dict[str, str] \| None` 默认值： `None`
`kwargs`	传递给 Chroma 集合查询的其他关键字参数。类型： `Any` 默认值： `{}`

返回	描述
`list[Document]`	与查询向量最相似的 `Document` 对象列表。

similarity_search_by_vector_with_relevance_scores ¶

similarity_search_by_vector_with_relevance_scores(
    embedding: list[float],
    k: int = DEFAULT_K,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[tuple[Document, float]]

返回与嵌入向量最相似的文档和相似度分数。

参数	描述
`embedding`	用于查找相似文档的嵌入。类型： `List[float]`
`k`	要返回的文档数量。类型： `int` 默认值： `DEFAULT_K`
`filter`	按元数据筛选。类型： `dict[str, str] \| None` 默认值： `None`
`where_document`	用于按文档筛选的字典。例如 {"$contains": "hello"}。类型： `dict[str, str] \| None` 默认值： `None`
`kwargs`	传递给 Chroma 集合查询的其他关键字参数。类型： `Any` 默认值： `{}`

返回	描述
`list[tuple[Document, float]]`	与查询文本最相似的文档列表以及每个文档的相关性分数
`list[tuple[Document, float]]`	（浮点数）。分数越低表示相似度越高。

similarity_search_with_score ¶

similarity_search_with_score(
    query: str,
    k: int = DEFAULT_K,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[tuple[Document, float]]

使用 Chroma 运行带距离的相似性搜索。

参数	描述
`query`	要搜索的查询文本。类型： `str`
`k`	要返回的结果数。类型： `int` 默认值： `DEFAULT_K`
`filter`	按元数据筛选。类型： `dict[str, str] \| None` 默认值： `None`
`where_document`	用于按文档内容筛选的字典。例如 {"$contains": "hello"}。类型： `dict[str, str] \| None` 默认值： `None`
`kwargs`	传递给 Chroma 集合查询的其他关键字参数。类型： `Any` 默认值： `{}`

返回	描述
`list[tuple[Document, float]]`	与查询文本最相似的文档列表以及
`list[tuple[Document, float]]`	每个文档的距离（浮点数）。分数越低表示相似度越高。

similarity_search_with_vectors ¶

similarity_search_with_vectors(
    query: str,
    k: int = DEFAULT_K,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[tuple[Document, ndarray]]

使用 Chroma 运行带向量的相似性搜索。

参数	描述
`query`	要搜索的查询文本。类型： `str`
`k`	要返回的结果数。类型： `int` 默认值： `DEFAULT_K`
`filter`	按元数据筛选。类型： `dict[str, str] \| None` 默认值： `None`
`where_document`	用于按文档内容过滤的字典。例如 {"$contains": "hello"}。类型： `dict[str, str] \| None` 默认值： `None`
`kwargs`	传递给 Chroma 集合查询的其他关键字参数。类型： `Any` 默认值： `{}`

返回	描述
`list[tuple[Document, ndarray]]`	与查询文本最相似的文档列表以及
`list[tuple[Document, ndarray]]`	每个文档的嵌入向量。

similarity_search_by_image ¶

similarity_search_by_image(
    uri: str, k: int = DEFAULT_K, filter: dict[str, str] | None = None, **kwargs: Any
) -> list[Document]

根据给定的图像 URI 搜索相似的图像。

参数	描述
`uri`	要搜索的图像的 URI。类型： `str`
`k`	要返回的结果数。类型： `int` 默认值： `DEFAULT_K`
`filter`	按元数据筛选。类型： `dict[str, str] \| None` 默认值： `None`
`**kwargs`	要传递给函数的其他参数。类型： `Any` 默认值： `{}`

返回	描述
`list[Document]`	与所提供图像最相似的图像列表。列表中的每个元素都是一个
`list[Document]`	LangChain 文档对象。页面内容是 b64 编码的图像，元数据
`list[Document]`	是默认的或由用户定义的。

引发	描述
`ValueError`	如果嵌入函数不支持图像嵌入。

similarity_search_by_image_with_relevance_score ¶

similarity_search_by_image_with_relevance_score(
    uri: str, k: int = DEFAULT_K, filter: dict[str, str] | None = None, **kwargs: Any
) -> list[tuple[Document, float]]

根据给定的图像 URI 搜索相似的图像。

参数	描述
`uri`	要搜索的图像的 URI。类型： `str`
`k`	要返回的结果数。类型： `int` 默认值： `DEFAULT_K`
`filter`	按元数据筛选。类型： `dict[str, str] \| None` 默认值： `None`
`**kwargs`	要传递给函数的其他参数。类型： `Any` 默认值： `{}`

返回	描述
`list[tuple[Document, float]]`	包含与查询图像相似的文档及其
`list[tuple[Document, float]]`	相似度分数的元组列表。每个元组中的第 0 个元素是一个 LangChain 文档对象。
`list[tuple[Document, float]]`	页面内容是 b64 编码的图像，元数据是默认的或由用户定义的。

引发	描述
`ValueError`	如果嵌入函数不支持图像嵌入。

max_marginal_relevance_search_by_vector ¶

max_marginal_relevance_search_by_vector(
    embedding: list[float],
    k: int = DEFAULT_K,
    fetch_k: int = 20,
    lambda_mult: float = 0.5,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[Document]

返回使用最大边际相关性选择的文档。

最大边际相关性优化查询相似度与所选文档之间的多样性。

参数	描述
`embedding`	用于查找相似文档的嵌入。类型： `list[float]`
`k`	要返回的 `Document` 对象数量。类型： `int` 默认值： `DEFAULT_K`
`fetch_k`	要获取并传递给 MMR 算法的 `Document` 对象数量。类型： `int` 默认值： `20`
`lambda_mult`	介于 0 和 1 之间的数字，用于确定结果之间的多样性程度，其中 `0` 对应最大多样性，`1` 对应最小多样性。类型： `float` 默认值： `0.5`
`filter`	按元数据筛选。类型： `dict[str, str] \| None` 默认值： `None`
`where_document`	用于按文档内容筛选的字典。例如 `{"$contains": "hello"}`。类型： `dict[str, str] \| None` 默认值： `None`
`kwargs`	传递给 Chroma 集合查询的其他关键字参数。类型： `Any` 默认值： `{}`

返回	描述
`list[Document]`	通过最大边际相关性选择的 `Document` 对象列表。

max_marginal_relevance_search ¶

max_marginal_relevance_search(
    query: str,
    k: int = DEFAULT_K,
    fetch_k: int = 20,
    lambda_mult: float = 0.5,
    filter: dict[str, str] | None = None,
    where_document: dict[str, str] | None = None,
    **kwargs: Any,
) -> list[Document]

返回使用最大边际相关性选择的文档。

最大边际相关性优化查询相似度与所选文档之间的多样性。

参数	描述
`query`	用于查找相似文档的文本。类型： `str`
`k`	要返回的文档数量。类型： `int` 默认值： `DEFAULT_K`
`fetch_k`	要获取以传递给 MMR 算法的文档数量。类型： `int` 默认值： `20`
`lambda_mult`	一个介于 `0` 和 `1` 之间的数字，决定了结果之间的多样性程度，其中 `0` 对应最大多样性，`1` 对应最小多样性。类型： `float` 默认值： `0.5`
`filter`	按元数据筛选。类型： `dict[str, str] \| None` 默认值： `None`
`where_document`	用于按文档内容筛选的字典。例如 `{"$contains": "hello"}`。类型： `dict[str, str] \| None` 默认值： `None`
`kwargs`	传递给 Chroma 集合查询的其他关键字参数。类型： `Any` 默认值： `{}`

返回	描述
`list[Document]`	通过最大边际相关性选择的 `Document` 对象列表。

引发	描述
`ValueError`	如果未提供嵌入函数。

delete_collection ¶

delete_collection() -> None

删除集合。

reset_collection ¶

reset_collection() -> None

重置集合。

通过删除集合并重新创建一个空集合来重置集合。

get ¶

get(
    ids: str | list[str] | None = None,
    where: Where | None = None,
    limit: int | None = None,
    offset: int | None = None,
    where_document: WhereDocument | None = None,
    include: list[str] | None = None,
) -> dict[str, Any]

获取集合。

参数	描述
`ids`	要获取的嵌入的 ID。可选。类型： `str \| list[str] \| None` 默认值： `None`
`where`	一个 Where 类型的字典，用于筛选结果。例如 `{"$and": [{"color": "red"}, {"price": 4.20}]}`。可选。类型： `Where \| None` 默认值： `None`
`limit`	要返回的文档数量。可选。 TYPE: `int \| None` DEFAULT: `None`
`offset`	开始返回结果的偏移量。用于分页结果时与 limit 一起使用。可选。 TYPE: `int \| None` DEFAULT: `None`
`where_document`	一个 WhereDocument 类型的字典，用于按文档筛选。例如 `{"$contains": "hello"}`。可选。类型： `WhereDocument \| None` 默认值： `None`
`包含`	一个列表，指定结果中要包含的内容。可以包含 `"embeddings"`、`"metadatas"`、`"documents"`。ID 总是被包含。默认为 `["metadatas", "documents"]`。可选。类型: `list[str] \| None` 默认值: `None`

返回	描述
`dict[str, Any]`	一个包含键 `"ids"`、`"embeddings"`、`"metadatas"`、`"documents"` 的字典。

get_by_ids ¶

get_by_ids(ids: Sequence[str]) -> list[Document]

根据 ID 获取文档。

返回的文档应将其 ID 字段设置为文档在向量存储中的 ID。

如果某些 ID 未找到或存在重复的 ID，返回的文档数量可能少于请求的数量。

用户不应假设返回文档的顺序与输入 ID 的顺序相匹配。相反，用户应依赖于返回文档的 ID 字段。

如果某些 ID 未找到文档，此方法不应**抛出异常**。

参数	描述
`ids`	要检索的 ID 列表。类型： `Sequence[str]`

返回	描述
`list[Document]`	文档列表。

在 0.2.1 版本中添加

update_document ¶

update_document(document_id: str, document: Document) -> None

更新集合中的一个文档。

参数	描述
`document_id`	要更新的文档的 ID。类型： `str`
`document`	要更新的文档。类型： `Document`

update_documents ¶

update_documents(ids: list[str], documents: list[Document]) -> None

更新集合中的一个文档。

参数	描述
`ids`	要更新的文档的 ID 列表。类型: `list[str]`
`documents`	要更新的文档列表。 TYPE: `list[Document]`

引发	描述
`ValueError`	如果未提供嵌入函数。

from_texts `类方法` ¶

from_texts(
    texts: list[str],
    embedding: Embeddings | None = None,
    metadatas: list[dict] | None = None,
    ids: list[str] | None = None,
    collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
    persist_directory: str | None = None,
    host: str | None = None,
    port: int | None = None,
    headers: dict[str, str] | None = None,
    chroma_cloud_api_key: str | None = None,
    tenant: str | None = None,
    database: str | None = None,
    client_settings: Settings | None = None,
    client: ClientAPI | None = None,
    collection_metadata: dict | None = None,
    collection_configuration: CreateCollectionConfiguration | None = None,
    *,
    ssl: bool = False,
    **kwargs: Any,
) -> Chroma

从原始文档创建 Chroma 向量存储。

如果指定了 persist_directory，集合将持久化到该目录。否则，数据将是临时的内存数据。

参数	描述
`texts`	要添加到集合中的文本列表。类型: `list[str]`
`collection_name`	要创建的集合的名称。类型： `str` 默认值： `_LANGCHAIN_DEFAULT_COLLECTION_NAME`
`persist_directory`	用于持久化集合的目录。类型: `str \| None` 默认值: `None`
`host`	已部署的 Chroma 服务器的主机名。类型: `str \| None` 默认值: `None`
`port`	已部署的 Chroma 服务器的连接端口。默认为 8000。 TYPE: `int \| None` DEFAULT: `None`
`ssl`	是否与已部署的 Chroma 服务器建立 SSL 连接。默认为 False。类型： `bool` 默认值： `False`
`headers`	要发送到已部署的 Chroma 服务器的 HTTP 标头。类型： `dict[str, str] \| None` 默认值： `None`
`chroma_cloud_api_key`	Chroma Cloud API 密钥。类型: `str \| None` 默认值: `None`
`tenant`	租户 ID。Chroma Cloud 连接必需。对于本地 Chroma 服务器，默认为 'default_tenant'。类型: `str \| None` 默认值: `None`
`database`	数据库名称。Chroma Cloud 连接必需。默认为 'default_database'。类型: `str \| None` 默认值: `None`
`embedding`	嵌入函数。 TYPE: `Embeddings \| None` DEFAULT: `None`
`metadatas`	元数据列表。类型： `list[dict] \| None` 默认值： `None`
`ids`	文档 ID 列表。类型: `list[str] \| None` 默认值: `None`
`client_settings`	Chroma 客户端设置。类型： `Settings \| None` 默认值： `None`
`client`	Chroma 客户端。文档：https://docs.trychroma.com/reference/python/client 类型： `ClientAPI \| None` 默认值： `None`
`collection_metadata`	集合配置。类型： `dict \| None` 默认值： `None`
`collection_configuration`	集合的索引配置。类型： `CreateCollectionConfiguration \| None` 默认值： `None`
`kwargs`	用于初始化 Chroma 客户端的其他关键字参数。类型： `Any` 默认值： `{}`

返回	描述
`Chroma`	Chroma 向量存储。类型： `Chroma`

from_documents `类方法` ¶

from_documents(
    documents: list[Document],
    embedding: Embeddings | None = None,
    ids: list[str] | None = None,
    collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME,
    persist_directory: str | None = None,
    host: str | None = None,
    port: int | None = None,
    headers: dict[str, str] | None = None,
    chroma_cloud_api_key: str | None = None,
    tenant: str | None = None,
    database: str | None = None,
    client_settings: Settings | None = None,
    client: ClientAPI | None = None,
    collection_metadata: dict | None = None,
    collection_configuration: CreateCollectionConfiguration | None = None,
    *,
    ssl: bool = False,
    **kwargs: Any,
) -> Chroma

从文档列表创建 Chroma 向量存储。

如果指定了 persist_directory，集合将持久化到该目录。否则，数据将是临时的内存数据。

参数	描述
`collection_name`	要创建的集合的名称。类型： `str` 默认值： `_LANGCHAIN_DEFAULT_COLLECTION_NAME`
`persist_directory`	用于持久化集合的目录。类型: `str \| None` 默认值: `None`
`host`	已部署的 Chroma 服务器的主机名。类型: `str \| None` 默认值: `None`
`port`	已部署的 Chroma 服务器的连接端口。默认为 8000。 TYPE: `int \| None` DEFAULT: `None`
`ssl`	是否与已部署的 Chroma 服务器建立 SSL 连接。类型： `bool` 默认值： `False`
`headers`	要发送到已部署的 Chroma 服务器的 HTTP 标头。类型： `dict[str, str] \| None` 默认值： `None`
`chroma_cloud_api_key`	Chroma Cloud API 密钥。类型: `str \| None` 默认值: `None`
`tenant`	租户 ID。Chroma Cloud 连接必需。对于本地 Chroma 服务器，默认为 'default_tenant'。类型: `str \| None` 默认值: `None`
`database`	数据库名称。Chroma Cloud 连接必需。默认为 'default_database'。类型: `str \| None` 默认值: `None`
`ids`	文档 ID 列表。类型: `list[str] \| None` 默认值: `None`
`documents`	要添加到 `VectorStore` 的文档列表。 TYPE: `list[Document]`
`embedding`	嵌入函数。 TYPE: `Embeddings \| None` DEFAULT: `None`
`client_settings`	Chroma 客户端设置。类型： `Settings \| None` 默认值： `None`
`client`	Chroma 客户端。文档：https://docs.trychroma.com/reference/python/client 类型： `ClientAPI \| None` 默认值： `None`
`collection_metadata`	集合配置。类型： `dict \| None` 默认值： `None`
`collection_configuration`	集合的索引配置。类型： `CreateCollectionConfiguration \| None` 默认值： `None`
`kwargs`	用于初始化 Chroma 客户端的其他关键字参数。类型： `Any` 默认值： `{}`

返回	描述
`Chroma`	Chroma 向量存储。类型： `Chroma`

delete ¶

delete(ids: list[str] | None = None, **kwargs: Any) -> None

通过向量 ID 删除。

参数	描述
`ids`	要删除的 ID 列表。类型: `list[str] \| None` 默认值: `None`
`kwargs`	附加的关键字参数。类型： `Any` 默认值： `{}`

langchain-chroma¶

langchain_chroma ¶

Chroma ¶

embeddings 属性 ¶

aget_by_ids 异步 ¶

adelete 异步 ¶

aadd_texts 异步 ¶

add_documents ¶

aadd_documents 异步 ¶

search ¶

asearch 异步 ¶

asimilarity_search_with_score 异步 ¶

similarity_search_with_relevance_scores ¶

asimilarity_search_with_relevance_scores 异步 ¶

asimilarity_search 异步 ¶

asimilarity_search_by_vector 异步 ¶

amax_marginal_relevance_search 异步 ¶

amax_marginal_relevance_search_by_vector 异步 ¶

afrom_documents 异步 类方法 ¶

afrom_texts 异步 类方法 ¶

as_retriever ¶

__init__ ¶

__ensure_collection ¶

__query_collection ¶

encode_image 静态方法 ¶

fork ¶

add_images ¶

add_texts ¶

similarity_search ¶

similarity_search_by_vector ¶

similarity_search_by_vector_with_relevance_scores ¶

similarity_search_with_score ¶

similarity_search_with_vectors ¶

similarity_search_by_image ¶

similarity_search_by_image_with_relevance_score ¶

max_marginal_relevance_search_by_vector ¶

max_marginal_relevance_search ¶

delete_collection ¶

reset_collection ¶

get ¶

get_by_ids ¶

update_document ¶

update_documents ¶

from_texts 类方法 ¶

from_documents 类方法 ¶

delete ¶

`langchain-chroma`¶

embeddings `属性` ¶

aget_by_ids `异步` ¶

adelete `异步` ¶

aadd_texts `异步` ¶

aadd_documents `异步` ¶

asearch `异步` ¶

asimilarity_search_with_score `异步` ¶

asimilarity_search_with_relevance_scores `异步` ¶

asimilarity_search `异步` ¶

asimilarity_search_by_vector `异步` ¶

amax_marginal_relevance_search `异步` ¶

amax_marginal_relevance_search_by_vector `异步` ¶

afrom_documents `异步` `类方法` ¶

afrom_texts `异步` `类方法` ¶

init ¶

encode_image `静态方法` ¶

from_texts `类方法` ¶

from_documents `类方法` ¶