【RAG】【vector_stores038】Firestore向量存储示例-开发者社区

案例目标

本案例展示如何使用Google Firestore作为向量数据库，与LlamaIndex集成实现高效的文档存储和相似性搜索功能。Firestore是Google Cloud提供的无服务器文档数据库，可以自动扩展以满足任何需求。

通过本示例，您将学习：

如何配置Google Cloud项目和Firestore数据库
如何使用FirestoreVectorStore存储和检索向量数据
如何执行向量相似性搜索查询
如何应用元数据过滤来优化搜索结果

技术栈与核心依赖

核心依赖

依赖包	用途
llama-index-vector-stores-firestore	LlamaIndex与Firestore的集成包
llama-index-embeddings-huggingface	HuggingFace嵌入模型集成
llama-index	LlamaIndex核心框架
google-cloud-firestore	Google Cloud Firestore客户端库

技术栈

Google Firestore GCP

无服务器文档数据库，提供自动扩展和强一致性

HuggingFace嵌入

使用BAAI/bge-small-en-v1.5模型生成文本向量表示

文档处理

LlamaIndex的SimpleDirectoryReader用于加载文档

向量索引

LlamaIndex的VectorStoreIndex用于构建索引和查询

环境配置

Google Cloud项目设置

在运行示例之前，需要完成以下步骤：

创建Google Cloud项目：访问Google Cloud控制台创建新项目
启用Firestore API：在API库中启用Firestore API
创建Firestore数据库：按照Firestore文档创建数据库

安装依赖

%pip install --quiet llama-index %pip install --quiet llama-index-vector-stores-firestore llama-index-embeddings-huggingface

设置Google Cloud项目ID

# 设置您的Google Cloud项目ID PROJECT_ID = "YOUR_PROJECT_ID" # 替换为您的项目ID # 使用gcloud命令行工具设置项目 !gcloud config set project {PROJECT_ID}

提示：如果您不知道项目ID，可以运行以下命令之一：

gcloud config list- 查看当前配置
gcloud projects list- 列出所有项目

身份验证

在Colab环境中，可以使用以下代码进行身份验证：

from google.colab import auth # 进行身份验证 auth.authenticate_user()

注意：如果您在Vertex AI Workbench中运行此笔记本，请参考设置说明进行身份验证。

案例实现

1. 导入必要的库

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext, ServiceContext from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.vector_stores.firestore import FirestoreVectorStore from llama_index.core import Settings

2. 加载文档数据

# 加载Paul Graham的文章数据 documents = SimpleDirectoryReader( "../../examples/data/paul_graham" ).load_data()

3. 配置嵌入模型

# 设置HuggingFace嵌入模型 embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

4. 创建Firestore向量存储

# 指定集合名称 COLLECTION_NAME = "test_collection" # 创建Firestore向量存储 store = FirestoreVectorStore(collection_name=COLLECTION_NAME)

5. 构建向量索引

# 创建存储上下文 storage_context = StorageContext.from_defaults(vector_store=store) # 创建服务上下文，禁用LLM，仅使用嵌入模型 service_context = ServiceContext.from_defaults( llm=None, embed_model=embed_model ) # 从文档创建向量索引 index = VectorStoreIndex.from_documents( documents, storage_context=storage_context, service_context=service_context )

6. 执行查询

# 创建查询引擎 query_engine = index.as_query_engine() # 执行查询 res = query_engine.query("What did the author do growing up?") # 打印最相关的文档片段 print(str(res.source_nodes[0].text))

7. 应用元数据过滤

from llama_index.core.vector_stores.types import ( MetadataFilters, ExactMatchFilter, MetadataFilter, ) # 创建元数据过滤器 filters = MetadataFilters( filters=[MetadataFilter(key="author", value="Paul Graham")] ) # 创建带过滤器的查询引擎 query_engine = index.as_query_engine(filters=filters) # 执行带过滤的查询 res = query_engine.query("What did the author do growing up?") print(str(res.source_nodes[0].text))

Firebase/Firestore特性：

自动扩展：无需管理服务器，根据需求自动扩展
实时同步：数据更改会实时同步到所有客户端
离线支持：支持离线数据访问和同步
安全规则：提供细粒度的数据访问控制

案例效果

本示例展示了Firestore向量存储的完整工作流程，包括：

配置Google Cloud项目和Firestore数据库
使用HuggingFace嵌入模型将文档转换为向量
将Paul Graham的文章数据索引到Firestore向量存储中
执行向量相似性搜索查询，获取相关文档片段
应用元数据过滤器优化搜索结果

预期输出示例

What I Worked On February 2021 Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep. The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.

案例实现思路

核心实现步骤

环境准备：创建Google Cloud项目，启用Firestore API，创建数据库
身份验证：配置Google Cloud身份验证，确保代码可以访问Firestore
依赖安装：安装必要的Python包，包括Firestore集成和嵌入模型
数据准备：使用SimpleDirectoryReader加载示例文档
嵌入模型配置：设置HuggingFace嵌入模型，用于将文本转换为向量
向量存储初始化：创建FirestoreVectorStore实例，指定集合名称
索引构建：使用加载的文档和配置的上下文创建VectorStoreIndex
查询执行：通过查询引擎执行向量相似性搜索
过滤应用：使用元数据过滤器优化搜索结果

关键技术点

Firestore集成

通过FirestoreVectorStore类将Firestore作为向量数据库，实现文档存储和检索

HuggingFace嵌入

使用BAAI/bge-small-en-v1.5模型生成高质量的文本向量表示

向量索引

通过VectorStoreIndex将文档转换为向量并存储在Firestore中

元数据过滤

使用MetadataFilters实现基于文档属性的精确搜索

架构流程

Firebase Firestore向量存储架构流程图（示意图）

扩展建议

功能扩展

实现批量操作支持，提高大规模数据处理效率
添加高级过滤器支持，如范围过滤、模糊匹配等
集成混合搜索功能，结合向量搜索和全文搜索
实现实时数据同步，支持多客户端协作

性能优化

优化索引策略，提高查询响应速度
实现向量压缩技术，减少存储成本
添加缓存机制，减少重复查询开销
使用异步操作，提高并发处理能力

应用场景扩展

智能搜索

构建理解语义的搜索引擎，提供更精准的搜索结果

内容推荐

基于用户偏好和内容相似性，实现个性化内容推荐

知识问答

构建智能问答系统，基于文档内容回答用户问题

集成建议

与Google Cloud服务集成：如Cloud Functions、Cloud Run，实现无服务器应用
与Firebase Authentication集成：实现用户身份验证和授权
与Google Cloud Storage集成：实现大规模文件存储和处理
与Vertex AI集成：使用Google的高级AI模型增强应用功能

总结

本示例展示了如何使用Google Firestore作为向量数据库，与LlamaIndex集成实现高效的文档存储和相似性搜索功能。Firestore作为Google Cloud提供的无服务器文档数据库，具有自动扩展、实时同步和离线支持等特性，非常适合构建现代AI应用。

通过本示例，我们学习了：

如何配置Google Cloud项目和Firestore数据库
如何使用FirestoreVectorStore存储和检索向量数据
如何执行向量相似性搜索查询
如何应用元数据过滤来优化搜索结果

Firebase Firestore向量存储的主要优势包括：

无服务器架构：无需管理服务器，自动扩展以满足需求
实时同步：数据更改会实时同步到所有客户端
离线支持：支持离线数据访问和同步
安全规则：提供细粒度的数据访问控制
全球分布：数据可以在全球多个区域复制，提高访问速度

这个示例为构建基于Google Cloud的向量搜索应用提供了基础，可以根据具体需求扩展功能，实现更复杂的AI应用场景。

案例目标

技术栈与核心依赖

核心依赖

技术栈

Google Firestore GCP

HuggingFace嵌入

文档处理

向量索引

环境配置

Google Cloud项目设置

安装依赖

设置Google Cloud项目ID

身份验证

案例实现

1. 导入必要的库

2. 加载文档数据

3. 配置嵌入模型

4. 创建Firestore向量存储

5. 构建向量索引

6. 执行查询

7. 应用元数据过滤

案例效果

预期输出示例

案例实现思路

核心实现步骤

关键技术点

Firestore集成

HuggingFace嵌入

向量索引

元数据过滤

架构流程

扩展建议

功能扩展

性能优化

应用场景扩展

智能搜索

内容推荐

知识问答

集成建议

总结

微信聊天记录导出终极指南：WeChatExporter让你轻松备份珍贵记忆

利用RealSense D435与MediaPipe实现机器人末端6D手部姿态同步控制

别再死记硬背！用T型/Π型等效电路图解二端口网络，一看就懂

【Neural Whole-Body Control: HOVER ExBody2 神经】第四部分：代码实战：PyTorch + IsaacLab 4.2 数据准备：从MoCap到IsaacLab

基于深度学习opencv+YOLOv5的密封钉缺陷焊缝检测 U-Net模型密封钉缺陷焊缝检测

开源大模型UI设计典范：Nanbeige 4.1-3B Streamlit WebUI视觉动效解析