monstache实践mongodb同步es

发表于 2022-02-08 更新于 2024-01-13 阅读次数：阅读次数：

monstache实践

背景

我们已经通过 Enterprise Search 企业搜索实践快速搭建起了搜索引擎，
并且通过评估 mongodb同步elasticSearch方案评估,了解到社区和行业主流monstache同步方案。

我们按照Enterprise Search 企业搜索实践，先创建Engine Schema,提前设置好mapping字段。

设置字段
IRy42Q
查看字段数据
BivxFf

我们来实践一下monstache

monstache配置

假设我们已经有了mongodb和elasticsearch，我们来配置同步设置

# 启用调试日志
verbose = true

mongo-url = "mongodb://root:<password>@10.8.99.44:27011/?authSource=admin" 
elasticsearch-urls = ["https://<host>:9200"]

# index GridFS files inserted into the following collections
file-namespaces = ["biocitydb.materials"]
# 此选项允许你直接将集合从 MongoDB 复制到 Elasticsearch。 Monstache 允许过滤实际索引到 Elasticsearch 的数据，因此你不一定需要复制整个集合。 在上面，我们同步数据库 test 中的 mycol 集合。
direct-read-namespaces = ["biocitydb.materials"]
# 实时通知以告知 Elasticsearch 所有写入文档，包括指定集合中的删除和更新。
change-stream-namespaces = ["biocitydb.materials"] 

namespace-regex = '^biocitydb\.materials$'

# 压缩请求到es
gzip = true

# generate indexing statistics
stats = true

# index statistics into Elasticsearch
index-stats = true

elasticsearch-user = "elastic"
elasticsearch-password = "<password>"

#monstache最多开几个线程同步到es,默认为4
elasticsearch-max-conns = 2 

# 证书文件
elasticsearch-pem-file = "/monstache/client.crt.pem"
elasticsearch-validate-pem-file = false

# mongodb删除集合或库时是否同步删除es中的索引
dropped-collections = true
dropped-databases = false

# 更新es而不是覆盖
index-as-update = true

replay = false

# 记录同步位点，便于下次从该位置同步
resume = true

# do not validate that progress timestamps have been saved
resume-write-unsafe = false

# 需要es ingest-attachment 
index-files = false

# turn on search result highlighting of GridFS content
file-highlighting = true

# 高可用模式下需要配置集群名称，集群名称一样的进程会自动加入一个集群内,这个是monstance的集群，不是es
cluster-name = '<name>'

# do not exit after full-sync, rather continue tailing the oplog
exit-after-direct-reads = false

# 生产环境以日志文件输出，默认以命令行输出
# [logs]
# info = "./logs/info.log"
# warn = "./logs/wran.log"
# error = "./logs/error.log"
# trace = "./logs/trace.log"

# mapping定义mongodb数据到es的索引名称和type，namespace是库名.集合名
# 这里需要注意一件事：最好是在es中创建好你要的索引结构，关闭es的自动创建索引功能
[[mapping]]
namespace = "biocitydb.materials" 
index = "materials"
 
[[script]]
namespace = "biocitydb.materials"
path = "./scripts/materials.js"
routing = true

[logs]: 记录错误信息
[[mapping]]: 改写默认的索引名称。在上面我们的索引名称为 mongodb
**[[script]]**：是一种中间件，能够转换，删除文档或定义索引元数据。可以使用 Javascript 或 Golang 作为插件编写该中间件。

用于转换文档的脚本示例

module.exports = function (doc) {
    delete doc._id;
    //TODO
    return doc;
}

同步完后，我们来看看同步的数据情况

正确同步了所有数据
8wwon6
正常搜索
nV8qgF

我们同时也评估了使用flinkCDC同步，可查看
Flink CDC实践mongodb到es

monstache实践

背景

monstache配置

相关链接