elasticdump和ESM

逐个执行如下命令；

1.拷贝analyzer如分词（需要分词器，可能不成功，不影响复制）
./elasticdump --input=http://[来源IP地址]:9200/[来源索引] --output=http://[目标IP地址]:9200/[目标索引] --type=analyzer
2.拷贝映射
./elasticdump --input=http://[来源IP地址]:9200/[来源索引] --output=http://[目标IP地址]:9200/[目标索引] --type=mapping
3.拷贝数据
./elasticdump --input=http://[来源IP地址]:9200/[来源索引] --output=http://[目标IP地址]:9200/[目标索引] --type=data

恢复顺序是：1、先分词；2、索引、3、数据

注：如查不迁移分词，替换为空值
例：替换分词
sed -i 's/,"analyzer":"ik_analyzer"//g' geometry_955_mapping.json

elasticdump \
--input=http://192.168.1.140:9200/source_index \
--output=http://192.168.1.141:9200/target_index \
--type=data \
--limit=2000 # 每次操作的objects数量，默认100，数据量大的话，可以调大加快迁移速度
--input=http://172.30.40.125:9200/geometry_833 \
--output=http://47.105.38.107:9200/geometry_833

导出索引结构
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=/data/my_index_mapping.json \
--type=mapping

导出索引数据
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=/data/my_index.json \
--type=data

elasticdump --input=http://172.30.40.125:9200/geometry_833 --output=/mysqldata/es20240516/geometry_833_mapping.json --type=mapping
elasticdump --input=http://172.30.40.125:9200/geometry_833 --output=/mysqldata/es20240516/geometry_833.json --type=data

导入索引和数据

elasticdump \
--input=/data/source_index_mapping.json \
--output=http://192.168.1.141:9200/source_index \
--type=mapping
elasticdump \
--input=/data/source_index.json \
--output=http://192.168.1.141:9200/source_index \
--type=data \
--limit=2000

elasticdump 所有索引
elasticdump --input=./indices.json --output=http://localhost:9201 --all=true

elasticdump 所有数据

elasticdump --input=http://localhost:9200/ --output=all_data.json --all=true
这里的参数解释如下：
--input：指定 Elasticsearch 实例的地址。
--output：指定导出的文件名。
--all=true：指示 elasticdump 导出所有的数据。

elasticdump --input=/data/source_index_mapping.json --output=http://192.168.1.141:9200/source_index --type=mapping
elasticdump --input=/data/source_index.json --output=http://192.168.1.141:9200/source_index --type=data

elasticdump --input==/mysqldata/es20240516/geometry_833_mapping.json --output=http://47.105.38.107:9200/geometry_833 --type=mapping
elasticdump --input==/mysqldata/es20240516/geometry_833.json --output=http://47.105.38.107:9200/geometry_833 --type=data

#es如果有密码，执行以下语句
elasticdump \
--input=http://username:passowrd@production.es.com:9200/my_index \
--output=http://username:password@staging.es.com:9200/my_index \
--type=data

替换分词
sed -i 's/,"analyzer":"ik_analyzer"//g' geometry_955_mapping.json
sed -i 's/,"analyzer":"ik_analyzer"//g' 1.json

大量数据迁移

#ESM迁移工具【适合大数据量】

解析：esm迁移过程原理与elasticsearch-dump类似，区别在于esm使用go语言开发，号称每分钟可以迁移一千万条数据

下载地址：wget https://github.com/medcl/esm/releases/download/v0.7.0/esm-linux-amd64

1、数据迁移

授权：chmod +x esm-linux-amd64

./esm-linux-amd64 -s http://192.168.1.x.:9200 -m 账户:密码 -x "索引名称" -d http://192.168.2.x:9200 -n "账户:密码" -c 10000 -w10 -b=20 -f --refresh

解析：

-s表示读取数据源SOURCE
-d表示将数据源传输到目的地DESTINATION。
-x表示需要复制的索引名称
-q表示指定条件的查询语句
-n表示base认证的用户名和密码
-w表示并发数，默认为1
-b表示buck大小，默认5MB
-f表示复制前删除已有重名索引
–refresh表示完成后再刷新索引

4、下载安装elasticsearch-migration
源码：https://github.com/medcl/esm-abandoned
编译好的工具：https://github.com/medcl/esm-abandoned/releases

下载编译好的工具放到/usr/local/hadoop目录下，解压后可以直接运行，elasticsearch-migration支持linux，windows等不同系统。使用示例

./bin/linux64/esm -s http://10.62.124.x:9200 -d http://10.67.151.y:9200 -x index_name -w=5 -b=10 -c 10000
1
-w 表示线程数
-b 表示一次bulk请求数据大小，单位MB默认 5M
-c 一次scroll请求数量

5、数据迁移
因为需要迁移的索引比较多，大概有几百个，为了提高效率，所以写了一个批量索引迁移脚本：

#!/bin/sh

dir="/usr/local/hadoop"
cd $dir

esIndex=`curl -s 'http://10.62.124.x:9200/_cat/indices' | grep -e mobile_lte_* | awk '{print $3}'`

for indexName in $esIndex

echo "Start migration $indexName"
./bin/linux64/esm -s http://10.62.124.x:9200 -d http://10.67.151.y:9200 -x $indexName -y $indexName -w=5 -b=10 -c 10000 --copy_settings --copy_mappings --force --refresh

done

将该脚本放到/usr/local/hadoop目录下，运行即可。

授权：chmod +x esm-linux-amd64

./esm-linux-amd64 -s http://192.168.1.x.:9200 -m 账户:密码 -x "索引名称" -d http://192.168.2.x:9200 -n "账户:密码" -c 10000 -w10 -b=20 -f --refresh

解析：
-s表示读取数据源SOURCE
-d表示将数据源传输到目的地DESTINATION。
-x表示需要复制的索引名称
-q表示指定条件的查询语句
-n表示base认证的用户名和密码
-w表示并发数，默认为1
-b表示buck大小，默认5MB
-f表示复制前删除已有重名索引
–refresh表示完成后再刷新索引

2、使用示例

copy index index_name from 192.168.1.x to 192.168.1.y:9200

./bin/esm  -s http://192.168.1.x:9200   -d http://192.168.1.y:9200 -x index_name  -w=5 -b=10 -c 10000

copy index src_index from 192.168.1.x to 192.168.1.y:9200 and save with dest_index

./bin/esm -s http://localhost:9200 -d http://localhost:9200 -x src_index -y dest_index -w=5 -b=100

support Basic-Auth

./bin/esm -s http://localhost:9200 -x "src_index" -y "dest_index"  -d http://localhost:9201 -n admin:111111

copy settings and override shard size

./bin/esm -s http://localhost:9200 -x "src_index" -y "dest_index" -d http://localhost:9201 -m admin:111111 -c 10000 --shards=50 --copy_settings

copy settings and mapping, recreate target index, add query to source fetch, refresh after migration

./bin/esm -s http://localhost:9200 -x "src_index" -q=query:phone -y "dest_index" -d http://localhost:9201 -c 10000 --shards=5 --copy_settings --copy_mappings --force --refresh

dump elasticsearch documents into local file

./bin/esm -s http://localhost:9200 -x "src_index"  -m admin:111111 -c 5000 -q=query:mixer  --refresh -o=dump.bin

loading data from dump files, bulk insert to another es instance

./bin/esm -d http://localhost:9200 -y "dest_index"   -n admin:111111 -c 5000 -b 5 --refresh -i=dump.bin

support proxy

 ./bin/esm -d http://123345.ap-northeast-1.aws.found.io:9200 -y "dest_index"   -n admin:111111  -c 5000 -b 1 --refresh  -i dump.bin  --dest_proxy=http://127.0.0.1:9743

use sliced scroll(only available in elasticsearch v5) to speed scroll, and update shard number

 ./bin/esm -s=http://192.168.3.206:9200 -d=http://localhost:9200 -n=elastic:changeme -f --copy_settings --copy_mappings -x=bestbuykaggle  --sliced_scroll_size=5 --shards=50 --refresh

migrate 5.x to 6.x and unify all the types to doc

./esm -s http://source_es:9200 -x "source_index*" -u "doc" -w 10 -b 10 - -t "10m" -d https://target_es:9200 -m elastic:passwd -n elastic:passwd -c 5000

to migrate version 7.x and you may need to rename _type to _doc

./esm -s http://localhost:9201 -x "source" -y "target" -d https://localhost:9200 --rename="_type:type,age:myage" -u"_doc"

filter migration with range query

./esm -s https://192.168.3.98:9200 -m elastic:password -o json.out -x kibana_sample_data_ecommerce -q "order_date:[2020-02-01T21:59:02+00:00 TO 2020-03-01T21:59:02+00:00]"

range query, keyword type and escape

./esm -s https://192.168.3.98:9200 -m test:123 -o 1.txt -x test1  -q "@timestamp.keyword:[\"2021-01-17 03:41:20\" TO \"2021-03-17 03:41:20\"]"

generate testing data, if input.json contains 10 documents, the follow command will ingest 100 documents, good for testing

./bin/esm -i input.json -d  http://localhost:9201 -y target-index1  --regenerate_id  --repeat_times=10

select source fields

 ./bin/esm -s http://localhost:9201 -x my_index -o dump.json --fields=author,title

rename fields while do bulk indexing

./bin/esm -i dump.json -d  http://localhost:9201 -y target-index41  --rename=title:newtitle

user buffer_count to control memory used by ESM， and use gzip to compress network traffic

./esm -s https://localhost:8000 -d https://localhost:8000 -x logs1kw -y logs122 -m elastic:medcl123 -n elastic:medcl123 --regenerate_id -w 20 --sliced_scroll_size=60 -b 5 --buffer_count=1000000 --compress false

Download

Releases · medcl/esm · GitHub

Compile:

if download version is not fill you environment,you may try to compile it yourself. go required.

make build

go version >= 1.7

Options

Usage:
esm [OPTIONS]
Application Options:
-s, --source= source elasticsearch instance, ie: http://localhost:9200
-q, --query= query against source elasticsearch instance, filter data before migrate, ie: name:medcl
-d, --dest= destination elasticsearch instance, ie: http://localhost:9201
-m, --source_auth= basic auth of source elasticsearch instance, ie: user:pass
-n, --dest_auth= basic auth of target elasticsearch instance, ie: user:pass
-c, --count= number of documents at a time: ie "size" in the scroll request (10000)
--buffer_count= number of buffered documents in memory (100000)
-w, --workers= concurrency number for bulk workers (1)
-b, --bulk_size= bulk size in MB (5)
-t, --time= scroll time (1m)
--sliced_scroll_size= size of sliced scroll, to make it work, the size should be > 1 (1)
-f, --force delete destination index before copying
-a, --all copy indexes starting with . and _
--copy_settings copy index settings from source
--copy_mappings copy index mappings from source
--shards= set a number of shards on newly created indexes
-x, --src_indexes= indexes name to copy,support regex and comma separated list (_all)
-y, --dest_index= indexes name to save, allow only one indexname, original indexname will be used if not specified
-u, --type_override= override type name
--green wait for both hosts cluster status to be green before dump. otherwise yellow is okay
-v, --log= setting log level,options:trace,debug,info,warn,error (INFO)
-o, --output_file= output documents of source index into local file
-i, --input_file= indexing from local dump file
--input_file_type= the data type of input file, options: dump, json_line, json_array, log_line (dump)
--source_proxy= set proxy to source http connections, ie: http://127.0.0.1:8080
--dest_proxy= set proxy to target http connections, ie: http://127.0.0.1:8080
--refresh refresh after migration finished
--fields= filter source fields, comma separated, ie: col1,col2,col3,...
--rename= rename source fields, comma separated, ie: _type:type, name:myname
-l, --logstash_endpoint= target logstash tcp endpoint, ie: 127.0.0.1:5055
--secured_logstash_endpoint target logstash tcp endpoint was secured by TLS
--repeat_times= repeat the data from source N times to dest output, use align with parameter regenerate_id to amplify the data size
-r, --regenerate_id regenerate id for documents, this will override the exist document id in data source
--compress use gzip to compress traffic
-p, --sleep= sleep N seconds after finished a bulk request (-1)
Help Options:
-h, --help Show this help message