【Elasticsearch03】企业级日志分析系统ELK之Elasticsearch访问与优化

Elasticsearch 访问

Shell 命令

查看 ES 集群状态

访问 ES

#查看支持的指令
curl http://127.0.0.1:9200/_cat
#查看es集群状态  集群存活少于半数，无法执行
curl http://127.0.0.1:9200/_cat/health
url 'http://127.0.0.1:9200/_cat/health?v'
#查看集群分健康性,获取到的是一个json格式的返回值，那就可以通过python等工具对其中的信息进行分析
#注意：status 字段为green才是正常状态
curl  http://127.0.0.1:9200/_cluster/health?pretty=true
#查看所有的节点信息
curl 'http://127.0.0.1:9200/_cat/nodes?v'
#列出所有的索引 以及每个索引的相关信息
curl 'http://127.0.0.1:9200/_cat/indices?v'

范例：

[root@es-node3 ~]#curl http://127.0.0.1:9200/_cat/master
Jt8JAK8PQP-WfMufUdDphg 10.0.0.181 10.0.0.181 node-1
[root@es-node1 ~]#curl 'http://127.0.0.1:9200/_cat/nodes'
10.0.0.183 58 96 5 1.30 1.10 0.69 cdfhilmrstw - node-3
10.0.0.181 25 91 7 1.21 1.20 0.81 cdfhilmrstw * node-1
10.0.0.182 22 90 0 0.00 0.04 0.07 cdfhilmrstw - node-2
[root@es-node1 ~]#curl 'http://127.0.0.1:9200/_cat/nodes?v'
ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role   master name
10.0.0.183           59          77  59    1.25    1.10     0.70 cdfhilmrstw -      node-3
10.0.0.181           26          88  54    1.18    1.19     0.81 cdfhilmrstw *      node-1
10.0.0.182           22          90   0    0.00    0.03     0.06 cdfhilmrstw -      node-2#查看es集群状态
[root@es-node1 ~]#curl 'http://127.0.0.1:9200/_cat/health'
[root@es-node1 ~]#curl -sXGET http://127.0.0.1:9200/_cluster/health?pretty=true
{"cluster_name" : "my-application","status" : "green","timed_out" : false,"number_of_nodes" : 3,"number_of_data_nodes" : 3,"active_primary_shards" : 0,"active_shards" : 0,"relocating_shards" : 0,"initializing_shards" : 0,"unassigned_shards" : 0,"delayed_unassigned_shards" : 0,"number_of_pending_tasks" : 0,"number_of_in_flight_fetch" : 0,"task_max_waiting_in_queue_millis" : 0,"active_shards_percent_as_number" : 100.0
}
#列出所有的索引 以及每个索引的相关信息
[root@es-node1 ~]#curl '127.0.0.1:9200/_cat/indices?v'

创建和查看索引

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-createindex.html

范例:

#elasticsearch 7.X 默认创建1个分片1个副本,之前版本默认5分版1副本#创建索引index1,简单输出
[root@es-node1 ~]#curl -XPUT '127.0.0.1:9200/index1'
{"acknowledged":true,"shards_acknowledged":true,"index":"index1"}
#创建索引index2,格式化输出
curl -XPUT '127.0.0.1:9200/index2?pretty'
{"acknowledged" : true,"shards_acknowledged" : true,"index" : "index2"
}#查看所有索引
[root@es-node1 ~]#curl 'http://127.0.0.1:9200/_cat/indices?v'
health status index  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   index1 AFthPFyMTySooXOcGbrbFw   1   1          0            0       450b           225b
green  open   index2 WCuS-wlfS8WHyNo8UBgVNw   1   1          0            0       450b           225b
[root@es-node1 ~]#curl  '127.0.0.1:9200/index2?pretty'
{"index2" : {"aliases" : { },"mappings" : { },"settings" : {"index" : {"routing" : {"allocation" : {"include" : {"_tier_preference" : "data_content"}}},"number_of_shards" : "1","provided_name" : "index2","creation_date" : "1734361963326","number_of_replicas" : "1","uuid" : "WCuS-wlfS8WHyNo8UBgVNw","version" : {"created" : "8080299"}}}}
}#创建3个分片和2个副本的索引
curl -XPUT '127.0.0.1:9200/index3' -H 'Content-Type: application/json' -d '
{"settings": {"index": {"number_of_shards": 3,  
"number_of_replicas": 2 }}
}'
{"acknowledged":true,"shards_acknowledged":true,"index":"index3"}
#调整副本数为1,但不能调整分片数
curl -XPUT '127.0.0.1:9200/index3/_settings' -H 'Content-Type: application/json' -d '
{"settings": { "number_of_replicas": 1 }
}'
{"acknowledged":true}#早期版本，如es1.X,2.X可以直在下面数据目录下直接看到index的名称，5.X版本后只会显示下面信息
#说明：TPUrqHNdRT2lCNVPMPF2eg表示索引ID
#/var/lib/elasticsearch/indices/索引ID/分片ID
[root@es-node1 ~]#ll /var/lib/elasticsearch/indices/
total 20
drwxr-sr-x 5 elasticsearch elasticsearch 4096 Dec 16 23:14 ./
drwxr-s--- 5 elasticsearch elasticsearch 4096 Dec 16 23:17 ../
drwxr-sr-x 4 elasticsearch elasticsearch 4096 Dec 16 23:10 AFthPFyMTySooXOcGbrbFw/
drwxr-sr-x 5 elasticsearch elasticsearch 4096 Dec 16 23:16 Q-vb41mVSIeuGp1Nq-O2xA/
drwxr-sr-x 4 elasticsearch elasticsearch 4096 Dec 16 23:12 WCuS-wlfS8WHyNo8UBgVNw/
[root@es-node1 ~]#tree /var/lib/elasticsearch/indices/* -L 1
/var/lib/elasticsearch/indices/AFthPFyMTySooXOcGbrbFw
├── 0
└── _state
/var/lib/elasticsearch/indices/Q-vb41mVSIeuGp1Nq-O2xA
├── 1
├── 2
└── _state
/var/lib/elasticsearch/indices/WCuS-wlfS8WHyNo8UBgVNw
├── 0
└── _state

插入文档

#创建文档时不指定_id，会自动生成
#8.X版本后因为删除了type,所以索引操作：{index}/{type}/需要修改成PUT {index}/_doc/
#index1是索引数据库,book是type
#8.X版本之后
[root@es-node1 ~]#curl -XPOST http://127.0.0.1:9200/index1/_doc/ -H 'Content-Type: application/json' -d '{"name":"linux", "author": "wangxiaochun", "version": "1.0"}' {"_index":"index1","_id":"OLAP0JMBTTXdWK2BDiOF","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":0,"_primary_term":1}[root@node1 ~]#curl -XPOST 'http://127.0.0.1:9200/index1/_doc?pretty' -H 'Content-Type:application/json' -d '{"name":"python", "author": "xuwei", "version": "1.0"}' 
{"_index" : "index1","_type" : "book","_id" : "i8Q5TXsB1gLtFVg7vodl","_version" : 1,"result" : "created","_shards" : {"total" : 2,"successful" : 2,"failed" : 0},"_seq_no" : 1,"_primary_term" : 1
}#7.X之前
[root@node1 ~]#curl -XPOST http://127.0.0.1:9200/index1/book/ -H 'Content-Type: application/json' -d '{"name":"linux", "author": "wangxiaochun", "version": "1.0"}'{"_index":"index1","_type":"book","_id":"isQ3TXsB1gLtFVg76of5","_version":1,"result":"created","_shards": {"total":2,"successful":2,"failed":0},"_seq_no":0,"_primary_term":1}#创建文档时指定_id为3
#index1是索引数据库,book是type,3是document记录
[root@node1 ~]#curl -XPOST 'http://127.0.0.1:9200/index1/_doc/3?pretty' -H 'Content-Type:application/json' -d '{"name":"golang", "author": "zhang", "version": "1.0"}' 
[root@node1 ~]#curl -XPOST 'http://127.0.0.1:9200/index1/book/3?pretty' -H 'Content-Type:application/json' -d '{"name":"golang", "author": "zhang", "version": "1.0"}' 
{"_index" : "index1",
"_type" : "book","_id" : "3","_version" : 1,"result" : "created","_shards" : {"total" : 2,"successful" : 2,"failed" : 0},"_seq_no" : 2,"_primary_term" : 1
}

查询文档

范例:

#查询索引的中所有文档
[root@node1 ~]#curl  'http://127.0.0.1:9200/index1/_search?pretty'#指定ID查询
#curl -XGET ‘http://127.0.0.1:9200/{index}/{type}/{id}’
#8.X版: curl -XGET ‘http://127.0.0.1:9200/{index}/_/{id}’
#新版本
[root@node1 ~]#curl  'http://127.0.0.1:9200/index1/_doc/3?pretty'
#旧版本
[root@node1 ~]#curl  'http://127.0.0.1:9200/index1/book/3?pretty'#按条件进行查询,两种方式
#新版
[root@node1 ~]#curl  'http://127.0.0.1:9200/index1/_search?q=name:linux&pretty' 
[root@node1 ~]#curl -s  http://127.0.0.1:9200/index1/_search?pretty -H 'ContentType: application/json' -d '{"query":{"term":{"name":"linux"}}}'#旧版
[root@node1 ~]#curl  'http://127.0.0.1:9200/index1/book/_search?q=name:linux&pretty'

更新文档

#8.X
[root@node1 ~]#curl -XPOST 'http://127.0.0.1:9200/index1/_doc/3' -H 'ContentType: application/json' -d '{"version": "2.0","name":"golang","author": "zhang"}' #7.X之前
[root@node1 ~]#curl -XPOST 'http://127.0.0.1:9200/index1/book/3' -H 'ContentType: application/json' -d '{"version": "2.0","name":"golang","author": "zhang"}' {"_index":"index1","_type":"book","_id":"3","_version":3,"result":"updated","_shards"{"total":2,"successful":2,"failed":0},"_seq_no":5,"_primary_term":1}
[root@node1 ~]#curl  'http://127.0.0.1:9200/index1/_doc/3?pretty

删除文档

#8.X版本
curl -XDELETE http://kibana服务器:9200/<索引名称>/_doc/<文档id>
[root@es-node1 ~]#curl -XDELETE 'http://127.0.0.1:9200/index1/_doc/<文档id>
#7.X版本前
curl -XDELETE http://kibana服务器:9200/<索引名称>/type/<文档id>

范例: 删除指定文档

#8.X
[root@node1 ~]#curl -XDELETE 'http://127.0.0.1:9200/index1/_doc/3'{"_index":"index1","_id":"3","_version":2,"result":"deleted","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":4,"_primary_term":1}
#确认已删除
[root@node1 ~]#curl  'http://127.0.0.1:9200/index1/_doc/3?pretty'
{"_index" : "index1","_id" : "3","found" : false
}#7.X版本前
[root@node1 ~]#curl -XDELETE 
'http://127.0.0.1:9200/index1/book/i8Q5TXsB1gLtFVg7vodl'{"_index":"index1","_type":"book","_id":"i8Q5TXsB1gLtFVg7vodl","_version":2,"result":"deleted","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":3,"_primary_term":1

删除索引

[root@es-node1 ~]#curl -XDELETE http://127.0.0.1:9200/index2
{"acknowledged":true}
#查看索引是否删除
[root@es-node1 ~]#curl 'http://127.0.0.1:9200/_cat/indices?pretty'
#删除多个指定索引
curl -XDELETE 'http://127.0.0.1:9200/index_one,index_two
#删除通配符多个索引,需要设置action.destructive_requires_name: false
curl -XDELETE 'http://127.0.0.1:9200/index_*'

范例: 删除所有索引

#以下需要设置action.destructive_requires_name: false
[root@es-node1 ~]#curl -X DELETE "http://127.0.0.1:9200/*"
[root@es-node1 ~]#curl -X DELETE "127.0.0.1:9200/_all"
#以下无需配置
[root@es-node1 ~]#for i in `curl 'http://127.0.0.1:9200/_cat/indices?v'|awk '{print $3}'`;do curl -XDELETE http://127.0.0.1:9200/$i;done

Python 脚本：集群健康性检查

[root@es-node1 ~]# apt -y install python3
[root@es-node1 ~]# cat els-cluster-monitor.py 
#!/usr/bin/python3
#coding:utf-8
from email.mime.text import MIMEText
from email.utils import formataddr
import smtplib
import subprocess
import json
body = ""
false="false"
#用另外一个进程运行curl返回结果从stdout中读取
obj = subprocess.Popen(("curl -sXGET http://10.0.0.101:9200/_cluster/health?pretty=true"),shell=True, stdout=subprocess.PIPE) # 
data = obj.stdout.read()
#print(type(data)) # 应该是字符串类型或bytes类型
#print(data) # 确认返回的json形式的
es_dict = json.loads(data) if data else {}  # 把json字符串解析为字典
status = es_dict.get("status") # 通过字典查找status
if status == "green":print("OK")
else:print("Not OK")
[root@es-node1 ~]# chmod +x els-cluster-monitor.py
[root@es-node1 ~]#./els-cluster-monitor.py
OK
#将第二个节点停止服务
[root@es-node2 ~]# systemctl stop elasticsearch
#再次查看状态
[root@es-node1 ~]# python els-cluster-monitor.pyNot OK

优化 ELK 资源配置

开启 bootstrap.memory_lock 优化

开启 bootstrap.memory_lock: true 可以优化性能，但会导致无法启动的错误解决方法

注意：开启bootstrap.memory_lock: true 需要足够的内存，建议4G以上，否则内存不足，启动会很慢

官方文档:

https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration-memory.html#bootstrap-memory_lock
https://www.elastic.co/guide/en/elasticsearch/reference/current/setting-system-settings.html#systemd
[root@node1 ~]#vim /etc/elasticsearch/elasticsearch.yml 
#开启此功能导致无法启动
bootstrap.memory_lock: true
[root@node1 ~]#systemctl restart elasticsearch.service 
Job for elasticsearch.service failed because the control process exited with error code.
See "systemctl status elasticsearch.service" and "journalctl -xe" for details.
[root@node1 ~]#tail /data/es-logs/es-cluster.log #方法1：直接修改elasticsearch.service 
[root@node1 ~]#vim /lib/systemd/system/elasticsearch.service 
[Service]
#加下面一行
LimitMEMLOCK=infinity
#方法2：新建文件
[root@node1 ~]#systemctl edit elasticsearch
### Anything between here and the comment below will become the new contents of the file
#加下面两行，注意加在中间位置
[Service]
LimitMEMLOCK=infinity
### Lines below this comment will be discarded
[root@node1 ~]#cat /etc/systemd/system/elasticsearch.service.d/override.conf 
[Service]
LimitMEMLOCK=infinity
[root@node1 ~]#systemctl daemon-reload 
[root@node1 ~]#systemctl restart elasticsearch.service 
[root@node1 ~]#systemctl is-active elasticsearch.service 
active
#测试访问是否成功
[root@node1 ~]#curl http://node1.wang.com:9200
[root@node1 ~]#curl http://node2.wang.com:9200
[root@node1 ~]#curl http://node3.wang.com:9200

内存优化

官方文档

https://www.elastic.co/guide/en/elasticsearch/reference/current/importantsettings.html#heap-size-settings

推荐使用宿主机物理内存的一半，ES的heap内存最大不超过30G,26G是比较安全的

官方文档:

堆大小应基于可用 RAM：
将 Xms 和 Xmx 设置为不超过总内存的 50%。 Elasticsearch 需要内存用于 JVM 堆以外的用途。 例
如，Elasticsearch 使用堆外缓冲区来实现高效的网络通信，并依靠操作系统的文件系统缓存来高效地访问
文件。 JVM 本身也需要一些内存。 Elasticsearch 使用比 Xmx 设置配置的限制更多的内存是正常的。
在容器（例如 Docker）中运行时，总内存定义为容器可见的内存量，而不是主机上的总系统内存。
将 Xms 和 Xmx 设置为不超过压缩普通对象指针 (oops) 的阈值。 确切的阈值会有所不同，但在大多数系
统上 26GB 是安全的，在某些系统上可能高达 30GB。 要验证您是否低于阈值，请检查 Elasticsearch 
日志中的条目，如下所示：

关于OOPS的说明

Java 堆中的托管指针指向在 8 字节地址边界上对齐的对象。 压缩 oop 将托管指针（在 JVM 软件中的许
多但不是所有地方）表示为相对于 64 位 Java 堆基地址的 32 位对象偏移量。
因为它们是对象偏移量而不是字节偏移量，所以它们可用于处理多达 40 亿个对象（不是字节），或高达约 
32 GB 的堆大小。
要使用它们，必须将它们缩放 8 倍并添加到 Java 堆基地址以找到它们所引用的对象。 使用压缩 oop 的
对象大小与 ILP32 模式中的对象大小相当。

关于 Heap 内存大小

虽然JVM可以处理大量的堆内存，但是将堆内存设置为过大的值可能导致以下问题：
堆内存分配的效率低。Java语言本身就是一种高级语言，这意味着需要更多的堆内存来存储对象。但是，当堆
内存过大时，分配对象所需的时间也会相应增加，这可能会导致应用程序出现性能问题。
操作系统内存管理的限制。操作系统必须以页为单位进行内存管理。如果Java堆内存过大，则需要更多的页来
管理堆内存。这可能会导致操作系统出现性能问题。
垃圾回收(Garbage Collection, GC)：JVM内存的一部分被用于存储对象，这些对象随着时间的推移可能
不再需要。这些不再需要的对象被视为“垃圾”，需要由垃圾收集器清除，以释放内存空间。然而，执行GC会暂
停所有的应用线程，这被称为 "Stop-the-World"（暂停世界）。这种暂停可能会影响应用的性能和响应时
间。一般来说，如果堆内存非常庞大，GC需要检查和清理的对象数量会变得非常庞大，这会导致GC操作的时间
变得非常漫长。
对象指针的大小：在某些JVM实现（例如Oracle的HotSpot），在堆（Heap）大小超过32GB之后，对象指针
的表示将从32位压缩oops（Ordinary Object Pointers）转变为64位非压缩指针，这导致了内存使用的
增加。如果内存设置接近或略超过32GB，实际上可能会因为此原因造成更多的内存消耗。因此，通常在32GB
以下时，我们会使用32位压缩指针，而超过这个阈值时，除非有明确的需要，否则通常会选择保持在30GB左右
以避免转为64位指针。
因此，建议将Java堆内存设置为合适的大小，以便在GC操作的同时与应用程序的性能之间进行平衡。通常情况
下，堆内存应该设置为操作系统的物理内存的一半或三分之一。虽然这个数字可能会因系统配置和工作负载而
有所变化，但是在32G的机器上，32G的堆空间已经超出了大部分Java应用程序的需求，因此更大的堆内存并
不是必要的。
当然，根据具体的应用场景和需求，以及你使用的具体的JVM版本和垃圾收集器类型，这个30GB的规则并非绝
对。比如ZGC和Shenandoah这类的低延迟垃圾回收器就可以处理大于30GB的堆内存，同时还能保持低停顿时
间。

内存优化建议:

为了保证性能，每个ES节点的JVM内存设置具体要根据 node 要存储的数据量来估算,建议符合下面约定

在内存和数据量有一个建议的比例：对于一般日志类文件，1G 内存能存储48G~96GB数据
JVM 堆内存最大不要超过30GB
单个分片控制在30-50GB，太大查询会比较慢，索引恢复和更新时间越长；分片太小，会导致索引碎片化越严重，性能也会下降

范例:

#假设总数据量为1TB，3个node节点，1个副本；那么实际要存储的大小为2TB
每个节点需要存储的数据量为:2TB / 3 = 700GB，每个节点还需要预留20%的空间，所以每个node要存储
大约 700*100/80=875GB 的数据；每个节点按照内存与存储数据的比率计算：875GB/48GB=18，即需要
JVM内存为18GB,小于30GB
因为要尽量控制分片的大小为30GB；875GB/30GB=30个分片,即最多每个节点有30个分片
#思考：假设总数据量为2TB，3个node节点，1个副本呢?

范例：指定heap内存最小和最大内存限制

#建议将heap内存设置为物理内存的一半且最小和最大设置一样大,但最大不能超过30G
[root@es-node1 ~]# vim /etc/elasticsearch/jvm.options 
-Xms30g
-Xmx30g 
#每天产生1TB左右的数据量的建议服务器配置，还需要定期清理磁盘
16C 64G 6T硬盘 共3台服务器