es 分词器(五)之elasticsearch-analysis-jieba 8.7.0

devtools/2024/10/18 16:52:52/

elasticsearchanalysisjieba_870_0">es 分词器(五)之elasticsearch-analysis-jieba 8.7.0

今天咱们就来讲一下es jieba 8.7.0 分词器的实现,以及8.x其它版本的实现方式,如果想直接使用es 结巴8.x版本,请直接修改pom文件的elasticsearch.version版本号即可,然后打包安装就行,不需要做太多的操作。

elasticsearchjiebaplugin_4">一、elasticsearch-jieba-plugin

最近更新的版本为8.4.1,最近更新的时间停留在2022年,从这之后便无人维护此开源项目
GitHub地址:​​https://github.com/sing1ee/elasticsearch-jieba-plugin/tree/8.4.1​​

elasticsearchanalysisjieba_7">二、elasticsearch-analysis-jieba

最近更新的版本为6.8.17,比上面的插件更惨,已经有三年无人维护了。
Github地址:​​https://github.com/huaban/elasticsearch-analysis-jieba/tree/dependabot/maven/org.elasticsearch-elasticsearch-6.8.17​​

elasticsearchjiebaplugin_10">三、决定换壳elasticsearch-jieba-plugin

当前我开发的项目采用的版本为8.7.0,目前在网上无法找到与之匹配的版本。
ik分词器用户比jieba分词器用户多,因为会对应的es版本不断更新,目前ik分词器的版本已经更新至8.12.2,2024年5月14日位置es的最新版本为8.14.x
2024年5月14日es最新版本为8.14.x

elasticsearchanalysisjieba_16">四、编译elasticsearch-analysis-jieba分词器

由于原有的插件【elasticsearch-analysis-jieba】已经很久没有人使用,但我又感觉【elasticsearch-analysis-jieba】这个名称比【elasticsearch-jieba-plugin】【https://github.com/sing1ee/elasticsearch-jieba-plugin/tree/8.4.1】这个好听一点,所以我本地新开了一个【elasticsearch-analysis-jieba】项目,将这个【elasticsearch-jieba-plugin】这个项目的代码复制到新建的项目中,因为这个【elasticsearch-jieba-plugin】使用的是gradle管理,我想使用的是maven仓库,所以修改了一下。

image-20240515221232488

4.1 新增pom.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><name>elasticsearch-analysis-jieba</name><modelVersion>4.0.0</modelVersion><groupId>org.elasticsearch</groupId><artifactId>elasticsearch-analysis-jieba</artifactId><version>${elasticsearch.version}</version><packaging>jar</packaging><description>jieba Analyzer for Elasticsearch</description><inceptionYear>2011</inceptionYear><properties><elasticsearch.version>8.7.0</elasticsearch.version><maven.compiler.target>17</maven.compiler.target><elasticsearch.assembly.descriptor>${project.basedir}/src/main/assemblies/plugin.xml</elasticsearch.assembly.descriptor><elasticsearch.plugin.name>analysis-jieba</elasticsearch.plugin.name><elasticsearch.plugin.classname>org.elasticsearch.plugin.analysis.jieba.AnalysisJiebaPlugin</elasticsearch.plugin.classname><elasticsearch.plugin.jvm>true</elasticsearch.plugin.jvm><tests.rest.load_packaged>false</tests.rest.load_packaged><skip.unit.tests>true</skip.unit.tests></properties><licenses><license><name>The Apache Software License, Version 2.0</name><url>http://www.apache.org/licenses/LICENSE-2.0.txt</url><distribution>repo</distribution></license></licenses><developers><developer><name>INFINI Labs</name><email>hello@infini.ltd</email><organization>INFINI Labs</organization><organizationUrl>https://infinilabs.com</organizationUrl></developer></developers><parent><groupId>org.sonatype.oss</groupId><artifactId>oss-parent</artifactId><version>9</version></parent><distributionManagement><snapshotRepository><id>oss.sonatype.org</id><url>https://oss.sonatype.org/content/repositories/snapshots</url></snapshotRepository><repository><id>oss.sonatype.org</id><url>https://oss.sonatype.org/service/local/staging/deploy/maven2/</url></repository></distributionManagement><repositories><repository><id>oss.sonatype.org</id><name>OSS Sonatype</name><releases><enabled>true</enabled></releases><snapshots><enabled>true</enabled></snapshots><url>https://oss.sonatype.org/content/repositories/releases/</url></repository></repositories><dependencies><dependency><groupId>org.elasticsearch</groupId><artifactId>elasticsearch</artifactId><version>${elasticsearch.version}</version><scope>compile</scope></dependency><dependency><groupId>com.huaban</groupId><artifactId>jieba-analysis</artifactId><version>1.0.2</version></dependency><dependency><groupId>org.apache.logging.log4j</groupId><artifactId>log4j-api</artifactId><version>2.19.0</version></dependency><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.13.2</version><scope>test</scope></dependency></dependencies><build><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-compiler-plugin</artifactId><version>3.5.1</version><configuration><source>${maven.compiler.target}</source><target>${maven.compiler.target}</target></configuration></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-surefire-plugin</artifactId><version>2.11</version><configuration><includes><include>**/*Tests.java</include></includes></configuration></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-source-plugin</artifactId><version>2.1.2</version><executions><execution><id>attach-sources</id><goals><goal>jar</goal></goals></execution></executions></plugin><plugin><artifactId>maven-assembly-plugin</artifactId><configuration><appendAssemblyId>false</appendAssemblyId><outputDirectory>${project.build.directory}/releases/</outputDirectory><descriptors><descriptor>${basedir}/src/main/assemblies/plugin.xml</descriptor></descriptors><archive><manifest><mainClass>fully.qualified.MainClass</mainClass></manifest></archive></configuration><executions><execution><phase>package</phase><goals><goal>single</goal></goals></execution></executions></plugin></plugins></build><profiles><profile><id>disable-java8-doclint</id><activation><jdk>[1.8,)</jdk></activation><properties><additionalparam>-Xdoclint:none</additionalparam></properties></profile><profile><id>release</id><build><plugins><plugin><groupId>org.sonatype.plugins</groupId><artifactId>nexus-staging-maven-plugin</artifactId><version>1.6.3</version><extensions>true</extensions><configuration><serverId>oss</serverId><nexusUrl>https://oss.sonatype.org/</nexusUrl><autoReleaseAfterClose>true</autoReleaseAfterClose></configuration></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-release-plugin</artifactId><version>2.1</version><configuration><autoVersionSubmodules>true</autoVersionSubmodules><useReleaseProfile>false</useReleaseProfile><releaseProfiles>release</releaseProfiles><goals>deploy</goals></configuration></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-compiler-plugin</artifactId><version>3.5.1</version><configuration><source>${maven.compiler.target}</source><target>${maven.compiler.target}</target></configuration></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-gpg-plugin</artifactId><version>1.5</version><executions><execution><id>sign-artifacts</id><phase>verify</phase><goals><goal>sign</goal></goals></execution></executions></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-source-plugin</artifactId><version>2.2.1</version><executions><execution><id>attach-sources</id><goals><goal>jar-no-fork</goal></goals></execution></executions></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-javadoc-plugin</artifactId><version>2.9</version><executions><execution><id>attach-javadocs</id><goals><goal>jar</goal></goals></execution></executions></plugin></plugins></build></profile></profiles>
</project>

4.2 修改plugin-descriptor.properties文件

# Elasticsearch plugin descriptor file
# This file must exist as 'plugin-descriptor.properties' at
# the root directory of all plugins.
#
# A plugin can be 'site', 'jvm', or both.
#
### example site plugin for "foo":
#
# foo.zip <-- zip file for the plugin, with this structure:
#   _site/ <-- the contents that will be served
#   plugin-descriptor.properties <-- example contents below:
#
# site=true
# description=My cool plugin
# version=1.0
#
### example jvm plugin for "foo"
#
# foo.zip <-- zip file for the plugin, with this structure:
#   <arbitrary name1>.jar <-- classes, resources, dependencies
#   <arbitrary nameN>.jar <-- any number of jars
#   plugin-descriptor.properties <-- example contents below:
#
# jvm=true
# classname=foo.bar.BazPlugin
# description=My cool plugin
# version=2.0.0-rc1
# elasticsearch.version=2.0
# java.version=1.7
#
### mandatory elements for all plugins:
#
# 'description': simple summary of the plugin
description=${project.description}
#
# 'version': plugin's version
version=${project.version}
#
# 'name': the plugin name
name=${elasticsearch.plugin.name}
#
# 'classname': the name of the class to load, fully-qualified.
classname=${elasticsearch.plugin.classname}
#
# 'java.version' version of java the code is built against
# use the system property java.specification.version
# version string must be a sequence of nonnegative decimal integers
# separated by "."'s and may have leading zeros
java.version=${maven.compiler.target}
#
# 'elasticsearch.version' version of elasticsearch compiled against
# You will have to release a new version of the plugin for each new
# elasticsearch release. This version is checked when the plugin
# is loaded so Elasticsearch will refuse to start in the presence of
# plugins with the incorrect elasticsearch.version.
elasticsearch.version=${elasticsearch.version}

4.3 新增plugin-security.policy文件

grant {// needed because of the hot reload functionalitypermission java.net.SocketPermission "*", "connect,resolve";permission java.lang.RuntimePermission "setContextClassLoader";
};

4.4 构建插件

打包

image-20240515221635642

找到打包之后的zip包

image-20240515221711379

放到elasticsearch-8.7.0/plugin/analysis-jieba目录下。

image-20240515221917297

现在,再手动重启一下es就将elasticsearch-analysis-jieba分词器安装好啦。

五、测试jieba分词器

在kibana中创建索引

PUT jieba_index
{"settings": {"analysis": {"analyzer": {"my_ana": {"tokenizer": "jieba_index","filter": ["lowercase"]}}}}
}

文本分词器

PUT jieba_index/_analyze
{"analyzer" : "my_ana","text" : "黄河之水天上来"
}

返回结果

{"tokens": [{"token": "黄河","start_offset": 0,"end_offset": 2,"type": "word","position": 0},{"token": "黄河之水天上来","start_offset": 0,"end_offset": 7,"type": "word","position": 0},{"token": "之水","start_offset": 2,"end_offset": 4,"type": "word","position": 1},{"token": "天上","start_offset": 4,"end_offset": 6,"type": "word","position": 2},{"token": "上来","start_offset": 5,"end_offset": 7,"type": "word","position": 2}]
}

http://www.ppmy.cn/devtools/41974.html

相关文章

TypeScript学习日志-第二十四天(webpack构建ts+vue3)

webpack构建tsvue3 一、构建项目目录 如图&#xff1a; shim.d.ts 这个文件用于让ts识别.vue后缀的 后续会说 并且给 tsconfig.json 增加配置项 "include": ["src/**/*"] 二、基础构建 安装依赖 安装如下依赖&#xff1a; npm install webpack -D …

2024软件测试必问的常见面试题1000问!

01、您所熟悉的测试用例设计方法都有哪些&#xff1f;请分别以具体的例子来说明这些方法在测试用例设计工作中的应用。 答&#xff1a;有黑盒和白盒两种测试种类&#xff0c;黑盒有等价类划分法&#xff0c;边界分析法&#xff0c;因果图法和错误猜测法。白盒有逻辑覆盖法&…

CUDA is not availabe on this machine.

assert torch.cuda.is_available(), "CUDA is not availabe on this machine." AssertionError: CUDA is not availabe on this machine. 这个错误信息表明你尝试在PyTorch中使用CUDA&#xff08;也就是NVIDIA的GPU加速&#xff09;&#xff0c;但是你的机器上似乎没…

函数递归练习

目录 1.分析下面选择题 2.实现求第n个斐波那契数 3.编写一个函数实现n的k次方&#xff0c;使用递归实现。 4.写一个递归函数DigitSum(n)&#xff0c;输入一个非负整数&#xff0c;返回组成它的数字之和 5.递归方式实现打印一个整数的每一位 6.实现求n的阶乘 1.分析下面选择…

【git】发生冲突后回滚提交

gerrit 冲突&#xff0c; 无法合并到主干 那么先回滚 参考这里的 reset 操作&#xff1a; 回滚 到上一个提交 $ git reset --soft HEAD~1 # 數字表示移動到 HEAD後面第幾個刚提交的会撤回&#xff0c; stash 刚刚提交的 然后去pull 最新的 修改冲突&#xff1a; 最后再…

springboot实现文件防盗链设计

shigen坚持更新文章的博客写手&#xff0c;擅长Java、python、vue、shell等编程语言和各种应用程序、脚本的开发。记录成长&#xff0c;分享认知&#xff0c;留住感动。 个人IP&#xff1a;shigen &#x1f44b;&#x1f44b;&#x1f44b;hello&#xff0c;伙伴们好久不见&…

React: memo

React.memo 允许你的组件在 props 没有改变的情况下跳过重新渲染。 const MemoizedComponent memo(SomeComponent, arePropsEqual?)React 通常在其父组件重新渲染时重新渲染一个组件。你可以使用 memo 创建一个组件&#xff0c;当它的父组件重新渲染时&#xff0c;只要它的新…

算法提高之香甜的黄油

算法提高之香甜的黄油 核心思想&#xff1a;spfa 遍历所有点作为起点 spfa求最短路最后求和返回 求最小 #include <iostream>#include <cstring>#include <algorithm>using namespace std;const int N 810,M 3000,INF 0x3f3f3f3f;int n,p,m;int id[N];…