ElasticSearch实现全文检索

致力于最高效的Java学习

原文链接 blog.csdn.net/weixin_44671737/article/details/114456257
B 站搜索:楠哥教你学Java
获取更多优质视频教程
摘要
1 技术选型
搜索引擎服务使用 ElasticSearch
提供的对外 web 服务选则 Springboot web
1.1 ElasticSearch
1.2 Spring Boot
1.3 ik分词器
ElasticSearch 本身不支持中文的分词,需要安装中文分词插件,如果需要做中文的信息检索,中文分词是基础,此处选则了ik,下载好后放入 elasticSearch 的安装位置的 plugin 目录即可。
2 环境准备
需要安装好elastiSearch以及kibana(可选),并且需要lk分词插件。
1、安装elasticSearch elasticsearch官网. 笔者使用的是7.5.1。
2、ik插件下载 ik插件github地址. 注意下载和你下载elasticsearch版本一样的ik插件。
3、将ik插件放入elasticsearch安装目录下的plugins包下,新建报名ik,将下载好的插件解压到该目录下即可,启动es的时候会自动加载该插件。


3 项目架构
1、获取数据使用ik分词插件
2、将数据存储在es引擎中
3、通过es检索方式对存储的数据进行检索
4、使用es的java客户端提供外部服务

4 实现效果
4.1 搜索页面
简单实现一个类似百度的搜索框即可。

4.2 搜索结果页面

点击第一个搜索结果是我个人的某一篇博文,为了避免数据版权问题,笔者在es引擎中存放的全是个人的博客数据。
5 具体代码实现
5.1 全文检索的实现对象
按照博文的基本信息定义了如下实体类,主要需要知道每一个博文的url,通过检索出来的文章具体查看要跳转到该url。
package com.lbh.es.entity;import com.fasterxml.jackson.annotation.JsonIgnore;import javax.persistence.*;/*** PUT articles* {* "mappings":* {"properties":{* "author":{"type":"text"},* "content":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"},* "title":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"},* "createDate":{"type":"date","format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"},* "url":{"type":"text"}* } },* "settings":{* "index":{* "number_of_shards":1,* "number_of_replicas":2* }* }* }* ---------------------------------------------------------------------------------------------------------------------* Copyright(c)lbhbinhao@163.com* @author liubinhao* @date 2021/3/3*/public class ArticleEntity {private long id;private String author;private String content;private String title;private String createDate;private String url;public String getAuthor() {return author;}public void setAuthor(String author) {this.author = author;}public String getContent() {return content;}public void setContent(String content) {this.content = content;}public String getTitle() {return title;}public void setTitle(String title) {this.title = title;}public String getCreateDate() {return createDate;}public void setCreateDate(String createDate) {this.createDate = createDate;}public String getUrl() {return url;}public void setUrl(String url) {this.url = url;}}
5.2 客户端配置
通过java配置es的客户端
/*** Copyright(c)lbhbinhao@163.com* @author liubinhao* @date 2021/3/3*/public class EsConfig {private String schema;private String address;private int connectTimeout;private int socketTimeout;private int tryConnTimeout;private int maxConnNum;private int maxConnectPerRoute;public RestHighLevelClient restHighLevelClient() {// 拆分地址ListhostLists = new ArrayList<>(); String[] hostList = address.split(",");for (String addr : hostList) {String host = addr.split(":")[0];String port = addr.split(":")[1];hostLists.add(new HttpHost(host, Integer.parseInt(port), schema));}// 转换成 HttpHost 数组HttpHost[] httpHost = hostLists.toArray(new HttpHost[]{});// 构建连接对象RestClientBuilder builder = RestClient.builder(httpHost);// 异步连接延时配置builder.setRequestConfigCallback(requestConfigBuilder -> {requestConfigBuilder.setConnectTimeout(connectTimeout);requestConfigBuilder.setSocketTimeout(socketTimeout);requestConfigBuilder.setConnectionRequestTimeout(tryConnTimeout);return requestConfigBuilder;});// 异步连接数配置builder.setHttpClientConfigCallback(httpClientBuilder -> {httpClientBuilder.setMaxConnTotal(maxConnNum);httpClientBuilder.setMaxConnPerRoute(maxConnectPerRoute);return httpClientBuilder;});return new RestHighLevelClient(builder);}}
5.3 业务代码编写
包括一些检索文章的信息,可以从文章标题,文章内容以及作者信息这些维度来查看相关信息。
/*** Copyright(c)lbhbinhao@163.com* @author liubinhao* @date 2021/3/3*/public class ArticleService {private static final String ARTICLE_INDEX = "article";private RestHighLevelClient client;private ArticleRepository articleRepository;public boolean createIndexOfArticle(){Settings settings = Settings.builder().put("index.number_of_shards", 1).put("index.number_of_replicas", 1).build();// {"properties":{"author":{"type":"text"},// "content":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"}// ,"title":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"},// ,"createDate":{"type":"date","format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"}// }String mapping = "{\"properties\":{\"author\":{\"type\":\"text\"},\n" +"\"content\":{\"type\":\"text\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"}\n" +",\"title\":{\"type\":\"text\",\"analyzer\":\"ik_max_word\",\"search_analyzer\":\"ik_smart\"}\n" +",\"createDate\":{\"type\":\"date\",\"format\":\"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd\"}\n" +"},\"url\":{\"type\":\"text\"}\n" +"}";CreateIndexRequest indexRequest = new CreateIndexRequest(ARTICLE_INDEX).settings(settings).mapping(mapping,XContentType.JSON);CreateIndexResponse response = null;try {response = client.indices().create(indexRequest, RequestOptions.DEFAULT);} catch (IOException e) {e.printStackTrace();}if (response!=null) {System.err.println(response.isAcknowledged() ? "success" : "default");return response.isAcknowledged();} else {return false;}}public boolean deleteArticle(){DeleteIndexRequest request = new DeleteIndexRequest(ARTICLE_INDEX);try {AcknowledgedResponse response = client.indices().delete(request, RequestOptions.DEFAULT);return response.isAcknowledged();} catch (IOException e) {e.printStackTrace();}return false;}public IndexResponse addArticle(ArticleEntity article){Gson gson = new Gson();String s = gson.toJson(article);//创建索引创建对象IndexRequest indexRequest = new IndexRequest(ARTICLE_INDEX);//文档内容indexRequest.source(s,XContentType.JSON);//通过client进行http的请求IndexResponse re = null;try {re = client.index(indexRequest, RequestOptions.DEFAULT);} catch (IOException e) {e.printStackTrace();}return re;}public void transferFromMysql(){articleRepository.findAll().forEach(this::addArticle);}public ListqueryByKey(String keyword){ SearchRequest request = new SearchRequest();/** 创建 搜索内容参数设置对象:SearchSourceBuilder* 相对于matchQuery,multiMatchQuery针对的是多个fi eld,也就是说,当multiMatchQuery中,fieldNames参数只有一个时,其作用与matchQuery相当;* 而当fieldNames有多个参数时,如field1和field2,那查询的结果中,要么field1中包含text,要么field2中包含text。*/SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();searchSourceBuilder.query(QueryBuilders.multiMatchQuery(keyword, "author","content","title"));request.source(searchSourceBuilder);Listresult = new ArrayList<>(); try {SearchResponse search = client.search(request, RequestOptions.DEFAULT);for (SearchHit hit:search.getHits()){Map<String, Object> map = hit.getSourceAsMap();ArticleEntity item = new ArticleEntity();item.setAuthor((String) map.get("author"));item.setContent((String) map.get("content"));item.setTitle((String) map.get("title"));item.setUrl((String) map.get("url"));result.add(item);}return result;} catch (IOException e) {e.printStackTrace();}return null;}public ArticleEntity queryById(String indexId){GetRequest request = new GetRequest(ARTICLE_INDEX, indexId);GetResponse response = null;try {response = client.get(request, RequestOptions.DEFAULT);} catch (IOException e) {e.printStackTrace();}if (response!=null&&response.isExists()){Gson gson = new Gson();return gson.fromJson(response.getSourceAsString(),ArticleEntity.class);}return null;}}
5.4 对外接口
和使用springboot开发web程序相同。
/*** Copyright(c)lbhbinhao@163.com* @author liubinhao* @date 2021/3/3*/public class ArticleController {private ArticleService articleService;public boolean create(){return articleService.createIndexOfArticle();}public boolean delete() {return articleService.deleteArticle();}public IndexResponse add( ArticleEntity article){return articleService.addArticle(article);}public String transfer(){articleService.transferFromMysql();return "successful";}public Listquery(String keyword){ return articleService.queryByKey(keyword);}}
5.5 页面
此处页面使用thymeleaf,主要原因是笔者真滴不会前端,只懂一丢丢简单的h5,就随便做了一个可以展示的页面。
搜索页面
<html lang="en" xmlns:th="http://www.thymeleaf.org"><head><meta charset="UTF-8" /><meta name="viewport" content="width=device-width, initial-scale=1.0" /><title>YiyiDutitle><style>input:focus {border: 2px solid rgb(62, 88, 206);}input {text-indent: 11px;padding-left: 11px;font-size: 16px;}style><style class="input/css">.input {width: 33%;height: 45px;vertical-align: top;box-sizing: border-box;border: 2px solid rgb(207, 205, 205);border-right: 2px solid rgb(62, 88, 206);border-bottom-left-radius: 10px;border-top-left-radius: 10px;outline: none;margin: 0;display: inline-block;background: url(/static/img/camera.jpg) no-repeat 0 0;background-position: 565px 7px;background-size: 28px;padding-right: 49px;padding-top: 10px;padding-bottom: 10px;line-height: 16px;}style><style class="button/css">.button {height: 45px;width: 130px;vertical-align: middle;text-indent: -8px;padding-left: -8px;background-color: rgb(62, 88, 206);color: white;font-size: 18px;outline: none;border: none;border-bottom-right-radius: 10px;border-top-right-radius: 10px;margin: 0;padding: 0;}style>head><body><div style="font-size: 0px;"><div align="center" style="margin-top: 0px;"><img src="../static/img/yyd.png" th:src = "@{/static/img/yyd.png}" alt="一亿度" width="280px" class="pic" />div><div align="center"><form action="/home/query"><input type="text" class="input" name="keyword" /><input type="submit" class="button" value="一亿度下" />form>div>div>body>html>
搜索结果页面
<html lang="en" xmlns:th="http://www.thymeleaf.org"><head><link rel="stylesheet" href="https://cdn.staticfile.org/twitter-bootstrap/4.3.1/css/bootstrap.min.css"><meta charset="UTF-8"><title>xx-managertitle>head><body><header th:replace="search.html">header><div class="container my-2"><ul th:each="article : ${articles}"><a th:href="${article.url}"><li th:text="${article.author}+${article.content}">li>a>ul>div><footer th:replace="footer.html">footer>body>html>
楠哥简介
资深 Java 工程师,微信号 southwindss
《Java零基础实战》一书作者
腾讯课程官方 Java 面试官,今日头条认证大V
GitChat认证作者,B站认证UP主(楠哥教你学Java)
致力于帮助万千 Java 学习者持续成长。

