触类旁通Elasticsearch之吊打同行系列:搜索篇
程序源代码
共 16659字,需浏览 34分钟
· 2020-12-07
点击上方蓝色字体,选择“设为星标”
ES的搜索请求执行流程如图1所示。图中索引包含两个分片,每个分片有一个副本分片。在给文档定位和评分后,缺省只会获取排名前10的文档。REST API搜索请求被发送到所连接的节点,该节点根据要查询的索引,将这个请求依次发送到所有的相关分片(主分片或者副本分片)。从所有分片收集到足够的排序和排名信息后,只有包含所需文档的分片被要求返回相关内容。这种搜索路由的行为是可配置的,图1展示的默认行为,称为查询后获取(query_then_fetch)。
图1 搜索请求是如何路由的
一、搜索请求的结构
1. 确定搜索范围
# 无条件搜索整个集群
curl '172.16.1.127:9200/_search?pretty'
curl '172.16.1.127:9200/_all/_search?pretty'
curl '172.16.1.127:9200/*/_search?pretty'
# 无条件搜索get-together索引,类似于SQL中的select * from get-together;
curl '172.16.1.127:9200/get-together/_search?pretty'
# 在ES6中已经废弃了type的概念,所以功能同上
curl '172.16.1.127:9200/get-together/_doc/_search?pretty'
# 无条件搜索get-together、dbinfo两个索引
curl '172.16.1.127:9200/get-together,dbinfo/_doc/_search?pretty'
# 模糊匹配索引名称,包含get-toge开头的索引,但不包括get-together
curl '172.16.1.127:9200/+get-toge*,-get-together/_search?pretty'
2. 搜索请求的基本模块
select ...
from ...
where ...
order by ...
limit ...
where <-> query
select ... <-> _source
size + from <-> limit
order by <-> sort
query:配置查询和过滤器DSL,限制搜索的条件,类似于SQL查询中的where子句。
size:返回文档的数量,类似于SQL查询中的limit子句中的数量。
from:和size一起使用,from用于分页操作,类似于SQL查询中的limit子句中的偏移量。如果结果集合不断增加,获取某些靠后的翻页将会成为代价高昂的操作。(SQL中延迟关联的思想应该也可用于ES,先搜索出某一页的ID,再通过ID查询字段。)
_source:指定_source字段如何返回,默认返回完整的_source字段,类似于SQL中的select *。通过配置_source,将过滤返回的字段。
sort:类似于SQL中的order by子句,用于排序,默认的排序是基于文档的得分。
# ES的from从0开始
curl '172.16.1.127:9200/get-together/_search?from=10&size=10&pretty'
curl '172.16.1.127:9200/get-together/_search?sort=date:asc&pretty'
curl '172.16.1.127:9200/get-together/_search?sort=date:asc&_source=title,date&pretty'
curl '172.16.1.127:9200/get-together/_search?sort=date:asc&q=title:elasticsearch&pretty'
3. 基于请求主体的搜索请求
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"match_all": {}
},
"from": 10,
"size": 10
}'
# 只返回name和date字段
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"match_all": {}
},
"_source": [
"name",
"date"
]
}'
# 返回location开头的字段和日期字段,但不返回location.geolocation字段
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"match_all": {}
},
"_source": {
"include": [
"location.*",
"date"
],
"exclude": [
"location.geolocation"
]
}
}'
# 类似于SQL中的order by created_on asc, name desc, _score
curl -XPOST "172.16.1.127:9200/get-together/_mapping/_doc?pretty" -H 'Content-Type: application/json' -d'
{
"properties": {
"name": {
"type": "text",
"fielddata": "true"
}
}
}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"match_all": {}
},
"sort": [
{
"created_on": "asc"
},
{
"name": "desc"
},
"_score"
]
}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"match_all": {}
},
"from": 0,
"size": 10,
"_source": [
"name",
"organizer",
"description"
],
"sort": [
{
"created_on": "desc"
}
]
}'
select name, organizer, description
from get-together
order by created_on desc
limit 0, 10;
4. 回复的结构
curl '172.16.1.127:9200/_search?q=title:elasticsearch&_source=title,date&pretty'
{
"took" : 13, # 查询执行所用的毫秒数
"timed_out" : false, # 是否超时
"_shards" : {
"total" : 28, # 搜索的分片数
"successful" : 28, # 成功的分片数
"skipped" : 0, # 跳过的分片数
"failed" : 0 # 失败的分片数
},
"hits" : {
"total" : 7, # 匹配的文档数
"max_score" : 1.0128567, # 最高文档得分
"hits" : [ # 命中文档的数组
{
"_index" : "get-together", # 文档所属索引
"_type" : "_doc", # 文档所属类型
"_id" : "103", # 文档ID
"_score" : 1.0128567, # 相关性得分
"_routing" : "2", # 文档所属的分片号
"_source" : { # 请求的_source字段
"date" : "2013-04-17T19:00",
"title" : "Introduction to Elasticsearch"
}
},
{
"_index" : "get-together",
"_type" : "_doc",
"_id" : "105",
"_score" : 1.0128567,
"_routing" : "2",
"_source" : {
"date" : "2013-07-17T18:30",
"title" : "Elasticsearch and Logstash"
}
},
...
]
}
}
二、查询和过滤器
1. match
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"match_all": {}
}
}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"match": {
"title": "hadoop"
}
}
}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"match": {
"name": {
"query": "Elasticsearch Denver",
"operator": "and"
}
}
}
}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"match_phrase": {
"name": {
"query": "enterprise london",
"slop": 1
}
}
},
"_source": [
"name",
"description"
]
}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"match_phrase_prefix": {
"name": {
"query": "Elasticsearch den",
"max_expansions": 1
}
}
},
"_source": [
"name"
]
}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"multi_match": {
"query": "elasticsearch hadoop",
"fields": [
"name",
"description"
]
}
}
}'
2. term
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"term": {
"tags": "elasticsearch"
}
},
"_source": [
"name",
"tags"
]
}'
和term查询相似,可以使用term过滤器来限制结果文档,使其包含特定的词条,不过无须计算得分。
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"bool": {
"filter": {
"term": {
"tags": "elasticsearch"
}
}
}
}
}'
和term查询类似,terms查询可以搜索某个文档字段中的多个词条。例如下面的查询搜索标签含有“jvm”或“hadoop”的文档。
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"terms": {
"tags": [
"jvm",
"hadoop"
]
}
},
"_source": [
"name",
"tags"
]
}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"bool": {
"minimum_should_match": 2,
"must": {
"terms": {
"tags": [
"jvm",
"hadoop",
"lucene"
]
}
}
}
}
}'
3. query_string
curl -XGET '172.16.1.127:9200/get-together/_search?q=nosql&pretty'
curl -XPOST '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"query_string": {
"query": "nosql"
}
}
}'
curl -XPOST '172.16.1.127:9200/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"query_string": {
"default_field": "description",
"query": "nosql"
}
}
}'
curl -XPOST '172.16.1.127:9200/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"query_string": {
"fields": ["description", "tags"],
"query": "nosql"
}
}
}'
curl -XPOST '172.16.1.127:9200/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"query_string": {
"query": "name:nosql AND -description:mongodb"
}
}
}'
curl -XPOST '172.16.1.127:9200/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"query_string": {
"query": "(tags:search OR tags:lucene) AND (created_on:[1999-01-01 TO 2001-01-01])"
}
}
}'
三、复合查询
1. bool查询
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"bool": {
"must": [
{
"term": {
"attendees": "david"
}
}
],
"should": [
{
"term": {
"attendees": "clint"
}
},
{
"term": {
"attendees": "andy"
}
}
],
"must_not": [
{
"range": {
"date": {
"lt": "2013-06-30T00:00"
}
}
}
],
"minimum_should_match": 1
}
}
}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"bool": {
"must": [
{
"term": {
"attendees": "david"
}
},
{
"range": {
"date": {
"gte": "2013-06-30T00:00"
}
}
},
{
"terms": {
"attendees": [
"clint",
"andy"
]
}
}
]
}
}
}'
2. bool过滤器
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"term": {
"attendees": "david"
}
}
],
"should": [
{
"term": {
"attendees": "clint"
}
},
{
"term": {
"attendees": "andy"
}
}
],
"must_not": [
{
"range": {
"date": {
"lt": "2013-06-30T00:00"
}
}
}
]
}
}
}
}
}'
四、其它查询和过滤器
1. range查询和过滤器
# where created_on > 2012-06-01 and created_on < 2012-09-01
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"range": {
"created_on": {
"gt": "2012-06-01",
"lt": "2012-09-01"
}
}
}
}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"bool": {
"filter": {
"range": {
"created_on": {
"gt": "2012-06-01",
"lt": "2012-09-01"
}
}
}
}
}
}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"range": {
"name": {
"gt": "c",
"lt": "e"
}
}
}
}'
2. prefix查询和过滤器
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"prefix": {
"title": "liber"
}
}
}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '
{
"query": {
"bool": {
"filter": {
"prefix": {
"title": "liber"
}
}
}
}
}'
3. wildcard查询
# 创建索引,添加两个文档
curl -XPOST '172.16.1.127:9200/wildcard-test/_doc/1?pretty' -H 'Content-Type: application/json' -d '
{
"title":"The Best Bacon Ever"
}'
curl -XPOST '172.16.1.127:9200/wildcard-test/_doc/2?pretty' -H 'Content-Type: application/json' -d '
{
"title":"How to raise a barn"
}'
# “ba*n”会匹配bacon和barn
curl '172.16.1.127:9200/wildcard-test/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"wildcard": {
"title": {
"wildcard": "ba*n"
}
}
}
}'
# “ba?n”只会匹配barn,不会匹配bacon
curl '172.16.1.127:9200/wildcard-test/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"wildcard": {
"title": {
"wildcard": "ba?n"
}
}
}
}'
4. exists过滤器
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"filter": {
"exists": {
"field": "location_event.geolocation"
}
}
}
}
}'
5. missing过滤器
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "reviews"
}
}
}
}
}'
6. 将任何查询转变为过滤器
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"filter": {
"query_string": {
"query": "name:\"Elasticsearch\""
}
}
}
}
}'
五、为任务选择最好的查询
版权声明:
文章不错?点个【在看】吧! ?
评论
新时代写作与互动:《一本书讲透 Elasticsearch》读者群的创新之路
1、《一本书讲透 Elasticsearch》销售最近进展汇报给大家同步一下《一本书讲透 Elasticsearch》图书的进展情况,本周五(2024年4月26日),出版社编辑老师反馈图书相关销量进展:预计全网销量 1000 册+,发货量 2462 册(截止2024年4月28日)。2023年12月2
铭毅天下
0
Go 1.22 的新增功能系列之二:reflect.TypeFor
Go 1.22 的第一个候选版本已经发布,这意味着最终版本即将发布,现在是我在博客中介绍我在这个周期中所做工作的时候了。像往常一样,我的贡献很小,但它们是我的,所以我将从幕后的角度来谈谈它们。首先是reflect.TypeFor。这是整个函数:// TypeFor returns the [Type
GoCN
0
文本嵌入、文本分类和语义搜索
在实践中使用大型语言模型(LLM)中,RAG 的一个关键部分是使用文本嵌入从知识库中自动检索相关信息。在这里,我将更深入地讨论文本嵌入,并分享两个简单(但功能强大)的应用:文本分类和语义搜索。ChatGPT 吸引了全世界对人工智能及其潜力的想象。ChatGPT 的聊天界面是这一影响的关键因素,它使人
大邓和他的Python
0
Go 1.22 的新增功能系列之一:cmp.Or
截至撰写本文时,Go 1.22 已经发布几个月了。早就该结束我为 1.22 所做的工作的系列了。抱歉耽搁了这么久,我最近忙于生活事务。如果您错过了我关于reflect.TypeFor(https://blog.carlana.net/post/2024/golang-reflect-type-for
GoCN
1
全新 SOTA backbone | 2024年了,再见ViT系列Backbone,实数难得,不知道效果如何?
点击上方“小白学视觉”,选择加"星标"或“置顶”重磅干货,第一时间送达在构建用于精确匹配的深度固定长度表示时,确定指纹上的密集特征点,特别是在像素 Level 上,具有重大意义。为了探索指纹匹配的可解释性,作者提出了一种多阶段可解释的指纹匹配网络,名为通过视觉 Transformer 进行指纹匹配的
小白学视觉
10
带你玩转Linux系统之lscpu命令
链接:https://bbs.huaweicloud.com/blogs/422603一、lscpu命令介绍lscpu 是一个 Linux 命令,用于显示CPU架构的详细信息。它可以用来查看 CPU 的型号、主频、架构、虚拟化支持等。二、lscpu命令的使用帮助2.1 命令格式lscpu [选项]2
良许Linux
0
C语言基础之动态内存操作汇总
来源:机器之心1、堆区空间申请#include <stdlib.h> //头文件void *malloc(size_t size);//函数size表示申请的空间字节数函数的返回值:成功:返回值空间起始地址失败:NULL特点:分配指定大小的内存空间;分配的内存空间是连续的;需要手动释放
良许Linux
0
大模型并行训练指南:通俗理解Megatron-DeepSpeed之模型并行与数据并行(下)
文末《大模型项目开发线上营》秒杀倒计时↓↓↓接前文:(上)篇>>>大模型并行训练指南:通俗理解Megatron-DeepSpeed之模型并行与数据并行(上)(中)篇>>>大模型并行训练指南:通俗理解Megatron-DeepSpeed之模型并行与数据并行(中)06
七月在线实验室
10