触类旁通Elasticsearch之吊打同行系列:搜索篇
点击上方蓝色字体,选择“设为星标”

ES的搜索请求执行流程如图1所示。图中索引包含两个分片,每个分片有一个副本分片。在给文档定位和评分后,缺省只会获取排名前10的文档。REST API搜索请求被发送到所连接的节点,该节点根据要查询的索引,将这个请求依次发送到所有的相关分片(主分片或者副本分片)。从所有分片收集到足够的排序和排名信息后,只有包含所需文档的分片被要求返回相关内容。这种搜索路由的行为是可配置的,图1展示的默认行为,称为查询后获取(query_then_fetch)。

图1 搜索请求是如何路由的
一、搜索请求的结构
1. 确定搜索范围
# 无条件搜索整个集群curl '172.16.1.127:9200/_search?pretty'curl '172.16.1.127:9200/_all/_search?pretty'curl '172.16.1.127:9200/*/_search?pretty'# 无条件搜索get-together索引,类似于SQL中的select * from get-together;curl '172.16.1.127:9200/get-together/_search?pretty'# 在ES6中已经废弃了type的概念,所以功能同上curl '172.16.1.127:9200/get-together/_doc/_search?pretty'# 无条件搜索get-together、dbinfo两个索引curl '172.16.1.127:9200/get-together,dbinfo/_doc/_search?pretty'# 模糊匹配索引名称,包含get-toge开头的索引,但不包括get-togethercurl '172.16.1.127:9200/+get-toge*,-get-together/_search?pretty'
2. 搜索请求的基本模块
select ...from ...where ...order by ...limit ...where <-> queryselect ... <-> _sourcesize + from <-> limitorder by <-> sort
query:配置查询和过滤器DSL,限制搜索的条件,类似于SQL查询中的where子句。
size:返回文档的数量,类似于SQL查询中的limit子句中的数量。
from:和size一起使用,from用于分页操作,类似于SQL查询中的limit子句中的偏移量。如果结果集合不断增加,获取某些靠后的翻页将会成为代价高昂的操作。(SQL中延迟关联的思想应该也可用于ES,先搜索出某一页的ID,再通过ID查询字段。)
_source:指定_source字段如何返回,默认返回完整的_source字段,类似于SQL中的select *。通过配置_source,将过滤返回的字段。
sort:类似于SQL中的order by子句,用于排序,默认的排序是基于文档的得分。
# ES的from从0开始curl '172.16.1.127:9200/get-together/_search?from=10&size=10&pretty'
curl '172.16.1.127:9200/get-together/_search?sort=date:asc&pretty'curl '172.16.1.127:9200/get-together/_search?sort=date:asc&_source=title,date&pretty'curl '172.16.1.127:9200/get-together/_search?sort=date:asc&q=title:elasticsearch&pretty'3. 基于请求主体的搜索请求
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"match_all": {}},"from": 10,"size": 10}'
# 只返回name和date字段curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"match_all": {}},"_source": ["name","date"]}'
# 返回location开头的字段和日期字段,但不返回location.geolocation字段curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"match_all": {}},"_source": {"include": ["location.*","date"],"exclude": ["location.geolocation"]}}'
# 类似于SQL中的order by created_on asc, name desc, _scorecurl -XPOST "172.16.1.127:9200/get-together/_mapping/_doc?pretty" -H 'Content-Type: application/json' -d'{"properties": {"name": {"type": "text","fielddata": "true"}}}'curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"match_all": {}},"sort": [{"created_on": "asc"},{"name": "desc"},"_score"]}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"match_all": {}},"from": 0,"size": 10,"_source": ["name","organizer","description"],"sort": [{"created_on": "desc"}]}'
select name, organizer, descriptionfrom get-togetherorder by created_on desclimit 0, 10;
4. 回复的结构
curl '172.16.1.127:9200/_search?q=title:elasticsearch&_source=title,date&pretty'{"took" : 13, # 查询执行所用的毫秒数"timed_out" : false, # 是否超时"_shards" : {"total" : 28, # 搜索的分片数"successful" : 28, # 成功的分片数"skipped" : 0, # 跳过的分片数"failed" : 0 # 失败的分片数},"hits" : {"total" : 7, # 匹配的文档数"max_score" : 1.0128567, # 最高文档得分"hits" : [ # 命中文档的数组{"_index" : "get-together", # 文档所属索引"_type" : "_doc", # 文档所属类型"_id" : "103", # 文档ID"_score" : 1.0128567, # 相关性得分"_routing" : "2", # 文档所属的分片号"_source" : { # 请求的_source字段"date" : "2013-04-17T19:00","title" : "Introduction to Elasticsearch"}},{"_index" : "get-together","_type" : "_doc","_id" : "105","_score" : 1.0128567,"_routing" : "2","_source" : {"date" : "2013-07-17T18:30","title" : "Elasticsearch and Logstash"}},...]}}
二、查询和过滤器

1. match
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"match_all": {}}}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"match": {"title": "hadoop"}}}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"match": {"name": {"query": "Elasticsearch Denver","operator": "and"}}}}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"match_phrase": {"name": {"query": "enterprise london","slop": 1}}},"_source": ["name","description"]}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"match_phrase_prefix": {"name": {"query": "Elasticsearch den","max_expansions": 1}}},"_source": ["name"]}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"multi_match": {"query": "elasticsearch hadoop","fields": ["name","description"]}}}'
2. term
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"term": {"tags": "elasticsearch"}},"_source": ["name","tags"]}'
和term查询相似,可以使用term过滤器来限制结果文档,使其包含特定的词条,不过无须计算得分。
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"bool": {"filter": {"term": {"tags": "elasticsearch"}}}}}'
和term查询类似,terms查询可以搜索某个文档字段中的多个词条。例如下面的查询搜索标签含有“jvm”或“hadoop”的文档。
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"terms": {"tags": ["jvm","hadoop"]}},"_source": ["name","tags"]}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"bool": {"minimum_should_match": 2,"must": {"terms": {"tags": ["jvm","hadoop","lucene"]}}}}}'
3. query_string
curl -XGET '172.16.1.127:9200/get-together/_search?q=nosql&pretty'curl -XPOST '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"query_string": {"query": "nosql"}}}'
curl -XPOST '172.16.1.127:9200/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"query_string": {"default_field": "description","query": "nosql"}}}'
curl -XPOST '172.16.1.127:9200/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"query_string": {"fields": ["description", "tags"],"query": "nosql"}}}'
curl -XPOST '172.16.1.127:9200/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"query_string": {"query": "name:nosql AND -description:mongodb"}}}'
curl -XPOST '172.16.1.127:9200/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"query_string": {"query": "(tags:search OR tags:lucene) AND (created_on:[1999-01-01 TO 2001-01-01])"}}}'
三、复合查询
1. bool查询
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"bool": {"must": [{"term": {"attendees": "david"}}],"should": [{"term": {"attendees": "clint"}},{"term": {"attendees": "andy"}}],"must_not": [{"range": {"date": {"lt": "2013-06-30T00:00"}}}],"minimum_should_match": 1}}}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"bool": {"must": [{"term": {"attendees": "david"}},{"range": {"date": {"gte": "2013-06-30T00:00"}}},{"terms": {"attendees": ["clint","andy"]}}]}}}'
2. bool过滤器
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"bool": {"filter": {"bool": {"must": [{"term": {"attendees": "david"}}],"should": [{"term": {"attendees": "clint"}},{"term": {"attendees": "andy"}}],"must_not": [{"range": {"date": {"lt": "2013-06-30T00:00"}}}]}}}}}'
四、其它查询和过滤器
1. range查询和过滤器
# where created_on > 2012-06-01 and created_on < 2012-09-01curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"range": {"created_on": {"gt": "2012-06-01","lt": "2012-09-01"}}}}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"bool": {"filter": {"range": {"created_on": {"gt": "2012-06-01","lt": "2012-09-01"}}}}}}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"range": {"name": {"gt": "c","lt": "e"}}}}'
2. prefix查询和过滤器
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"prefix": {"title": "liber"}}}'
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d '{"query": {"bool": {"filter": {"prefix": {"title": "liber"}}}}}'
3. wildcard查询
# 创建索引,添加两个文档curl -XPOST '172.16.1.127:9200/wildcard-test/_doc/1?pretty' -H 'Content-Type: application/json' -d '{"title":"The Best Bacon Ever"}'curl -XPOST '172.16.1.127:9200/wildcard-test/_doc/2?pretty' -H 'Content-Type: application/json' -d '{"title":"How to raise a barn"}'# “ba*n”会匹配bacon和barncurl '172.16.1.127:9200/wildcard-test/_search?pretty' -H 'Content-Type: application/json' -d'{"query": {"wildcard": {"title": {"wildcard": "ba*n"}}}}'# “ba?n”只会匹配barn,不会匹配baconcurl '172.16.1.127:9200/wildcard-test/_search?pretty' -H 'Content-Type: application/json' -d'{"query": {"wildcard": {"title": {"wildcard": "ba?n"}}}}'
4. exists过滤器
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d'{"query": {"bool": {"filter": {"exists": {"field": "location_event.geolocation"}}}}}'
5. missing过滤器
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d'{"query": {"bool": {"must_not": {"exists": {"field": "reviews"}}}}}'
6. 将任何查询转变为过滤器
curl '172.16.1.127:9200/get-together/_search?pretty' -H 'Content-Type: application/json' -d'{"query": {"bool": {"filter": {"query_string": {"query": "name:\"Elasticsearch\""}}}}}'
五、为任务选择最好的查询

版权声明:
文章不错?点个【在看】吧! ?
评论




