通過使用同義詞,您可以將已經上傳的同義詞文件作用于阿里云Elasticsearch的同義詞庫,并使用更新后的詞庫搜索。阿里云Elasticsearch支持兩種方式使用同義詞:上傳同義詞文件、直接引用同義詞。本文分別介紹兩種方式的使用示例。
背景信息
本文中的命令,均可在Kibana控制臺中執行。登錄Kibana控制臺的方法,請參見登錄Kibana控制臺。方式一:上傳同義詞文件
前提條件:已上傳同義詞文件。具體操作,請參見上傳同義詞文件進行上傳。
以下示例使用filter過濾器配置同義詞,使用aliyun_synonyms.txt作為測試文件,內容為begin, start
。
- 創建索引。
PUT /aliyun-index-test { "settings": { "index":{ "analysis": { "analyzer": { "by_smart": { "type": "custom", "tokenizer": "ik_smart", "filter": ["by_tfr","by_sfr"], "char_filter": ["by_cfr"] }, "by_max_word": { "type": "custom", "tokenizer": "ik_max_word", "filter": ["by_tfr","by_sfr"], "char_filter": ["by_cfr"] } }, "filter": { "by_tfr": { "type": "stop", "stopwords": [" "] }, "by_sfr": { "type": "synonym", "synonyms_path": "analysis/aliyun_synonyms.txt" } }, "char_filter": { "by_cfr": { "type": "mapping", "mappings": ["| => |"] } } } } } }
- 配置同義詞字段title。
- Elasticsearch 7.0以下版本示例
PUT /aliyun-index-test/_mapping/doc { "properties": { "title": { "type": "text", "analyzer": "by_max_word", "search_analyzer": "by_smart" } } }
- Elasticsearch 7.0及以上版本示例
PUT /aliyun-index-test/_mapping/ { "properties": { "title": { "type": "text", "analyzer": "by_max_word", "search_analyzer": "by_smart" } } }
重要 官方Elasticsearch從7.0版本開始,移除了類型(type)的概念,默認使用_doc
代替。因此在設置索引mapping時無需指定type,否則會報錯。
- Elasticsearch 7.0以下版本示例
- 校驗同義詞。
GET /aliyun-index-test/_analyze { "analyzer": "by_smart", "text":"begin" }
執行成功后,返回如下結果。{ "tokens": [ { "token": "begin", "start_offset": 0, "end_offset": 5, "type": "ENGLISH", "position": 0 }, { "token": "start", "start_offset": 0, "end_offset": 5, "type": "SYNONYM", "position": 0 } ] }
- 添加數據,進行下一步測試。
- Elasticsearch 7.0以下版本示例
PUT /aliyun-index-test/doc/1 { "title": "Shall I begin?" }
PUT /aliyun-index-test/doc/2 { "title": "I start work at nine." }
- Elasticsearch 7.0及以上版本示例
PUT /aliyun-index-test/_doc/1 { "title": "Shall I begin?" }
PUT /aliyun-index-test/_doc/2 { "title": "I start work at nine." }
- Elasticsearch 7.0以下版本示例
- 通過搜索測試,校驗同義詞。
GET /aliyun-index-test/_search { "query" : { "match" : { "title" : "begin" }}, "highlight" : { "pre_tags" : ["<red>", "<bule>"], "post_tags" : ["</red>", "</bule>"], "fields" : { "title" : {} } } }
執行成功后,返回如下結果。{ "took": 11, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 0.41048482, "hits": [ { "_index": "aliyun-index-test", "_type": "doc", "_id": "2", "_score": 0.41048482, "_source": { "title": "I start work at nine." }, "highlight": { "title": [ "I <red>start</red> work at nine." ] } }, { "_index": "aliyun-index-test", "_type": "doc", "_id": "1", "_score": 0.39556286, "_source": { "title": "Shall I begin?" }, "highlight": { "title": [ "Shall I <red>begin</red>?" ] } } ] } }
方式二:直接引用同義詞
以下示例直接引用同義詞,并使用IK詞典進行分詞。
- 創建索引。
PUT /my_index { "settings": { "analysis": { "analyzer": { "my_synonyms": { "filter": [ "lowercase", "my_synonym_filter" ], "tokenizer": "ik_smart" } }, "filter": { "my_synonym_filter": { "synonyms": [ "begin,start" ], "type": "synonym" } } } } }
以上命令的原理為:- 設置一個同義詞過濾器my_synonym_filter,并配置同義詞詞庫。
- 設置一個my_synonyms解釋器,使用ik_smart分詞。
- 經過ik_smart分詞,把所有字母小寫,并作為同義詞處理。
- 配置同義詞字段title。
- Elasticsearch 7.0以下版本示例
PUT /my_index/_mapping/doc { "properties": { "title": { "type": "text", "analyzer": "my_synonyms" } } }
- Elasticsearch 7.0及以上版本示例
PUT /my_index/_mapping/ { "properties": { "title": { "type": "text", "analyzer": "my_synonyms" } } }
重要 官方Elasticsearch從7.0版本開始,移除了類型(type)的概念,默認使用_doc
代替,所以在設置索引mapping時無需指定type,否則會報錯。
- Elasticsearch 7.0以下版本示例
- 校驗同義詞。
GET /my_index/_analyze { "analyzer":"my_synonyms", "text":"Shall I begin?" }
執行成功后,返回如下結果。{ "tokens": [ { "token": "shall", "start_offset": 0, "end_offset": 5, "type": "ENGLISH", "position": 0 }, { "token": "i", "start_offset": 6, "end_offset": 7, "type": "ENGLISH", "position": 1 }, { "token": "begin", "start_offset": 8, "end_offset": 13, "type": "ENGLISH", "position": 2 }, { "token": "start", "start_offset": 8, "end_offset": 13, "type": "SYNONYM", "position": 2 } ] }
- 添加數據,進行下一步測試。
- Elasticsearch 7.0以下版本示例
PUT /my_index/doc/1 { "title": "Shall I begin?" }
PUT /my_index/doc/2 { "title": "I start work at nine." }
- Elasticsearch 7.0及以上版本示例
PUT /my_index/_doc/1 { "title": "Shall I begin?" }
PUT /my_index/_doc/2 { "title": "I start work at nine." }
- Elasticsearch 7.0以下版本示例
- 通過搜索測試,校驗同義詞。
GET /my_index/_search { "query" : { "match" : { "title" : "begin" }}, "highlight" : { "pre_tags" : ["<red>", "<bule>"], "post_tags" : ["</red>", "</bule>"], "fields" : { "title" : {} } } }
執行成功后,返回如下結果。{ "took": 11, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 0.41913947, "hits": [ { "_index": "my_index", "_type": "doc", "_id": "2", "_score": 0.41913947, "_source": { "title": "I start work at nine." }, "highlight": { "title": [ "I <red>start</red> work at nine." ] } }, { "_index": "my_index", "_type": "doc", "_id": "1", "_score": 0.39556286, "_source": { "title": "Shall I begin?" }, "highlight": { "title": [ "Shall I <red>begin</red>?" ] } } ] } }