常见的电商搜索比如京东、淘宝,输入面膜或者mm,下拉框会有很多引导用户去选择的关键字,比如 面膜 面霜、面膜 补水 ,因为最近项目需求需要加入搜索联想的功能,在这过程中碰过很多次坑,所以在这记录。
常见的搜索联想有通过数据库来实现,比如mysql、oracle,通过sql语句的LIKE 查询,可以实现前缀匹配。这种在数据量不大的情况下是可以的,但是一般电商平台的索引数据量都是非常大,这样查出来的速度就很慢,用户体验也很不好。另外一种是使用搜索引擎实现的搜索,因为搜索引擎会给每个分词加索引,我们获取回来就很快。
倒排索引
Elasticsearch使用一种叫做倒排索引(inverted index)的结构来做快速的全文搜索。倒排索引由在文档中出现的唯一的单词列表,以及对于每个单词在文档中的位置组成。
正序索引 是一个 索引对应一个文档字段
索引 | 文档 | 文档 |
---|---|---|
1 | 中国 | 中华人民共和国 |
2 | 中国 | 美国 |
倒排索引 是把文档字段分词,对应文档的索引
文档 | 1 | 2 |
---|---|---|
中国 | X | X |
中华人民共和国 | X | |
美国 | X |
使用elasticsearch实现的搜索联想就是通过分词器进行分词
生成tokens,然后通过倒排索引的方式来搜索出所在的文档,然后会显回来。
spring-boot整合elasticsearch实现搜索联想
pom.xml文件
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>1.5.1.RELEASE</version>
</parent>
<dependencies>
<!-- Spring Boot Elasticsearch 依赖 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
<!-- Spring Boot Web 依赖 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- Junit -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
</dependency>
<dependency>
<groupId>org.nlpcn</groupId>
<artifactId>nlp-lang</artifactId>
<version>1.7.6</version>
</dependency>
</dependencies>
引入一个简单的spring-boot整合elasticsearch的项目,和拼音jar包
注解实现的实体
@Document(indexName = "cityindex", type = "citysuggest")
public class CitySuggest implements Serializable{
/**
*
*/
private static final long serialVersionUID = 1L;
@Field(type=FieldType.Long)
private Long id;
@Field(type=FieldType.String)
private String keyword;
@CompletionField(analyzer="ik_smart",searchAnalyzer="ik_smart",payloads=false)
private Completion suggesttag;
public Long getId() {
return id;
}
public void setId(Long id) {
this.id = id;
}
public Completion getSuggesttag() {
return suggesttag;
}
public void setSuggesttag(Completion suggesttag) {
this.suggesttag = suggesttag;
}
public String getKeyword() {
return keyword;
}
public void setKeyword(String keyword) {
this.keyword = keyword;
}
}
注意:同一个索引中的id注解模式@Field(type=FieldType.Long),所有的type中要一致,不然后面定义的联想无效
实现联想的字段类型Completion,也就是官网上面的"type":"competeion"
该项目是elasticsearch2.3.3 + sping-boot 1.7.5
进行分词生成索引
public boolean updateSuggest(City city) {
AnalyzeRequestBuilder requestBuilder = new AnalyzeRequestBuilder(esClient, AnalyzeAction.INSTANCE, "cityindex", city.getCityname());
requestBuilder.setAnalyzer("ik_smart");
AnalyzeResponse response = requestBuilder.get();
List<AnalyzeToken> tokens = response.getTokens();
List<String> input = new ArrayList<String>();
List<CitySuggest> citySuggests = new ArrayList<CitySuggest>();
for (AnalyzeToken token : tokens) {
if (token.getTerm().length() < 2) {
continue;
}
if (!input.contains(token.getTerm())) {
input.add(token.getTerm());
}
}
//关键字处理
for(int i=0,j=input.size();i<j;i++){
CitySuggest citySuggest = new CitySuggest();
List<String> itemInput = new ArrayList<String>();
itemInput.add(input.get(i));
itemInput.add(Pinyin.list2StringSkipNull(Pinyin.pinyin(input.get(i)),""));
itemInput.add(Pinyin.list2StringSkipNull(Pinyin.firstChar(input.get(i)),""));
Completion completion = new Completion(list2String(itemInput));
completion.setOutput(input.get(i));
citySuggest.setId((i+1L));
citySuggest.setSuggesttag(completion);
citySuggest.setKeyword(input.get(i));
citySuggests.add(citySuggest);
}
for(int i=0;i<citySuggests.size();i++){
citySuggestRepository.save(citySuggests.get(i));
}
return true;
}
获取联想数据的接口
public List<String> suggest(String prefix) {
CompletionSuggestionBuilder suggestion = SuggestBuilders.completionSuggestion("complete");
suggestion.analyzer("ik_smart");
//suggesttag是联想数据字段
suggestion.text(prefix).field("suggesttag");
SearchResponse response = this.esClient.prepareSearch("cityindex").setTypes("citysuggest").addSuggestion(suggestion).execute().actionGet();
Suggest suggest = response.getSuggest();
// 没有任何数据
if (suggest == null) {
return new ArrayList<String>();
}
List<? extends Suggest.Suggestion.Entry<? extends Suggest.Suggestion.Entry.Option>> list = response.getSuggest().getSuggestion("complete").getEntries();
List<String> suggestList = new ArrayList<String>();
if (list == null) {
return null;
} else {
for (Suggest.Suggestion.Entry<? extends Suggest.Suggestion.Entry.Option> e : list) {
for (Suggest.Suggestion.Entry.Option option : e) {
suggestList.add(option.getText().toString());
}
}
}
return suggestList;
}
测试接口
http://10.0.0.80:8080/api/city/suggest?content=mm
效果数据
{
"result": 0,
"msg": "获取数据成功",
"nowtime": 1551255460698,
"suggests": [
"面膜"
]
}
动态生成索引
POST /gangyanindex/goodsuggest/_mapping
{
"goodsuggest": {
"properties": {
"suggesttag": {
"max_input_length": 50,
"payloads": false,
"analyzer": "ik_smart",
"preserve_position_increments": true,
"type": "completion",
"preserve_separators": true
},
"id": {
"index": "not_analyzed",
"type": "string"
},
"keyword": {
"type": "string"
}
}
}
}