Rails App 运用 Redis 构建高性能的实时搜索

Rails App 运用 Redis 构建高性能的实时搜索

李华顺

Name: 李华顺 (Jason Lee)

Twitter: @huacnlee

Github: http://github.com/huacnlee

者也淘宝 MED

http://twitter.com/huacnlee

http://github.com/huacnlee

目前市面上的搜索引擎项目

但我不讲它们 !

Background

• 做了者也 (zheye.org) 这个网站 ;

• 需要实现类似 Quora 那样高效的搜索功能；

• 采用 Ruby on Rails 开发， MongoDB 数据库；

• 中文的搜索，需要分词；

• 需要逐字匹配搜索；

http://quora.com/

http://www.mongodb.org/

• 能够在键盘输入的瞬间响应搜索结果；

•MongoDB 支持；

• 不需要太复杂的查询，单个字段作为搜索条件；

• 逐字匹配功能；

• 分词、模糊匹配；

• 实时更新；

• 排序；

此搜索功能的需求


为什么不用 Sphinx 或其他的开源项目

•查询速度无法满足按键瞬间需要响应的需求

•对于 MongoDB 的，暂无现成的组件可用

•需要逐字匹配搜索

•实时更新索引


起初的实现机制

set keys * 关键词 * mgetclass Ask after_create do key = "quora:#{self.title.downcase}" $redis.set(key,{:id => self.id,:title => self.title, :type => self.type}) end before_destroy do $redis.del("quora:#{self.title_was.downcase}") end

def search(text,limit = 10) words = RMMSeg.split(text) keys = $redis.keys("*#{words.collect(&:downcase).join("*")}*")[0,limit] result = $redis.mget(*keys) items = [] result.each do |r| items << JSON.parse(r) end items.sort { |b,a| a['type'] <=> b['type'] } return items endend

问题

•数据上了 10 万 + 会越来越慢

•分词搜索只能按顺序输入的查询

•无法排序

改如何改进？

SINTER SUNION

运用 Redis 的特性

Sets关键词索引

实体数据

SADD SREM

HashesHMGET

HDELHSET

Sorted Sets前缀匹配索引

ZADD

ZRANK ZRANGE

http://redis.io/commands/sinter

http://redis.io/commands/sunion

http://redis.io/commands/sadd

http://redis.io/commands/srem

http://redis.io/commands/hmget

http://redis.io/commands/hdel

http://redis.io/commands/hset

http://redis.io/commands/zadd

http://redis.io/commands/zrank

http://redis.io/commands/zrange

Redis-Search 的索引结构

Ask

{ 'id' : 1, 'title' : 'Ruby on Rails 为什么室如此高效？ ' , 'score' : 4 }{ 'id' : 2, 'title' : 'Ruby 编程入门应该看什么书籍？ ', 'score' : 20 }{ 'id' : 3, 'title' : 'Ruby 和 Python 那个更好 ?' , 'score' : 13 }{ 'id' : 4, 'title' : ' 做 Python 开发应该用什么开发工具比较好？ ', 'score' : 5 }

演示数据 :

Topic

{ 'id' : 1, 'name' : 'Ruby' , 'score' : 5 }{ 'id' : 2, 'name' : 'Rails' , 'score' : 18 } { 'id' : 3, 'name' : 'Rubies', 'score' : 10 }{ 'id' : 4, 'name' : 'Rake', 'score' : 4 }{ 'id' : 5, 'name' : 'Python' , 'score' : 2 }

prefix_index_enable = true

前缀匹配索引

Sorted Sets

关键词索引

Setstopic:rails [2]ask:rails [1]topic:ruby [1]ask:ruby [1,2,3]topic:rails [4]topic:rubies [5] ask:python [3,4]ask: 什么 [1,2,4]......

Score 排序索引

ask:_score_:1 4ask:_score_:2 20ask:_score_:3 13ask:_score_:4 5topic:_score_:1 18topic:_score_:2 10topic:_score_:3 4topic:_score_:4 2......

Sets

索引

1. r2. ra3. rai4. rail• rails*1. rak• rake*• ru• rub• rubi• rubie• rubies*• ruby*

‣ * 号项表示实际词‣ 自动排序存放

http://redis.io/commands%23sorted_set

http://redis.io/commands%23set

http://redis.io/commands%23set

索引实际数据

Topic

topic:1 { 'id' : 1, 'name' : 'Ruby' }topic:2 { 'id' : 2, 'name' : 'Rails' } topic:3 { 'id' : 3, 'name' : 'Rubies' }topic:4 { 'id' : 4, 'name' : 'Rake' }topic:5 { 'id' : 5, 'name' : 'Python' }

Hashes

Ask

ask:1 { 'id' : 1, 'title' : 'Ruby on Rails 为什么如此高效？ ' }ask:2 { 'id' : 2, 'title' : 'Ruby 编程入门应该看什么书籍？ ' }ask:3 { 'id' : 3, 'title' : 'Ruby 和 Python 那个更好 ?' }ask:4 { 'id' : 4, 'title' : ' 做 Python 开发应该用什么开发工具比较好？ ' }

前缀匹配搜索过程r

1

[rails,rake,rubies,ruby]

ru

8

[rubies,ruby]

ruby

13

[ruby]

输入

坐标

得到从坐标 1 到 101 之间的前缀，并取出带 * 号的项

redis> ZRANGE 1 100+1

redis> SORT topic:rubies+ruby BY topic:_score_:* DESC LIMIT 0 10

[2,3,1,4]返回到 redis-search [2,1] [1]

redis> HMGET ask 2,3,1,4

结果

1. r2. ra3. rai4. rail• rails*1. rak• rake*• ru• rub• rubi• rubie• rubies*• ruby*

redis> ZRANK r

rub

9

redis> SUNIONSTORE topic:rubies+ruby topic:rubies topic:ruby取关键词的并集

排序

{ 'id' : 2, 'name' : 'Rails' , 'score' : 18 }{ 'id' : 3, 'name' : 'Rubies', 'score' : 10 }{ 'id' : 1, 'name' : 'Ruby' , 'score' : 5 }{ 'id' : 4, 'name' : 'Rake', 'score' : 4 }

http://antirez.com/post/autocomplete-with-redis.html前缀算法索引来源 :

http://redis.io/commands/zrange

http://redis.io/commands/sort


http://redis.io/commands/zrank

http://redis.io/commands/sunionstore

http://antirez.com/post/autocomplete-with-redis.html

分词搜索过程

Ruby

[ruby]

[1,2,3]

Ruby 什么

[ruby, 什么 ]

[1,2]

Ruby 什么书籍

[ruby, 什么 , 书籍 ]

[2]

输入

分词得到

交集 (in Redis)

redis> SINTERSTORE ask:ruby+ 什么 + 书籍 ask:ruby ask: 什么 ask: 书籍

redis> SORT ask:ruby+ 什么 + 书籍 BY ask:_score_:* DESC LIMIT 0 10

[2,3,1]返回编号到 redis-search [2,1] [2]

redis> HMGET ask 2,3,1

{ 'id' : 2, 'title' : 'Ruby 编程入门应该看什么书籍？ ', 'score' : 20 }{ 'id' : 3, 'title' : 'Ruby 和 Python 那个更好 ?' , 'score' : 13 }{ 'id' : 1, 'title' : 'Ruby on Rails 为什么室如此高效？ ' , 'score' : 4 }

结果

http://redis.io/commands/sinterstore



so...

Redis-Search

ActiveRecord

Redis-Search 特性

• iMac 上面能够 100 万 + 数据的搜索能够达到 10ms/ 次以内响应速度 ;

• 实时更新搜索索引；

• 中文分词搜索 (rmmseg-cpp)

• 前缀匹配搜索；

• No-SQL - 无需查询原始数据库；

• 根据汉语拼音搜索 (chinese_pinyin) ；

• ActiveRecord 和 Mongoid 支持；

https://github.com/pluskid/rmmseg-cpp

https://github.com/flyerhzm/chinese_pinyin

Redis-Search 的局限性

• 只能针对一个字段搜索（后面会加入别名搜索功能 ) ；

• 排序选项有限（目前只有一个）；

• 附加条件只能是 = ，不能 > 或 < ... ；

• 拼音搜索在某些同音字场景下面会有小出入；

应用场景

• 文章搜索；

• 搜索用户；

• 国家，城市匹配；

• 好友匹配；

• 分类， Tag 匹配；

• 其他名称匹配（如：店名，地址，品牌，书籍，电影，音乐 ...)

• 相关内容匹配；

How to use it?

gem 'redis','>= 2.1.1'gem 'chinese_pinyin', '0.4.1'gem 'rmmseg-cpp-huacnlee', '0.2.9'gem 'redis-namespace','~> 1.1.0'gem 'redis-search', '0.7.0'

Gemfile

shell> bundle install

安装

config/initializers/redis_search.rb

require "redis"require "redis-namespace"require "redis-search"redis = Redis.new(:host => "127.0.0.1",:port => "6379")redis.select(3)# 设置命名空间，防止和其他项目发生冲突redis = Redis::Namespace.new("your_app_name:search", :redis => redis)Redis::Search.configure do |config| config.redis = redis # 前缀匹配搜索阀值，设置多少要看你需要前缀匹配的内容，最长的字数有多少，越短越好 config.complete_max_length = 100 # 是否开启拼音搜索 config.pinyin_match = trueend

配置

Model 配置

class User include Mongoid::Document include Redis::Search

field :name field :tagline field :email field :followers_count, :type => Integer, :default => 0 field :sex, :type => Integer, :default => 0 # 开启次 Model 的搜索索引 # title_field 用于搜索的字段 # prefix_index_enable 是否使用逐字匹配 # score_field 排序字段 # condition_fields 附加条件 # ext_fields 存入 Hash 的字段 , 因为 redis-search 不再查询原始数据库，所以如果显示需要某些字段，请把它定义到这里 redis_search_index(:title_field => :name, :prefix_index_enable => true, :score_field => :followers_count, :condition_fields => [:sex] :ext_fields => [:email,:tagline])end

配置好以后， Redis-Search 将会在数据 Create, Update, Destroy 的时候自动更

新 Redis 里面的索引，以及 Hash 数据，无需理会更新的问题。

查询

rails c> Redis::Search.complete('User', 'hua', :conditions => {:sex => 1}, :limit => 20)

前缀匹配搜索 :

普通分词搜索 :

rails c> Redis::Search.query('Ask', 'Ruby敏捷开发 ', :conditions => {:state => 1}, :limit => 20)

http://github.com/huacnlee/redis-search

项目地址

https://github.com/huacnlee/redis-search

Thanks

Documents

Rails App 运用 Redis 构建高性能的实时搜索