28
PGroonga & Zulip Powered by Rabbit 2.2.1 PGroonga & Zulip Kouhei Sutou ClearCode Inc. Zulip & PGroonga Night 2017-09-06

PGroonga & Zulip

Embed Size (px)

Citation preview

  1. 1. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroonga & Zulip Kouhei Sutou ClearCode Inc. Zulip & PGroonga Night 2017-09-06
  2. 2. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroonga Pronunciation: pz:ln PostgreSQL extension PostgreSQL Fast full text search All languages are supported!
  3. 3. PGroonga & Zulip Powered by Rabbit 2.2.1 Fast? Need to measure to confirm Targets textsearch (built-in) pg_bigm (third party)
  4. 4. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroona and textsearch 0 0.2 0.4 0.6 0.8 1 1.2 1.4 PostgreSQL OR MySQL database America Data: English Wikipedia (Many records and large docs) N records: About 5.3millions Average text size: 6.4KiB Elapsedtime(ms) (Shorterisbetter) Query PGroonga textsearch
  5. 5. PGroonga & Zulip Powered by Rabbit 2.2.1 As fast as textsearch textsearch textsearch uses word based full text search textsearch PostgreSQL has enough performance for the approach PostgreSQL
  6. 6. PGroonga & Zulip Powered by Rabbit 2.2.1 textsearch and Japanese textsearch Asian languages including Japanese aren't supported Need plugin Plugin exists but isn't maintained
  7. 7. PGroonga & Zulip Powered by Rabbit 2.2.1 Japanese support Need one of them N-gram approach support N-gram Japanese specific word based approach support PGroonga supports both of them
  8. 8. PGroonga & Zulip Powered by Rabbit 2.2.1 PostgreSQL and N-gram PostgreSQLN-gram PostgreSQL is slow with N- gram approach PostgreSQLN-gram N-gram approach: pg_trgm (contrib) Japanese isn't supported by default pg_bigm (third-party)
  9. 9. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroona and pg_bigm 0 0.5 1 1.5 2 2.5 3 311 14706 20389 Data: Japanese Wikipedia (Many records and large documents) N records: About 0.9millions Average text size: 6.7KiB Fast Fast Elapsedtime(sec) (Lowerisbetter) N hits PGroonga pg_bigm
  10. 10. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroonga is fast stably PGroonga PostgreSQL needs "recheck" for N-gram approach PostgreSQLN-gramrecheck Seq search after index search PGroonga doesn't need PGroonga Only index search OK
  11. 11. PGroonga & Zulip Powered by Rabbit 2.2.1 Wrap up textsearch is fast but Asian langs aren't supported textsearch pg_bigm supports Japanese but is slow for large hits pg_bigm PGroonga is fast and supports all languages PGroonga
  12. 12. PGroonga & Zulip Powered by Rabbit 2.2.1 FYI: textsearch, PGroonga and Groonga 0 0.2 0.4 0.6 0.8 1 1.2 1.4 PostgreSQL OR MySQL database America Data: English Wikipedia (Many records and large docs) N records: About 5.3millions Average text size: 6.4KiB Groonga is 30x faster than others Elapsedtime(ms) (Shorterisbetter) Query PGroonga Groonga textsearch
  13. 13. PGroonga & Zulip Powered by Rabbit 2.2.1 Zulip and PGroonga Zulip uses textsearch by default Zuliptextsearch Japanese isn't supported Zulip supports PGroonga as option ZulipPGroonga Implemented by me
  14. 14. PGroonga & Zulip Powered by Rabbit 2.2.1 Zulip: full text search Zulip Zulip is chat tool Zulip Latency is important for UX UX Index update is heavy Delay index update
  15. 15. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index update CREATE TABLE zerver_message ( rendered_content text, -- ... Column for full text search search_tsvector tsvector ); -- Index for full text search CREATE INDEX zerver_message_search_tsvector ON zerver_message USING gin (search_tsvector);
  16. 16. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index update -- Execute append_to_fts_update_log() on change CREATE TRIGGER zerver_message_update_search_tsvector_async BEFORE INSERT OR UPDATE OF rendered_content ON zerver_message FOR EACH ROW EXECUTE PROCEDURE append_to_fts_update_log();
  17. 17. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index update -- Insert ID to fts_update_log table CREATE FUNCTION append_to_fts_update_log() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN INSERT INTO fts_update_log (message_id) VALUES (NEW.id); RETURN NEW; END $$;
  18. 18. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index update -- Keep ID to be updated CREATE TABLE fts_update_log ( id SERIAL PRIMARY KEY, message_id INTEGER NOT NULL );
  19. 19. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index update -- Execute do_notify_fts_update_log() -- on INSERT CREATE TRIGGER fts_update_log_notify AFTER INSERT ON fts_update_log FOR EACH STATEMENT EXECUTE PROCEDURE do_notify_fts_update_log();
  20. 20. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index update -- NOTIFY to fts_update_log channel! CREATE FUNCTION do_notify_fts_update_log() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN NOTIFY fts_update_log; RETURN NEW; END $$;
  21. 21. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index update cursor.execute("LISTEN ftp_update_log") # Wait cursor.execute("SELECT id, message_id FROM fts_update_log") ids = [] for (id, message_id) in cursor.fetchall(): cursor.execute("UPDATE zerver_message SET search_tsvector = " "to_tsvector('zulip.english_us_search', " "rendered_content) " "WHERE id = %s", (message_id,)) ids.append(id) cursor.execute("DELETE FROM fts_update_log WHERE id = ANY(%s)", (ids,))
  22. 22. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroonga: index update PGroonga PGroonga's index update is fast too PGroonga PGroonga's search while index update is still fast PGroonga
  23. 23. PGroonga & Zulip Powered by Rabbit 2.2.1 Perf characteristics Searchthroughput Update throughput PGroonga Searchthroughput Update throughput GIN Keep search performance while many updates Decrease search performance while updating
  24. 24. PGroonga & Zulip Powered by Rabbit 2.2.1 Update and lock Update without read locks Write locks are required
  25. 25. PGroonga & Zulip Powered by Rabbit 2.2.1 GIN: Read/Write GIN Conn1 Conn2 INSERT start SELECT start Blocked INSERT finish SELECT finish GIN Slow down!
  26. 26. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroonga: Read/Write PGroonga Conn1 Conn2 INSERT start SELECT start INSERT finish SELECT finish PGroonga No slow down!
  27. 27. PGroonga & Zulip Powered by Rabbit 2.2.1 Wrap up Zulip: Low latency for UX ZulipUX Delay index update PGroonga: Keeps fast search with update PGroonga Chat friendly characteristics
  28. 28. PGroonga & Zulip Powered by Rabbit 2.2.1 More PGroonga features PGroonga Query expansion Support synonyms Similar search Find similar messages Fuzzy search Stemming