Upload
rakuten-inc
View
52
Download
0
Embed Size (px)
Citation preview
Technology Conference -Rakuten Ichiba
22th October, 2016
E-Commerce Company
Rakuten, Inc.
Japan Ichiba
Cross border tradingTaiwan Ichiba
Rakuten Ichiba
1.R-Framework
2.Rakuten Ichiba iOS App
3.Redis Cluster
4.Rakuten Catalog Platform
Agenda
R-Framework
22th October, 2016
Daniel Berlanga
EC Marketplace Mall Development Department
6
Updating legacy systems
Question for (frontend) developers:
• Who’s using new frameworks?
• Who’s using jQuery?
• Who’s using something that you’d call legacy?
7
How to renew a system
• Make it using a new technology from scratch
• Time costly
• Big Impact
APIDB
Frontend:
HTML
CSS
JavaScript
ControllersFrontend:
HTML
CSS
JavaScript
❌
8
How to renew a system
Change the system one piece at a time:
• Less impact
• Backward compatibility
Phase 1
Set a base of
standard code
shared by
developers
Phase 2
????
Phase 3
PROFIT!!!Use the built
changes of
standard code to
change to a new
technology
9
Step 1: Creating a base of shared code
Amount of code required
Amount of shared code
10
R Framework
R is a JavaScript framework open for everyone in Rakuten Ichiba
• Common code sharing
• Unit tests
• Automatic building
• Componentization
• Improve development performance
• Scalability
11
R familyR
mo
du
les modules
R.uiR.apiR
jqu
ery
1.1
2.0
2.2
.0
vis
ea
rch
1.0
.0
R.a
pi.b
row
sin
gH
isto
ry
R.a
pi.it
em
Re
co
mm
en
d
R.a
pi.
ka
imaw
ari
R.a
pi.s
cv
R.Item
R.Shop
R.Search
Keyword
R.u
i.S
lid
es
ho
w
R.u
i.L
igh
tbo
x
R.u
i.Ta
bs
R util
enums
browser
Cache
cookies
dis
pla
y
Storage
user
DataRequester
Va
lid
ato
r
loa
dIm
ag
e2.6
.1
12
Rmodules
Glue for all the components.Provides definition of the dynamic modules and enables load on request
Rm
od
ule
s modules
R.uiR.apiR
jqu
ery
1.1
2.0
2.2
.0
vis
ea
rch
1.0
.0
R.a
pi.
bro
ws
ing
His
tory
R.a
pi.
ite
mR
eco
mm
en
d
R.a
pi.
kaim
aw
ari
R.a
pi.
scv
R.Item
R.Shop
R.Search
Keyword
R.u
i.S
lid
es
ho
w
R.u
i.L
igh
tbo
x
R.u
i.T
ab
s
R util
enums
browser
Cache
cookies
dis
pla
y
Storage
user
DataRequester
Va
lid
ato
r
loa
dIm
ag
e2
.6.1
13
R
Collection of useful functions shared among all the JavaScript parts, providing
compatibility with all supported browsersR
mo
du
les modules
R.uiR.apiR
jqu
ery
1.1
2.0
2.2
.0
vis
ea
rch
1.0
.0
R.a
pi.
bro
ws
ing
His
tory
R.a
pi.
ite
mR
eco
mm
en
d
R.a
pi.
kaim
aw
ari
R.a
pi.
scv
R.Item
R.Shop
R.Search
Keyword
R.u
i.S
lid
es
ho
w
R.u
i.L
igh
tbo
x
R.u
i.T
ab
s
R util
enums
browser
Cache
cookies
dis
pla
y
Storage
user
DataRequester
Va
lid
ato
r
loa
dIm
ag
e2
.6.1
• R.enums
• R.browser
• R.Cache
• R.cookies
• R.display
• R.Storage
• R.user
• R.Validator
14
R.ui
UI components are used like widgets, accepting data and options to display
coherent behavior between different pagesR
mo
du
les modules
R.uiR.apiR
jqu
ery
1.1
2.0
2.2
.0
vis
ea
rch
1.0
.0
R.a
pi.
bro
ws
ing
His
tory
R.a
pi.
ite
mR
eco
mm
en
d
R.a
pi.
kaim
aw
ari
R.a
pi.
scv
R.Item
R.Shop
R.Search
Keyword
R.u
i.S
lid
es
ho
w
R.u
i.L
igh
tbo
x
R.u
i.T
ab
s
R util
enums
browser
Cache
cookies
dis
pla
y
Storage
user
DataRequester
Va
lid
ato
r
loa
dIm
ag
e2
.6.1• R.ui.Slideshow
• R.ui.Lightbox
• R.ui.Tabs
• …
15
R.api
JavaScript wrappers for all used APIs:
Rm
od
ule
s modules
R.uiR.apiR
jqu
ery
1.1
2.0
2.2
.0
vis
ea
rch
1.0
.0
R.a
pi.
bro
ws
ing
His
tory
R.a
pi.
ite
mR
eco
mm
en
d
R.a
pi.
kaim
aw
ari
R.a
pi.
scv
R.Item
R.Shop
R.Search
Keyword
R.u
i.S
lid
es
ho
w
R.u
i.L
igh
tbo
x
R.u
i.T
ab
s
R util
enums
browser
Cache
cookies
dis
pla
y
Storage
user
DataRequester
Va
lid
ato
r
loa
dIm
ag
e2
.6.1
• Abstraction of the API implementation
• Avoid the use of magic numbers: enumerated values are implemented
• Usage as JavaScript function, no need to worry for server-client interaction
• Timeout error management
• Wrong data management
• Methods for easier/faster test
• Mockup data available for testing
• Cached responses
16
FIN
@danikaze
/in/danikaze
Reconstruct a million-user App
22th October, 2016
Jin Nagumo
EC Strategy Department
Lead engineer of Ichiba iOS App
Agenda
• About Rakuten Ichiba iOS app
• Problems
• Strategy & Action
• Lesson learned
18
About Rakuten Ichiba iOS app
19
• The first app after searching
“Rakuten” in App Store
• Main app with basic Rakuten
Ichiba functions
• Share Rakuten Ichiba Items with
iMessage Extension(iOS10)
Tens of millions of active users
everyday!
20
Problems
Ill-formed Structure With No Strategy
21
Easy To Be Broken Code Everywhere
22
Fragile & Difficult To Add New Features
23
Small Failure Causes Huge Negative Impact
24
Conservative Mindset Impedes Innovation
25
26
How To Right This Wrong?
Rewrite From Scratch
27
If we go with refactoring…....• Time spent on legacy >>>>>>>> Time spent on redesign• No way to deal with wrongly chosen technology• Very limited space for a starting point• Too fragile to survive
Reconstruct the app and get everything done correctly this time!
Fully Reconstruct With Careful Redesign
28
Data Model & Basic Business Logics
Local & Remote Data Provider
Data Source Containing Business Logics For Specific UI
Screen Specific UI Shared UI Components
Rakuten Ichiba Kit
Detailed Documentation For The Overall Architecture
29
Implementation Tutorials To Prevent Bad Code
30
Revised Code Review Perspective
31
- No faultfinding, no personal preferences- Decreasing code dependency
- “If you think you are fully capable of debugging this code alone, then you can mark it as approved”
- Important place/process for us to learn from each other
Horizontal Assignment To Achieve Max Productivity
32
Developer 1 Developer 2 Developer 3
Feature 1 Feature 2 Feature 3
UI
N/W
B/L
UI
N/W
B/L
UI
N/W
B/L
Previously
Feature 1 Feature 2 Feature 3
UI
N/W
B/L
Developer 1
Developer 2
Developer 2
Currently
33
Lesson Learned
Refactor Early, Refactor Often
• Code base = patient
• Continuously refactor including backbone
• Consistent guidelines & tutorials
• Right person do the right thing
• Stateless code makes things testable
34
Redis Cluster in Ichiba
22th October, 2016
Kejun Huang Twitter: @iandyh GitHub: @iandyh
EC Marketplace RMS Development Department
Redis Cluster supports some of our most
important services(Taiwan)
Why Redis Cluster
• Redis is great: data structure server, great performance (1 million QPS withpipeline)
• Distributed Redis is needed because single instance is not enough
• Production ready since March 2015 and it’s becoming better
• Easy to provision and manage(compared to existing solution)
38
Redis Cluster Introduction
Master A Master B Master C
Slave of A Slave of B Slave of C
Replication
Slot = crc16(key). Client maintains a map between slot and Redis node.
Redis Cluster uses gossip to understand the state of each other andmake decisions according to the views from majority of the masters.
slots:0-5460 slots: 5461-10922 Slots: 10923-16383
The limitation of Redis Cluster
• It only supports up to 1000 nodes, no big cluster
• If majority of the masters cannot be reached, the cluster stopfunctioning
• You must have at least one slave for each master
• During small window, the acknowledged writes can be lost
Why we choose Redis Cluster
• Applications were using Redis
• We do not want to maintain Twemproxy + Sentinel + Redis
• We mainly uses for cache, so write safely is not the greatest concern
• It performs well in the testing, especially during failover, no human involved
• Scaling up(adding nodes to cluster) is easy
• Client support in Java is mature
41
How we use Redis Cluster
Configure Redis as LRUcache(allkeys-lru), turn offpersistency(AOF and RDB).Renamed dangerouscommands.
Currently we have 7 clusters runningin production. Some applicationshave their dedicated cluster. Someof them shared a cluster.Applications only connect tomasters.
Run on machines with 4cores, 32 GB memory, 1Gbitsnetwork machine. 4 Redisinstances each machine, ~6GB maxmemory set
Collectd, Graphite,Grafana formonitoring.
Things we have learned
• Bottleneck is network IO most of the time especially for requests with large payload
• Keep the size of each Redis instance small
• Client bugs, new protocol introduced because of Redis Cluster
• Persistence, difficult to recover from RDB files
• Directly connecting to Redis is awesome, but it increases operation difficulty.
• Upgrading Redis itself is not a happy process
43
In-house tools to manage the cluster
• Visualise cluster topology• Wrap `redis-trib.rb` with UI to prevent human mistake
Future work
1. Because of legacy code, we are mostly only using get/set commandsin Redis
2. Currently we are still using hard coded IP for Redis instances.
3. Unify access from one single client
4. Automatic resource allocation
46
22th October, 2016
Product Catalog Section - EC Company
Rakuten Catalog Platform - EC Company
Rakuten Institute of Technology
Rakuten Catalog PlatformData Science x Product Catalog Management
47
Agenda
I. What is Rakuten Catalog Platform - Ryuma
II. Data Science for Product Catalog - KJ
III. Data Delivery by Rakuten Catalog Platform - Suguru
48
Speakers
RYUMA
IKEDA
KEIJI
SHINZATO
SUGURU
SUZUKI
PART 1
What is Rakuten Catalog Platform
PART 2
Data Science for Product Catalog
PART 3
Data Delivery by Rakuten Catalog Platform
49
Speakers
E-Commerce Company : Manager, Taxonomist, Product Manager
Leading strategic planning and implementation of product catalog platform powering
website’s navigation and core product search functionality and leading design of
product taxonomies. Previously, as a taxonomist at Amazon, developed taxonomies
and corresponding navigation. Love cats.
RYUMA
IKEDA
PART 1 What is
Rakuten Catalog Platform
50
Locations
Seattle
San FranciscoIrvine
BostonNYC
TokyoDalian
Bangalore
Worldwide Development & Operations
51
Rakuten Catalog Platform
RMS Rakuten Ichiba
Catalog Data
Merchant’s
Product
Master Specs
Taxonomy &
Metadata
Search
Contents
Navigation
Price
Comparison
Inventory
Page
Order
Sales
Analytics
Merchants Customers
52
Example of Catalog Data
Merchant’s Product Master Specs Taxonomy
Over 230,000,000 Products 350,000 Categories
53
Product Data Journey
Collection Processing Delivery
• Merchants (1) submit product
data through shop
management system (2)
• Product data store in
product database (3)
• Utilize data science
technique (4) with operators
(5) to enrich product data (6)
with several KPIs (7)
• Product data is delivered to
front-end database (8)
• Customers (9) will see
product data with PC (10)
and smartphones (11)
1
2 3
4
5
7
6 8
10
11
9
54
How it works? - SERPs -
As - is To-be
I am browsing men’s
sneaker category. But,
why boots is showing
up?
Sneakers
Sneakers
Boots
Classification
We are adapting, several ways
of the classification methods.
• Manual classification
(operator review)
• Rule-base classification
(text filter)
• Auto-classification
(machine learning)
55
How it works? - Faceted Navigation -
As - is To-be
I want to refine the
search by brand, series,
size and many other
detailed conditions, but...
Attribute Extraction
To enrich type of search filter
and expand it’s coverage,
1. Extract attribute values
(brands, color, size and
others) from product
information
2. Utilize these values for
faceted navigation
56
Speakers
Rakuten Institute of Technology : Lead Scientist
Joined Rakuten 2011 as an expert of natural language processing. Before joining
Rakuten, Worked at Kyoto University as a post-doctoral researcher. Research interest
are knowledge acquisition, information extraction, sentiment analysis and text mining.
Love craft beer.
KEIJI
SHINZATO
PART 2 Data Science
for Product Catalog
57
Our challenges
• Auto classification
– A large number of categories (30,000 categories!)
– Hierarchical machine learning approach
• Attribute extraction (focusing on “brand”)
– Ambiguity
• パーカー (brand/hoodie), ブラウン (brand/color)
– Dictionary based approach
58
Overview of brand extraction
Brand Dictionary
Product titles andtheir categories
Input data with brands
• Tokenization• PoS tagging
Brand Extraction
Morphological Analysis
RakutenProduct Data
(category, title, description)
Heuristic rules,Machine learning
Manual evaluation
• Extract the mostleft side candidates
59
Brand dictionary
• Brand expression with its relevant category
– The method employs brand expressions whose relevant category is the same with a given product
• 100K entries
Brand expression Relevant category
力王 Gardening & Tools
中部電磁器工業 Computers & Networking
キメラパーク Women's Clothing
シュガーローズ Women's Clothing
サスクワッチファブリックス Women's Clothing
藤栄 Home Decor, Housewares &
Furniture
ミキモト Beauty, Cosmetics & Fragrances
エドウィンゴルフ Sports & Outdoors
AKI WORLD Sports & Outdoors
工房飛竜 Toys, Hobbies & Games
パーカー Home & Office Supplies
ハイライトキャバレー Men's Clothing
杉野 Men's Clothing
カウネット Kitchen, Dining & Bar
60
Women's Clothing Women's Clothing
Collecting brands from semi-structured data
Table Listing
61
Product title
Product description
JAN Brand
4948872 Sony
4992739 Coca-Cola
: :
Collecting brands from product titles using machine learning
<BRAND>SONY</BRAND> PlayStation4 Black CUH-2000BB01 (1TB)
Machine learning algorithm(Conditional Random Fields)
4948872XXXXXX
SONY PlayStation4 BlackCUH-2000BB01 (1TB)
JAN: Japan Article Number
62
Performance
• Manually assign brands to 500 randomly selected product titles
– % of products with brands: 69.6% (348/500)
• Precision: 91.9% (204/222)
• Recall: 58.6% (204/348)
• We can automatically assign correct brands for 92M products in 230M products!
63
Speakers
E-Commerce Company : Manager, Software Developer
Joined Rakuten 2007.Leading Rakuten Catalog Platform in Japan for Data feed in/out,
Store, Taxonomy management(Genre/Tag/Attribute) as technical side, Distribution and
Classification.
My favorite Rakuten Ichiba genreId is 100300.
SUGURU
SUZUKI
PART 3 Data Delivery by
Rakuten Catalog Platform
64
Rakuten Catalog Platform
RMS Rakuten Ichiba
Catalog Data
Merchant’s
Product
Master Specs
Taxonomy &
Metadata
Search
Contents
Navigation
Price
Comparison
Inventory
Page
Order
Sales
Analytics
Merchants Customers
Data Delivery
40,000 Shops 230,000,000 Merchant‘s Product 60 Services
65
Data Delivery Cycle
Faster Cheaper
Catalog Data
Merchant’s
Product
Master Specs
Taxonomy &
Metadata
Quality
VolumeSpeed
66
Quality Management/Control
Quality
67
Item RegistrationQuality
Collection
• Merchants (1) submit product
data through shop
management system (2)
1
2
68
Non Structured Data…
Merchants
送料無料
Po
int *
10
【送料無料】【あす楽】【ポイント10
倍】【おパン】【子パンダ】【Cotton
100%】【S/M/L】【バスト 82 – 94
cm】【かわいい】Panda T-Shirts /
WhitePanda T-Shirts S/M/L
Item Name
Item
Description
CategoryID
Image
As isQuality
レディースファッション > その他
69
Rich/Structured Data
Panda T-Shirts
Item Name
Master
Specs
Image
To be
Size :
Color :
Brand :
Texture :
Material :
Series :
Character :
Point :
Shipping :
S
White
Rakuten
Soft
Cotton
Panda
Opan
10 times
Free
Facet
Quality
Pure Image
Master Specs
Item Name Clean up
S M L Cotton 100%
Panda Opan White
Annotation
Merchants
70
Volume Management/Control
Volume
71
Product Data VolumeVolume
Processing
• Product data store in
product database (3)
• Utilize data science
technique (4) with operators
(5) to enrich product data (6)
with several KPIs (7)
3
4
5
7
6
230,000,000 Merchant’s Product
72
Volume Management/ControlVolume
Prepare rich/structured data for processing
Planning
i. Correct data to 1 place
ii. Processing with reducing Pre-Processing cost
iii. Deliver it to 1 place
Collection Processing Delivery
73
Delivery Management/Control
Speed
74
Product Data DeliverySpeed
Merchants
Auction
RMS
SearchEngine
kobo
Check Out
Review
Rakuten Search
Books
Ranking
Advertisement
TOP page
Item Page
Affiliate
BrowsingHistory
Web Service
Super DB
Report
Auto
Racoupon
BI Tool
60 Services(Included Oversea companies)
Delivery
• Product data is delivered to
front-end database (8)
• Customers (9) will see
product data with PC (10)
and smartphones (11)
8
10
11
9
For Customers
75
Data Delivery Concept
1. Data delivery from 1 place to all services
2. Automation(No Operation)
76
Data Delivery from Single Point
Merchants
Auction
RMS
SearchEngine
kobo
Check Out
Review
Rakuten Search
Books
Ranking
Advertisement
TOP page
Item Page
Affiliate
BrowsingHistory
Web Service
Super DB
Report
Auto
Racoupon
BI Tool
60 Services(Included Oversea companies)
Search
Engine
/API
Processing
Merchant’s Product
NavigationAPI
Taxonomy & Facets
Display Products
Name
Node
Level
Display Order
77
Data Delivery & Control by API
NavigationAPI
Taxonomy & Facets
Taxonomy: [
{
navigationId: 1000000,
navigationName: "カラー",
navigationLayout: pallet,
navigationDisplayLimit: 17,.
.
.
Deliver Logic for
Faceted Navigation
List -> colorPallet
カラー黒グレー白茶抹茶黄土色赤ピンクオレンジ黄色紫緑青
78
Summary
Faster Cheaper
Catalog Data
Merchant’s
Product
Master Specs
Taxonomy &
Metadata
Quality
VolumeSpeed
Structured data/Correction
Rich/Structured dataDeliver from Single Point/
Automation
79
We are Hiring!
http://global.rakuten.com/corp/careers/engineering/