Migration
from mysql to
elasticsearch
Our Service
User can find jobs
that fit him/her
Company can find suitable
candidates for its jobs.
Matching on Structured Data
• Both User and Job data are stored in mysql

• Records in Job table corresponds to multiple records in
child tables, so do records in User table (1:N relationship)
Job
Location Salary Others
User
Location Salary Others
Number of records
Table Number of records
User 529,683
child tableX of user 2,984,111
child tableY of user 1,966,161
… …
Table Number of records
Job 160,305
child tableA of job 2,359,512
child tableB of job 232,202
… …
For user searchFor Job search
Complicated SQL Query
SELECT parentTable.id FROM parentTable

INNER JOIN childTableD ON childTableD.id = parentTable.id 

LEFT OUTER JOIN childTableA ON childTableA.id = parentTable.id AND childTableA.code = 273230450

….

INNER JOIN (

SELECT id FROM childTableB

WHERE code IN (11232) AND id IN (SELECT id FROM parentTable WHERE code = 1 )

UNION

….

SELECT id FROM childTableC

WHERE code IN (1, 2, 3) AND id IN (SELECT id FROM parentTable WHERE code = 5 )

) temp2 ON parentTable.id = temp2.id

WHERE

AND childTableA.id IS NULL

….

AND parentTable.id IN (SELECT id FROM childTableC WHERE code IN (1, 2, 3))

AND (

parentTable.id IN (SELECT id FROM childTableB WHERE code IN (11232))

OR 

….

)

) ORDER BY childTableD.timestamp DESC, parentTable.updatedOn DESC LIMIT 101 OFFSET
• And SQL queries are constructed dynamically in Java.
We are struggling handling job/user search
• Lots of slow ( > 1s) queries of mysql

• Some of them are pretty slow ( > 2s )
160305
Top 20 slow queries of a day
Total Count Average Sec Query
1 209 1.4 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
2 197 1.4 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
3 175 1.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
4 158 1.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
5 113 3.2 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
6 96 1.1 INSERT INTO tableC (id, lastLoginDate, rank, sortKey, createdOn, updatedOn) SELECT id, DATE(FROM_UNIXTIME(IFNUL,
7 85 3.3 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
8 85 1.1 INSERT INTO tableD (id, tableD, createdOn, updatedOn) SELECT id, 2, UNIX_TIMESTAMP(), UNIX_TIMESTAMP() FROM
9 84 1.1 INSERT INTO tableC (id, lastLoginDate, rank, sortKey, createdOn, updatedOn) SELECT id,DATE(FROM_UNIXTIME(IFNUL
10 82 1.1 INSERT INTO tableD (id, tableD, createdOn, updatedOn) SELECT id, 1, UNIX_TIMESTAMP(), UNIX_TIMESTAMP() FROM
11 80 3.2 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
12 79 3.4 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
13 77 4.1 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
14 70 1.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
15 70 4.3 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
16 69 3 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
17 69 5.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
18 67 2.6 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
19 65 1.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
20 63 4.1 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc
Most of them are related to user search
Solution for the slow query problem
• SQL tuning

-> Done (4 different sorted tables, bit-operation). 

-> But still slow

• Introducing elasticsearch(ES)

-> Maybe. But ES is a full-text search engine.

Can ES handle structured data too? Not sure 🤔

• Do an experiment before introducing ES to our project

-> Create an ES cluster with production data in dev env



• The result: ES query(0.5s) is faster than mysql one(2s)
The first target
• User data characteristics

• Data are updated in real-time.

• A number of target records increases quickly 

Less than 10,000 in Jan/2017 

More than 500,000 in Aug/2017

• Job data characteristics

• Data are updated three times a day

• Data growth isn’t fast

• Data structure will change due to a business reason
Let’s migrate job search first!
Basic Strategy
• Data can be recovered from mysql 

-> Data can be recovered by batch even if entire ES cluster is collapsed

• Try to minimize use case of ES

-> Only focus on searching list of jobs which match the given search
condition

-> Do not deal with trivial queries for a specific job (use mysql)

• Use high performance machine

-> 3 physical data nodes (20CPU,32GB RAM 200GB SSD)

• All job related data stored in ES

-> We had a option that we get only ids by ES and get necessary data
from other data source (such as mysql, redis). But we didn’t choose it
for simple implementation
Performance Tuning
• Servers in Cluster

3 master nodes: To avoid split brain

3 data nodes: 2 maybe enough, but for safety

• Index settings

1 shard: 1 shard has better performance than 2 shards

2 replicas: each data node has 1 shard

• Indexing performance setting

indices.store.throttle.max_bytes_per_sec is set 200mb (default
20mb)
Performance Tuning
• Memory Settings

• Thread Pool Settings
Default Tuned
indices.fielddata.cache.size Unbounded 10%
indices.queries.cache.size 10% 10%
indices.requests.cache.size 1% 1%
Default Tuned
thread_pool.index.queue_size 200 1,000
thread_pool.bulk.queue_size 200 1,000
thread_pool.search.queue_size 1,000 5,000
Preparation Before Migration
• Stress Test

-> Use gatling (load and performance test tool)

-> Use queries executed in beta

-> High load search while indexing

-> Check the maximum request per second of the cluster (900req/sec)

-> Long run (more than 24 hours with load)

• HA and failure Test

-> Search function works even when one data node is available

-> Top page of our service is available even when entire cluster is down

-> Circuit breaker behavior

• Monitoring and Alert

-> Use prometheus and alertmanager

-> Server metrics, JVM metrics, ES specific metrics such as query cache size

-> Alert test before operation
As a result of
migration
Latency of API for job search
Before Migration (mysql)
After Migration (elasticsearch)
ES Cluster metrics
Server
ES Cluster metrics
JVM
Positive side effects
• Dedicated full text search engine (groonga) is no longer needed

Everything can be done in ES

• More flexible and precise geo location search than geohash

• Simpler code

-> No SQL tricks and No need to refer many tables

• Avoid constructing SQL query by string concatenation
Future Plan
• adapt ES to other implementation

-> Job related features that we avoided adapting before 

-> User search (the main target)

• Better full text search with neologd

Migration from mysql to elasticsearch

  • 1.
  • 2.
    Our Service User canfind jobs that fit him/her Company can find suitable candidates for its jobs.
  • 3.
    Matching on StructuredData • Both User and Job data are stored in mysql • Records in Job table corresponds to multiple records in child tables, so do records in User table (1:N relationship) Job Location Salary Others User Location Salary Others
  • 4.
    Number of records TableNumber of records User 529,683 child tableX of user 2,984,111 child tableY of user 1,966,161 … … Table Number of records Job 160,305 child tableA of job 2,359,512 child tableB of job 232,202 … … For user searchFor Job search
  • 5.
    Complicated SQL Query SELECTparentTable.id FROM parentTable INNER JOIN childTableD ON childTableD.id = parentTable.id LEFT OUTER JOIN childTableA ON childTableA.id = parentTable.id AND childTableA.code = 273230450 …. INNER JOIN ( SELECT id FROM childTableB WHERE code IN (11232) AND id IN (SELECT id FROM parentTable WHERE code = 1 ) UNION …. SELECT id FROM childTableC WHERE code IN (1, 2, 3) AND id IN (SELECT id FROM parentTable WHERE code = 5 ) ) temp2 ON parentTable.id = temp2.id WHERE AND childTableA.id IS NULL …. AND parentTable.id IN (SELECT id FROM childTableC WHERE code IN (1, 2, 3)) AND ( parentTable.id IN (SELECT id FROM childTableB WHERE code IN (11232)) OR …. ) ) ORDER BY childTableD.timestamp DESC, parentTable.updatedOn DESC LIMIT 101 OFFSET • And SQL queries are constructed dynamically in Java.
  • 6.
    We are strugglinghandling job/user search • Lots of slow ( > 1s) queries of mysql • Some of them are pretty slow ( > 2s ) 160305
  • 7.
    Top 20 slowqueries of a day Total Count Average Sec Query 1 209 1.4 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 2 197 1.4 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 3 175 1.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 4 158 1.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 5 113 3.2 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 6 96 1.1 INSERT INTO tableC (id, lastLoginDate, rank, sortKey, createdOn, updatedOn) SELECT id, DATE(FROM_UNIXTIME(IFNUL, 7 85 3.3 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 8 85 1.1 INSERT INTO tableD (id, tableD, createdOn, updatedOn) SELECT id, 2, UNIX_TIMESTAMP(), UNIX_TIMESTAMP() FROM 9 84 1.1 INSERT INTO tableC (id, lastLoginDate, rank, sortKey, createdOn, updatedOn) SELECT id,DATE(FROM_UNIXTIME(IFNUL 10 82 1.1 INSERT INTO tableD (id, tableD, createdOn, updatedOn) SELECT id, 1, UNIX_TIMESTAMP(), UNIX_TIMESTAMP() FROM 11 80 3.2 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 12 79 3.4 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 13 77 4.1 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 14 70 1.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 15 70 4.3 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 16 69 3 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 17 69 5.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 18 67 2.6 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 19 65 1.9 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc 20 63 4.1 SELECT COUNT(*) FROM tableA WHERE tableA.id IN ( SELECT tableA.id FROM tableA LEFT OUTER JOIN tableB ON sc Most of them are related to user search
  • 8.
    Solution for theslow query problem • SQL tuning
 -> Done (4 different sorted tables, bit-operation). 
 -> But still slow • Introducing elasticsearch(ES)
 -> Maybe. But ES is a full-text search engine.
 Can ES handle structured data too? Not sure 🤔 • Do an experiment before introducing ES to our project
 -> Create an ES cluster with production data in dev env
 • The result: ES query(0.5s) is faster than mysql one(2s)
  • 9.
    The first target •User data characteristics • Data are updated in real-time. • A number of target records increases quickly 
 Less than 10,000 in Jan/2017 
 More than 500,000 in Aug/2017 • Job data characteristics • Data are updated three times a day • Data growth isn’t fast • Data structure will change due to a business reason Let’s migrate job search first!
  • 10.
    Basic Strategy • Datacan be recovered from mysql 
 -> Data can be recovered by batch even if entire ES cluster is collapsed • Try to minimize use case of ES
 -> Only focus on searching list of jobs which match the given search condition
 -> Do not deal with trivial queries for a specific job (use mysql) • Use high performance machine
 -> 3 physical data nodes (20CPU,32GB RAM 200GB SSD) • All job related data stored in ES
 -> We had a option that we get only ids by ES and get necessary data from other data source (such as mysql, redis). But we didn’t choose it for simple implementation
  • 11.
    Performance Tuning • Serversin Cluster
 3 master nodes: To avoid split brain
 3 data nodes: 2 maybe enough, but for safety • Index settings
 1 shard: 1 shard has better performance than 2 shards
 2 replicas: each data node has 1 shard • Indexing performance setting
 indices.store.throttle.max_bytes_per_sec is set 200mb (default 20mb)
  • 12.
    Performance Tuning • MemorySettings • Thread Pool Settings Default Tuned indices.fielddata.cache.size Unbounded 10% indices.queries.cache.size 10% 10% indices.requests.cache.size 1% 1% Default Tuned thread_pool.index.queue_size 200 1,000 thread_pool.bulk.queue_size 200 1,000 thread_pool.search.queue_size 1,000 5,000
  • 13.
    Preparation Before Migration •Stress Test
 -> Use gatling (load and performance test tool)
 -> Use queries executed in beta
 -> High load search while indexing
 -> Check the maximum request per second of the cluster (900req/sec)
 -> Long run (more than 24 hours with load) • HA and failure Test
 -> Search function works even when one data node is available
 -> Top page of our service is available even when entire cluster is down
 -> Circuit breaker behavior • Monitoring and Alert
 -> Use prometheus and alertmanager
 -> Server metrics, JVM metrics, ES specific metrics such as query cache size
 -> Alert test before operation
  • 14.
    As a resultof migration
  • 15.
    Latency of APIfor job search Before Migration (mysql) After Migration (elasticsearch)
  • 16.
  • 17.
  • 18.
    Positive side effects •Dedicated full text search engine (groonga) is no longer needed
 Everything can be done in ES • More flexible and precise geo location search than geohash • Simpler code
 -> No SQL tricks and No need to refer many tables • Avoid constructing SQL query by string concatenation
  • 19.
    Future Plan • adaptES to other implementation
 -> Job related features that we avoided adapting before 
 -> User search (the main target) • Better full text search with neologd