1
Shaunak Kashyap
Developer Advocate at Elastic
@shaunak
Elasticsearch for SQL users
2
The Elastic Stack
ElasticsearchStore, Index & Analyze
KibanaUser Interface
SecurityMonitoring AlertingPlugins
Logstash BeatsIngest
Elastic Cloud: Elasticsearch as a ServiceHosted Service
3
Agenda
Search queries
Data modeling
Architecture
1
2
3
2
4
Agenda
Search queries
Data modeling
Architecture
1
3
5
Agenda
Search queries
Data modeling
1
2
3 Architecture
6
Search Queries
https://www.flickr.com/photos/samhames/4422128094
7
CREATE TABLE IF NOT EXISTS emails (
sender VARCHAR(255) NOT NULL,
recipients TEXT,
cc TEXT,
bcc TEXT,
subject VARCHAR(1024),
body MEDIUMTEXT,
datetime DATETIME
);
CREATE INDEX emails_sender ON emails(sender);
CREATE FULLTEXT INDEX emails_subject ON emails(subject);
CREATE FULLTEXT INDEX emails_body ON emails(body);
curl -XPOST 'http://localhost:9200/enron' -d'
{
"mappings": {
"email": {
"properties": {
"sender": { "type": "string", "index": "not_analyzed" },
"recipients": { "type": "string", "index": "not_analyzed" },
"cc": { "type": "string", "index": "not_analyzed" },
"bcc": { "type": "string", "index": "not_analyzed" },
"subject": { "type": "string", "analyzer": "english" },
"body": { "type": "string", "analyzer": "english" }
}
}
}
Schemas
8
Loading the data
9
[LIVE DEMO]
• Search for text in a single field
• Search for text in multiple fields
• Search for a phrase
https://github.com/ycombinator/es-enron
10
Other Search Features
Stemming Synonyms Did you mean?
• Jump, jumped, jumping • Queen, monarch • Monetery => Monetary
11
Data Modeling
https://www.flickr.com/photos/samhames/4422128094https://www.flickr.com/photos/ericparker/7854157310
12
To analyze or not to analyze?
PUT cities/city/1
{
"city": "Atlanta",
"population": 447841
}
PUT cities/city/2
{
"city": "New Albany",
"population": 8829
}
PUT cities/city/3
{
"city": "New York",
"population": 8406000
}
POST cities/_search
{
"query": {
"match": {
"city": "New Albany"
}
}
}
QUERY
+ = ?
13
To analyze or not to analyze?
PUT cities/city/1
{
"city": "Atlanta",
"population": 447841
}
PUT cities/city/2
{
"city": "New Albany",
"population": 8829
}
PUT cities/city/3
{
"city": "New York",
"population": 8406000
}
Term Document IDs
Albany 2
New 2,3
Atlanta 1
York 3
14
To analyze or not to analyze?
PUT cities
{
"mappings": {
"city": {
"properties": {
"city": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
MAPPING
Term Document IDs
New Albany 2
New York 3
Atlanta 1
PUT blog/post/1
{
"author_id": 1,
"title": "...",
"body": "..."
}
PUT blog/post/2
{
"author_id": 1,
"title": "...",
"body": "..."
}
PUT blog/post/3
{
"author_id": 1,
"title": "...",
"body": "..."
}
15
Relationships: Application-side joins
PUT blog/author/1
{
"name": "John Doe",
"bio": "..."
}
POST blog/author/_search
{
"query": {
"match": {
"name": "John"
}
}
}
QUERY 1
POST blog/post/_search
{
"query": {
"match": {
"author_id": <each id from query 1 result>
}
}
}
QUERY 2
PUT blog/post/1
{
"author_name": "John Doe",
"title": "...",
"body": "..."
}
PUT blog/post/2
{
"author_name": "John Doe",
"title": "...",
"body": "..."
}
16
Relationships: Data denormalization
POST blog/post/_search
{
"query": {
"match": {
"author_name": "John"
}
}
}
QUERY
PUT blog/post/3
{
"author_name": "John Doe",
"title": "...",
"body": "..."
}
17
Relationships: Nested objects
PUT blog/author/1
{
"name": "John Doe",
"bio": "...",
"blog_posts": [
{
"title": "...",
"body": "..."
},
{
"title": "...",
"body": "..."
},
{
"title": "...",
"body": "..."
}
]
}
POST blog/author/_search
{
"query": {
"match": {
"name": "John"
}
}
}
QUERY
18
Relationships: Parent-child documents
PUT blog/author/1
{
"name": "John Doe",
"bio": "..."
}
POST blog/post/_search
{
"query": {
"has_parent": {
"type": "author",
"query": {
"match": {
"name": "John"
}
}
}
QUERY
PUT blog
{
"mappings": {
"author": {},
"post": {
"_parent": {
"type": "author"
}
}
}
} PUT blog/post/1?parent=1
{
"title": "...",
"body": "..."
}
PUT blog/post/2?parent=1
{
"title": "...",
"body": "..."
}
PUT blog/post/3?parent=1
{
"title": "...",
"body": "..."
}
19
Architecture
https://www.flickr.com/photos/samhames/4422128094
https://www.flickr.com/photos/haribote/4871284379/
20
RDBMS Triggers
database by Creative Stall from the Noun Project
1 2
21
Async replication to Elasticsearch
1
2 3
ESSynchronizer
flow by Yamini Ahluwalia from the Noun Project
22
Async replication to Elasticsearch with Logstash
1
2 3
23
Forked writes from application
1
2
24
Forked writes from application (more robust)
1
2
queue by Huu Nguyen from the Noun Project
ESSynchronizer3
4
25
Forked writes from application (more robust with Logstash)
1
2
3
4
26
Questions?
@shaunak
https://www.flickr.com/photos/nicknormal/2245559230/

Elasticsearch for SQL Users