Practical Machine Learning and Rails Part1

Practical Machine
Learning and Rails

Andrew Cantino
VP Engineering, Mavenlink @tectonic

Founder, Agile Productions @ryanstout

This talk will
- introduce machine learning

- make you ML-aware

- have examples

This talk will not
- give you a PhD

- implement algorithms

- cover collaborative filtering,
optimization, clustering, advanced statistics, genetic algorithms, classical AI, NLP, ...

What is Machine Learning?
Many different algorithms

that predict data

from other data

using applied statistics.

"Enhance and rotate 20 degrees"

What data?
The web is data.

User decisions
APIs A/B Tests
Databases
Logs Streams

Browser versions
Reviews
Clicktrails

Okay. We have data.
What do we do with it?

We classify it.

Classification

:) OR :(

Classification
• Documents
o Sort email (Gmail's importance filter)
o Route questions to appropriate expert (Aardvark)
o Categorize reviews (Amazon)

• Users
o Expertise; interests; pro vs free; likelihood of paying;
expected future karma

• Events
o Abnormal vs. normal

Algorithms:
Decision Tree Learning

Algorithms:
Decision Tree Learning

Features
Email contains
word "viagra"

no yes

Email contains Email contains
word "Ruby" attachment?

no yes no yes

P(Spam)=10% P(Spam)=5% P(Spam)=70% P(Spam)=95%

Labels

Algorithms:
Support Vector Machines (SVMs)

Graphics from Wikipedia

Algorithms:
Naive Bayes

• Break documents into words and treat each
word as an independent feature

• Surprisingly effective on simple text and
document classification

• Works well when you have lots of data


Algorithms:
Naive Bayes

You received 100 emails, 70 of which were spam.
Word Spam with this word Ham with this word

viagra 42 (60%) 1 (3.3%)

ruby 7 (10%) 15 (50%)

hello 35 (50%) 24 (80%)

A new email contains hello and viagra. The probability that it
is spam is:
P(S|hello,viagra) = P(S) * P(hello,viagra|S) / P(hello,viagra)
= 0.7 * (0.5 * 0.6) / (0.59 * 0.43)
= 82%

Algorithms:
Neural Nets
Hidden layer

Input layer (features)

Output layer (Classification)


Curse of Dimensionality

The more features
and labels that you
have, the more data
that you need.

http://www.iro.umontreal.ca/~bengioy/yoshua_en/research_files/CurseDimensionality.jpg

Overfitting
• With enough parameters, anything is
possible.

• We want our algorithms to generalize and
infer, not memorize specific training
examples.

• Therefore, we test our algorithms on
different data than we train them on.

Practical Machine Learning and Rails Part1

More Related Content

Similar to Practical Machine Learning and Rails Part1

More from ryanstout

Recently uploaded

Practical Machine Learning and Rails Part1