Intro to Python for DSIntro to Python for DS
2 birds. One stone.
 
WIFI: MakeO ces 5Ghz
Password: Internet!23
 
http://bit.ly/thinkful-dc-python
TJ Stalcup
Lead DC Mentor @Thinkful
API Evangelist @540
Pokemon Master
About UsAbout Us
What's your name?
What do you do?
Why are you interested in data science or python?
About youAbout you
Online Bootcamp since 2012. We have worked
with over 6000 students around the world
paired up with over 400 mentors. 
 
1:1 Mentoring, learn by building projects and
turn YOUR ideas into portfolio pieces
 
Guaranteed
About ThinkfulAbout Thinkful
Local DC Crew
 
Learn why DS is a thing
 
What is Python
 
How do we use it with a real world project?
 
How do I learn more?
TONIGHT: Learn Python by DoingTONIGHT: Learn Python by Doing
What is a Data Scientist?What is a Data Scientist?
Example: LinkedIn 2006Example: LinkedIn 2006
“[LinkedIn] was like arriving at a conference reception
and realizing you don’t know anyone. So you just stand
in the corner sipping your drink—and you probably
leave early.”
-LinkedIn Manager, June 2006
Enter: Jonathan GoldmanEnter: Jonathan Goldman
Data Scientist
Joined LinkedIn in 2006, only 8M users (450M in 2016)
Started experiments to predict people’s networks
Engineers were dismissive: “you can already import your address
book”
DS ProcessDS Process
Frame the question
Collect the raw data
Process the data
Explore the data
Communicate results
Frame the QuestionFrame the Question
What questions do we want to answer?
What connections (type and number) lead to higher
user engagement?
Which connections do people want to make but are
currently limited from making?
How might we predict these types of connections with
limited data from the user?
Collect the DataCollect the Data
What data do we need to answer these questions?
Connection data (who is who connected to?)
Demographic data (what is the pro le of the
connection)
Engagement data (how do they use the site)
Process the DataProcess the Data
How is the data “dirty” and how can we clean it?
• User input
• Redundancies
• Feature changes
• Data model changes
Explore the DataExplore the Data
What are the meaningful patterns in the data?
• Triangle closing
• Time overlaps
• Geographic overlaps
Communicate FindingsCommunicate Findings
How do we communicate this? To whom?
Marketing - this will enable us to sell X more ad space. Results in X
more impressions per day
Product - this will allow us to build X more features
Development - this will allow us to grow our team by X
Sales - this will attract X more premium accounts
C-Level - this will result in $$$ more revenue
8M - 450M in 10 years
The ResultThe Result
 
 
Career Whack-A-Mole
Why DS now?Why DS now?
Big Data: datasets whose size is beyond the
ability of typical database software tools to
capture, store, manage, and analyze
Big DataBig Data
Trend "started" in 2005
Web 2.0 - Majority of content is created by users
Mobile accelerates this — data/person skyrockets
The Data ProblemThe Data Problem
We are generating more data
every year than existed
before.........
The SolutionThe Solution
There goes my hero....
 
watch 'em as they code....
Just need to do everything....Just need to do everything....
Just need to do everything....Just need to do everything....
Knowledge of statistics, algorithms, & software
Comfort with languages & tools (Python, SQL, Tableau)
Inquisitiveness and intellectual curiosity
Strong communication skills
It’s all Teachable!
Let's Learn Python TonightLet's Learn Python Tonight
Python for Programming
Great for Data Science
Robotics
Web Development (Python/Django)
Automation
Let's Learn Python TonightLet's Learn Python Tonight
firstName = 'TJ'
lastName = "Stalcup"
age = 34 // wow, much old
print firstName // TJ
print firstName + lastName // TJStalcup
print firstName + ' ' + lastName // TJ Stalcup
print lastName + ', ' + firstName // Stalcup, TJ
print age * 2 // 68, hopefully retired
def greet(name):
print 'Hello', name
greet('Jack') // Hello, Jack
greet('Jill') // Hello, Jill
greet('Bob') // Hello, Bob
greet(firstName) // Hello, TJ
greet(firstName + ' ' + lastName) // Hello, TJ Stalcup
The ModelThe Model
Our model is going to be a Decision Tree.
 
Decision trees predict the most likely outcome based on input.
 
You can think of it like a computer building a version of 20 questions.
Decision Trees - Golf?Decision Trees - Golf?
The NotebookThe Notebook
We're going to use a Google hosted Python notebook to build this
model. This app is called Colaboratory (Collaboration + Laboratory)
 
http://colab.research.google.com
 
New Notebook > New Python3 Notebook
ShortcomingsShortcomings
Our model has a few weaknesses:
 
-Limited inputs
-Assumptions
Coming Soon....Coming Soon....
Intro to SQL
Intro to Tableau
Intro to Statistics
 
http://meetup.com/Thinkful-DC
Data Science @ ThinkfulData Science @ Thinkful
Flexible, project-based curriculum to help you become the data
scientist you want to be
You don’t just learn skills, you get to make things
Mentor support from experts in the industry
Also, there's a job guarantee
Link for the third party audit jobs report:
https://www.thinkful.com/bootcamp-jobs-stats
Thinkful Graduates Job Guaranteed
Learning Mentor
Career MentorProgram Manager
Local Community
You
Unprecedented SupportUnprecedented Support
http://bit.ly/dc-ds-trial
Initial 2-week trial course
Start with Python and Statistics
Unlimited Q&A Sessions
Option to continue with full bootcamp
Financing & scholarships available
Aaron Lamphere
Trial Program Manager
 
Thinkful Two Week TrialThinkful Two Week Trial

Intro to Python for Data Science

  • 1.
    Intro to Pythonfor DSIntro to Python for DS 2 birds. One stone.   WIFI: MakeO ces 5Ghz Password: Internet!23   http://bit.ly/thinkful-dc-python
  • 2.
    TJ Stalcup Lead DCMentor @Thinkful API Evangelist @540 Pokemon Master About UsAbout Us
  • 3.
    What's your name? Whatdo you do? Why are you interested in data science or python? About youAbout you
  • 4.
    Online Bootcamp since2012. We have worked with over 6000 students around the world paired up with over 400 mentors.    1:1 Mentoring, learn by building projects and turn YOUR ideas into portfolio pieces   Guaranteed About ThinkfulAbout Thinkful Local DC Crew
  • 5.
      Learn why DSis a thing   What is Python   How do we use it with a real world project?   How do I learn more? TONIGHT: Learn Python by DoingTONIGHT: Learn Python by Doing
  • 6.
    What is aData Scientist?What is a Data Scientist?
  • 7.
    Example: LinkedIn 2006Example:LinkedIn 2006 “[LinkedIn] was like arriving at a conference reception and realizing you don’t know anyone. So you just stand in the corner sipping your drink—and you probably leave early.” -LinkedIn Manager, June 2006
  • 8.
    Enter: Jonathan GoldmanEnter:Jonathan Goldman Data Scientist Joined LinkedIn in 2006, only 8M users (450M in 2016) Started experiments to predict people’s networks Engineers were dismissive: “you can already import your address book”
  • 9.
    DS ProcessDS Process Framethe question Collect the raw data Process the data Explore the data Communicate results
  • 10.
    Frame the QuestionFramethe Question What questions do we want to answer? What connections (type and number) lead to higher user engagement? Which connections do people want to make but are currently limited from making? How might we predict these types of connections with limited data from the user?
  • 11.
    Collect the DataCollectthe Data What data do we need to answer these questions? Connection data (who is who connected to?) Demographic data (what is the pro le of the connection) Engagement data (how do they use the site)
  • 12.
    Process the DataProcessthe Data How is the data “dirty” and how can we clean it? • User input • Redundancies • Feature changes • Data model changes
  • 13.
    Explore the DataExplorethe Data What are the meaningful patterns in the data? • Triangle closing • Time overlaps • Geographic overlaps
  • 14.
    Communicate FindingsCommunicate Findings Howdo we communicate this? To whom? Marketing - this will enable us to sell X more ad space. Results in X more impressions per day Product - this will allow us to build X more features Development - this will allow us to grow our team by X Sales - this will attract X more premium accounts C-Level - this will result in $$$ more revenue 8M - 450M in 10 years
  • 15.
  • 16.
    Why DS now?WhyDS now? Big Data: datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze
  • 17.
    Big DataBig Data Trend"started" in 2005 Web 2.0 - Majority of content is created by users Mobile accelerates this — data/person skyrockets
  • 18.
    The Data ProblemTheData Problem We are generating more data every year than existed before.........
  • 19.
    The SolutionThe Solution Theregoes my hero....   watch 'em as they code....
  • 20.
    Just need todo everything....Just need to do everything....
  • 21.
    Just need todo everything....Just need to do everything.... Knowledge of statistics, algorithms, & software Comfort with languages & tools (Python, SQL, Tableau) Inquisitiveness and intellectual curiosity Strong communication skills It’s all Teachable!
  • 22.
    Let's Learn PythonTonightLet's Learn Python Tonight Python for Programming Great for Data Science Robotics Web Development (Python/Django) Automation
  • 23.
    Let's Learn PythonTonightLet's Learn Python Tonight firstName = 'TJ' lastName = "Stalcup" age = 34 // wow, much old print firstName // TJ print firstName + lastName // TJStalcup print firstName + ' ' + lastName // TJ Stalcup print lastName + ', ' + firstName // Stalcup, TJ print age * 2 // 68, hopefully retired def greet(name): print 'Hello', name greet('Jack') // Hello, Jack greet('Jill') // Hello, Jill greet('Bob') // Hello, Bob greet(firstName) // Hello, TJ greet(firstName + ' ' + lastName) // Hello, TJ Stalcup
  • 24.
    The ModelThe Model Ourmodel is going to be a Decision Tree.   Decision trees predict the most likely outcome based on input.   You can think of it like a computer building a version of 20 questions.
  • 25.
    Decision Trees -Golf?Decision Trees - Golf?
  • 26.
    The NotebookThe Notebook We'regoing to use a Google hosted Python notebook to build this model. This app is called Colaboratory (Collaboration + Laboratory)   http://colab.research.google.com   New Notebook > New Python3 Notebook
  • 27.
    ShortcomingsShortcomings Our model hasa few weaknesses:   -Limited inputs -Assumptions
  • 28.
    Coming Soon....Coming Soon.... Introto SQL Intro to Tableau Intro to Statistics   http://meetup.com/Thinkful-DC
  • 29.
    Data Science @ThinkfulData Science @ Thinkful Flexible, project-based curriculum to help you become the data scientist you want to be You don’t just learn skills, you get to make things Mentor support from experts in the industry Also, there's a job guarantee
  • 30.
    Link for thethird party audit jobs report: https://www.thinkful.com/bootcamp-jobs-stats Thinkful Graduates Job Guaranteed
  • 31.
    Learning Mentor Career MentorProgramManager Local Community You Unprecedented SupportUnprecedented Support
  • 32.
    http://bit.ly/dc-ds-trial Initial 2-week trialcourse Start with Python and Statistics Unlimited Q&A Sessions Option to continue with full bootcamp Financing & scholarships available Aaron Lamphere Trial Program Manager   Thinkful Two Week TrialThinkful Two Week Trial