Poisson Regression Models
for Count Data
2
Outline
• Review
• Introduction to Poisson regression
• A simple model: equiprobable model
• Pearson and likelihood-ratio test statistics
• Residual analysis
• Poisson regression with a covariate (Poisson time trend
model)
3
Review of Regression
You may have come across:
Dependent Variable Regression Model
Continuous Linear
Binary Logistic
Multicategory (unordered)
(nominal variable)
Multinomial Logit
Multicategory (ordered)
(ordinal variable)
Cumulative Logit
4
Regression
In this session:
Dependent Variable Regression Model
Continuous Linear
Binary Logistic
Multicategory (unordered)
(nominal variable)
Multinomial Logit
Multicategory (ordered)
(ordinal variable)
Cumulative Logit
Count variable Poisson Regression
(Log-linear model)
Data
Data for this session are assumed to be:
• A count variable Y (e.g. number of accidents,
number of suicides)
• One categorical variable (X) with C possible
categories (e.g. days of week, months)
• Hence Y has C possible outcomes y1, y2, …, yC
5
6
Introduction: Poisson regression
• Poisson regression is a form of regression
analysis model count data (if all explanatory
variables are categorical then we model
contingency tables (cell counts)).
• The model models expected frequencies
• The model specifies how the count variable
depends on the explanatory variables (e.g.
level of the categorical variable)
7
Introduction: Poisson regression
• Poisson regression models are generalized linear
models with the logarithm as the (canonical) link
function.
• Assumes response variable Y has a Poisson
distribution, and the logarithm of its expected
value can be modelled by a linear combination of
unknown parameters.
• Sometimes known as a log-linear model, in
particular when used to model contingency
tables (i.e. only categorical variables).
8
Example: Suicides (count variable) by Weekday
(categorical variable) in France
Mon 1001 15.2%
Tues 1035 15.7%
Wed 982 14.9%
Thur 1033 15.7%
Fri 905 13.7%
Sat 737 11.2%
Sun 894 13.6%
Total 6587 100.0%
9
Introduction: Poisson regression
• Let us first look at a simple case: the
equiprobable model (here for a 1-way
contingency table)
10
Equiprobable Model
• An equiprobable model means that:
– All outcomes are equally probable (equally likely).
– That is, for our example, we assume a uniform
distribution for the outcomes across days of week (Y
does not vary with days of week X).
• The equiprobable model is given by:
P(Y=y1) = P(Y=y2) = … = P(Y=yC) = 1/C
i.e. we expect an equal distribution across days
of week.
• Given the data we can test if the assumption of
the equiprobable model (H0) holds
11
Equiprobable Model
12
Example 1: Suicides by Weekday in France
Mon 1001 15.2%
Tues 1035 15.7%
Wed 982 14.9%
Thur 1033 15.7%
Fri 905 13.7%
Sat 737 11.2%
Sun 894 13.6%
Total 6587 100.0%
H0: Each day is
equally likely for
suicides (i.e. the
expected
proportion of
suicides is 100/7
= 14.3%
each day)
13
Example 2: Traffic Accidents by Weekday
H0: Each day is
equally likely for
an accident (i.e.
the expected
proportion of
accidents is 100/7
= 14.3%
each day)
Mon 11 11.8%
Tues 9 9.7%
Wed 7 7.5%
Thur 10 10.8%
Fri 15 16.1%
Sat 18 19.4%
Sun 23 24.7%
Total 93 100.0%
14
• H0: Each day is equally likely for an accident.
• Alternative null hypotheses are:
– H0: Each working day equally likely for an accident.
– H0: Saturday and Sunday are equally likely for an
accident.
• Omitted variables? For example, distance
driven each day of the week.
Hypothesis Testing
15
• We can express this equiprobable model more formally as a Poisson regression
model (without a covariate), which models the expected frequency
Poisson regression – without a covariate
16
• We assume a Poisson distribution with parameter μ for the random component, i.e. yi ~ Poisson(µ), i.e.
• Y is a random variable that takes only positive integer values
• Poisson distribution has a single parameter (μ) which is both its mean and its variance. y
i i
i
e
P(Y y ) where y 1,2, 3
y !
i i
i
i
m
m-
= = =
Poisson regression
• We aim to model the expected value of Y. It can be
shown that this is the parameter μ, hence we aim to
model μ.
• We can write the equiprobable model defined earlier as
a simple Poisson model (no explanatory variables), i.e.
mean of Y does not change with month:
where is a constant.
17
Poisson regression: Simple Model (No Covariate)
i i
i
E(y ) 1/
log( ) i 1, ,C
Cm
m a
= =
= = L
log(1/ )Ca =

Poisson regression models for count data

  • 1.
  • 2.
    2 Outline • Review • Introductionto Poisson regression • A simple model: equiprobable model • Pearson and likelihood-ratio test statistics • Residual analysis • Poisson regression with a covariate (Poisson time trend model)
  • 3.
    3 Review of Regression Youmay have come across: Dependent Variable Regression Model Continuous Linear Binary Logistic Multicategory (unordered) (nominal variable) Multinomial Logit Multicategory (ordered) (ordinal variable) Cumulative Logit
  • 4.
    4 Regression In this session: DependentVariable Regression Model Continuous Linear Binary Logistic Multicategory (unordered) (nominal variable) Multinomial Logit Multicategory (ordered) (ordinal variable) Cumulative Logit Count variable Poisson Regression (Log-linear model)
  • 5.
    Data Data for thissession are assumed to be: • A count variable Y (e.g. number of accidents, number of suicides) • One categorical variable (X) with C possible categories (e.g. days of week, months) • Hence Y has C possible outcomes y1, y2, …, yC 5
  • 6.
    6 Introduction: Poisson regression •Poisson regression is a form of regression analysis model count data (if all explanatory variables are categorical then we model contingency tables (cell counts)). • The model models expected frequencies • The model specifies how the count variable depends on the explanatory variables (e.g. level of the categorical variable)
  • 7.
    7 Introduction: Poisson regression •Poisson regression models are generalized linear models with the logarithm as the (canonical) link function. • Assumes response variable Y has a Poisson distribution, and the logarithm of its expected value can be modelled by a linear combination of unknown parameters. • Sometimes known as a log-linear model, in particular when used to model contingency tables (i.e. only categorical variables).
  • 8.
    8 Example: Suicides (countvariable) by Weekday (categorical variable) in France Mon 1001 15.2% Tues 1035 15.7% Wed 982 14.9% Thur 1033 15.7% Fri 905 13.7% Sat 737 11.2% Sun 894 13.6% Total 6587 100.0%
  • 9.
    9 Introduction: Poisson regression •Let us first look at a simple case: the equiprobable model (here for a 1-way contingency table)
  • 10.
    10 Equiprobable Model • Anequiprobable model means that: – All outcomes are equally probable (equally likely). – That is, for our example, we assume a uniform distribution for the outcomes across days of week (Y does not vary with days of week X).
  • 11.
    • The equiprobablemodel is given by: P(Y=y1) = P(Y=y2) = … = P(Y=yC) = 1/C i.e. we expect an equal distribution across days of week. • Given the data we can test if the assumption of the equiprobable model (H0) holds 11 Equiprobable Model
  • 12.
    12 Example 1: Suicidesby Weekday in France Mon 1001 15.2% Tues 1035 15.7% Wed 982 14.9% Thur 1033 15.7% Fri 905 13.7% Sat 737 11.2% Sun 894 13.6% Total 6587 100.0% H0: Each day is equally likely for suicides (i.e. the expected proportion of suicides is 100/7 = 14.3% each day)
  • 13.
    13 Example 2: TrafficAccidents by Weekday H0: Each day is equally likely for an accident (i.e. the expected proportion of accidents is 100/7 = 14.3% each day) Mon 11 11.8% Tues 9 9.7% Wed 7 7.5% Thur 10 10.8% Fri 15 16.1% Sat 18 19.4% Sun 23 24.7% Total 93 100.0%
  • 14.
    14 • H0: Eachday is equally likely for an accident. • Alternative null hypotheses are: – H0: Each working day equally likely for an accident. – H0: Saturday and Sunday are equally likely for an accident. • Omitted variables? For example, distance driven each day of the week. Hypothesis Testing
  • 15.
    15 • We canexpress this equiprobable model more formally as a Poisson regression model (without a covariate), which models the expected frequency Poisson regression – without a covariate
  • 16.
    16 • We assumea Poisson distribution with parameter μ for the random component, i.e. yi ~ Poisson(µ), i.e. • Y is a random variable that takes only positive integer values • Poisson distribution has a single parameter (μ) which is both its mean and its variance. y i i i e P(Y y ) where y 1,2, 3 y ! i i i i m m- = = = Poisson regression
  • 17.
    • We aimto model the expected value of Y. It can be shown that this is the parameter μ, hence we aim to model μ. • We can write the equiprobable model defined earlier as a simple Poisson model (no explanatory variables), i.e. mean of Y does not change with month: where is a constant. 17 Poisson regression: Simple Model (No Covariate) i i i E(y ) 1/ log( ) i 1, ,C Cm m a = = = = L log(1/ )Ca =