Showing posts with label information. Show all posts
Showing posts with label information. Show all posts

Wednesday, April 3, 2013

Estimating Reliability - Psychometrics

# Each item has a item characteristic curve (ICC) of:
PL3 = function(theta,a, b, c) c+(1-c)*exp(a*(theta-b))/(1+exp(a*(theta-b)))

# Let's imagine we have a test with parameters a,b,c
nitems = 50

test.a = rnorm(nitems)*.2+1
test.b = rnorm(nitems)
test.c = runif(nitems)*.3

# I classical test theory we have the assumption that total variance of the test results is equal to the variance of the true scores (defined as the expected outcome of the test for a given subject) plus measurement error [Var(X) = Var(T(theta)) + Var(E)].

# We "know" that classical test theory is wrong because we know that the error must be a function of ability (theta) since we know that no matter how good (or bad) a student is, the student will never get above the maximum (or below the minimum) score of the test.

# Let's generate the student pool.
nstud = 5000
stud.theta = rnorm(nstud)

# First let's generate our expected outcomes by student.
prob.correct = matrix(NA, ncol=nitems, nrow=nstud)
for (i in 1:nitems) prob.correct[,i]=PL3(stud.theta, test.a[i], test.b[i], test.c[i])

# Generate true scores
true.score = apply(prob.correct, 1, sum)

# Generate a single sample item responses for test 1.
x1.responses = matrix(runif(nitems*nstud), ncol=nitems, nrow=nstud)
# Generate total scores
x1.score = apply(x1.responses, 1, sum)

# The test sampling seems to be working correctly
cor(true.score, x1.score)

# Let's sample a parrellel test
x2.responses = matrix(runif(nitems*nstud), ncol=nitems, nrow=nstud)
# Generate total scores
x2.score = apply(x2.responses, 1, sum)

# Classical reliability is defined as the correlation between two parrellel test scores (I think):
rel.XX = cor(x1.score, x2.score)
  # Looks like our estimated reliability is about .84 or .85

plot(x1.score, x2.score, main="Highly Reliable Test fit along a 45 degree line")



# In theory we can find the reliability by performing the following calculation:
# rho = var(T)/var(x1)
rel.TX = var(true.score)/var(x1.score)
# This seems to give an estiamte that varies between .79 and .88

# Alternatively according to theory rho = 1 - var(E)/var(X)
E = true.score-x1.score
rel.EX =1-var(E)/var(x1.score)
  # Interestingly, this seems to produce a different estimate which is between .82 and .85 typically.

# An alternative reliability estimate is the Kuder-Richardson Formula 20
# K-R20= k/k-1 * (1-sum(p*q)/var(x1))
# With k defined as number of items
# and p probability correct with q = 1-p

# First let's calculate the estimated p by item.
P = apply(x1.responses, 2, mean)
Q = 1-P

# Now for the K-R20
rel.KR20 = nitems/(nitems-1)*(1-sum(P*Q)/var(x1.score))
  # This estimate seems to be between .84 and .86

# In order to calculate the IRT approximations of the classical test theory reliability we first need to calculate information.
# I = p*q when d=1 and p and q are the true probabilities of getting the correct answer.
# Which is equivalent to the prob correct on each item
I = prob.correct*(1-prob.correct)

# Test information can be found as the sum of the information
x1.I = apply(I, 1, sum)

# The unconditional standard error estimate.
SE = 1/x1.I^.5

# Average (unconditional) standard error
uSE = mean(SE)

# An IRT reliability estimate Thissen (2000) could be formed as:
rel.IRT = 1-uSE^2
  # This estimate is around .89.  However, I have not estimated the student abilities which might lead to a different level of reliability if I needed to.

rel.XX
rel.EX
rel.TX
rel.KR20
rel.IRT

Wednesday, November 28, 2012

CAT using Kullback–Leibler vs Fisher Information

R Code

# This post builds on my recent post that builds a simulation of a computer adaptive test (CAT) selecting items from an infinite item pool using fisher information as the criteria.

http://www.econometricsbysimulation.com/2012/11/computer-adaptive-test-assuming.html

# This post is very similar except that it uses instead the Kullback-Leibler (KL) information criteria to select items.

# More information about the KL can be found at:

http://www.econometricsbysimulation.com/2012/11/selecting-your-first-item-on-computer.html


#########################################################
# Specify the initial conditions:

true.ability = rnorm(1)

# Load three parameter model ICC
PL3 = function(theta,a, b, c) c+(1-c)*exp(a*(theta-b))/(1+exp(a*(theta-b)))

# Load three parameter item information:
PL3.info = function(theta, a, b, c) a^2 *(PL3(theta,a,b,c)-c)^2/(1-c)^2 * (1-PL3(theta,a,b,c))/PL3(theta,a,b,c)

#########################################################
#  Mock Computer Adaptive Test

# First let's specify and initial guess

est.start = 0

# How much do we adjust our estimate of person ability when all answer's are either right or wrong.
est.jump = .7

# Number of items on the test. This will be the end condition.
num.items = 50

# Set the other parameters in the 3PL.
a.base = 3
c.base = .1

# We will draw all of the random outcomes in advance so that both methods of selecting items get the "same" draws.
rdraw = runif(num.items)

### Let's generate a vector to hold both ability estimates.  Fisher and KL start at the same estimate.
ability.est = est.start

# Let's first generate a empty data frame to hold the set of item taken.
items = data.frame(a=NA,b=NA,c=NA,response=NA,p=NA, ability.est=NA)

i = 1

# For this first mock test we will not select items from a pool but instead assume the pool is infinite and has an item with an a=a.base, c=c.base, and b equal to whatever the current guess is.

# Let's select our first item - a,b,c, response are scalars that will be reused to simplify coding.
a=a.base
c=c.base

### This is the first divergence from the alternative CAT simulation.

### In order to select an item we must specify first a theta distribution to select the best item for.

### I will use 1000 draws from theta~normal()
theta.dist = rnorm(1000)

### Drawing from the second link posted above the KL information is defined as:
KL = function(theta.true,theta.estimate, a, b, c) {
  # For the true value
  p.true = PL3(theta.true,a,b,c)
  q.true = 1-p.true
 
  # For the estimate
  p.estimate = PL3(theta.estimate,a,b,c)
  q.estimate = 1-p.estimate

  # The following line is the value to be returned to the KL function:
  p.true*log(p.true/p.estimate) + q.true*log(q.true/q.estimate)
  }

### In order to select the best item from all the infinite number of items I will use a maximimization routine to select the b while a and c are held constant.

### I want to define a function to maximize the KL over for a given theta estimate.
KLmax = function(b) mean(KL(theta.true=theta.dist, theta.estimate=ability.est[i], a=a, b=b, c=c))

### Now we want to optimize the KLmax function looking for the choice of b that gives us the highest value.
optim(0,fn=KLmax , method="Brent", lower=-6, upper=6, control=list(fnscale = -1))

b=optim(0,fn=KLmax , method="Brent", lower=-6, upper=6, control=list(fnscale = -1))$par

# Probability of getting the item correct
p=PL3(true.ability, a,b,c)

# Note that all of the ps are already draw
response =  p > rdraw[i]

# The Item Characteristic Curve (ICC) gives the probability of getting the item correct.
# Thus, a .9 is the max of the runifrom that should produce a response of TRUE or correct or 1 (as far as R is concerned these TRUE and 1 are the same as is FALSE and 0)

items[i,] = c(a=a, b=b, c=c, response=response, p=p, ability.est=ability.est[i])

# We have now successfully administered our first item.

# Should do our first MLE estimation?

# Not quite, unfortunately MLE requires in the bianary case that the student has answered at least one question right and at least one question wrong.

i=1+1

# Instead we will just adjust the ability estimate by the fixed factor (est.jump)
ability.est[i] = ability.est[i-1]-(-1)^(response)*est.jump

# Now we administer the second item:
# We will continue this until we get some heterogeneity in the responses
response.v = items$response
response.ave = sum(response.v)/length(response.v)

while ((response.ave == ceiling(response.ave)) & (num.items >= i) ) {
  # This condition will no longer be true when at least one of the items is ansered correctly and one of the items answered incorrectly.
  ability.est[i] = ability.est[i-1]-(-1)^(response)*est.jump

  a=a.base
  c=c.base
 
  ### Using the KL information criteria
  b=optim(0,fn=KLmax , method="Brent", lower=-6, upper=6, control=list(fnscale = -1))$par
 
  p=PL3(true.ability, a,b,c)

  response =  p > rdraw[i]

  items[i,] = c(a=a, b=b, c=c, response=response, p=p, ability.est=ability.est[i])

  response.v = items$response
  response.ave = sum(response.v)/length(response.v)

  i=i+1
}

items
true.ability

# Now that we have some heterogeneity of responses we can use the MLE estimator
MLE = function(theta) sum(log((items$response==T)*PL3(theta, items$a, items$b, items$c) +
                              (items$response==F)*(1-PL3(theta, items$a, items$b, items$c))))

optim(0,MLE, method="Brent", lower=-6, upper=6, control=list(fnscale = -1))
  # Okay, it seems to be working properly  now we will loop through using the above function.
 
# The only thing we need change is the ability estimate.
while (num.items >= i) {
  ability.est[i] = optim(0,MLE, method="Brent", lower=-6, upper=6, control=list(fnscale = -1))$par

  a=a.base
  c=c.base
 
  ### Using the KL information criteria
  b=optim(0,fn=KLmax , method="Brent", lower=-6, upper=6, control=list(fnscale = -1))$par
 
  p=PL3(true.ability, a,b,c)
 
  response =  p > rdraw[i]

  items[i,] = c(a=a, b=b, c=c, response=response, p=p, ability.est=ability.est[i])

  response.v = items$response
  response.ave = sum(response.v)/length(response.v)

  i=i+1
}

ability.est[i] = optim(0,MLE, method="Brent", lower=-6, upper=6, control=list(fnscale = -1))$par

# Save the paths for comparison with Fisher information
items.KL = items
ability.est.KL = ability.est

#######################################################################
#####
##### Now let's compare this with the Fisher information item selection
#####
#######################################################################


ability.est = est.start
items = data.frame(a=NA,b=NA,c=NA,response=NA,p=NA, ability.est=NA)
i = 1
a=a.base
c=c.base
### Now b just equals the ability estimate
  b=ability.est[i]
p=PL3(true.ability, a,b,c)
response =  p > rdraw[i]
items[i,] = c(a=a, b=b, c=c, response=response, p=p, ability.est=ability.est[i])
i=1+1
ability.est[i] = ability.est[i-1]-(-1)^(response)*est.jump
response.v = items$response
response.ave = sum(response.v)/length(response.v)

while ((response.ave == ceiling(response.ave)) & (num.items >= i) ) {
  ability.est[i] = ability.est[i-1]-(-1)^(response)*est.jump
  a=a.base
  c=c.base
  ### Change how b is selected
    b=ability.est[i]
  p=PL3(true.ability, a,b,c)
  response =  p > rdraw[i]
  items[i,] = c(a=a, b=b, c=c, response=response, p=p, ability.est=ability.est[i])
  response.v = items$response
  response.ave = sum(response.v)/length(response.v)
  i=i+1
}
while (num.items >= i) {
  ability.est[i] = optim(0,MLE, method="Brent", lower=-6, upper=6, control=list(fnscale = -1))$par
  a=a.base
  c=c.base
  ### Change how b is selected
    b=ability.est[i]
  p=PL3(true.ability, a,b,c)
  response =  p > rdraw[i]
  items[i,] = c(a=a, b=b, c=c, response=response, p=p, ability.est=ability.est[i])
  response.v = items$response
  response.ave = sum(response.v)/length(response.v)
  i=i+1
}

# One final ability estimate calculation
ability.est[i] = optim(0,MLE, method="Brent", lower=-6, upper=6, control=list(fnscale = -1))$par

ability.est.Fisher = ability.est
items.Fisher = items

#######################################################################

minmax = function(x) c(min(x),max(x))

plot(0:num.items, ability.est.Fisher, type="l", main="CAT Estimates Ideally Converge on True Ability",
  , xlab="Number of Items Administered", ylim=minmax(c(ability.est.Fisher,ability.est.KL)),
  ylab="Estimated Ability", lwd=3, col="blue")
lines(0:num.items, ability.est.KL, lwd=2, col="green")
abline(h=true.ability, col="red", lwd=2)

## Graph I

# I have included two different graphs because I thought both of these were quite peculiar.
# The first graph is strange because in this chance draw the KL looks like it is doing as well or better than the Fisher.
# In almost all other samples that I ran it seemed to do considerably worse.

## Graph II


# Let's see what items both methods are choosing and how they relate to the true ability estimate.
plot(1:num.items, items.Fisher$b, type="l", main="Item's are selected using different criteria",
  , xlab="Number of Items Administered", ylim=minmax(c(items.KL$b,items.Fisher$b)),
  ylab="Estimated Ability", lwd=3, col="blue")
lines(1:num.items, items.KL$b, lwd=2, col="green")
abline(h=true.ability, col="red", lwd=2)



# Blue is Fisher
# Green is KL

# I have included two different graphs because I thought both of these were quite peculiar.
# The first graph is strange because in this chance draw the KL looks like it is doing as well or better than the Fisher.
# In almost all other samples that I ran it seemed to do considerably worse.




# Let's see what items both methods are choosing and how they relate to the true ability estimate.
plot(1:num.items, items.Fisher$b, type="l", main="Item's are selected using different criteria",
  , xlab="Number of Items Administered", ylim=minmax(c(items.KL$b,items.Fisher$b)),
  ylab="Item Difficulty", lwd=3, col="blue")
lines(1:num.items, items.KL$b, lwd=2, col="green")
abline(h=true.ability, col="red", lwd=2)


# The y-axis should read item difficulty.

# We can see the items tend to be closer to zero in difficulty for the KL selection (these items correspond to the most recent graph)

# From this simulation, it appears KL is an inferior item selection criteria.

# It is possible that this simulation is not taking into consideration all of the features of the KL criteria.  Perhaps, if the item pool has differences in the parameters a and c then KL will perform better.  I will leave that for a later simulation.

Wednesday, November 21, 2012

Estimating Person Characteristics from IRT Data - 3PL Model

Original Code

# One of the basic tasks in item response theory is estimating the ability of a test taker from the responses to a series of items (questions).

# Let's draw the same pool of items that we have used on several previous posts:

# First let's imagine we have a pool of 100 3PL items.
set.seed(101)

npool = 500

pool = as.data.frame(cbind(item=1:npool, a=abs(rnorm(npool)*.25+1.1), b=rnorm(npool), c=abs(rnorm(npool)/7.5+.1)))
summary(pool)

# Drawing on code from a previous post we can calculate several useful functions:

# (http://www.econometricsbysimulation.com/2012/11/matching-item-information.html)

# Each item has a item characteristic curve (ICC) of:
PL3 = function(theta,a, b, c) c+(1-c)*exp(a*(theta-b))/(1+exp(a*(theta-b)))

# Let's imagine that we have a single test taker and that test taker has taken the first 15 items from the pool.

items.count = 15
items.taken = pool[1:items.count,]

# And that the person has a latent theta ability of 1
theta = 1.3

# Let's calculate the cut points for each of the items.
items.cut = PL3(theta, items.taken$a, items.taken$b, items.taken$c)

# We can see how the cut point works by graphing
plot(0,0, type="n", xlim=c(-3,3),ylim=c(0,1), xlab=~theta,
           ylab="Probability of Correct Response", yaxs = "i", xaxs = "i" , main="Item Characteristics Curves and Ability Level")

for(i in 1:items.count) {
  lines(seq(-3,3,.1),PL3(seq(-3,3,.1), items.taken$a[i], items.taken$b[i], items.taken$c[i]), lwd=2)
  abline(h=items.cut[i], col="blue")
}
abline(v=theta,col="red", lwd=3)



# Now let's draw a uniform draw that we will use to calculate whether each item as passed.

rdraw = runif(items.count)

# Finally, we will calculate item responses
item.responses = 0
item.responses = items.cut > rdraw

###############################################
# Done with Simulation - Time for Estimation

# We want to use the information we know about the items (the item parameters and the responses) in order to estimate a best guess at the true ability of the test taker.

# First we must check if the person got all of the items either correct or incorrect.

sum(item.responses)
# If this is either a 0 or a number equal to the number of items then we cannot esimate an interior maximum without additional assumptions.

# We will attempt to recover our theta value using the r command optim

# First we need to specify the function to optimize over.

MLE = function(theta) sum(log((item.responses==T)*PL3(theta, items.taken$a, items.taken$b, items.taken$c) +
                              (item.responses==F)*(1-PL3(theta, items.taken$a, items.taken$b, items.taken$c))))
# The optimization function takes as its argument the choice variables to be optimized (theta).
# The way the above optimization works is that you specify the probility of each response piecewise.
# If the response is correct, then you count the CDF of theta up to that point as contributing to the probability of observing a correct outcome.  If the response is negative, then you count it as contributing the the probability of a incorrect outcome.  You then choose the theta that produces the greatest total probability.

MLEval = 0
theta.range = seq(-3,3,.1)
for(i in 1:length(theta.range)) MLEval[i] =MLE(theta.range[i])

plot(theta.range, MLEval, type="l", main="Maximim Likelihood function", xlab= ~theta, ylab="Sum of Log Likelihood")
abline(v=theta, col="blue")
# We can visually see that the maximum of the slope will not be at the true value though it will be close.

optim(0,MLE, method="Brent", lower=-6, upper=6, control=list(fnscale = -1))
abline(v=optim(0,MLE, method="Brent", lower=-6, upper=6, control=list(fnscale = -1))$par, col="red")


# We can see that we can estimate theta reasonably well with just 15 items from a paper test (red line estimate, blue line true).  However, looking at the graph of the ICCs, we can see that for most of the items, the steepest point (where they have the most discriminating power) is at an ability set lower than the test taker's ability.  Thus, this test provides the most information about a person who has a lower ability than the person with a theta=1.2.

# We can use R's optim function to find the ideal theta that would maximize the information from this test.
# Item information is:
PL3.info = function(theta, a, b, c) a^2 *(PL3(theta,a,b,c)-c)^2/(1-c)^2 * (1-PL3(theta,a,b,c))/PL3(theta,a,b,c)

# Notice, this is not the best way of defining the test information function since the items are not arguments.
test.info = function(theta) sum(PL3.info(theta, items.taken$a, items.taken$b, items.taken$c))

# Construct a vector to hold the test information
info = 0
for(i in 1:length(theta.range)) info[i]=test.info(theta.range[i])

plot(theta.range, info, type="l", main="Information Peaks Slightly Above 0", xlab= ~theta, ylab="Information")
abline(v=theta, col="blue")
# But we want to know about the test taker at theta

optim(0,test.info, method="Brent", lower=-6, upper=6, control=list(fnscale = -1))
# The person this test would be best suited to evaluate would have an ability rating of .19
abline(v=optim(0,test.info, method="Brent", lower=-6, upper=6, control=list(fnscale = -1))$par, col="red")