Thursday, 27 October 2011

Logistic Regression Cost Function

I'm going to calculate the cost function for two different lines. The first, shown in green, does not correctly classify all the points as some are on the wrong side of the line. The blue line classifies the points correctly but is probably not the best possible line.
The cost function for logistic regression is
Note that when y=1, the part inside the square brackets which is Cost(h(x),y) is (omitting superscripts),

Cost(h(x),y)=-ylog(h(x)) - (1-y)log(1-h(x))
= -1*log(h(x)) - (1-1)log(1-h(x))  
= -log(h(x))  

and when y=0,

Cost(h(x),y)=-ylog(h(x)) - (1-y)log(1-h(x))
=  -0*log(h(x)) - (1-0)log(1-h(x)) 
=  -log(1-h(x)). 

I'll work out Cost(h(x),y) in the table and then finish off underneath.  For the green line, θ0=-3, θ1=-1 and θ2=1. Why are the values like this and not θ0=3, θ1=1 and θ2=-1?

When we predict y, we use the value of h(x). When the value h(x) ≥ 0.5, we predict y=1 and when h < 0.5 we predict y=0. In the graph below, which is of the sigmoid function, the red line is the function h(x). The horizontal axis is z=θTx (note that the axis is not x). Now h(x) ≥ 0.5 = g(θTx ) when z=θTx ≥ 0.

From wiki: http://en.wikipedia.org/wiki/File:Logistic-curve.svg
Going back to our line, I wanted y=1 above the line and y=0 below it. I chose it that way. I could have chosen it the other way too as it's a pretty terrible line! This means we need the coefficients to be such that θTx ≥ 0 where we want y=1, that is  above the green line. Checking  you find that -3-x1+x2 ≥ 0 works.  (You can check it does work by putting in the point (2,5) which is above the line and checking it satisfies the inequality.  If it doesn't switch the signs of all the coefficients to make it work!)

Please note that log in the calculations is natural log, sometimes written ln.

x1x2y θTx = θ01x12x
= -3-x1+x2
h(x)=g(θTx)Cost(h(x),y) =
 -ylog(h(x)) - (1-y)log(1-h(x))
120-3-1*1+1*2=-21/(1+e-2) = 0.119202922- log(1-0.119202922) = 0.126928011
230-20.1192029220.126928011
240-10.26894142140.3132616875
350-11/(1+e-1) = 0.26894142140.3132616875
14001/(1+e-0) = 0.50.6931471806
541-40.017986214.0181499279
561-20.119202922- log(0.119202922) = 2.126928011
461-10.26894142141.3132616875
571-10.26894142141.3132616875
36100.50.6931471806
ΣCost(h(x),y)11.0382750722

The values marked in red correspond to the points on the wrong side of the line. Now we have the sum of the costs  for each of the training data, we can calculate J. Note that as there are 10 training examples, m=10 so our cost,


= 1/10*11.0382750722 1.1038.

For the line 0=-29 +4x1 +3x2, we want above the line to give y=1, so we have to write it so that
-29 +4x1 +3x2 ≥ 0 is true above the line, which it is. This gives us  θ0=-29, θ1=4 and θ2=3.

x1x2y θTx=θ01x12x2
= -29+4x1+3x2
h(x)=g(θTx)Cost(h(x),y) = -ylog(h(x)) - (1-y)log(1-h(x))
120-29+4*1+3*2=-191/(1+e-19) =
5.60279640614594E-009  ≈ 0.0000000056
5.60279646470696E-009  ≈ 0.0000000056
230-126.14417460221472E-0066.14419347772537E-006
240-90.00012339460.0001234022
350-20.1192029220.126928011
140-132.26032429790357E-0062.26032685249035E-006
54130.95257412680.0485873516
56190.99987660540.0001234022
46150.99330714910.0067153485
571120.99999385586.14419347772537E-006
36110.73105857860.3132616875
ΣCost(h(x),y)0.4957537573

If you look at the column marked h(x), you can see that for all the training examples where y=0, we have small values of h, that is h(x) < 0.5 and for all examples where y=1 we have large h(x), that is h(x) ≥ 0.5. This is what we wanted since we predict y=1 when h(x) ≥ 1 and y=0 when h(x) < 0.5. The accuracy of the predictions is reflected in the low sum of costs.

The total cost function for this h(x) is J(θ)= 1/10 * 0.4957537573  ≈ 0.0496.

No comments:

Post a Comment