fnenu's Notes on Online Courses: Logistic Regression Cost Function

I'm going to calculate the cost function for two different lines. The first, shown in green, does not correctly classify all the points as some are on the wrong side of the line. The blue line classifies the points correctly but is probably not the best possible line.

The cost function for logistic regression is

Note that when y=1, the part inside the square brackets which is Cost(h(x),y) is (omitting superscripts),

Cost(h(x),y)	=	-ylog(h(x)) - (1-y)log(1-h(x))
	=	-1*log(h(x)) - (1-1)log(1-h(x))
	=	-log(h(x))

and when y=0,

Cost(h(x),y)	=	-ylog(h(x)) - (1-y)log(1-h(x))
	=	-0*log(h(x)) - (1-0)log(1-h(x))
	=	-log(1-h(x)).

I'll work out Cost(h(x),y) in the table and then finish off underneath. For the green line, θ₀=-3, θ₁=-1 and θ₂=1. Why are the values like this and not θ₀=3, θ₁=1 and θ₂=-1?

When we predict y, we use the value of h(x). When the value h(x) ≥ 0.5, we predict y=1 and when h < 0.5 we predict y=0. In the graph below, which is of the sigmoid function, the red line is the function h(x). The horizontal axis is z=θ^Tx (note that the axis is not x). Now h(x) ≥ 0.5 = g(θ^Tx ) when z=θ^Tx ≥ 0.

From wiki: http://en.wikipedia.org/wiki/File:Logistic-curve.svg

Going back to our line, I wanted y=1 above the line and y=0 below it. I chose it that way. I could have chosen it the other way too as it's a pretty terrible line! This means we need the coefficients to be such that θ^Tx ≥ 0 where we want y=1, that is above the green line. Checking you find that -3-x₁+x₂ ≥ 0 works. (You can check it does work by putting in the point (2,5) which is above the line and checking it satisfies the inequality. If it doesn't switch the signs of all the coefficients to make it work!)

Please note that log in the calculations is natural log, sometimes written ln.

x₁	x₂	y	θ^Tx = θ₀+θ₁x₁+θ₂x₂ = -3-x₁+x₂	h(x)=g(θ^Tx)	Cost(h(x),y) = -ylog(h(x)) - (1-y)log(1-h(x))
1	2	0	-3-11+12=-2	1/(1+e^-2) = 0.119202922	- log(1-0.119202922) = 0.126928011
2	3	0	-2	0.119202922	0.126928011
2	4	0	-1	0.2689414214	0.3132616875
3	5	0	-1	1/(1+e^-1) = 0.2689414214	0.3132616875
1	4	0	0	1/(1+e^-0) = 0.5	0.6931471806
5	4	1	-4	0.01798621	4.0181499279
5	6	1	-2	0.119202922	- log(0.119202922) = 2.126928011
4	6	1	-1	0.2689414214	1.3132616875
5	7	1	-1	0.2689414214	1.3132616875
3	6	1	0	0.5	0.6931471806
				ΣCost(h(x),y)	11.0382750722

The values marked in red correspond to the points on the wrong side of the line. Now we have the sum of the costs for each of the training data, we can calculate J. Note that as there are 10 training examples, m=10 so our cost,

= 1/10*11.0382750722 ≈1.1038.

For the line 0=-29 +4x₁ +3x₂, we want above the line to give y=1, so we have to write it so that
-29 +4x₁ +3x₂ ≥ 0 is true above the line, which it is. This gives us θ₀=-29, θ₁=4 and θ₂=3.

x₁	x₂	y	θ^Tx=θ₀+θ₁x₁+θ₂x₂ = -29+4x₁+3x₂	h(x)=g(θ^Tx)	Cost(h(x),y) = -ylog(h(x)) - (1-y)log(1-h(x))
1	2	0	-29+41+32=-19	1/(1+e^-19) = 5.60279640614594E-009 ≈ 0.0000000056	5.60279646470696E-009 ≈ 0.0000000056
2	3	0	-12	6.14417460221472E-006	6.14419347772537E-006
2	4	0	-9	0.0001233946	0.0001234022
3	5	0	-2	0.119202922	0.126928011
1	4	0	-13	2.26032429790357E-006	2.26032685249035E-006
5	4	1	3	0.9525741268	0.0485873516
5	6	1	9	0.9998766054	0.0001234022
4	6	1	5	0.9933071491	0.0067153485
5	7	1	12	0.9999938558	6.14419347772537E-006
3	6	1	1	0.7310585786	0.3132616875
				ΣCost(h(x),y)	0.4957537573

If you look at the column marked h(x), you can see that for all the training examples where y=0, we have small values of h, that is h(x) < 0.5 and for all examples where y=1 we have large h(x), that is h(x) ≥ 0.5. This is what we wanted since we predict y=1 when h(x) ≥ 1 and y=0 when h(x) < 0.5. The accuracy of the predictions is reflected in the low sum of costs.

The total cost function for this h(x) is J(θ)= 1/10 * 0.4957537573 ≈ 0.0496.

fnenu's Notes on Online Courses

Thursday, 27 October 2011

Logistic Regression Cost Function

No comments:

Post a Comment

Blog Archive