fnenu's Notes on Online Courses: Machine Learning: Working out the Cost Function

I'll give an example, and then work out the cost function for three different pairs of values for θ₀ and θ₁.
The cost function J( θ₀,θ₁) is used to measure how good a fit a line is to the data. If it's a good fit, then it's going to give you better predictions. The line we're trying to make as good a fit as possible is h_θ(x)= θ₀ + θ₁x. The idea is to minimise the value of J. (I'm not going to talk about how to minimise it here, just how to calculate it from given values of θ₀ and θ₁.)

Training examples

x	y
2	1
2	4
5	4
5	8
9	8
9	11

Below I'll work out the cost function for various values of theta₀ and theta₁. The values are just chosen as illustrations and have not been chosen in any particular way.

Working out the Cost Function

First, we'll take θ₀=0 and θ₁=0. This gives us h(x)=0, which we draw on the graph as the horizontal line, y=0 along the x-axis. The crosses are the points given by the training examples.

x	y	h(x)=0	h(x)-y	( h(x) - y )²
2	1	0	0-1=-1	1
2	4	0	0-4=-4	16
5	4	0	0-4=-4	16
5	8	0	0-8=-8	64
9	8	0	0-8=-8	64
9	11	0	0-11=-11	121
			Sum of ( h(x) - y )²	282

Now the cost function is J(θ₀, θ₁)=1/2m Σ( h(x)-y)². Now m= number of values in the training set =6, Σ( h(x)-y)² just means the sum of all the values in the squared column above. Putting the values in gives

J(0, 0)=1/(2*6) (282) = 282/12=23.5.

We want to minimise J so let's try some other values of θ₀and θ₁. Next, we'll try θ₀=0 and θ₁=1. This gives h(x)=x which is the line y=x.

x	y	h(x)=x	h(x)-y	( h(x) - y )²
2	1	2	2-1=1	1
2	4	2	2-4=-2	4
5	4	5	5-4=1	1
5	8	5	5-8=-3	9
9	8	9	9-8=1	1
9	11	9	9-1=-2	4
			Sum of ( h(x) - y )²	20

So J(0, 1)=1/(2*6) (20) = 20/12 = 1.67 (2 decimal places). This is better than the answer we had for the previous values of θ₀and θ₁ as we are trying to minimise the cost function.

Finally, we'll do θ₀=1 and θ₁=1. This gives h(x)=x+1 which is the line y=1+x on the graph.