The cost function J( θ0,θ1) is used to measure how good a fit a line is to the data. If it's a good fit, then it's going to give you better predictions. The line we're trying to make as good a fit as possible is hθ(x)= θ0 + θ1x. The idea is to minimise the value of J. (I'm not going to talk about how to minimise it here, just how to calculate it from given values of θ0 and θ1.)
Training examples
x | y |
---|---|
2 | 1 |
2 | 4 |
5 | 4 |
5 | 8 |
9 | 8 |
9 | 11 |
Below I'll work out the cost function for various values of theta0 and theta1. The values are just chosen as illustrations and have not been chosen in any particular way.
Working out the Cost Function
x | y | h(x)=0 | h(x)-y | ( h(x) - y )2 |
---|---|---|---|---|
2 | 1 | 0 | 0-1=-1 | 1 |
2 | 4 | 0 | 0-4=-4 | 16 |
5 | 4 | 0 | 0-4=-4 | 16 |
5 | 8 | 0 | 0-8=-8 | 64 |
9 | 8 | 0 | 0-8=-8 | 64 |
9 | 11 | 0 | 0-11=-11 | 121 |
Sum of ( h(x) - y )2 | 282 |
Now the cost function is J(θ0, θ1)=1/2m Σ( h(x)-y)2. Now m= number of values in the training set =6, Σ( h(x)-y)2 just means the sum of all the values in the squared column above. Putting the values in gives
J(0, 0)=1/(2*6) (282) = 282/12=23.5.
We want to minimise J so let's try some other values of θ0 and θ1. Next, we'll try θ0=0 and θ1=1. This gives h(x)=x which is the line y=x.
x | y | h(x)=x | h(x)-y | ( h(x) - y )2 |
---|---|---|---|---|
2 | 1 | 2 | 2-1=1 | 1 |
2 | 4 | 2 | 2-4=-2 | 4 |
5 | 4 | 5 | 5-4=1 | 1 |
5 | 8 | 5 | 5-8=-3 | 9 |
9 | 8 | 9 | 9-8=1 | 1 |
9 | 11 | 9 | 9-1=-2 | 4 |
Sum of ( h(x) - y )2 | 20 |
So J(0, 1)=1/(2*6) (20) = 20/12 = 1.67 (2 decimal places). This is better than the answer we had for the previous values of θ0 and θ1 as we are trying to minimise the cost function.
Finally, we'll do θ0=1 and θ1=1. This gives h(x)=x+1 which is the line y=1+x on the graph.
x | y | h(x)=1+ x | h(x)-y | ( h(x) - y )2 |
---|---|---|---|---|
2 | 1 | 3 | 3-1=2 | 4 |
2 | 4 | 3 | 3-4=-1 | 1 |
5 | 4 | 6 | 6-4=2 | 4 |
5 | 8 | 6 | 6-8=-2 | 4 |
9 | 8 | 10 | 10-8=2 | 4 |
9 | 11 | 10 | 10-11=-1 | 1 |
Sum of ( h(x) - y )2 | 18 |
So J(1, 1)=1/(2*6) (18) = 18/12 = 1.5, which again is an improvement.
Nice explanation! Please do more such posts!!
ReplyDeleteThis was awesome. Great help!
ReplyDeletethank for share!
ReplyDeleteHi Is it possible to get J(theta 0, theta 1) = 0 ?
ReplyDeleteThanks,
Yes. Imagine points (0,0), (1,1), and (2,2) with a line of f(x) = 0 + x to fit them (so theta(0) = 0 and theta(1) = 1, like in the second example above).
Deleteh(x) - y for each example would be 0. Summing the squares of zero = 0. Perfect fit!
Finally I got answers fo my questions. Thanks
ReplyDeleteGlad it helped.
DeleteThis was really well done.
ReplyDeleteAwesome
ReplyDeleteWhat is the cost function in your training set when X=2 and Y=1.
ReplyDeleteamazing content really informational !!
ReplyDeletecheck our site too :pytholabs
what if the cost function = 0?
ReplyDeletegreat. helped me.
ReplyDelete