Tuesday, 11 October 2011

Machine Learning: Working out the Cost Function

I'll give an example, and then work out the cost function for three different pairs of values for  θ0 and θ1.
The cost function J( θ01) is used to measure how good a fit a line is to the data. If it's a good fit, then it's going to give you better predictions.  The line we're trying to make as good a fit as possible is hθ(x)= θ0 + θ1x. The idea is to minimise the value of J. (I'm not going to talk about how to minimise it here, just how to calculate it from given values of  θ0 and θ1.)

Training examples

xy
21
24
54
58
98
911

Below I'll work out the cost function for various values of theta0 and theta1. The values are just chosen as illustrations and have not been chosen in any particular way.

Working out the Cost Function



First, we'll take θ0=0 and θ1=0. This gives us h(x)=0, which we draw on the graph as the horizontal line, y=0 along the x-axis. The crosses are the points given by the training examples.

xyh(x)=0h(x)-y( h(x) - y )2
2100-1=-11
2400-4=-416
5400-4=-416
5800-8=-864
9800-8=-864
91100-11=-11121
Sum of ( h(x) - y )2282

Now the cost function is J(θ0, θ1)=1/2m Σ( h(x)-y)2. Now m= number of values in the training set =6, Σ( h(x)-y)2 just means the sum of all the values in the squared column above. Putting the values in gives


J(0, 0)=1/(2*6) (282) = 282/12=23.5.


We want to minimise J so let's try some other values of  θand θ1. Next, we'll try θ0=0 and θ1=1. This gives h(x)=x which is the line y=x.

xyh(x)=xh(x)-y( h(x) - y )2
2122-1=11
2422-4=-24
5455-4=11
5855-8=-39
9899-8=11
91199-1=-24
Sum of ( h(x) - y )220

 So J(0, 1)=1/(2*6) (20) = 20/12 = 1.67 (2 decimal places). This is better than the answer we had for the previous values of  θand θ1 as we are trying to minimise the cost function.

Finally, we'll do θ0=1 and θ1=1. This gives h(x)=x+1 which is the line y=1+x on the graph.

xyh(x)=1+ xh(x)-y( h(x) - y )2
2133-1=24
2433-4=-11
5466-4=24
5866-8=-24
981010-8=24
9111010-11=-11
Sum of ( h(x) - y )218

So J(1, 1)=1/(2*6) (18) = 18/12 = 1.5, which again is an improvement.

13 comments:

  1. Nice explanation! Please do more such posts!!

    ReplyDelete
  2. This was awesome. Great help!

    ReplyDelete
  3. Hi Is it possible to get J(theta 0, theta 1) = 0 ?
    Thanks,

    ReplyDelete
    Replies
    1. Yes. Imagine points (0,0), (1,1), and (2,2) with a line of f(x) = 0 + x to fit them (so theta(0) = 0 and theta(1) = 1, like in the second example above).

      h(x) - y for each example would be 0. Summing the squares of zero = 0. Perfect fit!

      Delete
  4. Finally I got answers fo my questions. Thanks

    ReplyDelete
  5. What is the cost function in your training set when X=2 and Y=1.

    ReplyDelete
  6. amazing content really informational !!

    check our site too :pytholabs

    ReplyDelete
  7. what if the cost function = 0?

    ReplyDelete