Cogsci 109 Assignment 4
Due: Tues Nov 12, 2013 10:59AM
For this homework, you will want to look at the example code on the
course website (linregress.m, predyval.m, linregresslargerexample.m,
leaveoutcode.m, myfitdemo.m, myfitfun.m)
1. Linear Regression
For this question you will use the data files:
xdata
ydata
xdata holds the x-coordinates of a set of data and ydata holds
the corresponding y-coordinates. You might want to look at the matlab command : load
and hold on
.
Put the following code in a script file called regression.m
a) write the code to plot the 2-dimensional data as green stars. (use load and plot commands)
b) write the code to fit the best (in the least squares sense) linear
fit to the data. (use the \ command)
c) write the code to draw this on the graph of the data as a red line. (keep the green stars showing)
(Use the hold on and plot commands)
d) write the code to fit the best (in the least squares sense) quadratic
(second order) fit to the data.
e) write the code to draw this on the graph of the data as a blue line.
(keep the red line and the green stars as well.)
f) write the code to fit the best fifth order fit to the data and draw
this on the graph of the data as a black curve (again keep the
previous curves and the data). Store this final figure (with green stars and red/blue and black lines/curves) as hw4q1.jpg.
g) Which of the three line/curves looks
like a better fit to you? Put this as a comment in the code "I think
that the _________ (fill in linear, quadratic or fifth-order) model is
a better fit to the data."
2. Overfitting Calculation
This question builds on the one above and uses the same data.
In this question we will try to find an answer to the question
"Is the data better fit by a 2nd order fit or a 5th order fit"
You may use the code in
hw4q2.m
You will need to write the code for the function
predictyval.m
that takes as input two variables, xleftout
, the x value that you want a predicted y value for and parameter vector
(e.g. mnpqrb from lecture)
that supplies the parameters of the fit.
The function must output the predicted y value at x value, xleftout.
Note you can do this with one function predictyval.m
to handle
both the quadratic and fifth order fit, but if you prefer you can write
two separate functions.
Which fit gives a lower root mean square "leave one out error"?
Make sure that the numbers agree with your intuition from the graphs.
Put this in a comment in your code predyval.m with the following
words (with the blanks filled in with either "The quadratic fit" or
"The fifth order fit"
_______ Gives a lower root mean
square leave-one-out error. Therefore I think that ____________ will
generalize better to unseen data from the same distribution.
3. Nelder Mead algorithm written homework -- Please bring this to class with you on Nov 12
Consider the contour plot below (where contours are drawn at even
spacing of the Error function). Assume that the contour plot is a
good representative of the error surface (that the surface varies
smoothly between the contours) and that the highest contour is the
outside one.
a) Draw the next position of the simplex after one step of
the Nelder-Mead algorithm (assume an alpha of 1).
b) If the new point had been the new best point, how would the simplex change?
c) If the new point had been worse than the remaining two, how would the
simplex change?
4. Function fitting with non-linear parameters in MATLAB
In this program you will use Matlab's fminsearch routine to fit
the function y=ax + bsin(cx) + d to the data above (xdata,ydata). Note
for full credit, you must do this the efficient way where you only use
the Nelder-Mead algorithm to search for the one non-linear parameter (as
in the
myfitdemo/myfitfun example in the class notes NOT as in the
myfitdemoslow.m/myfitfunslow.m)
You will use the programs
hw4fitdemo.m
and
hw4fitfun.m
but you will have to add lines to hw4fitfun.m to do all the work.
Save the figure of the final fit in hw4q4.jpg
5. Checking Generalization
You will now do the same leave-one-out computations you did in
question 2 above to estimate the future error rate with this type of
model fit to the data.
You will use the program
hw4q5.m
You will need to add two lines (indicated in the code)
%%%%%%%% write next line to compute yp
%%%%%%%%% write next line to compute yleftout (you can write it in one
%%%%%%%%% line of code (you don't need to write a predictyval function
You will also need to comment out the incremental plotting parts in
hw4fitfun.m
(as mentioned in that file)
Save your figure of the overlaid different fits (for leaving different points out) as hw4q5.jpg.
How does this model (y=ax + bsin(cx)+d) compare to the polynomial models from
Question 2? Write your answer in a comment in your hw4q5.m code.
6. K-means written homework -- Please bring this to class with you on Nov 12
Consider the example below. Cluster centers (or means) are depicted as X's, data points are dots. The box on the top (with 1 in the corner) represents a
system with 8 data points and K=2 cluster centers. To help you answer the question, the dividing line midway between the two centers is shown by a dashed line.
a) In the box in the center (with 2 in the right corner), draw the next position of the means (in the next iteration) and
b) in the box on the bottom (with 3 in the corner), draw the next position after that (in the next iteration) of the means.
You may want to continue drawing dashed lines to keep track of how the space is being divided (and show us your work).
Initial State
b)
c)
c) In general (not just for the example above) when will the K-means algorithm stop? (what condition must be fulfilled? - be as specific as you can). You do not have to write it as an equation but you may.
What to Hand In
Hand in your script files regression.m, predictyval.m, hw4q2.m (which
you may have modified) hw4fitdemo.m, hw4fitfun.m, hw4q5.m as well as
hw4q1.jpg, hw4q4.jpg, hw4q5.jpg on TED .
Hand in your written homework at the start of class Tuesday, November 12th.