Introduction to Machine Learning
Homework #2
Due Thursday March 1st at 11:30am
Perceptrons
Suppose you want to train a two-input perceptron to perfectly classify
the following dataset (the class labels are either +1 or -1):
2, 6, -1
1, 3, +1
3, 9, +1
Prove that the perceptron cannot learn this task, using inequalities
expressed in terms of the weights w0, w1, and w2. That is, for each
instance, write an inequality that must be true for the perceptron to
classify it correctly. Then show that they cannot all be true at the
same time.
Perceptrons
How much information does it take to describe a two-input perceptron?
The standard way uses a vector of three real weights: [w0, w1, w2].
Because the decision surface is a line, you might think that you could
represent a two-input perceptron with just two reals, a slope and
intercept, like this: [w0/w2, w1/w2, 1]. What's wrong with this idea?
That is, what information is lost with this representation?
Logistic Regression
The MNIST dataset is a
well-studied collection of handwritten digits. It is often used to
test multi-class classification algorithms, where there is one class
for each of the 10 digits (0 - 9).
I've made two files available for you:
- The raw MNIST data, which is a text file
containing 10,000 rows. Each row contains 28 * 28 = 784 integers in
the range 0 to 255.
Each integer is the pixel value from a 28 x 28 image of a
handwritten digit. Every row corresponds to a vector in the dataset
that is to be clustered.
- The labels for the raw data are in a file with
10,000 rows. The first row contains the correct digit label for the
first row in the raw data. The second row is the label for the
second instance, and so on.
Implement 2-class logistic regression using gradient descent as
outlined in the lecture notes. You can do either batch or stochastic
versions of the algorithm. You will only use your algorithm for this
dataset, so you can hard-wire in the number of instances and the size
of each instance. The goal is not to write a generic version of the
algorithm (though you can if you wish). The goal is to understand how
it works on real data.
Use your algorithm to learn a classifier that determines whether an
input image is an 8 or one of the other digits and record the
classification accuracy on the training set (the full dataset I
provided). Note that you'll have to come up with some stopping
criterion, which could be to simply run for a fixed number of
iterations and then quit.
After training is complete, create a 28x28 image of the learned
weights. The largest weight (most positive) should map to black, the
smallest weight (most negative) to white, and the other weights should
linearly interpolate between those extremes.
Submit the following:
- Your code
- The training set accuracy on the 8-vs-others classification
problem
- The weight image along with a brief paragraph of what the image
tells you about what was learned.