Introduction to Machine Learning

Homework #2

Due Thursday March 1st at 11:30am

Perceptrons

Suppose you want to train a two-input perceptron to perfectly classify the following dataset (the class labels are either +1 or -1):

2, 6, -1
1, 3, +1
3, 9, +1

Prove that the perceptron cannot learn this task, using inequalities expressed in terms of the weights w0, w1, and w2. That is, for each instance, write an inequality that must be true for the perceptron to classify it correctly. Then show that they cannot all be true at the same time.

Perceptrons

How much information does it take to describe a two-input perceptron? The standard way uses a vector of three real weights: [w0, w1, w2]. Because the decision surface is a line, you might think that you could represent a two-input perceptron with just two reals, a slope and intercept, like this: [w0/w2, w1/w2, 1]. What's wrong with this idea? That is, what information is lost with this representation?

Logistic Regression

The MNIST dataset is a well-studied collection of handwritten digits. It is often used to test multi-class classification algorithms, where there is one class for each of the 10 digits (0 - 9).

I've made two files available for you:

The raw MNIST data, which is a text file containing 10,000 rows. Each row contains 28 * 28 = 784 integers in the range 0 to 255. Each integer is the pixel value from a 28 x 28 image of a handwritten digit. Every row corresponds to a vector in the dataset that is to be clustered.
The labels for the raw data are in a file with 10,000 rows. The first row contains the correct digit label for the first row in the raw data. The second row is the label for the second instance, and so on.

Implement 2-class logistic regression using gradient descent as outlined in the lecture notes. You can do either batch or stochastic versions of the algorithm. You will only use your algorithm for this dataset, so you can hard-wire in the number of instances and the size of each instance. The goal is not to write a generic version of the algorithm (though you can if you wish). The goal is to understand how it works on real data.

Use your algorithm to learn a classifier that determines whether an input image is an 8 or one of the other digits and record the classification accuracy on the training set (the full dataset I provided). Note that you'll have to come up with some stopping criterion, which could be to simply run for a fixed number of iterations and then quit.

After training is complete, create a 28x28 image of the learned weights. The largest weight (most positive) should map to black, the smallest weight (most negative) to white, and the other weights should linearly interpolate between those extremes.

Submit the following:

Your code
The training set accuracy on the 8-vs-others classification problem
The weight image along with a brief paragraph of what the image tells you about what was learned.