Introduction to Machine Learning

CMSC 478

Spring 2018

Contact Information

Instructor: Tim Oates, ITE-336, oates@umbc.edu, x5-3082, Office hours: TBD

TA: Akshay Subramanya, ITE-334H, akshayv1@umbc.edu, Office hours: Mon/Fri 11:00am - 12:30pm

Modes of communication: I try to be responsive via email and prefer that over phone calls. If I'm in my office outside of office hours meeting with someone, please come back later. If I'm in my office with the door open and I'm alone, feel free to ask if I have time to talk then.

We'll also use Slack for discussions. I'll create channels for various topics, but find that most people conduct most conversations in the "general" channel. Because the discussions can get quite long, if you want a response from either me or the TA, be sure to tag our usernames in your post so that we'll be sure to see it.

Course Mechanics

Grades will be based on a midterm exam, a final exam, a project, and five homework assignments. The homeworks are crucial for solidifying what you learn in class. The final will focus on the material not covered on the midterm, but may ask you to integrate what you've learned during the entire semester.

The weights on the various items are as follows:

Grades will be on a standard 10 point scale based on your total points for the course as follows:

Total Points (P)
Grade
P >= 90A
90 > P >= 80B
80 > P >= 70C
70 > P >= 60D

There may be an upward curve (i.e., giving some number of additional points to everyone in the class), but there may not. If you want a specific grade, get the appropriate number of points from the table above.

Late policy: All assignments (homeworks and the various components of the course project) must be turned in at the beginning of class on the date that they are due. I understand that students have many demands on their time that vary in intensity over the course of the semester. Therefore, you will be allowed 3 late days without penalty for the entire semester. You can turn in 3 different assignments one day late each, or one assignment 3 days late, and so on.

Note that your first late day begins as soon as I start class on the day something is due. Your second late day begins at 11:31am on the following day, and so on.

Once these late days are used, a penalty of 33% will be imposed for each day (or fraction thereof) an assignment is late (33% for one day, 66% for two, 100% for three or more). Late days cannot be used for exams, but they can be used for anything else with a due date.

Hard copy is required for all assignments. If you are not on campus for the start of class (what?!?!) and want to email your assignment to me or the TA to establish that it was done on time, that's OK. However, you must hand in hard copy before the assignment will be graded.

Project: The project is meant to give students deeper exposure to some topic in machine learning than they would get from the lectures, readings, and discussions alone. Those projects that are most successful often blend the student's own research with machine learning, e.g., by applying machine learning techniques to a problem in some other area, or by bringing an insight from some other area to a problem in machine learning. However, projects need not involve ties with ongoing research. Many good projects in the past have investigated the application of existing algorithms to a domain/dataset of interest to the student, such as Texas Hold'em, the stock market, sporting events, and so on. Students can come up with their own project ideas or they can come see me and we'll brainstorm project ideas.

Projects may be done by individuals or teams of two people. However, teams of two will be expected to do significantly more work than what is expected of an individual project. More information on projects can be found here.

Textbook

It's hard to find a good text for an introductory course in machine learning. It turns out that Hal Daume down at UMCP is writing what looks like a very good text. The downside is that it is not complete, but the parts that we will cover are in good shape. The upside is that it's free! You can get it here: CIML.

There are a few other online texts that we'll draw from. You'll find links to them below in the schedule of topics, with specific sections that you should read.

Tools

You can use any programming language and any toolset for homeworks and your projects, but python has (almost) become the default language for machine learning at scale. Therefore, all of the examples that I do in class where we run an actual algorithm will be done using scikit-learn. A very easy way to get everything you may need is to install anaconda. It has python, scikit, and Jupyter notebooks for working with data and presenting results.

Syllabus

This syllabus is subject to small changes, but due dates and exam dates will not change.


Class
Date
Topic
Events/Readings
1 Tue  30 Jan Course overview, what is machine learning? Tools, decision trees Chapter 1 of CIML
2 Thu  1 Feb Decision trees, over-/under-fitting, geometry Book chapter on DTs (thanks to Tom Mitchell)
3 Tue  6 Feb k-nearest neighbor, k-means clustering Chapter 3 of CIML
4 Thu  8 Feb Perceptron Chapter 4 of CIML
Homework 1 assigned
5 Tue  13 Feb Perceptron, logistic regression Slides, thanks to Tom Dietterich
6 Thu  15 Feb Logistic regression, maximum entropy models
7 Tue  20 Feb Neural networks, backpropagation Chapter 10 of CIML
Homework 1 due
Homework 2 assigned
8 Thu  22 Feb Neural networks, improved training, regularization
9 Tue  27 Feb Deep learning, tools Deep learning book Chapter 14 through 14.3
10 Thu  1 Mar Deep learning Homework 2 due
Deep learning book Chapter 9 through 9.3
11 Tue  6 Mar Deep learning for language and vision
12 Thu  8 Mar Reinforcement learning Parts of chapters 3, 4, 5, and 6 of the online RL book
Project proposal due
13 Tue  13 Mar Reinforcement learning RL Problem
Dynamic Programming
Temporal Difference
14 Thu  15 Mar Reinforcement learning
Tue  20 Mar Spring Break
Thu  22 Mar Spring Break
15 Tue  27 Mar Probabilistic learning, generative and discriminative models Chapter 9 of CIML
Homework 3 assigned
16 Thu  29 Mar Probabilistic learning, naive Bayes, graphical models
17 Tue  3 Apr Midterm Exam on content of classes 1 - 14
18 Thu  5 Apr Probabilistic learning, Bayes nets Bayes nets slides
19 Tue  10 Apr Probabilistic learning, latent Dirichlet allocation LDA slides
LDA summary by David Blei
20 Thu  12 Apr Buffer day Homework 3 due
21 Tue  17 Apr Linear methods Chapter 7 of CIML

Homework 4 assigned
22 Thu  19 Apr Linear methods Project mid-term report due
23 Tue  24 Apr Kernel methods Chapter 11 of CIML
SVM slides
24 Thu  26 Apr Kernel methods
25 Tue  1 May Ensemble learning Chapter 13 of CIML
Bias/Variance slides
Boosting slides
Homework 4 due
Homework 5 assigned
26 Thu  3 May Ensemble learning
27 Tue  8 May Dimensionality reduction - LDA, PCA, ICA Chapter 4 Section 3 of ESL for LDA
Chapter 15 section 2 of CIML for PCA
LDA and PCA slides
28 Thu  10 May Dimensionality reduction
29 Tue  15 May Final exam review Homework 5 due
Thu  17 May Final Exam 10:30AM - 12:30PM
Thu  24 May Final project writeup due