Homeworks (40%): Both math and programming problems
Miniexams (15%): Two during the semester
Midterm Exam (15%): Linear and Logistic Regression, Naive Bayes, SVMs (subject to change)
Final Exam (25%): Everything else (Nearest Neighbors, Neural Networks, Decision Trees, Boosting, Clustering, PCA, etc.), plus pre-midterm topics (subject to change)
Gradescope Quizzes (5%): Short multiple-choice question quizzes conducted in random (possibly all) lectures to encourage class attendance and attention. We will take the best 10 quiz scores.
Lecture 2: MLE & MAP
Maximum Likelihood Estimation (MLE)
Goal: Find the parameter θ that maximizes the likelihood of the observed data D.
P(D∣θ)=n=1∏NP(xn∣θ)
Question: Given this model and the observed data, what is the most likely value of θ?
θ^MLE=argθmaxP(D∣θ)
Bayesian Learning
Question: How to incorporate prior knowledge?
Bayes’s Rule:
P(θ∣D)=P(D)P(D∣θ)P(θ)
where P(θ) is the prior distribution over θ, P(D∣θ) is the likelihood of the data given θ, and P(D) is the marginal likelihood of the data, which is independent of θ. Therefore,
P(θ∣D)∝P(D∣θ)P(θ)
Maximum A Posteriori Estimation (MAP)
θ^MAP=argθmaxP(θ∣D)=argθmaxP(D∣θ)P(θ)
Lecture 3-4: Linear Regression
SVD Decomposition:
A=UΣV⊤
U∈Rm×m and V∈Rn×n are orthogonal matrices
Σ∈Rm×n is a diagonal matrix with singular values σ1≥σ2≥⋯≥σr>0 on the diagonal, where r is the rank of A.
The squared singular values σi2 are the eigenvalues of A⊤A and AA⊤.
V is the right singular vectors of A, as well as the eigenvectors of A⊤A.
U is the left singular vectors of A, as well as the eigenvectors of AA⊤.
RSS(w)Set∇wRSSLeast Mean Squares solutionwLMS=n=1∑N(yn−w⊤xn)2=n∑(yn−w⊤xn)(yn−xn⊤w)=n∑(−2ynw⊤xn+w⊤xnxn⊤w)+const=w⊤(X⊤X)w−2(X⊤y)w+const=2(X⊤X)w−2(X⊤y)=0=(X⊤X)−1(X⊤y)
Probabilistic Interpretation of Linear Regression: