Machine Learning Course SS 13 U Stuttgart

See my general teaching page for previous versions of this lecture.

Exploiting large-scale data is a central challenge of our time. Machine Learning is the core discipline to address this challenge, aiming to extract useful models and structure from data. Studying Machine Learning is motivated in multiple ways: 1) as the basis of commercial data mining (Google, Amazon, Picasa, etc), 2) a core methodological tool for data analysis in all sciences (vision, linguistics, software engineering, but also biology, physics, neuroscience, etc) and finally, 3) as a core foundation of autonomous intelligent systems.

This lecture introduces to modern methods in Machine Learning, including discriminative as well as probabilistic generative models. A preliminary outline of topics is:

  • motivation
  • probabilistic modeling and inference
  • regression and classification methods (kernel methods, Gaussian Processes, Bayesian kernel logistic regression, relations)
  • discriminative learning (logistic regression, Conditional Random Fields)
  • feature selection
  • boosting and ensemble learning
  • representation learning and embedding (kernel PCA and derivatives, deep learning)
  • graphical models
  • inference in graphical models (MCMC, message passing, variational)
  • learning in graphical models
Students should bring basic knowledge of linear algebra, probability theory and optimization.
  • This is the central website of the lecture. Link to slides, exercise sheets, announcements, etc will all be posted here.
  • See the 01-introduction slides for further information.
Schedule, slides & exercises
date topics slides exercises
(due on 'date'+1)
08.04. Introduction & Organization 01-introduction (notation )
15.04. Regression
linear regression, non-linear features (polynomial, RBFs, piece-wise), regularization, cross validation, Ridge/Lasso, kernel trick
02-regression e01-intro
22.04. Classification
classification, discriminative function, logistic regression, binary \& multi-class case, conditional random fields
03-classification e02-linearRegression
29.04. Classification (cont.) e03-classification
13.05. Breadth of ML ideas 04-ideas e04-PCA-PLS
27.05. Breadth of ML ideas (cont.) e05-WEKA-boosting
03.06. SVMs (by Vien Ngo) 05-vien-SVM e06-SVM-NN
10.06. Deep Learning
Probability basics
17.06. Bayesian Regression & Classification 07-BayesianRegressionClassification e08-GaussianProcesses
24.06. Graphical Models 08-graphicalModels e09-graphicalModels
01.07. Inference in Graphical Models 09-graphicalModels-Inference e10-inference
08.07. Learning with Graphical Models 10-graphicalModels-Learning e11-EM
../data/gauss.txt ../data/mixture.txt
15.07. Summary 13-MachineLearning-script
[1] The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani and Jerome Friedman. Springer, Second Edition, 2009.
full online version available
(recommended: read introductory chapter)

[2] Pattern Recognition and Machine Learning by Bishop, C. M.. Springer 2006.
(especially chapter 8, which is fully online)

[email by Stefan Otte:] This is a nice little (26 pages) linear algebra and matrix calculus reference. It's used for the ML class in Stanford. Maybe it's interesting for your ML class. link

[email by Stefan Otte:] Feature selection, l1 vs. l2 regularization, and rotational invariance Paper: link Comments: link

[email by Stefan Otte:]
ich habe vor kurzem einen sehr guten Google Tech Talk zum Thema
Ensembles gesehen. In dem Talk "The Counter-Intuitive Properties of
Ensembles for Machine Learning, or, Democracy Defeats Meritocracy"
argument W. Philip Kegelmeyer (vereinfacht gesagt), dass man fuer
Supervised Learning Ensembles benutzen soll. Vll. ist das fuer den ein
oder anderen Studenten von Interesse.

Hier ein paar meiner Notizen:
- Boosting: overfitting, sensitive to outliers.
- "Ensembles of experts": diversity of experts --> diversity in error
--> robustness/no overfitting
- "Out of Bag validation" (OOB) to determine ensemble size (vs.
learning the weights for the voting (which does not scale))
- unstable classifiers (e.g. decision trees) are a good fit for ensembles
- decision trees without pruning work well with ensembles. (pruning is
normally expensive!)
- "Ensembles of bozos": LOTS of bozos which train on tiny subsets (1%)
of the data
- traditional < experts < bozos
- training bozos is faster than training one traditional sage!

Beste Grüße,