# Hauptseminar Machine Learning (WS 13/14)

Exploiting large-scale data is a central challenge of our time. Machine Learning is the core discipline to address this challenge, aiming to extract useful models and structure from data. Studying Machine Learning is motivated in multiple ways: 1) as the basis of commercial data mining (Google, Amazon, Picasa, etc.), 2) a core methodological tool for data analysis in all sciences (vision, linguistics, software engineering, but also biology, physics, neuroscience, etc. ) and finally, 3) as a core foundation of autonomous intelligent systems. In this seminar students will present seminal papers from the area of Machine Learning. Background in Machine Learning, e.g. from the Machine Learning course, is necessary.

This advanced seminar will be held completely in English. INFOTEC, cybernetics and other master students are welcome.

Participants have to give a presentation and write a summary paper.

#### Presentation

• 20 min presentation of the paper
• 10 min Q&A
• The other students should be able to grasp the paper afterwards!
• The other students will give you feedback.
• DATES: 15th, 22nd and 29th of January 2014 (check table below).

#### Summary paper

• Do not plagiarize! Writing a summary paper means that your describe, in your own words, the paper’s motivation, contributions, limitations and relations to other work. When refering to the author’s work, say “the authors propose…” or “they developed…”.
• Summary papers must be written in the style of ICML (Int. Conf. on Machine Learning) using their style files (preferrably LaTex). Find these style files online.
• The bibliography should follow scientific standards, preferrably using BibTeX as described in the ICML style.
• total of ~3500 words with the following content
1. Motivation and problem: What was the authors’ motivation for this research. What is the problem they are trying to solve.
2. State-of-the-art and contributions: What was the state-of-the-art BEFORE this paper and what do the authors aim and claim to contribute to the state-of-the-art with this work.
3. Summarize the methods, techniques, theory, algorithms, etc, that they develop.
4. Summarize their evaluation results.
5. Research and discuss the impact that this paper had on later research (e.g. use Google Scholar to find citations of this paper).
6. Add a personal assessment of the paper including critique and suggestions for improvements.
7. DEADLINE: 26th of February 2014.
DateSpeakerSelected Paper
2014.01.15Vincke J.“A View of the EM algorithm that justifies incremental, sparse and other variants”
2014.01.15Scheuefele K.“Active Learning for Parameter Estimation in Bayesian Networks”
2014.01.15Mehlbeer F.“Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data”
2014.01.15Fuchs S.“Active Learning with Statistical Models”
2014.01.22Hirschmann S.“Discovering Hidden Variables: A Structure-Based Approach”
2014.01.22Fleischer L.“Identifying Hierarchical Structure in Sequences: A Linear-Time Algorithm”
2014.01.22Hamann M.“Graphical Models: Structure Learning”
2014.01.29Fontanarosa R.“The infinite Markov Model”
2014.01.29Rupp T.“Knows what it knows: A Framework for Self-Aware Learning”
2014.01.29Ziegenhagel A.“Support Vector Machine Learning for Interdependent and Structured Output Spaces”

#### Papers

NOTE: Currently, some of these papers are rather long journal versions; others much shorter conference papers. We now provided also the corresponding conference papers for long papers, which you can take as basis for your report. (This is except for the seminal historical papers around 70ies or 80ies, for which there typically do not exist shorter ones. But they’re easier to read anyway.)

<

p>

J. Weston, F. Ratle, and R. Collobert: Deep learning via semi-supervised embedding. In Proc.\ of the 25th int.\ conf.\ on machine learning (icml 2008), 2008. [Bibtex]

@InProceedings{ weston:08,
author  = "J. Weston and F. Ratle and R. Collobert",
title = "Deep Learning via Semi-Supervised Embedding",
booktitle  = "Proc.\ of the 25th Int.\ Conf.\ on Machine Learning (ICML
2008)",
year = "2008",
}
I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun: Support vector machine learning for interdependent and structured output spaces. In Proceedings of the twenty-first international conference on machine learning, 104–, ACM, 2004. [Bibtex]

@inproceedings{Tsochantaridis:2004:SVM:1015330.1015341,
author = {Tsochantaridis, Ioannis and Hofmann, Thomas and Joachims, Thorsten and Altun, Yasemin},
title = {Support vector machine learning for interdependent and structured output spaces},
booktitle = {Proceedings of the twenty-first international conference on Machine learning},
series = {ICML '04},
year = {2004},
isbn = {1-58113-838-5},
pages = {104--},
url = {http://doi.acm.org/10.1145/1015330.1015341},
doi = {10.1145/1015330.1015341},
acmid = {1015341},
publisher = {ACM},
address = {New York, NY, USA},
pdf = {http://dl.acm.org/ft_gateway.cfm?id=1015341&ftid=273123&dwn=1&CFID=370986592&CFTOKEN=39715685}
}
I. Tsochantaridis, T. Joachims, T.Hofmann, and Y.Altun: (long version) large margin methods for structured and interdependent output variables. Journal of machine learning research, 6, 1453-1484, MIT Press, 2005. [Bibtex]

@Article{ tsochantaridis:05,
author  = "I. Tsochantaridis and T. Joachims and T.Hofmann and
Y.Altun",
title = "(LONG VERSION) Large margin methods for structured and interdependent output variables",
journal  = "Journal of Machine Learning Research",
volume  = "6",
pages = "1453-1484",
year = "2005",
publisher  = "MIT Press",
pdf={http://machinelearning.wustl.edu/mlpapers/paper_files/TsochantaridisJHA05.pdf}
}
S. Tong and D. Koller: Active learning for parameter estimation in bayesian networks. In In advances in neural information processing systems (nips 2000), 2001. [Bibtex]

@InProceedings{ tong-koller:01,
title = "Active learning for parameter estimation in Bayesian
networks",
author  = "S. Tong and D. Koller",
booktitle  = "In Advances in Neural Information Processing Systems (NIPS
2000)",
year = "2001",
pdf={http://ai.stanford.edu/~koller/Papers/Tong+Koller:NIPS00.pdf}
}
B. Taskar, C. Guestrin, and D. Koller: Max-margin markov networks. In Advances in neural information processing systems (nips 2003), 16, MIT Press, 2004. [Bibtex]

@InCollection{ taskar:04,
author  = "Ben Taskar and Carlos Guestrin and Daphne Koller",
title = "Max-Margin Markov Networks",
booktitle  = "Advances in Neural Information Processing Systems (NIPS
2003)",
volume  = "16",
publisher  = "MIT Press",
year = "2004",
pdf={http://books.nips.cc/papers/files/nips16/NIPS2003_AA04.pdf}
}
C. G. Nevill-Manning and I. H. Witten: Identifying hierarchical structure in sequences: a linear-time algorithm. Journal of artificial intelligence research, 7, 67-82, 1997. [Bibtex]

@Article{ nevillmanning-witten:97,
author  = "Craig G. Nevill-Manning and Ian H. Witten",
title = "Identifying hierarchical structure in sequences: A
linear-time algorithm",
journal  = "Journal of Artificial Intelligence Research",
volume  = "7",
pages = "67-82",
year = "1997",
pdf={http://arxiv.org/pdf/cs/9709102.pdf}
}
R. M. Neal and G. E. Hinton: A view of the em algorithm that justifies incremental, sparse, and other variants. Learning in graphical models, 89, 355–368, 1998. [Bibtex]

@Article{ neal-hinton:98,
title = {A view of the EM algorithm that justifies incremental,
sparse, and other variants},
author  = {Neal, R.M. and Hinton, G.E.},
journal  = {Learning in graphical models},
volume  = {89},
pages = {355--368},
year = {1998},
}
T. P. Minka: Expectation propagation for approximate Bayesian inference. In Proc. of the 17th annual conf.\ on uncertainty in ai (uai 2001), 362-369, 2001. [Bibtex]

@InProceedings{ minka:01-uai,
author  = "T. P. Minka",
title = "Expectation propagation for approximate {B}ayesian
inference",
booktitle  = "Proc. of the 17th Annual Conf.\ on Uncertainty in AI (UAI
2001)",
pages = "362-369",
year = "2001",
pdf={http://arxiv.org/pdf/1301.2294v1.pdf}
}
L. Li, M. L. Littman, T. J. Walsh, and A. L. Strehl: Knows what it knows: a framework for self-aware learning. Machine learning, 82, 399–443, Springer, 2011. [Bibtex]

@Article{ li2011knows,
title = {Knows what it knows: a framework for self-aware learning},
author  = {Li, Lihong and Littman, Michael L and Walsh, Thomas J and
Strehl, Alexander L},
journal  = {Machine learning},
volume  = {82},
number  = {3},
pages = {399--443},
year = {2011},
publisher  = {Springer},
pdf={http://www.research.rutgers.edu/~lihong/pub/Li08Knows.pdf}
}
J. Lafferty, A. McCallum, and F. Pereira: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Int.\ conf.\ on machine learning (icml 2001), 282-289, 2001. [Bibtex]

@InProceedings{ lafferty:01,
title = "Conditional Random Fields: Probabilistic Models for
Segmenting and Labeling Sequence Data",
author  = "J. Lafferty and A. McCallum and F. Pereira",
booktitle  = "Int.\ Conf.\ on Machine Learning (ICML 2001)",
pages = "282-289",
year = "2001",
pdf={http://www.cis.upenn.edu/~pereira/papers/crf.pdf}
}
Kschischang, Frey, and Loeliger: Factor graphs and the sum-product algorithm. Ieee transactions on information theory, 47, 2001. [Bibtex]

@Article{ kschischang-frey-loelinger:01,
author  = "Kschischang and Frey and Loeliger",
title = "Factor graphs and the sum-product algorithm",
journal  = "IEEE Transactions on Information Theory",
volume  = "47",
year = "2001",
pdf= {http://www.psi.toronto.edu/pubs/2001/frey2001factor.pdf}
}
Heckerman: Graphical models: structure learning. In The handbook of brain theory and neural networks (2nd edition), MIT Press, 2002. [Bibtex]

@InCollection{ heckerman:02,
author  = "Heckerman",
year = "2002",
title = "Graphical Models: Structure Learning",
booktitle  = "The Handbook of Brain Theory and Neural Networks (2nd
edition)",
publisher  = "MIT Press",
pdf = {http://mlg.eng.cam.ac.uk/zoubin/course04/hbtnn2e-III.pdf}
}
G. Elidan, N. Lotner, N. Friedman, and Daphne Koller: Discovering hidden variables: a structure-based approach. In NIPS, 479-485, 2000. [Bibtex]

@InProceedings{ elidan-et-al:00,
author  = "Gal Elidan and Noam Lotner and Nir Friedman and Daphne
Koller",
title = "Discovering Hidden Variables: A Structure-Based Approach",
booktitle  = "{NIPS}",
pages = "479-485",
year = "2000",
pdf = {http://www.cs.huji.ac.il/~nir/Papers/ELFK1.pdf}
}
A. P. Dempster, N. M. Laird, and D. B. Rubin: Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society. series b (methodological), 1–38, JSTOR, 1977. [Bibtex]

@Article{ dempster1977maximum,
title = {Maximum likelihood from incomplete data via the EM
algorithm},
author  = {Dempster, Arthur P and Laird, Nan M and Rubin, Donald B},
journal  = {Journal of the Royal Statistical Society. Series B
(Methodological)},
pages = {1--38},
year = {1977},
publisher  = {JSTOR},
pdf = {http://people.cs.missouri.edu/~chengji/mlbioinfo/dempster_em.pdf}
}
D. A. Cohn, Z. Ghahramani, and M. I. Jordan: (long version) active learning with statistical models. In Advances in neural information processing systems, 7, 705–712, The {MIT} Press, 1995. [Bibtex]

@InProceedings{ cohn-ghahramani-jordan:95,
author  = "David A. Cohn and Zoubin Ghahramani and Michael I. Jordan",
title = "(LONG VERSION) Active Learning with Statistical Models",
booktitle  = "Advances in Neural Information Processing Systems",
volume  = "7",
publisher  = "The {MIT} Press",
editor  = "G. Tesauro and D. Touretzky and T. Leen",
pages = "705--712",
year = "1995",
pdf = {http://www.jair.org/media/295/live-295-1554-jair.pdf}
}
M. I. Jordan, D. A. Cohn, and Z. Ghahramani: Active learning with statistical models. MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB} pdf={http://www.textfiles.com/bitsavers/pdf/mit/ai/aim/AIM-1522.pdf, 1995. [Bibtex]

@misc{jordan1995active,
title={Active Learning with Statistical Models},
author={Jordan, Michael I and Cohn, David A and Ghahramani, Zoubin},
year={1995},
publisher={MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB}
pdf={http://www.textfiles.com/bitsavers/pdf/mit/ai/aim/AIM-1522.pdf}
}
M. J. Beal, Z. Ghahramani, and C. Edward Rasmussen: The infinite hidden markov model. In Advances in neural information processing systems 14, MIT Press, 2002. [Bibtex]

@InProceedings{ beal-et-al:02,
author  = "Matthew J. Beal and Zoubin Ghahramani and Carl Edward
Rasmussen",
title = "The Infinite Hidden Markov Model",
booktitle  = "Advances in Neural Information Processing Systems 14",
editor  = "T. Dietterich and S. Becker and Z. Ghahramani",
publisher  = "MIT Press",
year = "2002",
pdf = {http://books.nips.cc/papers/files/nips14/AA01.pdf}
}
H. Akaike: A new look at the statistical model identification. Ieee transactions on automatic control, AC–19, 716–723, For a reprint see E. Parzen et al. (Eds.), \emph{Selected Papers of Hirotugu Akaike}, Springer Series in Statistics, 1998, 1974. [Bibtex]

@Article{ akaike:74,
author  = "H. Akaike",
title = "A new look at the statistical model identification",
journal  = "IEEE Transactions on Automatic Control",
volume  = "AC--19",
pages = "716--723",
year = "1974",
note = "For a reprint see E. Parzen et al. (Eds.), \emph{Selected
Papers of Hirotugu Akaike}, Springer Series in Statistics,
1998",
}