Machine learning is fashionable. But what is it and how can it be put to good use in the social sciences? This introductory course provides an overview of some of the most important machine learning techniques and their social science applications. Those applications can be grouped into several sub-categories:
Preparing data for statistical analysis: Sometimes data are so voluminous that hand-coding them is near-impossible. We can leverage clever computer algorithms to do the coding for us. For example, we could use an artificial neural network to detect if tweets, of which there are millions, come from a social bot or from a legitimate source.
Doing statistical analysis: As social scientists we are used to building models with numerous parametric assumptions. What if we would let algorithms leverage the data to obtain the model for us? That way, we may detect complex contingencies not previously theorized.
Pattern recognition: How do variables hang together and what groups do our cases form in terms of those variables? For example, political parties take positions on numerous issues. Can we group those issues into ideologies? Based on the issues can we place the parties into clusters?
Anomaly detection: Some phenomena such as war are fortunately rare. However, this makes analyzing them challenging. A whole subfield of machine learning is dedicated to the detection of such rare events or anomalies.
The Course
Through lectures and group exercises, the course shows applications in each area. After discussing the general principles of machine learning, the course spends three days on discussing supervised machine learning techniques (relevant for application areas 1 and 2), one day on pattern recognition (relevant for application area 3), and one day on anomaly detection (application area 4). Each day, students will learn the intuition behind the techniques, how they can be implemented in R, how should be interpreted, and how they can be applied in the social sciences. The course is designed to minimize the level of mathematical complexity, although students can always look up the details in vignettes made available for the course. Classification as well as regression tasks are considered. In the former, we seek to predict class membership; in the latter, we predict a numeric score. Interpretation is key and we spend a great deal of time on various metrics and their implementations.
The course covers the following algorithms/techniques: (1) k-nearest neighbors; (2) probabilistic learning (including naïve Bayes, linear, and quadratic discriminant analysis); (3) classification and regression trees, random forests, and model trees; (4) regression with regularization; (5) artificial neural network analysis; (6) boosting; (7) principal component analysis; (8) cluster analysis; (9) SMOTE; and (10) support vector machines.
Course leader
Marco Steenbergen is professor of political methodology at the University of Zurich. His methodological interests span choice models, machine learning, measurement, and multilevel analysis.
The course assumes a basic familiarity with probability theory and with linear regression analysis. Prior familiarity with machine learning or related fields (e.g., NLP) is not required. On the other hand, a good knowledge of R is essential for the successful completion of the course. Students should know how to read in data, how to transform variables, how to work with model objects, and how to create graphs.
Course aim
This introductory course provides an overview of some of the most important machine learning techniques and their social science applications.
Credits info: 10 EC
The Summer School cannot grant credits. We only deliver a Certificate of attendance, i.e. we certify your presence
If you consider using Summer School workshops to obtain credits (ECTS), you will have to investigate at your home institution (contact the person/institute responsible for your degree) to find out whether they recognize the Summer School, how many credits can be earned from a workshop/course with roughly 35 hours of teaching, no graded work, and no exams.
Fee info
CHF 700: Reduced fee: 700 Swiss Francs per weekly workshop for students (requires proof of student status).
CHF 1100: Normal fee: 1100 Swiss Francs per weekly workshop for all others.
For further information, please click the "LINK TO ORIGINAL" button below.