In this course, you will learn how to create groups from data. For example, you might want to detect different types of web users based on a set of variables that contain information on aspects of online activities (e.g., content preferences, time spent online, etc.). Or you try to understand vaccination hesitancy by identifying groups of people with similar sets of concerns. In these kinds of scenario, we often face datasets with many observations and large numbers of potentially relevant variables. It would be impossible to find groups of similar cases by just browsing the data or skimming tables.
Discovering groups in your data can be achieved by performing cluster analysis or latent class analysis. Additionally, these techniques allow for a fruitful description of the phenomenon of interest and enable follow-up analyses, for example, on how group membership (e.g., type of web users) is associated with other variables (e.g., gender, SES, life satisfaction, personality).
Cluster analysis can be described as a bottom-up approach, where various algorithms are deployed to find similar cases (e.g., persons, organizations, schools, countries) in your data. Similar cases will be grouped to create a given number of maximally different clusters.
Latent class analysis, in contrast, can be seen to use a top-down approach where we assume a probabilistic model to explain group membership. Latent class analysis works with the actual distribution of your data using a statistical model. You can include covariates and the procedure will provide goodness of fit measures, which can be used to compare different solutions.
This course contains an introduction to both cluster analysis and latent class analysis. We will spend two days on cluster analysis, two on latent class analysis, and one day on more advanced techniques to group cases, including machine learning variants.
You will work on data provided by the course instructor. We encourage you strongly, however, to bring your data if possible.
Upon completion of this course, you will have a good understanding of cluster analysis and latent class analysis, their differences, advantages, and disadvantages. You will know how to run these models on your data.
Robin Samuel, University of Luxemburg
While the course is designed to be introductory, participants should be familiar with univariate and bivariate statistics. If you have never been exposed to bivariate correlation and chi-square (e.g., in the context of cross-tabs, also known as contingency tables) this is probably not the course for you. Ideally, you will have some elementary knowledge of (OLS) regression as well.
We will use the software R. R allows running both cluster analyses and latent class analyses. While some familiarity with R would be useful, this is not strictly necessary as long as you have some knowledge of working with other statistical software packages using syntax (e.g., Stata or SPSS) and are willing to learn.
This course contains an introduction to factor analysis and latent class analysis. Both analytical techniques allow detecting concepts or groups in your data that are not directly measurable or observable.
The Summer School cannot grant credits. We only deliver a Certificate of attendance, i.e. we certify your presence
CHF 700: Reduced fee: 700 Swiss Francs per weekly workshop for students (requires proof of student status).
CHF 1100: Normal fee: 1100 Swiss Francs per weekly workshop for all others.
For further information, please click the "LINK TO ORIGINAL" button below.