Statistical Inference and Machine Learning
Everyone has heard of big data. Many people have big data. But only some people know what to do with big data when they have it.
So what’s the problem? Well, the big problem is that the data is big—the size, complexity and diversity of datasets increases every day. This means that we need new technological or methodological solutions for analysing data. There is a great demand for people with the skills and know-how to do big data analytics.
Extract information from large datasets
This free online course equips you for working with these solutions by introducing you to selected statistical and machine learning techniques used for analysing large datasets and extracting information.
Of course, we can’t teach everything in one course, so we have focused on giving an overview of a selection of common methods. You will become familiar with predictive analysis, dimension reduction, machine learning and clustering techniques. You will also discover how simple decision trees can help us make informed decisions and you can dive into statistical learning theory.
Explore real-world big data problems
These methods will be described through case studies that explain how each is applied to solve real-world problems. You can also develop your coding skills by applying the techniques you’ve just learnt to complete hands-on tasks and obtain results.
Just as there are many statistical and machine learning methods for big data analytics, there are also many software packages (see ‘Requirements’ below) that can be used for this purpose. In this course, we will expose you to three such packages, so that you can start to become familiar with using different tools, and can gain confidence in going further with these packages or using others that may come your way.
More courses in the Big Data series
This is the second in a series of four short courses from the ARC Centre of Excellence for Mathematical and Statistical Frontiers at Queensland University of Technology (QUT).
You can also join the other three courses in the series:
You will enjoy this course most and benefit from the learning experience if you have a basic understanding of statistics and mathematics at an undergraduate level.
In this course you will be using the following free tools. Please review the product websites below to ensure your system meets the minimum requirements:
R and R Studio Desktop (open source edition)
You will complete practical exercises using R Studio, so you’ll need to be familiar enough with R to:
- install a package
- import data
- read and run starter code
- develop a solution or read through a solution and gain understanding from it.
NOTE: You must first have a working installation of R to use R Studio.
H2O Flow can be used as a stand-alone package for big data analytics or can be used in conjunction with R. This package will allow you to tackle larger problems that you might encounter in your own work.
WEKA is a popular workbench for machine learning and statistical analysis. It comprises a very wide range of tools that are suitable for big data analysis.
Knowing R, H2O Flow and WEKA will give you a powerful, flexible and scalable set of tools to manipulate and analyse big data.