Big Data and Machine Learning Design for Education

University of Manchester, Manchester Institute of Education, MA Digital Technologies and Communication in Education.

“Truth is a matter of the imagination. The soundest fact may fail or prevail in the style of its telling” (Ursula Le Guin, The Left Hand of Darkness)

“The invention of the ship was also the invention of the shipwreck” (Paul Virilio)


‘Big Data’ is a term with a lot of hype around it. If you take this unit hoping by the end to know exactly what big data is, what it is not and exactly what to do with it, you will be disappointed! The first thing to recognise is that there are no certain agreements on what is meant by big data. The next thing to recognise is that if we did manage to agree on what to do with big data – how to capture it and analyse it, by the time we agreed, the tools used to capture and analyse data would have changed and we would have to start again.


What you will hopefully learn from this unit is what factors we need to be aware of that contribute to debates about what big data and machine learning are and how this could be used in ethical ways in education. You will learn how to learn more about these factors. In order to do this, we will have to also decide what we mean by ‘ethical’ and ‘education’, as well as asking what is data, how is it captured, stored, manipulated, analysed, presented and used in the world. We will ask: how does this apply to the education sector, from government policy, through edTech development strategy to classroom practice and how can we take this into account in designing for learning?


Often the debate about data use in education is focused heavily on doing analytics on existing data. But this misses a huge aspect of the data cycle and provenance is central to the first half of the unit – where does the data we use come from? Who has decided what and how data is captured in the first place? This provenance has in turn been informed by earlier data practices, so exploring the full cycle is essential to understanding the debates around analytics.



Week 1

In the first week we begin by looking at how this unit is structured, introducing some ideas we will explore on our journey and what kinds of activities we will do. We start to think about ideas around open data and citizen science, and go on a data hunt using a tool such as GooseChase.


Several key readings for the unit come from Ben Williamson. This week is also a good time to start exploring policy documentation on educational data, such as this one from the UK.


Week 2

This week we start exploring the whole cycle of the data science process. We particularly look at the perspectives of those who see potential in big data approaches to disrupt or at least influence education for the better.


We will focus on emerging examples of how learning analytics and educational data mining are already being used in policy and practice, how these are affecting approaches to research and our practical work will look at how we store data. We introduce Francis Galton, considered by many to be the ‘father of big data’ to explore historical underpinnings of data driven approaches.


Week 3

This week we take a critical perspective, exploring how the implementation of these approaches are playing out in practice, what new possibilities are being enabled and what possible ‘shipwrecks’ are emerging. We take a critical data/software studies approach to exploring these issues in the education context and this week we have three key themes:


    • Representations of data dilemmas
    • Galton: representation and intention
    • Capital, innovation and the economics of edTech


Provenance of data driven tools informs the design and implementation of these tools in educational contexts. An author whose work helps us in taking a critical lense to these tools, who owns them and who funded the development of them, is Audrey Watters. In order to get ‘into the mind’ of designers/developers, we design a data capturing app using paper prototyping techniques.


Week 4

Continuing our exploration of how design intentions determine how edtech tools capture and use data, we explore persuasive design and gamification, and the theories of behaviour and learning which underpin these. Nir Eyal is one of the authors we explore.


Putting these theories into practice, we continue designing our own educational data capture app through producing a wireframe, a common approach for UX developers.


Week 5

In preparation for the first assignment, this week we walk through a case study with critical data studies approach to the edTech application Classdojo. We will draw on all we have learnt about development and design to consider what we need to investigate about the data practices of this tool. We learn about case study methods in research and an approach called multimodal discourse analysis to explore the tool interface, through ‘reverse engineering’ the design intentions.


Week 6

In small groups, every student presents their findings from a mini case study on a tool of their choice. This forms part of the first assessment of the unit (see Assessment section below).


Week 7

Moving into the second half of the unit, we start to explore some of the data repositories that have already captured and made freely available ‘big’ data in the field of education and learning. We will see datasets that are not necessarily ‘big’ but understand how they have potential to be used in ‘big’ ways. Having explored intentions behind the capture of data, we start to look more closely at the tools being used to perform analytics on that data. We will look at a range of techniques that are being applied to education contexts. We learn about the processes of data engineering including data cleaning and scraping, to see some of the decision making that goes into these processes, just as we did with the design process.

We learn how to use a web scraping tool such as mozdeh which scrapes social media sites and performs various analyses such as sentiment analysis on the data.


Week 8

This week explores machine learning in more depth. The technologies in this field underpin the majority of data mining approaches and increasingly, where data is ‘Big’, learning analytics. Cathy O’Neil’s work on Weapons of Math Destruction is central to the themes of this week.



We explore some key terms in machine learning in order to understand the ideas behind some of the approaches informing educational technology design and data driven educational policy. We will play with a machine learning tool called Rapidminer that will allow us to set up machine learning processes without any coding skills. We do this to get a sense of what is happening ‘underneath the hood’. In one week there is very little we can cover of this huge field but experiencing these tools will enable you to better understand the literature describing the use of these tools and techniques.


Week 9

This week heads into ‘machine seeing’ in relation to visualisation. This is usually thought of as coming towards the end of the data cycle as a way of presenting findings. But it can also be used as EDA (exploratory data analysis), a starting point with our data to get a sense of what we can learn from it and what further questions we could ask of it.


Visualisation can be misleading and we explore a range of good and bad examples to develop data literacy, and consider how visualisations can influence educational policy and instructional design. We look at practical tools such as Tableau, and explore key authors such as Adrian Mackenzie and Lev Manovitch.


Week 10

Week 10 is a preparation for the second assignment for the unit assessment. We explore examples of studies that have been done with a ‘big’ educational dataset, investigating the analytic/algorithmic approaches, and select a case to investigate further. Each student will then explore their chosen case and produce a research poster on their findings.


Week 11

In week 11 we hold a research poster session in which students display and discuss with each other their research posters, representing their chosen mini case study.


Week 12

In this final week we discuss and debate the ideas we have explored in the unit. How will you take what you have learnt into your future designs, into informing and debating policy, into your creative future?



The assessment for the unit is divided into two parts: the first part enables students to evidence understanding of design factors in ‘big’ data capture and the second to evidence understanding of the implications for education of data analytics and machine learning.

The first assessment is sub-divided into a presentation on a case study interrogating the design of an edtech tool that captures large amounts of data and each individual student then developing a set of insightful questions relating to each of the presentations.

The second assessment is also sub-divided into two parts: the first part is for each individual student to demonstrate a mini case study exploring an aspect of ‘big data’ analytics, (such as an example of educational data mining, learning analytics, visualisations in learning) through a research poster; the second is a reflective and comparative commentary on all the research posters.

The second assessment is also sub-divided into two parts: the first part is for each individual student to demonstrate a mini case study exploring an aspect of ‘big data’ analytics, (such as an example of educational data mining, learning analytics, visualisations in learning) through a research poster; the second is a reflective and comparative commentary on all the research posters.


Leave a comment

  • Archives

  • Twitter @MADTCE