Learning analytics and machine learning in higher education with Mike Sharkey
The higher education sector has been using machine learning for some time, mostly to predict if students are at risk of dropping out of a program. In this series, I wanted to include an interview with someone who had been involved in this work, to see what L&D could learn from it.
Mike Sharkey is one of the leaders in this area. His most recent past role was as Vice President of Analytics at Blackboard. It was great to be reminded by Mike in this interview that learning is a complex area and there are so many factors at play. Because of this it’s not a simple area for the application of machine learning.
During the interview, Mike and I also briefly talk about privacy issues with analytics. As workplaces start to adopt more data-driven approaches to learning, there is a lot that we can learn about the work that has already been done in higher education, including how privacy issues can be approached.
Download the how artificial intelligence is changing the way L&D is working eBook
To go along with the podcast series on design thinking and L&D, we have released an eBook with transcripts of all the interviews. The eBook also gives a brief explanation of what AI is and an overview of how it is being used in L&D.
In the eBook you will learn:
- Some of the jargon behind the technologies e.g. what data scientists mean when they talk about ‘training a model’.
- How AI is being used in L&D today to gain insights and automate learning.
- Why you should be starting to look at using chatbots in your learning programs.
- How you can get started with recommendation engines
Subscribe using your favourite podcast player or RSS
Useful links for the podcast
- Find out more about Mike Sharkey
- Find out more about Data and graphs
- This example of a university course that is using the AI we discussed
Transcript - Learning analytics and machine learning in higher education with Mike Sharkey
Robin: Mike, you come from a higher education background. What are some of the trends you're seeing in higher-ed and analytics at the moment?
Mike: Well, it's a really interesting and creative space, and I'm very happy to be part of it. The biggest trend that I’m looking at has to do with impact, and what kind of impact data and analytics can have in higher-ed.
The best way to illustrate this is to look at comparable industries, where data and analytics have made a big impact. Industries like the financial sector have been doing quantitative stock market analysis for 60-plus years. Manufacturing has been using methods for data-driven quality and predictions for a long time. Even more recently, e-commerce. There's a lot of positive data and analytics work in those industries.
Why data-driven approaches in education are challenging
Mike: Then you look at industries where it hasn't been taken up. Education and healthcare are probably two industries where it really hasn't fulfilled the promises that it ought to have. The differences to me are obvious. The industries where data is working well – e.g. e-commerce, manufacturing and finance – have very quantifiable, measurable outcomes. I buy a stock for a dollar, I sell it for a dollar fifty, I've made a profit on it. Manufacturing is measured in number of units, throughput, cycle times, defect rates. e-commerce, people always talk about, ‘Oh we'll do this like Amazon does, where it says what other people like, and you also bought this thing …’ Well, there's a ‘buy’ button on Amazon. People click it and they buy something; it’s quantifiable.
If you look at education, there's no buy button. There's no point at which I now click a button and say, ‘Hey, I now know how to add fractions.’ Healthcare, there's no ‘I'm cured’ button. Because education doesn’t have that same quantitative outcome that other industries do, I don't think education has lived up to what it can accomplish with data and analytics. It certainly hasn't lived up to what many of the hype artists in the industry have claimed it can do.
Robin: I think of the learning experience as the input that doesn’t always have a direct correlation to the outcome. There are a whole lot of complicated factors involved.
Let's digress for a moment into healthcare. A friend had started in plant genetics research, and then she decided to retrain as a doctor. Within months of her training, she thought, ‘Oh, I thought this was a science.’ There are a whole lot of things about medicine that are just rules of thumb. There are things that work and they know it works, but they haven't got the actual research to prove it. She was really shocked, coming from a science background.
Education is mission critical
Mike: What you're bringing up is a good point, which is mission criticality. Let's go back to my Amazon example. If Amazon makes a mistake – where you're looking for ceiling fans and it suggests that you buy a power drill – how mission critical is that mistake? Not very. If I'm in healthcare or education, and I suggest something that's incorrect, it could be potentially bad, right? Suggesting someone takes a different course or takes a different pill that might be wrong can have bad outcomes. Mission criticality has a huge role.
I live in Phoenix, Arizona, in the US. We had an incident about three or four months ago with a driverless car that killed a pedestrian. As they dug further into the reasoning of what happened, the car picked up the pedestrian in plenty of time; it was the software determining whether this was something it needed to stop for or not, that didn’t work. It was following its rules, and the rule said, ‘No, I'm not going to stop. That's something I can keep going with,’ something like a bush or a plastic bag. The worst-case scenario happened. Mission criticality has a huge effect on the impact of data, which is why using data in education takes a bit more.
Robin: That is a really great example of the hype of over-promising, of the technology not quite being there yet. That's a nice way of thinking about it.
Mike: Gartner’s Hype Cycle is a bit of a truism. I did a blog post on this about a year or two ago, and talked about the hype cycle with analytics and higher education. I tend to be an optimist; I always put a positive spin on things. Obviously, the hype cycle is a negative thing. By ‘the hype cycle’, I'm saying: someone says, ‘Oh, we have this machine learning algorithm that can predict X, Y, and Z, and will improve retention at your institution by 15 percentage points.’ There's a lot of hyperbole.
There is a lot hype around machine learning and artificial intelligence
Mike: It's bad, because what happens is that people buy into it. They believe it at face value. When the results come in, or the lack of results happen, they are let down. They say, ‘Oh this analytics thing is a bunch of hogwash, and it doesn't give us the results we want.’ But being an optimist, I believe that we have to pass through this hype cycle in order to get to the real stuff. People are starting to not believe everything that's claimed. Start smaller; bite off a smaller project that can show some results. It may not change the world, but you can show the impact. l take steps toward the outcome, as opposed to believing that you’ll get amazing outcomes in just six months.
Robin: The example you just used, of the retention, is a really common one that's used in higher education. Are you saying that's not often the reality of what's being delivered with the technology?
The key is the what you do with the data – the intervention
Mike: There's an easy reason for that: it’s because it's not the technology that makes the difference, it's the intervention – it's the human part. If you break down a project of predicting at-risk students, there's your core retention-use case. I’m taking data about my students, I run it through a predictive model. The model says, ‘Of your thousand students, here are the 87 who are most at-risk. You should reach out to them.’ That's what the technology does.
The technology, in most cases, doesn't do the reaching out. It doesn't do the intervention. That's where the rubber meets the road. It's the intervention; it's reaching out to someone that is critical.
Who reaches out? Is it the faculty member? Is it an advisor? Do you send a message to the student? What is the expected treatment? Are they going to get extra tutoring? Are they going to get a loan? Are they going to drop a class and take another one? There are a lot of different paths to take. The algorithm might tell you that these 87 students are at-risk, but it's not going to tell you why they're at-risk.
Half of the students might be at-risk because they don't get the subject matter, they're struggling in the economics course and they don't get the material. The other half might be at-risk because their child is sick, or they're travelling with work, or a financial aid cheque didn't come through and they can't pay their tuition, or a pay cheque didn't come through and they can't pay their rent. That's what has to be treated.
Identifying at-risk students is good, but the reason that the machine can't solve everything is because the machine can't have that interaction and intervention with a student.
Robin: I think about machine learning and AI as bringing out what it means to be human. Being human isn’t just about finding patterns in data, or seeing patterns. It’s about being empathic and understanding what the people need: how do I help someone through this problem? The intervention by an individual faculty member needs to be personalised.
Data complements human decision making
Mike: Yes. The phrase I use is that ‘data complements the human decision-making process. It doesn't replace it.’ Data can enhance decision making. In my case of one thousand students and here are the 87 who are at-risk: if that was left up to the human, the human might say, ‘Well, I'm just going to call all one thousand students.’ And then they take a few weeks to cycle through them. Or they might say, ‘I'm going to take the 500 students who have the lowest grades in class, and I'll filter my list that way.’
There's a lot of value in saying, ‘You know what? I'm going to give you the 87 students who I think are most at-risk and would most benefit from your help today.’ I don't want to underestimate the value of that, because from an operational standpoint, that can be a huge thing.
I've worked at and with many large universities that have tens of thousands and hundreds of thousands of students. I'm always thinking about scale. To be able to identify those at-risk students without having to cycle through the whole list is tremendous. But that's just identifying how they might be at-risk; it’s not intervening and treating. When I say ‘data complements the human decision-making process’, it can make the process more efficient. It helps, but it's not going to solve or obviate the human need to help with intervention.
Robin: This is probably partly the problem of using machine learning in these sorts of more nebulous areas, like education.
Machine learning is about diagnosing, but then what is also needed is getting the prescription right. We have been working with a client who works in healthcare education. She's actually done a whole lot of research on remediation for doctors who were struggling with exams. But what's really fascinating is that she developed a whole complicated mental model that she uses when she is having a conversation with a doctor in training that is a struggle.
To be able to diagnose their problem is a holistic thing. It isn't always about their performance. Sometimes it’s about the sort of patients that they are seeing, or their cultural background, their family situation. There were just so many factors that she was bringing into that.
Mike: Yes, absolutely.
Robin: We have been building a study planner app based on her approach, that is driven by a neural network. It's still a little bit experimental because we’re trying to recreate her thinking. But it's interesting because essentially it’s the second part of what you’re talking about. It’s trying to figure out what is the right invention.
I wonder whether or not it's easier to just have a human intervention.
It takes time to develop the expertise to build the right solution
Mike: There was an article about a year or so ago about a large online class at a technical school. I believe it was at Georgia Tech, in the United States. The professor worked with a company to build basically an AI chatbot, as we would call it these days. The article said, ‘Oh, these students were so happy, and they got help, and some of them didn't even know it was a chatbot. They thought it was a teaching assistant assigned to the class.’ But there was a little line buried in the article: the instructor had taught this class for 10 years, the same class for 10 years. You mentioned the mental model; he knew all of the pitfalls that students hit in this class. He was able to seed the chatbot with all of these different scenarios because he knew what could go wrong.
Now, that's wonderful, it’s a good testament to dedication and understanding your art. But what that isn't, is scalable. That worked for one course that he taught for ten years. The brass ring in higher-ed technology is scalability. You want to be able to do this for one student, 10 students, 100 students, or 10,000 students. A lot of this stuff can't really be scaled. The learning part of machine learning may work in one discipline, but not in another.
There are often examples where the technology has worked well but it may not work for everybody in the same way.
Robin: Also, even if he's been teaching that for 10 years, if he's only doing it for one semester, each lecture, he's maybe only actually done it 10 times.
Robin: I've heard academics talk about how they might only get to one bit of content once a year, and then they don't have the chance to be able to refine that until another year. It’s one of the frustrating things about the area, is that the cycles are sometimes so long.
Robin: Have you seen any really great results from machine learning approaches in higher-ed, where you think it's actually worked really well?
Mike: There are cases where it has worked. I’m thinking about the work I’ve done in the past, and that peers and competitors have done. The results are very real, but moderate or modest results. Retention is something that's always important. If I have one thousand students and 900 students come back, then I've got a 90% retention rate.
When you look at doing these interventions, these models around student risk, what I've found and seen is that if you are not doing any intervention, machine learning is not going to help. If you have a pervasive model for supporting students and you are machine learning on top of that, you might see a couple of percentage points. You might see retention improve by one or two percentage points.
Even small improvements in retention can have a huge impact
Mike: If you weren't doing anything and you start using machine learning data techniques, plus you add new student advisors to do the intervention, you might see a five or six percentage point improvement. Those aren't small numbers.
I was at a university where one per cent translated to nine million dollars a year in benefits. Not only is the school benefiting, the students are benefiting because they're getting assistance. I don't want to minimise those numbers. But the problem is, they don't sound sexy. No-one's going to spend a lot of money on a tool that's going to get them a one per cent improvement.
To answer the question: yes, there are institutions I've worked with that have done these initiatives, done them well, and have seen improvements in retention rate.
Robin: That's scale factor again. But also, what you’re talking about with the integrated approach is the whole diagnostic leading into the intervention.
Mike: What you find is, people say, ‘Why does one school succeed and not another?’ One of the most common hallmarks I see is a positive data culture. The institution, usually from top down – usually it's a president or a provost – he or she isn't just buying into the next shiny thing. They understand data and they understand the benefits.
You will see things like quantitative reports at staff meetings. When people talk about problems, they'll say, ‘Well get me the data. Show me, is this a problem or not?’ When you see those kinds of characteristics of leaders, that tends to correlate with successful projects.
Robin: Yes. In workplaces we're finding what I call data-driven organisations. When you start to talk about measurement, logging of learning activities, people will automatically say, ‘Can we get it into our other data systems? Because we don't want it only in a learning system, we want to bring it into our whole data system.’ They've got this data-driven culture you're talking about.
Robin: A couple of times, when I've talked to people about the possibilities of predictive performance and predictive technologies, people have quickly said, ‘Oh that's sounding very Big Brother.’
Having technology possibly watching people's behaviour and performance all the time, does that bring up a whole lot of concerns around privacy for people? In the higher-ed sector, how have you seen some of that, sort of, play out? What have been some key decisions around data and privacy?
Privacy issues and data
Mike: It's absolutely a topic. I think privacy in higher-ed has been a topic even before this push on data and machine learning. What's happened with GDPR impacts is what's going on as well. We talked before about the mission critical nature of education, and then there is the ethics of knowing. If I have a predictive model that says here are the 87 students who are most at-risk, and if I think those 87 students aren't going to pass the class, do I have an obligation to take them out of that class? It's a bit of an interesting ethical question. The ethics of privacy are absolutely in play.
Respecting learners’ data
Mike: I'm happy to be in an industry like higher education where privacy is not overlooked. Every once in awhile, you get issues and data set problems. But for the most part, folks are very respectful of privacy. Personally, I want a balance. As someone who's taught and has taken classes, and has children who take classes, I'm very respectful of privacy. It’s about agency, the ‘who is looking at it, and what are they going to do with it?’ I'm fine with the school looking at the student's data. The student is at the school and they've given some agency to that school to be able to use the data to help them.
Very different from selling it to third parties. The balance I want is to respect that agency relationship so that a school can look at data, as long as it's for the benefit of the student. But what I don't want is things locked down so tight that it stifles innovation. That is definitely a worry.
One of the biggest things we talk about is opt-in versus opt-out. I've heard people say that if you're going to use a student's data to do predictive models, you need them to opt in first to use it. Anyone who deals with data knows: if you've got 100% of your students, and you ask them to opt in, you're going to get a single digit percentage of them who will opt in to say, ‘Yes, you can use my data.’
That will stifle any machine learning approach, because it needs a lot of data to work. Go for a balance where privacy is respected, as it should be, but not stifled so much that it hurts the technology and the ability to help students succeed.
Robin: There's a really powerful word that you're using in there, Mike. It's respect. That sort of being respectful of people's individual privacy, and what you can actually help them to achieve. That's a really nice point.
The ethical dilemma you talked about of how you help those people, or what you do about those people that are foundering is fascinating. To just drill into it a moment: last time I was tutoring at a university, this particular university had a minimum attendance requirement to be able to sit your final assessments. There was a disagreement with one student where they had not attended enough; the student then failed, based on performance.
It would have been easier for that student to have been told they weren't eligible to sit, because of their attendance. To have gone along with that, rather than given a failure result for performance.
Mike: A lot of times, dropping out or withdrawing is an okay solution – you've got a lot of this going back to mission criticality – but then you look at things like: that has a financial impact, the student may lose the tuition fee. Now you're making a decision that is impacting that person's wallet, and that is not a small decision to make.
Robin: Only, I think that particular student was struggling at that moment as well. It’s a really interesting set of dilemmas around how you deal with those at an individual level. But also, scale levels in higher education, it's just a different problem as well. Are you seeing the trend that the data is collected automatically, without needing to opt-in? That's possibly just part of the condition of being part of the course?
Mike: I think it should be. If you are taking a class at a university, it shouldn't come as a surprise that someone has the ability to look at your data. When I say ‘look at your data’, this is sort of the biggest kind of misnomer or misconception. I'm not looking at Jane Smith's data and when she logged in, right? We're looking in aggregate. When people say ‘look at my data’, no-one is snooping, trying to eavesdrop on you and what you are doing. Your data is an aggregate with other students’ data, and we use models and correlations and statistical techniques to try and find meaning in those data.
This is where some of the breakdown happens. We'll say, ‘Yes, the university is looking at your data to help students succeed.’ Then they get personal, ‘Well that's my data, why are you looking?’ In reality, your data is thrown into a pot with data of other students to try and make meaning out of it.
As a consumer myself, I understand. Whenever someone says they have access to my data, there is a concern. As consumers, we should have that concern for ourselves. But I firmly believe that should be an understood part of the relationship: you are at this university and so data is used in the tools.
Data in workplaces is different
Mike: It's not like a corporate relationship. In a corporate situation someone might say, ‘How can you read my emails? How did you find out that I was looking for another job?’ Well, you work at a company, it's their email system, you should have no expectation of privacy around it. I think it's a little different with a university. We're not going to be reading emails. But to say that you sent 15 emails this week, and we look at that activity, and see if it's indicative of the fact that you may or may not be successful in the class, absolutely I'll look at that. I can look at other metadata about what you do without looking at the content of what you didn’t do.
Robin: On another podcast with Lori Hoffman, she talked about it as being almost a digital body language; looking at that, at a sort of combined pattern level, rather than individual level.
Mike: No, but it's very easy to cross that threshold. There are universities that are taking GIS, to see how active you are on campus. Are you using the library? Are you using the student rec centre? I understand that, but now you have my physical location. Now you start to cross that threshold, I don't care how nice you say you're going to be, you're tracking my geolocation, that's getting really personal. It's very easy to start crossing between metadata and something that is a lot more sensitive.
Robin: Geolocation is interesting. It's one of the things I personally have had issues with: being tracked by Google. I don't mind anything online being tracked. It's actually the location. It's interesting how the physicality changes some of our perceptions of things.
Mike: Oh yes, you go to the Google My Maps page and you see where you've been. Google can show you where you've been, because I carry my phone around with me and it has that. It's a good thing I'm not doing anything illegal.
Robin: Higher-ed has been working with machine learning for a lot longer than corporate learning has, Mike. What do you think are the big lessons that could be transferred from what's been happening in higher-ed into corporate learning?
Getting started – focus on outcomes first
Mike: So, the biggest thing is focusing on the outcome first. What are you going to do with it? The ‘so what?’ The ‘what's the problem you're trying to solve?’ When I talk about machine learning, I actually take technology out of play.
Let’s say we're talking about student risk and retention. Instead of machine learning, I have a crystal ball. I am now a magic person with a crystal ball. Instead of a machine learning model, that will tell me these are the 87 students out of a thousand who are at risk, my crystal ball tells me the 87 out of the thousand. Now I write those 87 students’ names on a list.
There's no more argument over ‘what's your technique?’ ‘How accurate is the model?’ It's a crystal ball, it's 100% accurate. The big question is, what are you going to do with that list of 87 names? Are you going to contact them? Talk to them? Are you going to drop them out of classes? Who is going to take that action? Who is accountable and responsible for taking that action? What is the ultimate goal? Are you trying to get the student to pass a class? To graduate in four years? To get a job?
Realistically, you can take the technology off the table, and focus on the intervention and outcomes. The biggest learning from using data in higher-ed for these purposes is – if you haven't figured that part out first – what's your goal, and what are you going to do with the results of the data? Then you shouldn't be starting with the data technology side of it, because it's only going to confuse things more. I think this lesson can 100% apply on the corporate side.