When do predictions about students' future performance help?
One common use of statistical models is to make predictions about the future. These predictions are ubiquitous in our tech-infused lives today. Meteorologists deliver predictions about the weather to you via the apps on your phone. Your credit card company might flag a transaction as suspicious using a prediction from a model. Your bank might approve or deny a loan based on the predicted probability you’ll default. And if you’ve ever made a household budget in Excel, you’re just making predictions about future expenses.
Educators often encounter predictions, too. Large-scale assessments like NWEA’s MAP Growth make predictions about students’ expected growth. Learning apps and computer-adaptive tests (CAT) will use predictions about students’ level of proficiency to determine what question(s) to present next. And VVAAS – Virginia’s Visualization and Analytics Solution – can, among other things, predict a given student’s probability of passing a certain SOL test (Virginia’s annual statewide assessments) given their testing history.
More broadly, if we have the historical data, we can fit statistical models to predict basically anything we want about a student – how many absences they’ll have, how many days they’ll be suspended, whether they’ll drop out of school, if they’ll pass their end-of-course Algebra test, etc.
The more important question to me is not whether we can predict some student-level (or teacher-level or school-level) occurrence or attribute, but whether we should. And I don’t mean should here in the sense of some ethical, Minority-Report precog hand-wringing over predictions. What I mean is whether it’s actually worth the neuronal cycles to noodle over them.
Usually, it’s not.
Predictions are useful if they inform an action. This might be an action taken by an automated system, or it might be a manual action that a human has to, uh, manually implement. For instance, your credit card company might automatically send you a message if it predicts that a transaction is fraudulent. (I’ve gotten more than one of these while filling up at a Buc’ees gas station in the middle of the night driving to see my parents in Florida.) Or you might decide to grab a raincoat if your weather app predicts it’ll rain this afternoon.
But let’s say we can predict, in the fall of a given school year, whether a student – we’ll call her Emma Woodhouse – will pass her Algebra assessment in the spring. And let’s even assume that this prediction is pretty accurate. What action does this inform? If our model says she has a 95% probability of passing her test, what do we do with that data? Do we not teach her because she’ll be fine? What if she has a predicted 10% probability of passing? Do we also not teach her because there’s no way she’ll pass? What about at 50% probability?
The problem here is that there’s a fundamental mismatch between the unit of the prediction and the unit of the intervention. If I get an alert that says there may be a fraudulent transaction on my credit card, I can intervene to stop that transaction. The problem is that someone else is using my card, and my intervention directly stops someone from using my card. If I get an alert that says it’s going to rain, I can intervene by wearing a raincoat. The problem is that I might get wet, and wearing a raincoat stops me from getting wet. If I get an alert that says Emma Woodhouse is likely to fail her Algebra assessment, and I’m her teacher, I can intervene by…getting her to pass her test? What does that even mean?
Let’s take a step back and think about what “passing an algebra test” represents. We might reasonably argue that this means that Emma is proficient in algebra. But this is also kinda vague. More concretely, it might mean that Emma has mastered some set of standards that experts have determined represent the fundamental concepts of algebra. If we think about it this way, “algebraic proficiency” is a latent construct that can only be measured indirectly, via some expert-defined standards. We can then make some inference about Emma's algebraic proficiency using her performance on these standards-based questions. So on an algebra test, we might measure Emma’s ability to factor a polynomial, to graph a linear equation, to solve a system of equations, etc., and then we infer her proficiency in algebra from her performance on these questions.
What does this have to do with predictions? Well, if we're predicting Emma's probability of passing her algebra test, we’re making a prediction about the latent construct, but the unit of intervention is the standard (or even some indicator within the standard). The prediction that Emma has a 50% probability of passing her algebra test doesn’t tell me what standards she has performed well or poorly on, and so it doesn’t actually tell me what I should do to help her. It would be like me going in for a physical and my doctor telling me that I’m “in poor health” but not giving me any specific health indicators.
All of that said, there are obviously cases where predictions are useful for educators. The computer-adaptive testing example I gave before feels like a good use of predictions about student performance/proficiency. In these testing systems (and learning applications), the systems are constantly estimating a student’s proficiency, then showing them questions that they believe align with their proficiency. These predictions are useful because they automatically trigger a useful action – the estimated proficiency (the prediction) tells the system what question is “best” for the student, and the system then gives that question to the student.
I suppose a school could set up a system where students who have predicted probabilities of passing their year-end assessment between, say, 50-60% (“bubble kids”) are automatically given extra tutoring so they’re more likely to pass, but that honestly feels like more of an intervention for the school (i.e. it will make the school’s numbers look better) than one for the students. It definitely has a tail-wagging-the-dog vibe to me.
So, I don’t think these types of predictions are always useless, but we want to be sure we’re not being seduced by some clean, tidy metric that purports to neatly summarize a problem but ultimately doesn’t give us much to act on. As a rule of thumb, if you can’t draw a direct line from the prediction you’re getting to a specific action you’d take, you should question whether the prediction is actually helpful.