Is your data worth its cost?
Back in high school, I ran track and cross country. Every Sunday, my teammates and I would meet somewhere to do our long runs, which typically ranged anywhere from 6 to 15 miles, depending on where we were in our training cycle.
I grew up in Northern Virginia, which, despite its (deserved) reputation as a hellscape of chain restaurants and perpetual traffic, is actually home to some beautiful places to run – including the Manassas Battlefield Park and the Prince William Forest Park.
Although I liked running with my teammates much more than I liked running alone, the traffic in NoVa is truly the stuff of nightmares, particularly given that I was driving in the pre-podcast, pre-audiobook days, and I could easily exhaust an entire burned CD of early-2000s emo music in the time it took to get to wherever it was we were running. And so I developed a heuristic for deciding if I would meet up with my friends or run on my own:
The run needed to be longer than the round-trip driving time.
If we were going to run for 6 miles, assuming we’re running at a 7:30/mile pace, that’s 45 minutes of running. So the round-trip driving time needed to be less than 45 minutes, otherwise it wasn’t worth going.
We should think about data in education in a similar way.
Often, we hear that educators need to “use data.” Ok, sure, great. But what data? And what are we using it for? If you can’t answer those questions, then you’re not off to a great start. Or worse, if whoever tells you that you need to “use data” can’t answer those questions, then you should run away, quickly, and don’t look back.
But even if you can answer those questions, then the next step is to think about the costs of getting the data and the benefits of using it. If the benefits don’t outweigh the costs, then it’s not worth collecting the data.
If it takes me an hour to drive somewhere for a 10 minute run, I’m not going.
This is just very basic cost/benefit analysis, return-on-investment stuff, albeit in education, the costs and benefits aren’t typically in neat and tidy metrics like dollars, and so we have to think abstractly about things like person-hours, social capital, opportunity cost, and student learning.
Say you work at a school – you’re a teacher, an administrator, whatever. The actual role doesn’t matter much. And say someone at your school maintains a “master” spreadsheet that contains ALL OF THE STUDENT DATA. (n.b. that I don’t think you should recruit a spreadsheet to do the job of a database, but I also acknowledge that people use the tools available to them). Suppose that someone has the idea to include student attendance data in this spreadsheet, and the plan is to update attendance data in the sheet weekly. Is this worth doing?
Well, maybe. What decision(s) do I want to make using this data? This will help me frame the benefits. If I want to use it to determine who ought to receive an intervention that tends to improve attendance for many students, then that’s a benefit. If I want to use it just in case someone maybe asks me about it and I want an answer readily available, then that’s…less of a benefit.
But we can’t stop here, even if we determine that the data will help us make a decision that will (probably) benefit students. Remember, I liked running with my friends more than I liked running alone. If I only considered the benefits, I would have spent an even larger proportion of my teenage life in the driver seat of my 1993 Toyota Corolla.
Because there’s a cost to gathering this data. Maybe you can simply download and copy/paste readily-available csv extracts from your student information system (SIS) into your master spreadsheet and everything just works because someone has engineered your spreadsheet appropriately. Great – that’s a relatively low cost (at least until the functionality breaks, or the report format changes, and you have to fix it). But maybe that’s not how it works. Maybe you’re asking people to do lots of manual data entry and reconciliation each week, and that kind of work is tedious and unrewarding and not at all what anyone got into the field of education to do. That feels like a pretty steep cost to me.
My job is to work with data. I tend to think it’s useful. But not all data is equally useful, and not all data is so useful that it justifies the (potentially high) cost of collecting, cleaning, and analyzing it.
This isn’t to say that “high-cost” data isn’t worth using. Well-conducted classroom observations can be very “expensive” in that they take lots of administrator time and effort, but they provide some of the most beneficial data a school can get. Inversely, some of the extracts offered by digital learning applications require little more than a click of a “Download” button to retrieve, but the data some of them provide is not even worth this trivial amount of effort.
The best advice I can offer is to consider both the costs and the benefits of a potential data element before you decide you want to start tracking it. Otherwise, you might find yourself proverbially in deadlocked traffic on I-66, listening to My Chemical Romance’s Three Cheers for Sweet Revenge straight through for the third time, all for a 2 mile jog.