Thursday, August 13, 2009
Craft and Evidence
It's a great book to argue with, since it's prickly and peculiar and weirdly un-self-aware. I'll admit a temperamental allergy to any argument that smacks of “those manly men with their earthy authenticity,” and the book sometimes shades into that. That said, I have to admit that I laughed out loud, half-guiltily, at his invocation of The Postcard.
(For older or younger readers: in the 90's, before online applications became commonplace, faculty job applicants mailed thick paper applications and waited for paper responses. More often than not, the only response would be The Postcard, which acknowledged receipt – fair enough – and then asked you to check boxes indicating your race and gender. As an unemployed white guy, The Postcard was offensive beyond belief. “Give us an excuse not to hire you.” No, screw you. Then you'd feel like a reactionary prick for being offended, and feel bad for that, but you still needed a job, dammit. So now you get to be unemployed and self-loathing. That's just ducky. Now, with online applications, the demographic questions usually get asked upfront, where they blend in with everything else. Substantively, there's no difference, but at least it feels less insulting. If we can't offer jobs, we can at least recognize applicants' basic human dignity.)
The valuable part of the book for me, though, is its discussion of craft and the sense of individual agency.
Crawford rightly takes issue with the easy equation of 'white collar' with 'intellectually challenging,' and of 'blue collar' with 'mindless.' Anyone who has actually worked in both settings (hi!) can attest that working with recalcitrant materials can require real ingenuity, and that many office jobs are just about as brainless as you can get without actually starting to decompose. (Dilbert and The Office draw their popularity from noticing exactly that.) From that correct observation, Crawford also notes that part of the joy of certain kinds of hands-on work comes from the relative autonomy it affords. When you're trying to diagnose a funny engine behavior, it's just you and the engine. You get the engine to work or you don't. (Of course, it isn't always that simple. But the case is recognizable.) When you're jockeying for position in an office, by contrast, direct measures of performance are scarce, so it often comes down to office politics, which can feel like junior high all over again. Having a sense of control over your own work can free you from that gnawing sense of dissatisfaction when you really can't explain to others just what you do all day.
It struck me that this sense of ownership of craft is part of what's behind resistance to evidence-based policy in higher ed.
Done correctly, evidence-based policy (or what we academics call 'outcomes assessment') shifts the basis for decision-making from 'expert opinion' to actual observed facts, preferably gathered over a large sample size. In deciding whether a given practice makes sense, data counts. The idea is that some facts are counterintuitive, so simply relying on what longtime practitioners say is right and proper will lead to suboptimal results. Rather than deferring to credentials, authority, or seniority, we are supposed to defer to documented outcomes. Solutions that work are better than solutions that don't, regardless of where they come from or whose position they threaten.
What Crawford's book helped me to crystallize was why something as obviously good as data-based decisionmaking is so widely resisted on the ground. It effectively reduces the practitioner's sense of control over his own work. At some level, it threatens to reduce the craftsman to a mere worker.
Take away the sense of ownership of craft, even with the best of intentions (like improving outcomes for students), and the reaction will be/is vicious, heated, and often incoherent. Since there's really no basis for arguing that student results are irrelevant – without students, it's not clear that we need teachers – the arguments will be indirect. The measure is bad; the statistics are misleading; this is an excuse to fire people; this is an excuse to destroy academic freedom; this is about administrative control; this is a fad; blah blah blah.
I draw hope, though, from Crawford's correct observation that the 'white collar mind/blue collar body' split isn't really true. The same can apply here. Outcomes assessment done right is focused on where students end up. How you get them there is where the real craft comes in. How, exactly, do the most successful programs work? (For that matter, without assessing outcomes, how do we even know which programs are the most successful?)
On an individual level, professors do this all the time. We try different ways of explaining things, of posing problems, of structuring simulations, and then judge how well they worked. But student outcomes encompass far more than the sum of individual classes; without some sort of institutional effort, those extra factors go largely unaddressed (or, worse, addressed only according to custom or internal politics).
That could involve some displacement of traditional craft practice, but it hardly eliminates the role of craft. For a while I've been mentally toying with a scheme that looks like this: separate teaching from grading, then reward teaching that results in good grades. The instructor wouldn't grade his own class; he'd trade with someone else, ideally at another institution. (In that scheme, we could also do away with evaluative class observations and most uses of student course evaluations. Replace 'expert opinion' with observable facts. If you manage to succeed with your students using a method I don't personally get, the success is what matters. Likewise, if you consistently fail, the fact that some big muckety-muck somewhere endorses your method means exactly nothing.) That way, you're eliminating the obvious conflict of interest that tempts some scared faculty to resort to grade inflation. The grades won't be theirs to inflate.
Admittedly, this method wouldn't work as cleanly at, say, the graduate level, but I see it working fairly well for most undergrad courses. Your job as the instructor is not to threaten/cajole/judge, but to coach students on how to produce high-quality work. The secret grader is the enemy. Students make use of your help or they don't, and the results speak for themselves. Faculty who get consistently better results get recognition, and those who get consistently poor results are given the chance to improve; those who still fail after a reasonable shot are shown the door.
Getting back to Crawford, though, I was disappointed that he largely reinscribes the white collar/blue collar dualism in his description of two different ways of knowing. In an extended rant against Japanese repair manuals -- seriously, it's in there -- he draws a distinction between inflexible rule-based knowledge and hard-won life wisdom, clearly favoring the latter. The implication seems to be that knowledge is either 'explicit' -- that is, theoretical and absolute -- or 'tacit,' meaning acquired through non-transferable practice. Think 'theoretical physics' versus 'practicing mechanic.'
Well, okay, but there's a much more interesting kind of knowledge that draws on each. It's the kind of knowledge that social scientists deal with every single day. It's the statistical tendency. The rule based on aggregated observations, rather than deductive logic. It's inductive, probabilistic, empirical, and useful as hell. Baseball fans call it sabermetrics. Economists call it heuristics. (Score one for baseball fans.) It's based on real world observation, but real world observation across lots of people.
This is the kind of knowledge that helps us get past the well-documented (though unaddressed by Crawford) observation biases that real people have. Practitioners of sabermetrics, for example, found that some of the longstanding hunches of baseball scouts simply didn't stand up to scrutiny. Individual craft practice falls prey to individual biases, individual blind spots, and individual prejudices. Testing those assumptions against accumulated evidence isn't applying procrustean logic to messy reality. If anything, it's reality-based theorizing.
Done correctly, that's exactly what any outcomes-based or evidence-based system does. And rather than crushing individual craft, it actually gives the thoughtful practitioner useful fodder for improvement.
Though I have my misgivings about Crawford's book, I owe it a real debt. The key in getting outcomes assessment to mean something on the ground is to distinguish it from the false binary choice of craft or theory. It's in between, and yet distinct. It's empirical, but not individual. The fact that a thinker as subtle as Crawford could miss that category completely suggests that the task won't be easy, but it also suggests that doing the task right could make a real contribution.
Mostly, this is because most of them actually did it by email, rather than by online form. Since I was emailing directly to the search committee, and since they're not supposed to be able to ask this information, you couldn't send it to them.
The reason that I oppose the increasing focus on standardized tests is the "high stakes" nature of them. I don't think any given test should be invested with too much significance.
But the idea of having all homework and all tests graded by strangers gets you the benefit of that objectivity without the pressure. Ideally we'd figure out a way to do this while still leaving teachers the freedom to design their own curriculum and respond to the interests and abilities of their individual classes -- that would be the hard part. Still, it's a really appealing idea in some ways.
Of course, the actual task of grading would such even worse if you had to sit down with a bunch of crappy work by students you don't even *know* every night......
In deciding whether a given practice makes sense, data counts.
Relevant data count. [*]
If your goal is retention of students, the relevant data are the ABC rates for the class, which can be improved several different ways. If your goal is retention of knowledge (what used to be called "learning" before Learning College came to mean retention of students), then the relevant data might be the ABC rate in a later class.
For example, if a CC is teaching "critical thinking" (and who isn't these days), the measure isn't how many students pass or even if they get some "critical thinking question" correct on a particular exam, it is whether they do well as juniors in classes where they have to think critically.
The value of standardized tests has to be thought about in the same way. There are some nice ones in physics, and it is clear that some teaching techniques that target the things measured by those tests will improve performance on that test, but I have only seen one study that shows a correlation between that test and later success in a particular field. The counter example in physics is that physics graduate students often do as badly on one test as the undergrads, and that test is supposed to measure skills needed to get to graduate school!
[*] - Data are plural.
And then you'd have to read your own students' work ANYWAY (doubling your workload) so you'd know where and in what interesting ways they were missing the point!
When students draft a paper and I provide feedback, they expect that their responses to the feedback will directly result in a better grade on the final product. But some of those students also hold on to the high school mentality of writing in that they see evaluation as highly subjective, of the sort where if the teacher doesn't "like" you or the way you write, you get a bad grade.
I can see the benefit of separating myself from the grading process, but I wonder if my students would even then write drafts, since in at least some of their minds, they just need to please an unknown grader, and they perceive my feedback on their draft to be just as subjective as the evaluation of the person grading the final product.
Is there any work out there done on how students view this separation of instruction and evaluation?
1. One of the issues faculty have with assessment is that it implies that we are not experts in what we do. I.e., the craftspeople are not respected for their skill in the craft. Am I saying that I advocate not bothering to hook up an oxygen sensor to my engine to make sure the repair was made correctly or not sitting on the stool to test its strength? No, of course not. But assessment needs to be carefully developed to reflect each discipline. Too often, (at least my) administrators seem to think the answer is simple data. In the Humanities, sometimes simple data is difficult to come by. But most faculty would probably welcome an opportunity to sit down and develop reasonable assessments with their administrators, get some training in the skills necessary to administer the assessment instruments agreed upon (too often, it's not as simple as accepting that our tests or other assignments are legitimate), and in the process remind administrators that their discipline has merit. That's too rarely the case, unfortunately. My administrators want data, and they want that data to come from tests, and they want those tests to be administered online with multiple choice, and they want those online multiple-choice tests to be free. OK . . . . How do I assess speaking skills in a foreign language through such a test? Writing skills? Mastery of details of what the US deems a relatively arcane culture?
2. How do you develop the baseline? I could give my beginning language students the final exam for the second term at the beginning of the first semester, then again at the end of the second (i.e., the time it's appropriate). That would be proof that they had learned. However, what affect would it have on my students' motivation to take a test they failed miserably? Help me develop reasonable instruments! They don't teach this stuff in grad school!
3. Mary suggests, "Ideally we'd figure out a way to [uncouple teaching and grading] while still leaving teachers the freedom to design their own curriculum and respond to the interests and abilities of their individual classes -- that would be the hard part." No kidding--does anyone actually believe that faculty will allow the complete end of control over one's course that this would entail? That sets higher ed back about 200 years!
We assume, all along, that the outcomes are the result of good, or bad, teaching/instruction. Of course there is the whole matter of the raw material in the first place.
If we see a wide range of scores on the tests, will we then take the time to correlate the scores of students to their ranking by the admissions office when admitted? Perhaps compare the outcome assessment scores to GPA from HS and to SAT/ACT scores or some other measure of "ability?"
Perhaps it's time we told ourselves that we can't just let anyone in, and expect great outcomes. Sometimes, yes. Anecdotes will be rampant.
Let's hold Admissions accountable. (Sorry, all you open enrollment schools--your faculty I guess will just have to assume "it's all their fault.")
They are not there to change individual grades but to confirm the overall structure and delivery of the course and assessment.
There are a variety of benefits to this system from external checkpoints to sharing of good practice to mentoring and feedback to staff.
This worked well for quite some time but in the last decade has become a fairly contentious area. In many instances these EE appointments are just political rubber-stamping and when choosing one I have often been guided to choose someone that will not really raise any issues or ask too many questions (lord forbid).
DD, I don't fully agree with your discussion of heuristics as a "third way" of knowing. I think they are a formal, intellectual form of the craft knowledge that Crawford is discussing. My opinion may stem from my background in the sciences, where we're taught early on to think heuristically at the bench. Like a mechanic with an engine, we are regularly forced to bang our ideas up against reality and see how well they work. We think about the results in a formal way, but it is still, however, a craft. (Lots of times our ideas don't work, of course. I have a sign over my desk that says "Nature is a mother.")
I do, however, fully agree that much of the resistance to measurement in the academic community is a reaction to loss of "craftmanship" status. (Crawford discusses similar resistance to job standardization in auto manufacture.) However, if what the students actually want is just a freakin' education, rather than something individually crafted, then this is just not going to work out long term. Not all individually crafted education is even as good as what could be mass-produced.
That said, I am not sure that blind grading would work as a measurement tool. If done well, grading is a teaching tool; take it out of the equation, and product suffers. I think I might be inclined to use a statistical sampling system: take a demographically representative sample of students, track them through their college careers, and watch their performance relative to their predicted success based on "incoming quality" (high school grades and SAT scores). With enough students, you should be able to tease out the effectiveness of individual teachers.
Every year I show up with hundreds of other government teachers to grade thousands of essay questions. My wife does the same for English. It requires careful development of exam questions, development of a rubric all follow in grading, and it costs a lot of money to make sure everyone grades every exam question the same. Of course, AP teachers do not know what questions will be asked on the exam or have any influence in choosing them.
I recommend everyone do it at least once. See just what it would take to do what DD suggests. Would DD propose allowing CC teachers know the questions and choose them? Would all agree on what questions to use? What textbooks? What rubrics to use? Who would pay for shipping these exams around? For meetings to develop rubrics and questions?