Thursday, April 13, 2006
Can't Buy Me Love
We have student evaluations, of course, and formal observations by a peer, the chair, and the dean. The rest is mostly self-generated by the professor.
I’ve never seen a peer observation that was anything less than glowing. While I understand the impulse, the sheer abundance of superlatives renders them quite useless as evaluative tools. It falls victim to the ‘you first’ problem – the first honest peer evaluation would expose both the observer and the observed to all manner of awkwardness. Any ideas out there on how to make peer observations meaningful? Ideologically, I like the concept, but the execution just hasn’t been helpful.
Among the info I don’t have, though I had at my previous college, are student grades and course attrition rates.
From asking around, it sounds like student grades stopped being considered about ten years ago. I don’t know if it was at the behest of the faculty, or just as a byproduct of some long-forgotten IT change, but it’s the way it is. When I’ve suggested gathering that information, I’ve received the “what planet are you from?” look. But it’s important, and not just in an evil way.
At my previous college, grades and drop rates were reported each term in (relatively) easily digested form. It wasn’t that hard to spot patterns, which gave a context for student evaluations. Some professors graded hard but got student respect; I knew they were the real deal. Some graded generously and got student respect, and some graded hard and generated student antipathy; in those cases, I relied more on observations. And, memorably, some graded generously but still generated student antipathy. They couldn’t buy love. They were, uniformly, train wrecks.
Then you have the cult favorites – hard grades, glowing evaluations, but only about half of the students make it to the end of the course. Again, without numbers, it’s hard to tell.
It was useful to have that information, since all but the most egregiously incompetent could usually pull it together long enough for a decent observation, and, in the absence of context, low student evaluations could always be explained by (hypothetical) high standards.
I’d love to develop a way to do a speedy-but-thorough content analysis on the written comments on the back of student evaluations. Some of them are self-refuting (The prof is a mean ass dude how come he dint give me an a what an asshole this school suxxx). Some are revealing of serious issues (chronic instructor lateness particularly brings out student snarkiness, and I have to say, I don’t blame them). Some are unintentionally funny (one of mine from several years ago – “Now I write more clearer.” Uh, thanks.) But most are fairly vague and positive, and therefore not terribly useful. When you’re plowing through thousands of them, it would be helpful to have some way to separate the banal from the revealing.
Measurement issues again. This is becoming a theme. Or is it a motif? Sigh.
Do you have something like that in place?
Getting a grade report doesn't seem like a big deal. Where I teach, each faculty member gets a report of grades assigned by course. There are also totals for the faculty member and a comparison to the department and the College. This report is useful as a faculty member and for committees reviewing faculty.
This website looks like a good introduction and resource:
If I can help out more, send me an email (firstname.lastname@example.org).
I wonder, though about Chris's idea of tracking student performance before or after a given course. He seems to want to find evidence of the life-changing teachers, which is laudable, but statistically problematic. Say I am teaching an intro-composition course. My pool of students will most likely be entering Freshmen. They have no prior data to correlate post-class performance with.
Ok, say I have a Sophomore-level course. Numbers would also indicate that those who move on to their Junior year are the more successful, so the overall university attrition rate would have to be factored in.
Jumping topic: what is needed, it seems, are better metrics.
Student evaluations often don't ask specific, concrete questions.
* Did the course follow the syllabus (getting what you sign up for)
* was the reading load and assignments spaced evenly through the semester (consideration of student lives and respect for their time)
* was the instructor accessible for questions, specically: office hours held as posted, e-mails answered, phone number given, IM-available, etc.
* were the course objectives met (which assumes that a course has demonstrable objectives)
* did the student feel respected as a professional (subjective, but indicative of instructor orientation toward the learner--the more successful treat the student with respect, even if holding them to a high degree of work)
This list, of course, may be added and expanded, but it seeks to move the questions from impression-based (The instructor used effective teaching methods.) to objective and demonstrable facts.
here is an evaluation resource that begins to approach what I am calling for.
1. Grades - whether averaged in a semester or whether, as someone suggested, looked at over time - can have little-to-nothing to do with how well an instructor instructs. a) You can be the best teacher in the world, but some students are not going to put in the effort after your course to demonstrate what they learned from you, and b) Sometimes you get a bad class - or a really great class - and that can weight the grades in weird ways. Last semester I had a class where nearly every student got a B - only like 2 A's and 4 grades below a B. That's not normal for me. But that's what they earned. I don't know what that is supposed to "mean" in terms of my assessment as an instructor. Yes, grades can show something in extreme cases, but alone, they don't show much, I think, in most cases.
2) Student evaluations - given at the end of the semester, when students are overwhelmed and don't yet know how they'll do in the course without having taken the final and when they are stressed out - don't necessarily offer adequate assessment either. a) Students often can have a negative perception of an instructor at this particular point, especially if the instructor is "tough"; b) you're asking them to evaluate something that they're not finished with - that they haven't had time to process. Didn't any of you take a class that you hated at the time but then 3 or 5 or 10 years later you realized how great it was and how much you learned? Yeah, it's impossible to measure that. The best way that I can figure to try to account for that is something we do at my university, which is that in their exit interview for graduation students name the instructor who was most influential to their education, but even that might just be a popularity thing.
3) In-class observation, for the reasons you mention, is not an accurate indicator of how good/bad a teacher is. I'd also argue that one of the reasons it doesn't work is that good teaching happens over time - not just in one class meeting - and there is no way for somebody who drops in for one class to get a sense of how the dynamic of the class works on a regular basis.
I'm not sure what my point is, but I suppose that the problem for me with such assessment measures is that they really have nothing to do with my teaching. Ultimately, I can perform to whatever instrument you choose to get a decent evaluation score, and none of it "means" anything in terms of what kind of teacher I am. There is no quick and dirty way to deal with any of this, as much as those on the administrative side may wish there is, I think.
Perhaps you could develop some sort of follow up evaluation, where former students of the professors were asked to write letters evaluating them. You could randomly select students or have professors recommend student that they would like to have surveyed.
I have the worst student drop rate this semester I've ever had (in 10 years of teaching). Is it me? Is it the new textbook I'm piloting? Is it the students? Or, more likely, is it some combination of all three?
The difficulty, I think, is finding an evaluation process that is actually helpful to the teacher. The glowing observation makes me feel good, but no one seems to address the question, how can I be a better teacher?
Once, as a grad student, I was praised by the director of the writing program because I had the best grade distribution. I know I'm hard grader, my students have said as much. My students also "like" me. But that doesn't mean I'm a good teacher.
These are some really thoughtful comments -- thanks, everyone!
SGID looks like a great development tool, but an iffy evaluation tool. There's absolutely a place for a good development tool, but they aren't interchangable.
PPP and Dr. Crazy both make some excellent points. The dilemma of administration is that you know quite well that your information is limited and somewhat reflective of the way it was collected, but you have to make decisions anyway. At a teaching-oriented college like a cc, it's absolutely crucial that we only give tenure to people who are good in the classroom. (Research universities have very different evaluation criteria and issues.) How to define and capture that is the issue.
Very long-term follow-up is a good thing in itself, but at a school with a short degree and relatively high attrition (though not by cc standards), there are natural limits to it. It also doesn't help when the professor in question is relatively close to tenure, since we have to make decisions fairly quickly.
I like the idea of focusing on relatively objective criteria (does the professor show up for class?), but I'd be concerned that they only scrape the very minimum of what good teaching is. A reasonably well-trained border collie will show up for class faithfully. I want to know if good teaching occurred.
Post-tests are flawed indicators for a number of reasons. Most basically, students generally don't take them seriously. They often reflect previous preparation more than 'value added' by a given class. This is probably more true in a class like history or English than in math.
Even with weak teaching, I'd like to be able to make a distinction between relatively fixable mistakes (the rookie instructor who is far too credulous about giving extra credit, say) and permanent limits.
There's no clear way to do it well, but it has to be done. Such is the world of administration...
Yay! I'm not on there.
Seriously, though, there's just too much self-selection in the people who post to ratemyprofessor for it to be all that indicative.
My faith in student evaluations went way down when I read studies of them done by economists. This is a fascinating survey thereof; one really interesting bit is that looks matter in men's teaching evaluations more than women's!
I don't have a link for it, but I've heard that professor ratings for the term correlate highly with ratings given by people not in the class, who are presented only with a 15-second video, and still correlate highly with ratings given by people presented only with a still picture. So either people are shallow and don't factor in anything, or more intriguingly, people are such good judges that they can determine teaching ability from a photo.
Is this course a requirement?
Rate your interest (on a scale of 1-5) in the subject matter of this course before you took it.
Rate your interest now.
What grade do you expect to get in this class?
What percentage of the work did you do for this class?
There are others, but to me those provide crucial interpretative information for the assessment of the instuctor him or herself.
A couple of nice summaries:
Theall and Franklin. 2001. New Directions for Institutional Research. No. 109:45-56.
The problem is that any strongly designed research study looking at lots and lots of situations is not generalizable to an individual situation. For example, no matter how much the research shows that gender makes very little difference to ratings, there will always be a teacher who the students call a beeyotch behind her back because she won't put up with crap.
Also, student ratings DO correlate weakly but significantly with student learning. This does NOT mean that student ratings are not *also* affected by extraneous factors such as attractiveness. Unfortunately.
SGIDs are a great tool, but as Dean Dad points out, not for evaluative purposes. They do tend to result in slightly better student evals for the class in which they are conducted, though, possibly because students response well when the teacher demonstrates interest in their learning needs.
To get better student comments, ask better questions on the comments section. Instead of leaving it open as a free space, ask "What was the most valuable aspect of this class for your learning? What could be improved to help you learn better in this class?"
Of course, as an administrator you could just skip reading the comments. Bain points out that the numbers and the comments are fairly congruent; you don't learn more by spending time reading comments.
Dean Dad, you are very enlightened, please forgive me for stating something you already know: NOT EVERYONE CAN BE ABOVE AVERAGE. :) I get so mad when new faculty members come to me having been scolded by their chairs because their numbers are "below average." Just like about half of the rest of the department!!!
Anonymous Admirer of Dean Dad and Teaching Center Bottom Dweller
And I'd think that for peer evals, one could do the same thing. It seems to me that asking people to judge if X is good or bad is inevitably going to lead to inflation and useless information. Instead, why not have open-ended questions on specific areas, with perhaps an expectation that the faculty person being evaluated will generate a question or two of their own? Something specific to them, that they really want good feedback on (pacing of lecture, ability to generate discussion, ideas for group work, whatever)? It would make peer evals way more useful for faculty--possibly less useful in terms of "evaluating" folks, but perhaps not. Worth a try, I'd think.
The cure is to make the quesions as specific as possible (along the lines of what PPP suggests) and to use the survey results carefully and in conjunction with other sources of information.