Thursday, April 13, 2006


Can't Buy Me Love

Evaluating faculty is one of the most important parts of my job, yet some of the most basic information needed to do it right isn’t available.

We have student evaluations, of course, and formal observations by a peer, the chair, and the dean. The rest is mostly self-generated by the professor.

I’ve never seen a peer observation that was anything less than glowing. While I understand the impulse, the sheer abundance of superlatives renders them quite useless as evaluative tools. It falls victim to the ‘you first’ problem – the first honest peer evaluation would expose both the observer and the observed to all manner of awkwardness. Any ideas out there on how to make peer observations meaningful? Ideologically, I like the concept, but the execution just hasn’t been helpful.

Among the info I don’t have, though I had at my previous college, are student grades and course attrition rates.

From asking around, it sounds like student grades stopped being considered about ten years ago. I don’t know if it was at the behest of the faculty, or just as a byproduct of some long-forgotten IT change, but it’s the way it is. When I’ve suggested gathering that information, I’ve received the “what planet are you from?” look. But it’s important, and not just in an evil way.

At my previous college, grades and drop rates were reported each term in (relatively) easily digested form. It wasn’t that hard to spot patterns, which gave a context for student evaluations. Some professors graded hard but got student respect; I knew they were the real deal. Some graded generously and got student respect, and some graded hard and generated student antipathy; in those cases, I relied more on observations. And, memorably, some graded generously but still generated student antipathy. They couldn’t buy love. They were, uniformly, train wrecks.

Then you have the cult favorites – hard grades, glowing evaluations, but only about half of the students make it to the end of the course. Again, without numbers, it’s hard to tell.

It was useful to have that information, since all but the most egregiously incompetent could usually pull it together long enough for a decent observation, and, in the absence of context, low student evaluations could always be explained by (hypothetical) high standards.

I’d love to develop a way to do a speedy-but-thorough content analysis on the written comments on the back of student evaluations. Some of them are self-refuting (The prof is a mean ass dude how come he dint give me an a what an asshole this school suxxx). Some are revealing of serious issues (chronic instructor lateness particularly brings out student snarkiness, and I have to say, I don’t blame them). Some are unintentionally funny (one of mine from several years ago – “Now I write more clearer.” Uh, thanks.) But most are fairly vague and positive, and therefore not terribly useful. When you’re plowing through thousands of them, it would be helpful to have some way to separate the banal from the revealing.

Measurement issues again. This is becoming a theme. Or is it a motif? Sigh.

At my school, IT generates "grade tendency" reports every semester. So you (faculty, administrators) don't see individual student grades, but they see (a)the average of all the grades assigned by the prof that term; and (b)whether that was above or below how other faculty graded these same students; and (c)how this compares to the college average.
Do you have something like that in place?
No, but it would help. I don't need to know which student got what; I just need to know what the broad tendencies are.
There must be a seminar somewhere where IT folks earln to give you that "No way, it just can't be done look" to any request.

Getting a grade report doesn't seem like a big deal. Where I teach, each faculty member gets a report of grades assigned by course. There are also totals for the faculty member and a comparison to the department and the College. This report is useful as a faculty member and for committees reviewing faculty.
You should probably just scrap the current protocol and base your evaluations on the contents of their page on
The problem with all the methods of teaching evaluation I've seen is that they're chronically short-term. Looking at a single semester, even when you see the 'real-deal' combination of hard grades/student respect, it's difficult to know whether the instructor has succeeded as an educator or an entertainer. More instructive, I've always thought, would be measures of an instructor's long-term impact on his or her students' abilities as students. What if you could compare the average grades of a sample of students during the semester or two prior to taking Prof. X's class with the average grades of the same students during the semester or two following? There are tremendous methodological details to work out (all grades? only grades in related subject areas?), and it might indeed be an IT nightmare, but it could be very revealing. It might have a nice side-effect, too, of promoting collegiality among faculty: it dramatizes the fact that all of us who teach enjoy the benefits of every other good teacher at our institution, in the form of well-prepared students.
At my doctoral institution, we conducted SGIDs (Small Group Instructional Diagnosis) for each other during the midpoint of the semester. I found them incredibly useful, as did our faculty supervisor in evaluating our work.

This website looks like a good introduction and resource:

If I can help out more, send me an email (
I like the idea of grade tendency reports. As an instructor I would like to know how I rank in my grading compared with my peers. Grade inflation is a concern amoung part-timers as well.

I wonder, though about Chris's idea of tracking student performance before or after a given course. He seems to want to find evidence of the life-changing teachers, which is laudable, but statistically problematic. Say I am teaching an intro-composition course. My pool of students will most likely be entering Freshmen. They have no prior data to correlate post-class performance with.

Ok, say I have a Sophomore-level course. Numbers would also indicate that those who move on to their Junior year are the more successful, so the overall university attrition rate would have to be factored in.

Jumping topic: what is needed, it seems, are better metrics.

Student evaluations often don't ask specific, concrete questions.

* Did the course follow the syllabus (getting what you sign up for)
* was the reading load and assignments spaced evenly through the semester (consideration of student lives and respect for their time)
* was the instructor accessible for questions, specically: office hours held as posted, e-mails answered, phone number given, IM-available, etc.
* were the course objectives met (which assumes that a course has demonstrable objectives)
* did the student feel respected as a professional (subjective, but indicative of instructor orientation toward the learner--the more successful treat the student with respect, even if holding them to a high degree of work)

This list, of course, may be added and expanded, but it seeks to move the questions from impression-based (The instructor used effective teaching methods.) to objective and demonstrable facts.

here is an evaluation resource that begins to approach what I am calling for.
Ok, I'm going to leap in and play devil's advocate a bit here, not because I don't think that there should be some accountability on the part of instructors but because I think that the idea that any method of assessment can in a fool-proof way quantify good teaching is deeply flawed.

1. Grades - whether averaged in a semester or whether, as someone suggested, looked at over time - can have little-to-nothing to do with how well an instructor instructs. a) You can be the best teacher in the world, but some students are not going to put in the effort after your course to demonstrate what they learned from you, and b) Sometimes you get a bad class - or a really great class - and that can weight the grades in weird ways. Last semester I had a class where nearly every student got a B - only like 2 A's and 4 grades below a B. That's not normal for me. But that's what they earned. I don't know what that is supposed to "mean" in terms of my assessment as an instructor. Yes, grades can show something in extreme cases, but alone, they don't show much, I think, in most cases.

2) Student evaluations - given at the end of the semester, when students are overwhelmed and don't yet know how they'll do in the course without having taken the final and when they are stressed out - don't necessarily offer adequate assessment either. a) Students often can have a negative perception of an instructor at this particular point, especially if the instructor is "tough"; b) you're asking them to evaluate something that they're not finished with - that they haven't had time to process. Didn't any of you take a class that you hated at the time but then 3 or 5 or 10 years later you realized how great it was and how much you learned? Yeah, it's impossible to measure that. The best way that I can figure to try to account for that is something we do at my university, which is that in their exit interview for graduation students name the instructor who was most influential to their education, but even that might just be a popularity thing.

3) In-class observation, for the reasons you mention, is not an accurate indicator of how good/bad a teacher is. I'd also argue that one of the reasons it doesn't work is that good teaching happens over time - not just in one class meeting - and there is no way for somebody who drops in for one class to get a sense of how the dynamic of the class works on a regular basis.

I'm not sure what my point is, but I suppose that the problem for me with such assessment measures is that they really have nothing to do with my teaching. Ultimately, I can perform to whatever instrument you choose to get a decent evaluation score, and none of it "means" anything in terms of what kind of teacher I am. There is no quick and dirty way to deal with any of this, as much as those on the administrative side may wish there is, I think.
The undergraduate institution I attended asked students to write letters of recommendation for profs. who were up for tenure review. The Prof. selected a bunch of students that he/she had taught during their time at the college and we each received a little letter from the dean explaining the process and what they were looking for in the letter. I was always pleased to be included and not realizing the politics of letters of recommendations, wrote a very honest letter with my opinion of the professors strengths and weaknesses in the classroom.

Perhaps you could develop some sort of follow up evaluation, where former students of the professors were asked to write letters evaluating them. You could randomly select students or have professors recommend student that they would like to have surveyed.
This is such an important area, and the most difficult, I think. I have been observed and observed others this sememster. I admit, my observation was glowing. But I also think the teacher should not be blamed for the sins of the students, as it were.
I have the worst student drop rate this semester I've ever had (in 10 years of teaching). Is it me? Is it the new textbook I'm piloting? Is it the students? Or, more likely, is it some combination of all three?
The difficulty, I think, is finding an evaluation process that is actually helpful to the teacher. The glowing observation makes me feel good, but no one seems to address the question, how can I be a better teacher?
Once, as a grad student, I was praised by the director of the writing program because I had the best grade distribution. I know I'm hard grader, my students have said as much. My students also "like" me. But that doesn't mean I'm a good teacher.
Was that a P.D.Q. Bach reference at the end? You're my hero, Dean Dad.
Yes, it was P.D.Q. Bach. Nice catch!

These are some really thoughtful comments -- thanks, everyone!

SGID looks like a great development tool, but an iffy evaluation tool. There's absolutely a place for a good development tool, but they aren't interchangable.

PPP and Dr. Crazy both make some excellent points. The dilemma of administration is that you know quite well that your information is limited and somewhat reflective of the way it was collected, but you have to make decisions anyway. At a teaching-oriented college like a cc, it's absolutely crucial that we only give tenure to people who are good in the classroom. (Research universities have very different evaluation criteria and issues.) How to define and capture that is the issue.

Very long-term follow-up is a good thing in itself, but at a school with a short degree and relatively high attrition (though not by cc standards), there are natural limits to it. It also doesn't help when the professor in question is relatively close to tenure, since we have to make decisions fairly quickly.

I like the idea of focusing on relatively objective criteria (does the professor show up for class?), but I'd be concerned that they only scrape the very minimum of what good teaching is. A reasonably well-trained border collie will show up for class faithfully. I want to know if good teaching occurred.

Post-tests are flawed indicators for a number of reasons. Most basically, students generally don't take them seriously. They often reflect previous preparation more than 'value added' by a given class. This is probably more true in a class like history or English than in math.

Even with weak teaching, I'd like to be able to make a distinction between relatively fixable mistakes (the rookie instructor who is far too credulous about giving extra credit, say) and permanent limits.

There's no clear way to do it well, but it has to be done. Such is the world of administration...
I must admit I aspire to the "cult favorite" category - because, really, what are we graduate students if not the ones who made it to the end of the class and gave the glowing evaluations?
base your evaluations on the contents of their page on

Yay! I'm not on there.
Seriously, though, there's just too much self-selection in the people who post to ratemyprofessor for it to be all that indicative.

My faith in student evaluations went way down when I read studies of them done by economists. This is a fascinating survey thereof; one really interesting bit is that looks matter in men's teaching evaluations more than women's!

I don't have a link for it, but I've heard that professor ratings for the term correlate highly with ratings given by people not in the class, who are presented only with a 15-second video, and still correlate highly with ratings given by people presented only with a still picture. So either people are shallow and don't factor in anything, or more intriguingly, people are such good judges that they can determine teaching ability from a photo.
My current institution asks students a whole bunch of standard questions on each evaluation that have nothing to do with the instructor him or herself. Some of the most useful, I think, are the following:

Is this course a requirement?

Rate your interest (on a scale of 1-5) in the subject matter of this course before you took it.

Rate your interest now.

What grade do you expect to get in this class?

What percentage of the work did you do for this class?

There are others, but to me those provide crucial interpretative information for the assessment of the instuctor him or herself.
There is more research on student evaluations than on any other issue in higher education.

A couple of nice summaries:

Theall and Franklin. 2001. New Directions for Institutional Research. No. 109:45-56.

The problem is that any strongly designed research study looking at lots and lots of situations is not generalizable to an individual situation. For example, no matter how much the research shows that gender makes very little difference to ratings, there will always be a teacher who the students call a beeyotch behind her back because she won't put up with crap.

Also, student ratings DO correlate weakly but significantly with student learning. This does NOT mean that student ratings are not *also* affected by extraneous factors such as attractiveness. Unfortunately.

SGIDs are a great tool, but as Dean Dad points out, not for evaluative purposes. They do tend to result in slightly better student evals for the class in which they are conducted, though, possibly because students response well when the teacher demonstrates interest in their learning needs.

To get better student comments, ask better questions on the comments section. Instead of leaving it open as a free space, ask "What was the most valuable aspect of this class for your learning? What could be improved to help you learn better in this class?"

Of course, as an administrator you could just skip reading the comments. Bain points out that the numbers and the comments are fairly congruent; you don't learn more by spending time reading comments.

Dean Dad, you are very enlightened, please forgive me for stating something you already know: NOT EVERYONE CAN BE ABOVE AVERAGE. :) I get so mad when new faculty members come to me having been scolded by their chairs because their numbers are "below average." Just like about half of the rest of the department!!!

Anonymous Admirer of Dean Dad and Teaching Center Bottom Dweller
Student evals: I have found that I can make these useful by giving the students specific areas for feedback. "This is a new course, and I would really like your ideas about the text selection and reading load." "I regularly assign online discussion to go along with this course: please give me feedback on what you think the benefits and drawbacks of this was." "I have to admit that I don't much care for lecturing, and I tend to try to generate discussion instead. But sometimes there are specific topics or areas that work well in lecture. Can you let me know if there are specific questions about the course that you think would have been good lecture topics?" And so on. One thing I steer away from is "did you like X or not"--I'm more interested in substantive feedback on specific issues.

And I'd think that for peer evals, one could do the same thing. It seems to me that asking people to judge if X is good or bad is inevitably going to lead to inflation and useless information. Instead, why not have open-ended questions on specific areas, with perhaps an expectation that the faculty person being evaluated will generate a question or two of their own? Something specific to them, that they really want good feedback on (pacing of lecture, ability to generate discussion, ideas for group work, whatever)? It would make peer evals way more useful for faculty--possibly less useful in terms of "evaluating" folks, but perhaps not. Worth a try, I'd think.
I think student evaluations of professors are roughly analogous to what a company would get if it asked customers to evaluate the salespeople. Does the customer like the sales rep because she works hard to solve customer problems? Or does he like the rep because she gives him big, unnecessary discounts? A company that based retention/promotion decisions on such evaluations, without looking below the surface, would soon find itself out of business.

The cure is to make the quesions as specific as possible (along the lines of what PPP suggests) and to use the survey results carefully and in conjunction with other sources of information.
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?