At a regional, comprehensive university in the Midwest the mathematics faculty have been grappling with what it means to be an effective teacher and how to evaluate such effectiveness. Their conclusion is that student evaluations should not be the primary means of evaluating teaching. Hence, the department is in the process of articulating a statement of expectations for teaching from which appropriate assessment instruments will be developed.
Background and Purpose
The University of Wisconsin Oshkosh is one of thirteen four-year campuses in the University of Wisconsin System. UW Oshkosh typically has an undergraduate student body of 9,000-10,000 plus another 1,000-1,300 graduate students. It offers masters degrees in selected areas, including an MS Mathematics Education degree administered by the Mathematics Department.
The mathematics department is primarily a service department, providing mathematics courses for general education, for business administration and for prospective and inservice teachers. Over 80% of the credits generated by the department are earned by students who are not majors or minors in mathematics. The largest number of credits are generated in intermediate and college algebra. About one-third of all students enrolled in service courses earn less than a C grade or withdraw, with permission, after the free drop period.
The system for evaluating teaching effectiveness, particularly evaluation for the purpose of determining merit pay increments, has not had wide support for many years. Minor changes have been proposed with the intention of making the policy more widely acceptable. More recently changes that are more substantive have been considered. These will be described after providing some background on personnel decisions, on the role of teaching effectiveness in those decisions, and on the role of student input into those decisions.
Faculty are involved in four kinds of personnel decisions: renewal, promotion, merit review and post tenure review. Each of these decisions involves an evaluation of teaching, professional growth and service. Our experience has probably been similar to that of many departments of our size. There has been dissatisfaction for many years with the manner in which teaching has been evaluated but there has not been any consensus on how to change it. We have generally required probationary faculty to demonstrate they were effective teachers without giving them a working definition of effective teaching. Instead we have generally relied on how their student "evaluations" compared with those of other teachers.
The UW System Board of Regents has, since 1975, mandated the use of "student evaluations" to assess teaching effectiveness. The mathematics department responded to the Regent mandate by constructing a nine-item form. Students were given nine prompts (e.g. "prepared for class," "relates well to students," "grading") and were asked to rate each instructor on a five point scale ranging from "very poor" to "outstanding." The ninth item was "considering all aspects of teaching that you feel are important, assign an overall rating to your instructor." Almost all of the items were designed to elicit subjective responses. Department policy called for deriving a single number for each class by taking a weighted average in which the ninth "overall item" was weighted at one-half and the remaining items as a composite group weighted half. The numbers assigned to an instructor by course were then averaged over courses to determine a single numerical rating for each instructor. For some number of years the single numerical score on the student evaluation survey represented the preponderance of evidence of teaching effectiveness used in retention decisions and merit allocation. Peer evaluation was a factor in some instances, but those instances were most often situations in which the peer evaluation was to "balance" or mitigate the effects of the student evaluation. Over a period of time, the department identified several faults in this process. Some of those were faults with the survey instrument; some were faults in interpreting and using the results. A partial listing follows.
1. Most of the items in the survey were written from the perspective that an effective teacher is essentially a presenter and an authority figure. This is not the image of an effective teacher that is described in the current standards documents.
2. The student opinions were not entirely consistent with other measures of teaching effectiveness. For example, some faculty found they could improve their student evaluation ratings by doing things they did not consider to be effective practice or by ceasing to do things that they believed were good instructional practices.
3. Faculty objected to calling the data collected from students "evaluations" but did not object to it being called "opinions," since the data were to be input to an evaluation performed by faculty members.
4. The Regents had mandated the use of student evaluation data but had not mandated that it represent the preponderance of evidence of effective teaching.
5. We had been led to evaluate teaching effectiveness by comparing single numerical scores. Eventually each score was translated into a decile. So, half of the students could rate you as "average" and the other half as "good" for a 3.5 on the five point scale. But that may have placed you in the 20th percentile in the departmental ranking. What started out as a fairly good evaluation (between average and good) was ultimately translated into a rating which suggested poor showing (since you were in the bottom fifth of all teachers).
Probationary faculty are subject to renewal every two years. The decision to renew is made by a committee of tenured faculty who rate each candidate on a five point scale with "meets expectations" in the center. The decision to promote in rank is made by a committee of faculty in the upper ranks who rate each candidate on a five point scale with "meets expectations" in the center. Merit reviews have been conducted every two years and have been independent of other reviews. Each faculty member has been rated on a nine point scale by separate committees which evaluate respectively, teaching, professional growth and service. The post tenure review, conducted by a committee of tenured faculty, is based on merit reviews over four years and results in a designation of "meets expectations" or "needs a plan of improvement."
In renewal and promotion decisions, the appropriate personnel committee evaluates teaching after examining evidence submitted by the candidate. Candidates are required to submit student evaluations from each course taught, including free response comments of students. There will also be about two reports of class visitations each year from peers. Candidates also write a summary of their teaching accomplishments and may supplement it with copies of syllabi and sample exams.
In merit review, a rating for teaching has been determined by taking 0.4 times a weighted average of student evaluations and 0.6 times a rating determined by a committee. The committee has not had access to student evaluations, but bases its ratings on two-page statements from the faculty. The faculty are directed to include in their statement: curriculum activity (course and instructional materials development), classroom activities different from traditional lecture, assessment (exams, quizzes, assignments), grading standards, accommodations made (new preparations, more than two preparations, night classes, adapting to schedule changes). The committee rates each faculty member on a scale of 1-9.
The department has not been satisfied with the merit policy and has taken some steps to reform it. The reforms deal with the following items.
1. Rather than have the committee rate each individual on a scale of 1-9, the department would develop a statement of what is expected and the committee would rate faculty on a five point scale with "meets expectations" in the middle. The expectation would be stated in observable terms. This would make the form of the decision of the teaching evaluation committee in the merit process consistent with the form of the decision for renewal and promotion. It would also make the decision process less arbitrary.
2. The old policy did not promote improvement in teaching. Faculty were given a rating (which they usually perceived as being too low relative to colleagues) and that rating did not provide a direction for improvement. But with the "expectations" approach we can ask for evidence that the instructor has thought about teaching, has done some honest reflection on teaching, and has a realistic plan for improvement. For a department it may be more important over time that each member has some identifiable program for improving teaching than it is to compare "performance" and determine whose is in the top quartile and whose is in the bottom quartile.
3. The old policy did not promote improvement in programs offered by the department. For example, we are required to have an assessment plan. That plan is supposed to ensure that data are collected and used to make decisions to improve programs. But none of our evaluation of teaching is connected to promoting the assessment plan. A revised merit policy should recognize individual efforts that help the department meet its collective responsibilities.
In summary, our experience with "student evaluation" has led to general acceptance of the following principles.
1. The opinions of students should be collected systematically as input to an evaluation of teaching to be made by faculty. The input should be properly referred to as "student opinion" or as "student satisfaction" and not be referred to as "student evaluation."
2. The survey instrument should, as much as possible, ask students to record observations of what they perceive in the classroom; it should not ask for direct evaluative judgment on those observations.
3. Faculty should have "ownership" of the items in the survey instrument, by selecting them from a larger bank of items or by constructing them to match with teaching standards they have adopted.
4. Student opinions most often reflect classroom performance. Classroom performance is one aspect of teaching effectiveness. Other very important aspects are curriculum planning, materials development and assessment. Generally, students are not reliable sources for input into evaluating those components of effective teaching.
Having articulated these principles, we began to develop a new survey instrument. The general strategy was to create a consensus on a statement of teaching standards and construct a new survey instrument which would ask students whether they observed behavior that reflected those standards. After about a year's work had produced a rough draft of a statement of standards, the task was temporarily abandoned because the University developed a new instrument. The new SOS, as it was called, contained 30 items. The department policy committee recommended we choose a subset of the new SOS. It asked department members to examine each of the SOS items and to choose a subset of ten or more items on which they would prefer to be judged. After several rounds of surveying the faculty, there was general consensus on ten items. These ten items became the New Mathematics Department SOS. [See Appendix.]
The new survey is probably not what would have resulted if we had developed teaching standards and then developed a survey to match them. However, the new survey had several properties that we sought. It does have a separate scale for each item and most items ask students to make an observation, rather than an evaluation. For example, consider this item: "During most class periods, this teacher raises thought-provoking ideas and asks challenging questions _______." The choices for the blank are: "several times, more than a few times, a few times, one or more times, zero times." Of course, we have an implied value system that "several times" is preferable to "zero times." But students may not share that value system. Indeed, some may respond (accurately for their class) "more than a few times" and believe that is not the mark of an effective teacher. As another example consider the prompt: "The teacher is ______ attentive and considerate when listening to students' questions and ideas." The choices for the blank are "never or almost never, often not, usually, almost always, always." The implied value is that teachers are to encourage dialogue and student participation.
The new survey instrument, consisting of the selected subset of the university instrument items, took about a year to get into place. Meanwhile, work on developing a set of department standards for teaching has not progressed. Rather, we will be redirecting that effort to developing a statement of expectations for teaching. The expectations are different from standards in the sense that standards relate exclusively to classroom management and performance, while the expectations will include aspects of teaching (e.g., planning, assessing, developing curriculum, experimenting with new strategies) that occur outside the classroom, aspects which students are not generally able to observe. The expectations will be outcomes that will be judged by peers, not by students. Some sample proposed statements of expectations include the following:
(1) Every faculty member should be involved, perhaps with a group of others in the department, in a program of improvement. That program can and, perhaps, should include making visits to classes of colleagues and having colleagues visit their classes.
(2) Every faculty member should be involved in the assessment of students, courses and programs and in the formulation of reasonable strategies for improving those students, courses and programs.
(3) Every faculty member should be involved in curriculum building by designing new courses, revising old courses, and constructing meaningful activities for students.
(4) Every faculty member should collect student opinions of his/her teaching and should consider those opinions as part of a program for improving his/her teaching.
After work on the expectations has been completed we expect to resume the quest to construct a statement of teaching standards that can be translated into prompts for a department SOS instrument. That is likely to take at least two more years.
Use of Findings
The merit evaluation for the last two decades had been independent of all other evaluations. The department decided that the merit policy should not continue as an independent process but take as input the results of review for renewal of probationary faculty and post tenure review of tenured faculty. Expectations for probationary faculty had already been articulated in the Renewal/Tenure Policy. It remains to complete a list of expectations for tenured faculty in the Post Tenure Review Policy. We have broadened the definition of effective teaching to include performance on tasks that usually take place outside of the classroom. We have placed more emphasis on encouraging improvement than on making comparisons between one faculty member and another.
We have taken steps to reduce the amount of weight given to student opinions. In fact student opinions will not factor directly into merit points in the future. Rather, student opinion is one of several factors that are considered by the Renewal Committee or the Post Tenure Review Committee as they make an evaluation. At the same time we have tried to gather student opinions on items that we believe they can observe and report with some objectivity. We have also taken steps to describe aspects of teaching that students do not usually observe. As a result we hope to have a policy that focuses more on professional development than on forcing faculty to compete in a zero-sum game.
As a result of this experience we recommend the following to all departments confronted with the very difficult task of evaluating teaching.
1. Focus as much as possible on improving teaching and as little as possible on rewards and punishments. Faculty can be motivated to participate in a program of professional development much more readily than they can be persuaded to participate in a system designed to reward or punish.
2. Focus as much as possible on developing and building consensus for your own statement of standards of teaching as a foundation for evaluating teaching. Evaluation should be based on an explicit standard, a standard with which all faculty can identify.
3. Focus on a definition of teaching that extends to tasks that are performed outside the classroom. Teaching effectiveness is much more than a "performance" in the classroom.
4. Find a realistic role for student input. Determine the role that you want student data to play in the evaluation process.
5. Do not expect anything more than temporary, partial solutions but continue to seek more permanent and complete solutions.
Student Opinion of Instruction Items
1. This teacher makes _____ use of class time.
5 - very good; 4 - good; 3 - satisfactory; 2 - poor; 1 - very poor
2. The syllabus (course outline) did _____ job in explaining course requirements.
5 - a very good; 4 - a good; 3 - an adequate; 2 - not do an adequate; 1 - (there is no syllabus/course outline)
3. This teacher raises thought-provoking ideas and asks challenging questions _____ during most class periods.
5 - several times; 4 - more than a few times; 3 - a few times; 2 - one or two times; 1 - zero times
4. This teacher provides _____ feedback on my progress in the course.
5 - a great deal of; 4 - more than an average amount of; 3 - an average amount of; 2 - little; 1 - very little or no
5. The assignments (papers, performances, projects, exams, etc.) in this class are _____ used as learning tools.
5 - always; 4 - almost always; 3 - sometimes; 2 - seldom; 1 - never
6. Tests _____ assess knowledge of facts and understanding of concepts instead of memorization of trivial details.
5 - always; 4 - almost always; 3 - usually; 2 - often do not; 1 - almost never
7. The quality of teaching in this course is giving me _____ opportunity to gain factual knowledge and important principles.
5 - a very good; 4 - a good; 3 - an average; 2 - a poor; 1 - a very poor
8. This teacher is _____ attentive and considerate when listening to students' questions and ideas.
5 - always; 4 - almost always; 3 - usually; 2 - often not; 1 - never or almost never
9. The work assigned contributes _____ to my understanding of the subject.
5 - a very significant amount; 4 - a significant amount; 3 - a good amount; 2 - little; 1 - very little
10. The difficulty level of this course is _____ challenging to me.
5 - constantly; 4 - often; 3 - sometimes; 2 -
almost never; 1 - never