University of North Dakota Home
Office Name
'
A to Z Index'Directory'Map
OID
 
 
 
'
Resources for Teaching Evaluation
'
Student Evaluations of Teaching:
What the Literature Tells Us

Student evaluations of teaching have been studied intensively over the years. At this point, over 2000 studies have been reported, with more results coming in all the time. Although the findings of these studies are not always consistent with each other, the experts tend to agree on a number of points. On this page, we summarize some of the findings that most experts embrace, as well as those experts' advice on use and interpretation of student evaluations. The names in parentheses after the items refer to authors in the list of resources at the bottom of this page.

A Few Things Most Experts Agree On

  • Ratings of overall teaching effectiveness are moderately correlated with independent measures of student learning and achievement.
  • An instructor's ratings for a given course tend to be relatively consistent over successive years; there is not much variation in student ratings for an individual instructor regardless of whether the form is administered to current students or to alumni.
  • Certain factors can influence student ratings--though less dramatically than we might suppose. For example:
    • student motivation: ratings are lower in required courses than in courses for which students had a prior interest, courses in the major, or those taken as electives (Cashin, Franklin, Ory, Centra)
    • ratings for higher-level courses (including grad courses) tend to be higher than for lower-level courses (but differences are small) (Cashin, Ory)
    • ratings given during the final exam are lower than those given during a regular class period (Ory, Franklin, Cashin)
    • students give highest ratings to classes in arts and humanities, lowest to classes in physical sciences, math and engineering (Ory, Franklin, Cashin)
    • TAs and first-year teachers receive lower rathings than others (Centra, Ory, Franklin)
    • new or revised courses often get lower-than-expected ratings the first time out (Franklin, Centra)
  • Some factors that we may think influence student evaluations seem to have little or no influence. These include:
    • time of day/schedule (Franklin, Cashin, Centra)
    • student ability/GPA (Franklin, Cashin)
    • instructor's rank, age, and research productivity (Ory, Cashin)
  • Some of the factors that faculty think influence ratings (level of difficulty, amount of work, instructor's style, gender, class size) are much more ambiguously related. To learn more about these findings and the studies on which they are based, you may want to check out some of these helpful resources at the bottom of this page.

Advice on Use and Interpretation

 Note: The following recommendations are taken from the University of Washington website, with only slight modificatons to reflect UND policy and practice.

Student course ratings have many uses, particularly if viewed over time and across courses. Student
ratings provide information that instructors can use to identify areas of strength and areas needing
improvement in their teaching. Furthermore, departments and teaching units can use student ratings in the aggregate to assess the overall performance of multi-course and multi-instructor units, as well as to evaluate individual instructors for personnel reasons, such as decisions regarding retention, promotion, tenure and merit pay.

The nine recommendations listed below can provide helpful guidelines for the use of student course ratings in personnel decisions.

    1. Student ratings must be used in concert with other data that relate to the quality of a faculty member's teaching, rather than as a sole indicator of teaching quality. Other sources such as peer reviews of classroom sessions, peer reviews of curricular materials, and faculty self-reflection should be assessed in addition to student evaluations to gain a true sense of the teaching skills and performance of a faculty member. Consideration of these other sources of evidence is especially important because student ratings alone do not provide sufficient evidence of the extent of student learning in a course.
    2. Evaluations from more than a single section should be used in making any decision about teaching quality. Research has shown that ratings from at least five courses are necessary to assure adequate reliability. The validity of the ratings for measuring teaching quality is increased as a greater variety of course formats is represented in the data upon which decisions are based. Trends in ratings across years may also be important in assessing teaching.
    3. Global ratings of teaching effectiveness are most appropriate to use in personnel decisions. "Overall ratings of the teacher and the course tend to correlate more closely with student achievement scores than do other items." (Centra) Other, more specific items should be used by the faculty member for review of specific skills and areas for improvement. (For examples of global ratings, see items 21 and 22 on the UND USAT form.)
    4. Small differences in individual evaluations should not be used as a basis for differential decisions. Because student ratings yield numerical averages, there is a temptation to overestimate the precision of the averages that are presented. Small differences in ratings may not be meaningful. It is better to deal with much broader classifications, such as Excellent/Good/Acceptable/Unacceptable or Significantly Exceeds Expectations/Meets Expectations/Falls Short of Expectations/Falls Significantly Short of Expectations.
    5. Interpretations of student ratings averages should be guided by awareness that students tend to rate faculty at or near the high end of the scale. It is therefore not appropriate to use the median (or 50th percentile) as a presumed dividing line between strong and weak teachers. More appropriate would be to assume that the majority of teachers are strong. It is also appropriate, when evaluating average ratings of individual instructors, to consider relevant comparisons (see Recommendation 6) and specific characteristics of courses taught (see Recommendation 7).
    6. Comparative data should be used but with caution. Department and university-wide comparison data will be reported on the summary report forms. However, for comparisons to be useful, the normative group should be based on more than a narrow population of instructors. Smaller departments may not want to rely on departmental norms but use norms calculated for a number of similar departments or for the school or college as a whole. At times, it may be better to compare ratings of similar courses across departments rather than ratings of dissimilar courses within departments.
    7. Course characteristics should be considered when interpreting results. For example, large lecture courses typically receive lower ratings than smaller courses, new courses being taught for the first time receive lower ratings than well-established courses, introductory courses for non-majors receive lower ratings than higher division courses for majors. (See Student Evaluations of Teaching:What the Literature Tells Us for more information on factors that influence ratings.) Adjustments for course type should be made in order to have a fairer sense of the faculty member's teaching skills. One way to adjust for course types is by choosing similar courses for normative comparisons.
    8. Faculty members should be given an opportunity to respond to evaluation results. Faculty should have an opportunity to discuss the objectives of the course, how the teaching methods were used to meet that objective, and how circumstances in the course might have affected evaluations. Furthermore, other evaluation information gained from a given course (see Recommendation 1) can aid with the interpretation of ratings results.
    9. Administration of course ratings should be scheduled to maximize the number of respondents. Generally, evaluations will have greater validity when higher proportions of the enrolled students complete evaluation forms. Ratings may not be an accurate reflection of the entire class when smaller proportions of students respond. This problem can be particularly acute in small classes. It is recommended that at least two-thirds of enrolled students must be included in the results to have any confidence in the results. As proportions decrease, particularly in small classes, there is greater opportunity for the rating of one or a few students to disproportionately affect the results.

Resources

Braskamp, L.A., & Ory, J.C. (1994). Assessing Faculty Work: Enhancing Individual and InstitutionalPerformance. San Francisco: Jossey-Bass.

Cashin, William E. (1999). "Student Ratings of Teaching: Uses and Misuses" in Changing Practices in Teaching Evaluation, ed. Peter Seldin. Boston: Anker. (pp. 25-44.)

John A. Centra. (1993) Reflective Faculty Evaluation: Enhancing Teaching and Determining Faculty Effectiveness. San Francisco: Jossey-Bass.

Barbara Gross Davis, "Tools for Teaching: Student Rating Forms." Available on-line at
http://teaching.berkeley.edu/bgd/ratingforms.html

Jennifer Franklin (2001). "Interpreting the Numbers: Using a Narrative to Help Others Read Student Evaluations of Your Teaching Accurately," in Techniques and Strategies for Interpreting Student Evaluations. ed. Karron G. Lewis. New Directions for Teaching and Learning, #87. San Francisco: Jossey-Bass (pp. 85-100).

"Evaluation of Teaching Using Student Ratings." A website maintained by New York University Center for Teaching Excellence. http://www.nyu.edu/cte/white.html

Lewis, K.G. (2001). Techniques and Strategies for Interpreting Student Evaluations. New Directions for Teaching And Learning, 87. San Francisco: Jossey-Bass.

National Research Council. (2003). Evaluating and Improving Undergraduate Teaching in Science,Technology, Engineering, and Mathematics. Washington, D.C.: National Academies Press. Available on-line at http://www.nap.edu/books/0309072778/html/

John C. Ory (2001). "Faculty Thoughts and Concerns About Student Ratings," in Techniques and Strategies for Interpreting Student Evaluations. ed. Karron G. Lewis. New Directions for Teaching and Learning, #87. San Francisco: Jossey-Bass (pp. 3-15).

Seldin, Peter. (1999) Changing Practices in Evaluating Teaching: A Practical Guide to Improved Faculty Performance and Promotion/Tenure Decisions. Bolton, MA: Anker Publishing.

All books and articles listed are available in the OID library.

 
 
 
Office of Instructional Development
409 Twamley Hall
Campus Box 7104
Telephone: 701.777.3325
Fax: 701.777.2925
oid@und.nodak.edu