Assessment in online learning--it's a matter of time

Citation metadata

Date: March-April 2014
From: Journal of College Science Teaching(Vol. 43, Issue 4)
Publisher: National Science Teachers Association
Document Type: Essay
Length: 3,308 words

Document controls

Main content

Abstract: 

Taking online courses is becoming a more common part of the college experience, but very little is known about student behaviors and strategies related to online assessment. This article reviews how students in an online Earth and Space Science course interact with various online assessments. Our two main findings are that our students do not use self-assessment tools effectively, and time spent on online exams is surprisingly short. We discuss how the use of self-assessment tools can probably be improved through careful online course design, but the short time spent on online exams is partially due to the nature of the online environment itself. We make a number of design suggestions that can encourage good test-taking strategies in the online environment.

Full Text: 

Today, "digital natives" (Prensky, 2001) make up the vast majority of college students, and many of them are increasingly testing their digital abilities in distance education courses. The debate on how proficient they are in the use of e-learning technology is still ongoing (e.g., Margarayan, Littlejohn, & Vojt, 2011); nevertheless, little is known about their strategies for online assessment.

We recently designed a two-credit online course in Earth and Space Science for elementary education majors (Cervato, Kerton, Peer, Hassall, & Schmidt, 2013). The course, taken by between 35 and 48 students per semester, with between 64% and 94% of those being first- and second-year students, is divided into 15 weekly content modules consisting of online multimedia content and offline readings, projects, and experiments. The course was delivered using the Blackboard Learn 9 Learning Management System (LMS). Each module was designed to include a suite of online formative and summative assessments (worth 60% of the final grade): multiple-choice self-assessment (SA) questions to help students navigate the associated online and text readings; a nonproctored, open-book, multiple-choice quiz on the module content; and a proctored exam every three or four modules. The remaining 40% of the final grade was based on various semester-long projects.

Over three semesters we experimented with different ways of using the various assessment tools and explored how our students interacted with them. We have found that (a) our students do not use SA tools effectively, and (b) time spent on online exams is surprisingly short.

Use of self-assessments

We implemented a simple style of self-assessment--consisting of 10 questions randomly drawn from a large pool--that students were to take after completing the online module. Students were given unlimited time to complete the questions and an unlimited number of attempts. Completion of each SA quiz with a score of 8 or better was worth 0.3% of their final grade. The goal of the SA was to allow the student to test his or her understanding of the content material without the fear of negative consequences for getting answers incorrect. Our intention was that the students would use the feedback from the SA to identify areas in which they are weak; this would lead to a more efficient use of their study time and ultimately better performance in the course as a whole.

We found that in Semester 1, when use of the SA was required, the students accessed the SA frequently, but there was no correlation between SA usage and a student's final grade in the course. In Semester 2, there was no credit associated with the activity, and almost none of the students, including the better ones, used the SA quizzes to assist them with the course (see Figures 1 and 2); in Semester 3, we did not use the SA quizzes at all. In Semesters 2 and 3, overall student performance in quizzes and exams was better than in the first semester. This was not expected, as instances where SA usage does have a positive correlation with a student's grade have been reported in the literature at least for face-to-face courses (Kibble et al., 2011; Smith, 2007).

Why was SA so ineffective in our course, to the point that removing it improved student performance? We believe that the foremost reason is that the ability to use SA effectively requires the ability to reflect on one's progress in learning, a high-level metacognitive and knowledge skill, using the language of Bloom's Taxonomy (e.g., Heer, 2009), that many of our students have not yet developed. One way to describe students' learning progress is as a journey from novice to expert (Heck & Wild, 2011). Expert learners have developed metacognitive skills that allow them to monitor their learning, change strategies, and assess their progress. Novice learners have not--they view metacognitive opportunities such as formative assessment as duties or requirements. As such, our students are novice learners in science and in the online environment. Another possibility is that the way SA was included in the course, as a separate task following reading, is not an effective way to engage and scaffold learning in this group of students. We will return to this idea after we analyze a second aspect of online assessment: how much time students invest in online testing.

Duration/timing of online assessment

On the basis of our experience in face-to-face courses and best practices for design and administering of multiple-choice exams (e.g., Brothen & Wambach, 2004; Chronicle of Higher Education, 2010), we decided that 1-2 minutes per question would allow sufficient time for all students to complete the exams. So we set up the time available for open-book quizzes (10 questions) at a maximum of 30 minutes to encourage them to read the textbook beforehand but still allow time to double-check the answer if needed and at 1 hour for 30-question exams. Each quiz randomly selects questions from a test bank pool provided by the textbook publisher, and each exam pulls questions from the combined pools of the quizzes that covered the same material. Questions are administered one at a time, backtracking is permitted, and the LMS warns the student before submission if an answer was left blank. After submission, students can see the questions, their answers, and the points they earned for each question. They do not see the correct answer.

To evaluate our approach, each semester we analyzed how much time students spent in each assessment (see Tables 1 and 2). On average, students spent between 10 and 13 minutes on each exam (worth up to 10% of the course grade) and between 13 and 16 minutes on each quiz (worth up to 1% of the course grade). In terms of average time spent per question, this translates to between half a minute per question on exams to one and a half minutes per question on quizzes. In both cases this includes the time the student would spend reading a question, reading up to four multiple-choice answers, evaluating the options, making their selection, and submitting it. For the quizzes this would also include any time the students spent consulting their notes, textbook, or other available resources, likely accounting for the time difference seen between quizzes and exams. We also found no significant correlation between exam or quiz score and time spent on it, and we found no correlation between time spent on quizzes or exams and final course grade.

We were not surprised by the lack of correlation between online evaluation time and final grade; from personal experience with our students, we know that a long exam duration could equally be due to a poor student struggling or a good student being meticulous and rechecking answers, and this intuition has been confirmed by many educational and psychological studies as reported by Schnipke and Scrams (2002). However, we were puzzled by the very short time students spent doing the online assessments and, in particular, how the amount of time spent was not influenced by the amount of course grade linked to each assessment type. We could not find studies that showed how much time students typically spend on multiple-choice tests in face-to-face courses; our own experience suggests that students could spend as little as 10 minutes or as long as 50 (the entire testing period) for a 30-question exam. It is worth noting that over three semesters the longest any student spent on an exam was only 32 minutes, nowhere close to total time available for the exams.

We evaluated three hypotheses regarding why the students were completing the exams so rapidly compared with our face-to-face courses: first, the exams were too easy; second, the students were exposed to too many questions from the exam pool beforehand; and third, the online environment itself encouraged rapid test taking. We think the first hypothesis is unlikely as the questions originated from a test bank that had already been reviewed as part of the publisher's quality control process. Both instructors then reviewed the larger initial pool of questions in advance, removing questions that only emphasized memorization of jargon and terminology and selecting questions that were of comparable difficulty and style to those questions used in the equivalent face-to-face courses.

The issue of pool size probably does explain the pattern we see in the exam duration where Exam 1 always took longer than the other exams. The first exam has the largest pool (127 questions), whereas the other exams have comparable-sized pools (98, 82, and 73, respectively) and comparable, shorter, average completion times. We used a "Monte Carlo" computer model to explore the effect of this "pool exposure" on student response times by simulating the quiz and exam question assignments for thousands of trials and determining the average number of previously viewed questions expected for each exam. For example, our model suggests that a student taking Exam 1 will, on average, have seen seven questions previously during the quizzes. To explore the effect this could have on exam duration, we assume that students only "glance" at these questions for a short period of time (5 or 10 seconds) and then recalculate the average time students spend on each new question. For example, for Exam 4 of fall 2011, we calculate a time per question of 21 seconds (total average time divided by 30 questions). From our models, though, we expect that, on average, a student will have seen 12 of the exam questions previously. Assuming they glance at those questions for 5 (or 10) seconds, then the average time spent on each of the remaining 18 new questions increases to 31 (or 28) seconds. Even accounting for the effect of pool exposure, the total time spent per question by our students (column 7 of Table 2) still seems unusually short.

Our final hypothesis is that our students' apparent haste in completing their exams was related to the environment in which assessment was taking place: online. Ideally one would test this by administering a paper exam to half of the students and asking the other half to take the same test online. Because our course is fully online, we could not do this. However, we suggest that there are a couple of aspects of the online environment that could influence exam times.

First, there has been some research done comparing people's reading habits online versus on paper, showing that there is a difference in patterns of reading web pages and books (Liu, 2005). This particular study shows that screen-based reading behavior involves mainly scanning and spotting keywords rather than in-depth reading. It is likely that this style of reading also translates to the online testing environment, meaning that our students not only skim previously viewed questions, but also skim all exam questions.

Second, there are numerous online resources and books on developing strategies to take multiple-choice tests (e.g., Dobbin, 1984; Landsberger, 2013). These include previewing the test, pacing yourself to leave time to review your answers at the end, critically evaluating all options given for each questions, answering easier questions first and spending more time on harder ones, and reviewing all questions and answers at the end. Although these strategies might be used by students taking pen-and-paper exams, our online environment does not seem to nurture appropriate test-taking behavior. Of particular concern is that, given the short amount of time spent by our students on online assessment, it is practically impossible that they backtracked to review answers before they submitted the exam even though this option was available. Exam questions in our course were displayed one at a time, following "best practices" so that students could focus on each question and would be less likely to accidentally miss questions. Unfortunately, the one-at-a-time display does not encourage backtracking because students must navigate through the questions sequentially, and when they have reached Question 30, they might not want to spend the time to review all questions one by one.

Future directions for course design and research

There is evidence that SA is most effective, in that it leads to better overall course grade, when it is not formally graded and is presented as an optional activity (Kibble, 2007), but in our case this leads to SA being ignored rather than used. Given the nature of our students, novice learners in science and in the online learning environment, is there a way to implement SA in the online environment that makes it a useful learning tool? Our current course design is shown schematically in Figure 3 (top of figure). The SA activity is presented to the students as a separate task to be completed after moving through the subject content. When the SA is optional, students clearly make the choice that the time spent on the SA is not time well spent and almost uniformly choose to not complete it. When we required it, students focused on getting the points and the right answer, rather than using the tool to learn (Kerton & Cervato, 2012). An alternative design, schematically illustrated in Figure 3 (bottom of figure), is where the SA is embedded throughout the module so that from the students' perspective, two tasks have been reduced to a single one. An analogy from face-to-face classes is the use of anonymous "clicker" (personal response system) questions to periodically assess student understanding during a lecture. One possible benefit of this integrated design is that students will become comfortable with the idea that reviewing and self-testing are necessary and useful aspects of the learning process. We encourage instructors to try this SA design and encourage LMS developers to create tools that facilitate this approach.

Although probably little can be done to combat the online style of reading, there are ways to improve LMS design for the delivery of exam questions. For example, features allowing students to electronically mark questions they want to review; to have easy access to individual questions, perhaps through the use of thumbnails or a clickable list of question numbers; and to receive system prompts to review questions if time remains would all make online exams more user-friendly.

In closing, we think that the study of student behavior during online evaluation as part of college courses is an area that is ripe for further research. Much of the research on examinee behavior on online exams has been done in the context of large standardized tests such as the ACT, SAT, LSAT, and GRE (e.g., Camara, 2002; Schnipke & Scrams, 2002), and most studies examining student behavior in online courses tend to focus more on issues of cheating and security (e.g., Harmon, Lambrinos, & Buffolino, 2010; Rowe, 2004) rather than on student test-taking strategies and noncheating-related behavior. Because online courses are becoming more prevalent and the LMS automatically and unobtrusively records timing information, more research can and should be done at the individual course level to better understand student behavior in online testing in order to optimize students' learning experience.

Acknowledgments

We thank Iowa State University (ISU) intern Angela Zhang for her work collecting exam and test timing data. Many thanks also to Joanne Olson from the ISU School of Education for informative and illuminating discussions related to teacher training.

References

Brothen, T., & Wambach, C. (2004). The value of time limits on internet quizzes. Teaching of Psychology, 31, 62-64.

Camara, W. (2002). Examinee behavior and scoring of CBTs. In C. N. Mills, M. T. Potenza, J. J. Fremer, & W. C. Ward (Eds.), Computer-based testing: Building the foundation for future assessments. Mahwah, NJ: Erlbaum.

Cervato, C., Kerton, C. R., Peer, A., Hassall, L., & Schmidt, A. (2013). The big crunch: A hybrid solution to earth and space science instruction for elementary education majors. Journal of Geoscience Education, 61, 173-186.

Chronicle of Higher Education. (2010). How many multiple choice questions? Retrieved from http:// chronicle.com/forums/index. php?topic=65799.0

Dobbin, J. E. (1984). How to take a test. Princeton, NY: Educational Testing Services.

Harmon, O. R., Lambrinos, J., & Buffolino, J. (2010). Assessment design and cheating risk in online instruction. Online Journal of Distance Learning Administration, 13(3). Retrieved from http:// www.westga.edu/%7Edistance/ ojdla/Fall133/harmon_lambrinos_ buffolino133.html

Heck, J., & Wild, M. R. (2011). Expert learners. Retrieved from http:// expertlearners.com/srl.php

Heer, R. (2009). A model of learning objectives. Retrieved from http:// www.celt.iastate.edu/teaching/ RevisedBlooms1.html

Kerton, C. R., & Cervato, C. (2012). Self-assessment in online learning: Why bother? [Abstract for Paper No. 34-3]. Geological Society of America Abstracts with Programs, 44(7), p. 113. Retrieved from https:// gsa.confex.com/gsa/2012AM/ webprogram7Paper210157.html

Kibble, J. D. (2007). Use of unsupervised online quizzes as formative assessment in a medical physiology course: Effects of incentives on student participation and performance. Advances in Physiology Education, 31, 253-260.

Kibble, J. D., Johnson, T. R., Mohammed, K. K., Nelson, L. D., Riggs, G. H., Borrero, J. L., & Payer, A. F. (2011). Insights gained from the analysis of performance and participation in online formative assessment. Teaching and Learning in Medicine, 23(2), 125-129.

Landsberger, J. (2013). Multiple choice tests. Retrieved from http://www. studygs.net/tsttak3.htm

Liu, Z. (2005). Reading behavior and the digital environment. Journal of Documentation, 61(6), 700-712.

Margaryan, A., Littlejohn, A., & Vojt, G. (2011). Are digital natives a myth or reality? University students' use of digital technologies. Computers & Education, 56(2), 429-440.

Prensky, M. (2001). Digital natives, digital immigrants: Part 1. On the Horizon, 9(5), 1-6.

Rowe, N. C. (2004). Cheating in online student assessment: beyond plagiarism. Online Journal of Distance Learning Administration, 7(2). Retrieved from http://www. westga.edu/~distance/ojdla/ summer72/rowe72.html

Schnipke, D. L., & Scrams, D. J. (2002). Exploring issues of examinee behavior: Insights gained from response-time analyses. In C. N. Mills, M. T. Potenza, J. J. Fremer, & W. C. Ward (Eds.), Computer-based testing: Building the foundation for future assessments (pp. 236-266). Mahwah, NJ: Erlbaum.

Smith, G. (2007). How does student performance in formative assessments relates to learning assessed by exams? Journal of College Science Teaching, 37, 28-34.

Charles Kerton (kerton@iastate.edu) is an associate professor in the Department of Physics and Astronomy and Cinzia Cervato is a professor in the Department of Geological and Atmospheric Sciences, both at Iowa State University in Ames.

Caption: FIGURE 1 Semester I self-assessment (SA) usage. Each circle shows the number of SA activities completed by students and the average time they spent on each SA. The diameter of the circle is proportional to their final letter grade in the course as shown in the lower right. Note that the majority of students completed all of the 14 required SA activities and that there is no strong correlation between SA usage and final course grade.

Caption: FIGURE 2 Semester 2 self-assessment (SA) usage. When not graded, SA usage dropped dramatically. Note that only two students completed all 14 available SA activities and that there is no correlation between SA usage and final course grade.

Caption: FIGURE 3 Self-assessment (SA) in online modules. Two alternative designs for including SA in an online module are shown schematically. In the sequential model (top), SA is completed as a separate activity done after all of the subject content is covered. In the alternative integrated design (bottom), SA is dispersed throughout the time when subject content is being covered. One possible advantage to the integrated model is that SA can be viewed by the student as another aspect of the content model and not an extra task.

TABLE 1
Quiz summary.

                 Number of    Average time (a)   Time per question
Evaluation       questions    (minutes)          (seconds)

Fall 11 quiz     10           15.85 (5.0)        95
Spring 12 quiz   10           13.16 (3.6)        79
Fall 12 quiz     10           15.84 (3.8)        95

                 Average score (a)   r (time vs. score)
Evaluation

Fall 11 quiz     7.82 (0.8)          0.46
Spring 12 quiz   7.64 (0.9)          0.42
Fall 12 quiz     8.48 (0.6)          -0.07

(a) standard deviation is given in parentheses.

TABLE 2
Exam summary.

               Average         Time per
               total time      question    Average
Evaluation     (minutes) (a)   (seconds)   score (a)

Fall 11--1     14.26 (5.9)     28.5        17.35 (3.3)
Fall 11--2     10.71 (5.3)     21.4        17.55 (4.2)
Fall 11--3     8.79 (2.7)      17.6        16.26 (3.4)
Fall 11--4     10.31 (4.3)     20.6        18.77 (3.6)
Spring 12--1   13.46 (5.2)     26.9        18.30 (3.8)
Spring 12--2   9.70 (3.2)      19.4        20.11 (4.7)
Spring 12--3   8.67 (3.0)      17.3        20.96 (4.7)
Spring 12--4   8.33 (2.2)      16.7        22.11 (4.8)
Fall 12--1     17.57 (5.8)     35.1        19.60 (2.9)
Fall 12--2     11.11 (3.6)     22.2        21.88 (4.2)
Fall 12--3     11.34 (4.7)     22.7        22.07 (5.0)
Fall 12--4     12.45 (4.7)     24.9        22.07 (4.6)

                                             Adjusted time
               r (time vs.   Average # of    per question (b)
Evaluation     score)        new questions   (seconds)

Fall 11--1     0.33          23              35.7 (34.2)
Fall 11--2     0.01          18              32.4 (29.0)
Fall 11--3     0.16          15              30.2 (25.2)
Fall 11--4     -0.16         18              31.0 (27.7)
Spring 12--1   -0.10         23              33.6 (32.1)
Spring 12--2   0.03          18              29.0 (25.7)
Spring 12--3   0.08          15              29.7 (24.7)
Spring 12--4   -0.11         18              24.4 (21.1)
Fall 12--1     0.12          23              44.3 (42.8)
Fall 12--2     -0.31         18              33.7 (30.4)
Fall 12--3     -0.41         15              40.4 (35.4)
Fall 12--4     -0.26         18              38.2 (34.8)

(a) standard deviation is given in parentheses.

(b) assumes the student spends 5 (or 10) seconds on each previously
viewed question.

Source Citation

Source Citation   

Gale Document Number: GALE|A498484341