« Learning Show -- Don't Forget Forgetting | Main | Work-Learning Research Morphs to Hobby »

Tuesday, 27 February 2007

Learning Research Quiz Results 2002-2007

Since 2002, I have offered the Learning Research Quiz on the Work-Learning Research website. Over 1200 people have taken that quiz. 991 people answered all 15 questions. There are 15 questions on the quiz. A blind monkey randomly guessing the answers would score about 24% (there are three to seven answer choices per question).

The questions are scenario-based questions presenting relatively realistic learning-design situations. The answer choices are plausible---there are no throwaway answer choices. The questions are thus not easy to answer. They are specifically designed to measure people's understanding of fundamental learning principles like repetition, spacing, feedback, retrieval practice, instructional objectives, etc.

People who answered the questions are self-selected from visitors to the Work-Learning Research website so we can't really be sure who this sample represents. In general, it is my experience that most people who come to the website are experienced learning professionals who are passionate about learning, training, teaching, and/or instructional design.

Here are the results for the 991 people who answered all 15 questions:

If 24% is a completely random score, what should we expect from those who take the quiz? Would you expect them to do twice as well (around 50% correct?) or do better, say 60%, 70%, 80%, 90%, or 100%?

On average our 991 participants scored 32% correct, barely better than chance responding!!

Scoring 1% to 25% correct were 279 respondents.
Scoring 25% to 50% correct were 610 respondents.
Scoring 50% to 75% correct were 101 respondents.
Scoring 75% to 100% correct was 1 respondent, who scored 80% correct.

Association affilitation had little impact on average correct:

ISPI members scored 36% correct.
ASTD members scored 33% correct.
Other trade association members 28% correct.
Research association members 30% correct.
Non-affilitated respondents scored 31% correct.

Age of respondents had little impact on average correct:

Under 21 years of age scored 26%
21-25 scored 28%
26-35 scored 32%
36-45 scored 33%
46-55 scored 30%
56-65 scored 35%
66-75 scored 33%

Education degree had little impact on average correct:

"Ph.D. in psychology, learning, etc." scored 35% correct.
"Ph.D. in other discipline" scored 34% correct.
"Masters in psychology, learning, etc." scored 33% correct.
"Masters in other discipline" scored 32% correct.
"Bachelors" scored 31% correct.
"Other" scored 29% correct.

Job Title had little impact on average correct:

"Trainer" scored 32% correct.
"Instructional Designer" scored 33% correct.
"Performance Consultant" scored 33% correct.
"Human Performance Technologist" 38% correct.
"E-Learning Specialist" scored 29% correct.
"Learning Technology Developer" scored 33% correct.
"Teacher" scored 31% correct.
"Professor (with little research activity)" scored 33% correct.
"Learning Researcher (professors too)" scored 31% correct.
"Manager of Instructional Development" scored 34% correct.
"Manager of Training" scored 34% correct.
"Student" scored 28% correct.
"Other" scored 29% correct.

Date of Taking the Quiz had little impact on average correct:

2002 scored 30%
2003 scored 33%
2004 scored 31%
2005 scored 33%
2006 scored 30%
2007 scored 33%

Conclusion

Because the sample of respondents is difficult to define, any conclusions must remain speculative. Still, the results suggest a massive competence gap. The 32% average score---and the stubborn lack of improvement regardless of experience, education, and age---suggests that most people in the learning-and-performance field are unprepared for roles as designers of learning, at least as far as their ability to apply knowledge of learning research.

Links

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341cf01053ef00d834e9eefe53ef

Listed below are links to weblogs that reference Learning Research Quiz Results 2002-2007:

Comments

Will, I am aghast at such responses! Do we all really know so little?

I do have a request-how about some feedback-like the opportunity to see the questions again, with the answers to try to learn something from it? Thanks, Ann

Ann, I've added links to the quiz feedback in the body of the post. Thanks!

I'm not aghast (surprised) but am appalled. I entered the field with Pipe and Mager (et.al) and have seen no progress in the practice or the competence of the practitioners.
Odiorne saw none, also. Has anyone?

In my experience, 70% in the trade are dullards, 20% are voluble trend fashionistas, and 10% are skilled, often self-taught.

Training-- performance improvement--task analysis--were "rationalized" in WWII and yet the trade still does not have a standards handbook on massed vs. distributed practice, for example.

well, I got 2 out of 15 correct. That is substantially worse than the average, which is, as you point out, barely above what they would get from pure guesswork.

(Actually, the 32 percent is about exactly what you would expect. It's an old adage among trivia game players: 'when in doubt pick 3' (ie., C, the middle response). And this quiz fits true to the pattern: A was the correct response 2 times, B 4 times, C 6 times, D 2 times, E none, and F once.)

All this is a round-about way of saying: have you considered the possibility that it's the quiz that's the problem, not the quiz-takers?

I mean - I went into the test with the expectation that I might not do well. I have a healthy doubt of my own abilities. But I am not a 2 out of 15 in my own field. That's an unreasonable result.

There is, in my view, a systematic flaw in this test. And it can be expressed generally as the following:

The test author believes (based on some research, which is never cited) that "Learning is better if F" where 'F' is some principle, such as "Performance objectives that most clearly specify the desired learner behavior will produce the best instructional design."

This principle is treated as linear. That is to say, the more the principle is exemplified in the answer (per the author's interpretation) the more learning will be better.

But these principles are not linear. There is a point of diminishing returns. There is a point at which slavish adherence to the principle produces more problems than good. Experienced designers understand this, and hence build some slack into the application of the principles.

Question 1 provides a good object lesson:

The feedback states: "Performance objectives that most clearly specify the desired learner behavior will produce the best instructional design."

Option B (which I selected) is: “As each web page is developed, and after the full website is developed, each web page should be tested in both Netscape Navigator and Internet Explorer.”

Option C (which is considered correct) is: Same as B, with the addition of the following: “One month after completing the training, learners should test each web page during its development at least twice 90 percent of the time, and test each web page once after the whole website is complete at least 98 percent of the time.”

Now the question is, is the performance objective "more clearly stated" in C than in B? According to the author (obviously) it is. But sometimes making things more precisely stated does not make them more clear. It does not even make them more precise.

Which is clearer:

a. Test the page after design

b. Test the page 98 percent of the time after design

In my view, (a) is clearer.

Moreover, (b) is no more precise than (a). Because what (a) *means* is "Test the page 100 percent of the time after design".

Therefore, it would be unreasonable to select (c) on the ground that it is clearer. The unthinking effort to make it more precise went over the top and resulted in a statement that is more an example of nonsense than clarity.

The entire test is constructed this way. I got a couple where it was pretty obvious what the examiner was looking for. But otherwise, I picked what I felt was the best answer, which in every case was the less extreme version of the over-the-top choice.

In question number 2, for example, the principle is: "When the learning and performance contexts are similar, more information will be retrieved from memory."

Well, this is generally true. But will somebody prepare better spending a week on the road, living in a hotel, unable to keep up with work at home in Boston or to be there to help the kids? Being on the road creates an impact. So even if the test is being conducted in San Francisco, the comes a point where the advantage of studying and testing in a similar environment is overwhelmed by the disadvantage of being on the road.

The test author created an extreme case - a test location in San Francisco instead of a test location in downtown Boston. Thus, complications that an experienced person would automatically take into account - the time lost in airports, the rigors of travel, etc. - are built into their thinking.

The only way to get through such questions is to be able to figure out what the author is looking for. In this case, I looked at the example and it was pretty clear that it would be based on 'similarity of environment' and not any real question about 'effective learning'. It was one of the two I got right.

But author's intention is very deliberately disguised throughout the test. Or more accurately, the test addresses such a specific context that only people who work in that specific context have any real chance of divining the author's intent (and as it turns out, the context was so narrow it didn't even show up statistically).

This, I think, is one of the problems of testing genrally, and not just this test in particular.

In a test like this, each question is designed to measure only one point of learning (more precisely: to measure responses only along one vector). Theoretically, you could have questions that measure more than one vector, but it results in confusing questions and too many possible responses.

If the test measures simple things, that's fine. The question of whether 2+2=4 is not going to be impacted by external considerations.

But if the test measures complex phenomena, then it is going to systematically misrepresent the student's understanding of the phenomena.

Specifically, a very simple one-dimensional understanding will fare as well (and in this case, better) than a complex, nuanced understanding. People who understand a discipline as a set of one-dimensional principles will do the best - understanding simply becomes a case of picking which principle applies, then selecting the example that fits the best.

This test fails because it is too narrowly defined to let the simple understanders spot the principle being defined, and too dependent on single principles to give people who genuinely understand the phenomena any advantage.

The test author is right: don’t trust gurus.

Unfortunately, the test author didn't consider the possibility of recursion.

Stephen,

Thanks for taking the time to write such an exhaustive analysis of my test.

Your critique is intriguing, but off the mark.

The test clearly is not perfect. No test is perfect. Yes, it grades questions in a rather one-dimensional way, but that's okay for the purposes it was designed for---to get a general idea of how much people know about the most fundamental principles of learning.

It is not appropriate to think of the scoring system employed in comparison to the straw man of typical tests we are all used to. In typical tests, it is considered a failure to get an answer wrong. I evaluate my test results in a probabilistic manner, under the assumption that those who know more about learning will do better on the test. For example, an expert in learning may get a 75% not a 40%. This takes care of the dimensionality problem you mentioned.

One of the skills of being a professional is to be able to notice what aspects of the environment are most important to pay attention to. Your criticism of the multidemsionality of the test questions is misplaced. A truly knowledgeable person must know the most critical dimensions to pay attention to. That is one of the hallmarks of expertise---to know what is important.

Again, and I believe I have reiterated this many times, the test is admittedly not perfect. The test could, for example, be improved by validating it with real learning experts other than myself.

Here's a more ideal design, implementable with more time and resources than I had available:

1. Develop a set of questions based on fundamental research-based learning principles.

2. Select a group of learning researchers to offer suggestions for improvement of each test item. Then improve the questions.

2A. If we had unlimited time and resources, we could create experiments for each question, where we actually created different learning interventions for each answer choice and randomly assigned real learners to the various interventions to see which answer choice actually produced the best learning results. I didn't feel this was necessary since I based my questions on the collective wisdom of dozens of the world's most knowledgable learning researchers.

3. Have a second group of learning researchers take the test as test takers. Look at the result for each question. Keep questions that are reliably answered in the "correct" way---in other words keep the questions in which all, or almost all, of the experts agree. Discard questions where the experts disagree.

4. Provide the remaining questions as the assessment.

While this procedure would certainly result in a better test, I am confident that the test as currently designed still can separate the wheat from the chaff.

As you can imagine, using the exhaustive approach I described above would be rather onerous, which is why I created the test in a more straightforward manner.

By the way, since you commented on the lack of research citations in my feedback---an intentional design decision to keep the feedback reasonably short and accessible---let me say that each of these questions is based on dozens or hundreds of studies from the world's preeminent refereed journals on learning, instruction, and memory. To get access to my research, interested readers may visit my catalog of research reports at www.work-learning.com/catalog/

In your critique of my Question 1, you completely failed to include the actual scenario-based question. The context in which decisions are made---as can be included in the question scenarios---is critically important and goes back to the primacy of noticing. An expert in instructional design would have considered the importance of creating an objective that can be measured. Only the "correct" choice creates such an objective.

Critiques are most valuable if they include suggestions for improvement. Your critique complained about the limitations of testing, but did not suggest improvements. How would you create a test that assessed people's knowledge of fundamental learning principles?


Hi Will,
From your conclusions, either everyone who has taken your test is a dullard and/or is uneducated (including member's of various professional organizations and, likely, some PhD's in education) or your test is flawed.

If Occam's Razor (the simplest explanation is the correct one) is applied here then the problem is with your test. You challenge Stephen Downes to be more constructive with his feedback, please accept mine as I have tried to do just that. I have found many problems, here are a sample:

- Confusing wording - Q1, Answer "C" I'm a big believer in the ABCD (audience, behaviour, condition, degree) approach to learning objectives, but when the written objective is so confusing the simpler objective is better. The problem is that in the stem you talk about websites, but in the outcome you talk about webpages. Question Stems and responses should be not just related, but the same, so when reading the stems, I couldn’t understand how someone would test a website 90% of the time – you either test it or you don’t. Of course, your objective was about testing webpages, even then the wording is odd and takes far too much cognitive processing.

- Content problem - Q3, Answer B - 10 min review, 20 min new material. Your feedback states “Choice A is best because it provides the most repetition.” so if Choice D would have been to provide 25 minutes of review and 5 minutes of new material then it would have been correct? Research suggests that an adult’s attention span is 15 to 20 minutes and that the first 10 minutes are most productive. If this is the case – then all of the productive time in your correct scenario has been spent on review with diminishing attention going to the new materials. Answer “A” balances 5 minutes of good review and then the remaining time for new material which seems more prudent and more grounded in theory.

- Too simplistic, not enough context - Q8 even in your feedback you acknowledge that you don’t know “It seems reasonable that, after a couple of hours of instruction, learners who receive interesting yet irrelevant information may be able to re-energize themselves to pay attention to further learning materials.” Yes, it does seem reasonable and the telling of on topic and off topic stories to vary pace, engage, refresh, motivate, entertain, etc. is the same in the classroom, boardroom, lunchroom and locker room. Done right, it can be a good thing and if this client is a professional comedian, then he should be able to do it right. You should remove this question from your statistics.

- Poorly written distracters – Q9 There are plenty of evaluation experts that will tell you that the six distracters that you provided are frought with problems that renders the question almost useless in its discriminating power. The key issue is not the use of statements like “Choices B and C are correct.” But their use mixed with distracters like “The prequestions will have little if any effect.” The fair way to write this question would be to enumerate all of the choices in the stem, then have all of the distracters match, so the distracters would be along the lines of “Statement 1 alone is correct” / “Statements 1 and 3 are correct”. BTW, your distracters in Q1 and Q5 (e.g ‘same as B but add’) suffer from the same problem but a different reason – I’m sure John Sweller and Ruth Colvin Clark talk about it in their books …

- ??? – Q11 your feedback says “They knew 50 of the answers and guessed correctly on 10 without really knowing the answer. How do we know this?” well, you don’t. I don’t know you or your work, but for someone who claims that the feedback they have written is based on research and who offers to sell that research – this is shocking. Any evaluation expert and/or statistician will say that the premise you base your feedback on is false. If what you wrote was true, then anyone who scores 100% on a test, really only knows 90% and guessed at the final 10%. You cannot twist probability in statistics like this. If probability in testing were so predictable then in any 4-distracter multiple choice test the minimum anyone can get is 25% because to do worse would be statistically impossible.

- Illogical question – Q14 asks “The instructional guru recommends that they develop two tests, one with questions and one without, and see which one works best.” This is obviously a zen question – what is a test with no questions? I guess it is a blank page. So which produces better learning outcomes – blank pages or pages with questions written on them?

- Missing Details – Q15 compares a classroom video seminar with an e-learning video-based course. Without further details about the elearning course, one could assume that a classroom seminar would include other participants and some kind of facilitator – which would mean there would also be opportunities for conversation, collaboration, questions and feedback. The inclusion of words like ‘classroom’ and ‘seminar’ connote all of the interaction that usually takes place in those environments. Thus without a better explanation of the context of instruction in both presentations the question leaves too much room for argument – so the answers cannot be argued definitively.

I enjoyed this little test and you did present some arguments in the feedback that I found reassuring to my own practice and some that challenged my understanding. However I must disagree with you that it has any kind of predictive or explanatory power for the state of Instructional Technology. Occam’s Razor …

CR,

Thanks for your suggestions for improvements.

I think Stephen's point, though, was that the whole way of asking the questions was flawed, because they offered several interpretations, depending upon what the reader focused on.

As I wrote previously, this design actually enables expertise to be demonstrated because experts can tell what to focus on when given the real-world messiness of multidimensionality.

Will

Just discovered your blog and the test. Great resources - thanks.

However, I think CR has it spot on. The test is an excellent tool for self-assessment and learning but I wouldn't take it seriously as a way of evaluating competence. I did quite well :-)

Hi..can i get the test results for free? I am Jessa Marie Agnes. I have no pay pal account and i also have no credit cards. I take the Personality test because it is our assignment but i cannot get the results for i have no money to pay. i am still a college student here in the Philippines and i have just an allowance of 500php in a week..pls help me.I need the result. If you will send it to me for free just send it to my email address gabjessa@hotmail.com. Thank You.

The quiz feedback is free and is listed at the bottom of the original post. Here it is again:

http://www.work-learning.com/quiz_questions_feedback.htm

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Search

  • Google
    This Blog Web

WLR and Centrax

Translate

Notable Books

Sponsoring Ads (vT1)

Sponsoring Ads (vG2)

Sponsoring Ads (vL3)

Tracker