Clark Quinn and I have started debating top-tier issues in the workplace learning field. In the first one, we debated who has the ultimate responsibility in our field. In the second one, we debated whether the tools in our field are up to the task.
In this third installment of the series, we've engaged in an epic battle about the worth of the 4-Level Kirkpatrick Model. Clark and I believe that these debates help elucidate critical issues in the field. I also think they help me learn. This debate still intrigues me, and I know I'll come back to it in the future to gain wisdom.
And note, Clark and I certainly haven't resolved all the issues raised. Indeed, we'd like to hear your wisdom and insights in the comments section.
--------------------------
Will:
I want to pick on the second-most renowned model in instructional design, the 4-Level Kirkpatrick Model. It produces some of the most damaging messaging in our industry. Here’s a short list of its treacherous triggers: (1) It completely ignores the importance of remembering to the instructional design process, (2) It pushes us learning folks away from a focus on learning—where we have the most leverage, (3) It suggests that Level 4 (organizational results) and Level 3 (behavior change) are more important than measuring learning—but this is an abdication of our responsibility for the learning results themselves, (4) It implies that Level 1 (learner opinions) are on the causal chain from training to performance, but two major meta-analyses show this to be false—smile sheets, as now utilized, are not correlated with learning results! If you force me, I’ll share a quote from a top-tier research review that damns the Kirkpatrick model with a roar. “Buy the ticket, take the ride.”
Clark:
I laud that you’re not mincing words! And I’ll agree and disagree. To address your concerns: 1) Kirkpatrick is essentially orthogonal to the remembering process. It’s not about learning, it’s about aligning learning to impact. 2) I also think that Kirkpatrick doesn’t push us away from learning, though it isn’t exclusive to learning (despite everyday usage). Learning isn’t the only tool, and we should be willing to use job aids (read: performance support) or any other mechanism that can impact the organizational outcome. We need to be performance consultants! 3) Learning in and of itself isn’t important; it’s what we’re doing with it that matters. You could ensure everyone could juggle chainsaws, but unless it’s Cirque de Soleil, I wouldn’t see the relevance.
So I fully agree with Kirkpatrick on working backwards from the org problem and figuring out what we can do to improve workplace behavior. Level 2 is about learning, which is where your concerns are, in my mind, addressed. But then you need to go back and see if what they’re able to do now is what is going to help the org! And I’d counter that the thing I worry about is the faith that if we do learning, it is good. No, we need to see if that learning is impacting the org. 4) Here’s where I agree, that Level 1 (and his numbering) led people down the garden path: people seem to think it’s ok to stop at level 1! Which is maniacal, because what learners think has essentially zero correlation with whether it’s working (as you aptly say)). So it has led to some really bad behavior, serious enough to make me think it’s time for some recreational medication!
Will:
Actually, I’m flashing back to grad school. “Orthogonal” was one of the first words I remember learning in the august halls of my alma mater. But my digression is perpendicular to this discussion, so forget about it! Here’s the thing. A model that is supposed to align learning to impact ought to have some truth about learning baked into its DNA. It’s less than half-baked, in my not-so-humble opinion.
As they might say in the movies, the Kirkpatrick Model is not one of God's own prototypes! We're responsible people, so we ought to have a model that doesn’t distract us from our most important leverage points. Working backward is fine, but we’ve got to go all the way through the causal path to get to the genesis of the learning effects. Level 1 is a distraction, not a root. Yes, Level 2 is where the K-Model puts learning, but learning back in 1959 is not the same animal that it is today. We actually have a pretty good handle on how learning works now. Any model focused on learning evaluation that omits remembering is a model with a gaping hole.
Clark:
Ok, now I’m confused. Why should a model of impact need to have learning in its genes? I don’t care whether you move the needle with performance support, formal learning, or magic jelly beans; what K talks about is evaluating impact. What you measure at Level 2 is whether they can do the task in a simulated environment. Then you see if they’re applying it at the workplace, and whether it’s having an impact.
No argument that we have to use an approach to evaluate whether we’re having the impact at level 2 that we should, but to me that’s a separate issue. Kirkpatrick just doesn’t care what tool we’re using, nor should it. Kirkpatrick doesn’t care whether you’re using behavioral, cognitive, constructivist, or voodoo magic to make the impact, as long as you’re trying something.
We should be defining our metric for level 2, arguably, to be some demonstrable performance that we think is appropriate, but I think the model can safely be ignorant of the measure we choose at level 2 and 3 and 4. It’s about making sure we have the chain. I’d be worried, again, that talking about learning at level 2 might let folks off the hook about level 3 and 4 (which we see all too often) and make it a matter of faith. So I’m gonna argue that including the learning into the K model is less optimal than keeping it independent. Why make it more complex than need be? So, now, what say you?
Will:
Clark! How can you say the Kirkpatrick model is agnostic to the means of obtaining outcomes? Level 2 is “LEARNING!” It’s not performance support, it’s not management intervention, it’s not methamphetamine. Indeed, the model was focused on training.
The Kirkpatricks (Don and Jim) have argued—I’ve heard them live and in the flesh—that the four levels represent a causal pathway from 1 to 4. In addition, the notion of working backward implies that there is a causal connection between the levels. The four-level model implies that a good learner experience is necessary for learning, that learning is necessary for on-the-job behavior, and that successful on-the-job behavior is necessary for positive organizational results. Furthermore, almost everybody interprets it this way.
The four levels imply impact at each level, but look at all the factors that they are missing! For example, learners need to be motivated to apply what they’ve learned. Where is that in the model? Motivation can be an impact too! We as learning professionals can influence motivation. There are other impacts we can make as well. We can make an impact on what learners remember, whether learners are supported back on the job, etc.
Here’s what a 2012 seminal research review from a top-tier scientific journal concluded: “The Kirkpatrick framework has a number of theoretical and practical shortcomings. [It] is antithetical to nearly 40 years of research on human learning, leads to a checklist approach to evaluation (e.g., ‘we are measuring Levels 1 and 2, so we need to measure Level 3’), and, by ignoring the actual purpose for evaluation, risks providing no information of value to stakeholders… (p. 91). That’s pretty damning!
Clark:
I don’t see the Kirkpatrick model as an evaluation of the learning experience, but instead of the learning impact. I see it as determining the effect of a programmatic intervention on an organization. Sure, there are lots of other factors: motivation, org culture, effective leadership, but if you try to account for everything in one model you’re going to accomplish nothing. You need some diagnostic tools, and Kirkpatrick’s model is one.
If they can’t perform appropriately at the end of the learning experience (level 2), that’s not a Kirkpatrick issue, the model just lets you know where the problem is. Once they can, and it’s not showing up in the workplace (level 3), then you get into the org factors. It is about creating a chain of impact on the organization, not evaluating the learning design. I agree that people misuse the model, so when people only do 1 or 2, they’re wasting time and money. Kirkpatrick himself said he should’ve numbered it the other way around.
Now if you want to argue that that, in itself, is enough reason to chuck it, fine, but let’s replace it with another impact model with a different name, but the same intent of focusing on the org impact, workplace behavior changes, and then intervention. I hear a lot of venom directed at the Kirkpatrick model, but I don’t see it ‘antithetical to learning’.
And I worry the contrary; I see too many learning interventions done without any consideration of the impact on the organization. Not just compliance, but ‘we need a course on X’ and they do it, without ever looking to see whether a course on X will remedy the biz problem. What I like about Kirkpatrick is that it does (properly used) put the focus on the org impact first.
Will:
Sounds like you’re holding on to Kirkpatrick because you like its emphasis on organizational performance. Let’s examine that for a moment. Certainly, we’d like to ensure that Intervention X produces Outcome Y. You and I agree. Hugs all around. Let’s move away from learning for a moment. Let’s go Mad Men and look at advertising. Today, advertising is very sophisticated, especially online advertising because companies can actually track click-rates, and sometimes can even track sales (for items sold online). So, in a best-case scenario, it works this way:
- Level 1 – Web surfers says they like the advertisement
- Level 2 – Web surfers show comprehension by clicking on link.
- Level 3 – Web surfers spend time reading/watching on splash page.
- Level 4 – Web surfers buy the product offered on the splash page.
A business person’s dream! Except that only a very small portion of sales actually happen this way (although, I must admit, the rate is increasing). But let’s look at a more common example. When a car is advertised, it’s impossible to track advertising through all four levels. People who buy a car at a dealer can’t be definitively tracked to an advertisement.
So, would we damn our advertising team? Would we ask them to prove that their advertisement increased car sales? Certainly, they are likely to be asked to make the case…but it’s doubtful anybody takes those arguments seriously… and shame on folks who do!
In case, I’m ignorant of how advertising works behind the scenes—which is a possibility, I’m a small “m” mad man—let me use some other organizational roles to make my case.
- Is our legal team asked to prove that their performance in defending a lawsuit is beneficial to the company? No, everyone appreciates their worth.
- Do our recruiters have to jump through hoops to prove that their efforts have organizational value? They certainly track their headcounts, but are they asked to prove that those hires actually do the company good? No!
- Do our maintenance staff have to get out spreadsheets to show how their work saves on the cost of new machinery? No!
- Do our office cleaning professionals have to utilize regression analyses to show how they’ve increased morale and productivity? No again!
There should be a certain disgust in feeling we have to defend our good work every time…when others don’t have to.
I use the Mad Men example to say that all this OVER-EMPHASIS on proving that our learning is producing organizational outcomes might be a little too much. A couple of drinks is fine, but drinking all day is likely to be disastrous.
Too many words is disastrous too…But I had to get that off my chest…
Clark:
I do see a real problem in communication here, because I see that the folks you cite *do* have to have an impact. They aren’t just being effective, but they have to meet some level of effectiveness. To use your examples: the legal team has to justify its activities in terms of the impact on the business. If they’re too tightened down about communications in the company, they might stifle liability, but they can also stifle innovation. And if they don’t provide suitable prevention against legal action, they’re turfed out. Similarly, recruiters have to show that they’re not interviewing too many, or too few people, and getting the right ones. They’re held up against retention rates and other measures. The maintenance staff does have to justify headcount against the maintenance costs, and those costs against the alternative of replacement of equipment (or outsourcing the servicing). And the office cleaning folks have to ensure they’re meeting environmental standards at an efficient rate. There are standards of effectiveness everywhere in the organization except L&D. Why should we be special?
Let’s go on: sales has to estimate numbers for each quarter, and put that up against costs. They have to hit their numbers, or explain why (and if their initial estimates are low, they can be chastised for not being aggressive enough). They also worry about the costs of sales, hit rates, and time to a signature. Marketing, too, has to justify expenditure. To use your example, they do care about how many people come to the site, how long they stay, how many pages they hit, etc. And they try to improve these. At the end of the day, the marketing investment has to impact the sales. Eventually, they do track site activity to dollars. They have to. If we don’t, we get boondoggles. If you don’t rein in marketing initiatives, you get these shenanigans where existing customers are boozed up and given illegal gifts that eventually cause a backlash against the company. Shareholders get a wee bit stroppy when they find that investments aren’t paying off, and that the company is losing unnecessary money.
It’s not a case of ‘if you build it, it is good’! You and I both know that much of what is done in the name of formal learning (and org L&D activity in general) isn’t valuable. People take orders and develop courses where a course isn’t needed. Or create learning events that don’t achieve the outcomes. Kirkpatrick is the measure that tracks learning investments back to impact on the business. and that’s something we have to start paying attention to. As someone once said, if you’re not measuring, why bother? Show me the money! And if you’re just measuring your efficiency, that your learning is having the desired behavioral change, how do you know that behavior change is necessary to the organization? And until we get out of the mode where we do the things we do on faith, and start understanding have a meaningful impact on the organization, we’re going to continue to be the last to have an influence on the organization, and the first to be cut when things are tough. Yet we have the opportunity to be as critical to the success of the organization as IT! I can’t stand by seeing us continue to do learning without knowing that it’s of use. Yes, we do need to measure our learning for effectiveness as learning, as you argue, but we have to also know that what we’re helping people be able to do is what’s necessary. Kirkpatrick isn’t without flaws, numbering, level 1, etc. But it’s a clear value chain that we need to pay attention to. I’m not saying in lieu of measuring our learning effectiveness, but in addition. I can’t see it any other way.
Will:
Okay, I think we’ve squeezed the juice out of this tobacco. I would have said “orange” but the Kirkpatrick Model has been so addictive for so long…and black is the new orange anyway…
I want to pick up on your great examples of individuals in an organizations needing to have an impact. You noted, appropriately, that everyone must have an impact. The legal team has to prevent lawsuits, recruiters have to find acceptable applicants, maintenance has to justify their worth compared to outsourcing options, cleaning staff have to meet environmental standards, sales people have to sell, and so forth.
Here is the argument I’m making: Employees should be held to account within their circles of maximum influence, and NOT so much in their circles of minimum influence.
So for example, let’s look at the legal team.

Doesn’t it make sense that the legal team should be held to account for the number of lawsuits and amount paid in damages more than they should be held to account for the level of innovation and risk taking within the organization?
What about the cleaning professionals?

Shouldn’t we hold them more accountable for measures of perceived cleanliness and targeted environmental standards than for the productivity of the workforce?
What about us learning-and-performance professionals?

Shouldn’t we be held more accountable for whether our learners comprehend and remember what we’ve taught them more than whether they end up increasing revenue and lowering expenses?
I agree that we learning-and-performance professionals have NOT been properly held to account. As you say, “There are standards of effectiveness everywhere in the organization except L&D.” My argument is that we, as learning-and-performance professionals, should have better standards of effectiveness—but that we should have these largely within our maximum circles of influence.
Among other things, we should be held to account for the following impacts:
- Whether our learning interventions create full comprehension of the learning concepts.
- Whether they create decision-making competence.
- Whether they create and sustain remembering.
- Whether they promote a motivation and sense-of-efficacy to apply what was learned.
- Whether they prompt actions directly, particularly when job aids and performance support are more effective.
- Whether they enable successful on-the-job performance.
- Et cetera.
Final word, Clark?
Clark:
First, I think you’re hoist by your own petard. You’re comparing apples and your squeezed orange. Legal is measured by lawsuits, maintenance by cleanliness, and learning by learning. Ok that sounds good, except that legal is measured by lawsuits against the organization. And maintenance is measured by the cleanliness of the premises. Where’s the learning equivalent? It has to be: impact on decisions that affect organizational outcomes. None of the classic learning evaluations evaluate whether the objectives are right, which is what Kirkpatrick does. They assume that, basically, and then evaluate whether they achieve the objective.
That said, Will, if you can throw around diagrams, I can too. Here’s my attempt to represent the dichotomy. Yes, you’re successfully addressing the impact of the learning on the learner. That is, can they do the task. But I’m going to argue that that’s not what Kirkpatrick is for. It’s to address the impact of the intervention on the organization. The big problem is, to me, whether the objectives we’ve developed the learning to achieve are objectives that are aligned with organizational need. There’s plenty of evidence it’s not.

So here I’m trying to show what I see K doing. You start with the needed business impact: more sales, lower compliance problems, what have you. Then you decide what has to happen in the workplace to move that needle. Say, shorter time to sales, so the behavior is decided to be timeliness in producing proposals. Let’s say the intervention is training on the proposal template software. You design a learning experience to address that objective, to develop ability to use the software. You use the type of evaluation you’re talking about to see if it’s actually developing their ability. Then you use K to see if it’s actually being used in the workplace (are people using the software to create proposals), and then to see if it’d affecting your metrics of quicker turnaround. (And, yes, you can see if they like the learning experience, and adjust that.)
And if any one element isn’t working: learning, uptake, impact, you debug that. But K is evaluating the impact process, not the learning design. It should flag if the learning design isn’t working, but it’s not evaluating your pedagogical decisions, etc. It’s not focusing on what the Serious eLearning Manifesto cares about, for instance. That’s what your learning evaluations do, they check to see if the level 2 is working. But not whether level 2 is affecting level 4, which is what ultimately needs to happen. Yes, we need level 2 to work, but then the rest has to fall in line as well.
My point about orthogonality is that K is evaluating the horizontal, and you’re saying it should address the vertical. That, to me, is like saying we’re going to see if the car runs by ensuring the engine runs. Even if it does, but if the engine isn’t connected through the drivetrain to the wheels, it’s irrelevant. So we do want a working, well-tuned, engine, but we also want a clutch or torque converter, transmission, universal joint, driveshaft, differential, etc. Kirkpatrick looks at the drive train, learning evaluations look at the engine.
We don’t have to come to a shared understanding, but I hope this at least makes my point clear.
Will:
Okay readers! Clark and I have fought to a stalemate… He says that the Kirkpatrick model has value because it reminds us to work backward from organizational results. I say the model is fatally flawed because it doesn’t incorporate wisdom about learning. Now it’s your turn to comment. Can you add insights? Please do!