The corporate learning market just passed $400 billion. AI roleplay is now a bundled feature in platforms that sit on 42 million enterprise seats. Any L&D manager can check the "conversation practice" box without procuring a single new tool. And every company's leaders are now practicing with the same generic scenarios, the same agreeable AI, the same five-minute exercises.

So here's the question no one is asking: if every organization trains their leaders with the same off-the-shelf content, where does the competitive advantage come from?

The market split no one talks about

Something counterintuitive is happening in corporate learning. Commodity platforms are growing — the global e-learning market is projected to exceed $375 billion by 2026. Self-paced, scalable, affordable. Millions of seats purchased.

And at the same time, premium executive education is growing faster. The executive education market is projected to grow from $9.8 billion in 2025 to $28.3 billion by 2035. Harvard Business School's executive education division hit a record $245 million in tuition revenue in FY2024 — up 9% from the year before, while charging more per participant than ever. Customized programs now make up 58% of all executive education spending.

The market didn't choose one or the other. It split in two. Commodity for breadth. Bespoke for depth. Compliance training, technical onboarding, software skills — that's where self-paced platforms excel. Nobody needs a custom-designed course to learn Excel.

But leadership? The conversation where a director needs to realign a resistant team around a new strategy? The feedback session that determines whether a high-potential stays or leaves? The negotiation that shapes a partnership for the next five years?

Organizations are spending more on those moments, not less. Despite having cheaper options available. That should tell us something.

The completion problem — and why it's the wrong metric

The most commonly cited metric in corporate learning is completion rate. And the numbers are stark.

Self-paced online learning platforms consistently show completion rates between 5% and 15%. This has remained largely unchanged across multiple years of data. Cohort-based programs with instructor involvement and peer accountability consistently achieve completion rates between 85% and 90% — a finding confirmed across studies by Josh Bersin, the Harvard Business Review (2023), and Course Report (2024).

The difference is six- to tenfold. That alone should give any L&D leader pause before assuming that making content available is the same as developing capability.

For context: across 433 completed sessions on RolePlays.ai, our completion rate is 95%. The average session lasts 49 minutes — not a five-minute exercise, but a sustained, meaningful practice conversation. Users don't drop off because the experience demands something of them: realistic pushback, genuine difficulty, and feedback tied to specific moments in their conversation. When practice feels real, people stay.

But here's the harder truth: completion is the wrong metric. Even a 95% completion rate doesn't tell you whether behavior changed. A leader can complete a session, read the feedback, earn a strong score — and still freeze when the real conversation happens. Completion tells you someone engaged with the practice. It tells you nothing about whether they can use it.

That's why measurement needs to go further.

What Kirkpatrick taught us fifty years ago (that we still ignore)

Donald Kirkpatrick's four-level evaluation model has been the standard framework for measuring training effectiveness since the 1950s. The four levels are straightforward:

Level 1 — Reaction. Did participants enjoy the training? The "smile sheet."

Level 2 — Learning. Did they acquire new knowledge or skills? The quiz, the assessment, the certificate.

Level 3 — Behavior. Did they actually apply what they learned on the job? Did their conversations, decisions, and actions change?

Level 4 — Results. Did the behavioral change produce measurable business outcomes?

The model is clear. The industry's application of it is not.

According to ATD research, 90% of organizations measure Level 1. 83% measure Level 2. But only 35% consistently evaluate Level 3 or Level 4. The vast majority of corporate learning measurement stops at "did they like it?" and "did they pass the test?" — two metrics that research has shown do not reliably predict whether behavior actually changes.

Most commodity learning platforms are structurally designed to measure Levels 1 and 2. They track satisfaction scores, completion rates, quiz results, and time spent. These are easy metrics. They fill dashboards. They make quarterly reports look productive.

But they don't answer the question that matters: when your leader walked into the room, did they do something different?

The Stanford proof: equal knowledge, radically different behavior

A 2024 study from Stanford University (Shaikh et al., published at CHI '24, the premier Human-Computer Interaction conference) ran a controlled experiment that maps precisely onto the Kirkpatrick framework — and exposes the gap.

Forty participants were split into two groups. Both received identical training material on conflict resolution strategies. One group studied the material only. The other group also practiced with an AI simulation before facing an actual conflict with a real person.

At Level 2 (Learning), both groups scored equally on a knowledge quiz. Same strategies. Same recall. Same recognition. On paper, identically prepared.

At Level 3 (Behavior), the results were dramatically different. The group that had practiced reduced their use of competitive strategies — threats, ultimatums, appeals to authority — by 67%. They doubled their use of cooperative strategies. The control group, armed with identical knowledge, couldn't translate what they'd learned into action.

One participant from the study-only group said it plainly: reading about the strategies was easy, but implementing them felt impossible.

This is the gap that Kirkpatrick Level 2 metrics will never reveal. And it's the gap that most corporate learning programs never close — because they stop measuring before behavior enters the picture.

Why generic AI makes the problem worse

The Stanford researchers discovered something else that matters: standard AI language models are fundamentally unsuited for practicing difficult conversations. They're too agreeable. They cave too quickly. They don't maintain realistic pushback.

This is by design. Consumer AI is optimized for user satisfaction and engagement — which means it tells you what you want to hear. When a leader practices giving critical feedback to an AI that immediately accepts the feedback and agrees to improve, that leader walks away feeling confident. They've practiced nothing. They've rehearsed success against an opponent that didn't show up.

The Stanford team had to build a sophisticated multi-step architecture — classifying conflict strategies at each turn, planning the AI's next move based on negotiation theory, scoring responses for their impact on the conversation's trajectory — to create what they called a "Goldilocks zone." A simulation challenging enough to be useful, but not so rigid that practice becomes futile.

This matters because the fastest-growing category in corporate learning is now "AI roleplay as a feature" — bundled into platforms that millions of employees already access. The feature exists. The question is whether it produces behavioral change, or just the feeling of it.

What you invest in tells your leaders what they're worth

Here's where this stops being about platforms and starts being about people.

When an organization hands its leaders a generic chatbot and a library of pre-built scenarios — the same scenarios available to every other company on the same platform — it sends a message. The message is: your development is a checkbox. We've provided access. Good luck.

When an organization invests in understanding its specific leadership challenges, builds scenarios grounded in its culture and strategic context, and designs practice around proven frameworks with expert facilitation — that sends a different message. The message is: your growth matters to us. We built this for you.

Leaders notice.

LinkedIn's own research (2024 Workplace Learning Report) shows that companies with strong learning cultures report 57% retention rates, compared to 27% for companies with only moderate learning cultures. Gallup found that personalized learning experiences increase retention by 47%. And 93% of employees say they're more likely to stay with an organization that invests in their career development.

But here's the finding from LinkedIn's own data that should give commodity platform buyers pause: employees spend more time developing skills when learning is tied to their specific career goals. Mass-developed curriculum was found to be less valuable than identifying specific skills for each employee.

The data doesn't say "give people access to a library." It says "invest in them specifically." There's a world of difference between the two.

What bespoke actually means

I was in Germany recently, training the management team of a bank on nonviolent communication. The theory part went well — they understood the framework, they did the exercises, they could explain the principles back to me. All good.

But the real shift happened when they used a specifically tailored roleplay scenario we'd built for their context. That was the moment it stuck. Not the lecture. Not the framework on the whiteboard. The practice — with a persona that felt like someone from their world.

And then something happened that I've come to recognize as the sign that bespoke training is working: one of the participants came to me immediately afterward and said, "Bernhard, could we model one of the personas like this?" And he gave me very specific instructions — a particular kind of resistance, a particular communication pattern, a particular dynamic from his team. "That's exactly what my people need to practice with."

That's the moment a generic platform will never produce. The participant stopped being a consumer of content and started co-designing their own development. They didn't want a library. They wanted their conversation.

When we work with an organization at RolePlays.ai, that's what we build toward. We start with their learning objectives — not our scenario library. What leadership challenge are they actually trying to solve? What does their culture reward and punish? What conversations are their people avoiding?

From there, we build scenarios grounded in proven frameworks — McKinsey, ICF, Kegan's Immunity to Change — with personas designed to reflect the real dynamics of their organization. Not generic "difficult employees," but the specific kind of resistance, deflection, and complexity that their leaders actually face.

Our AI doesn't agree after two turns. It pushes back. It deflects. It changes direction mid-conversation. It creates the realistic pressure that makes practice meaningful — the same kind of pressure the Stanford researchers found essential for producing behavioral change.

And because we work with facilitators and coaches, the practice doesn't exist in isolation. It's debriefed. It's connected to a larger learning journey. It's designed for Kirkpatrick Level 3 from the start — not as an afterthought.

This takes more time, more expertise, and more investment than activating a feature in an existing platform. That's the point. The conversations that define leadership careers deserve more than a checkbox.

Designing for Level 3

In a forthcoming piece, we'll take Kirkpatrick's four levels and turn them around — not as an evaluation tool applied after training, but as a design methodology that shapes training from the start. What does it mean to begin with the business outcome you need, define the behavioral changes required to get there, and only then build the learning experience? That's the approach we use when we go into an organization, and it's fundamentally different from the content-first model that dominates commodity platforms.

Because the question was never whether your leaders have access to training.

The question is whether they can do something different on Monday morning.

If you're ready to move beyond Level 2, let's talk.

References

Shaikh, O., Chai, V., Gelfand, M. J., Yang, D., & Bernstein, M. S. (2024). Rehearsal: Simulating conflict to teach conflict resolution. Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24). ACM. https://doi.org/10.1145/3613904.3642159

Kirkpatrick, D. L. (1994). Evaluating Training Programs: The Four Levels. Berrett-Koehler Publishers.

Future Market Insights. (2025). Executive Education Program Market: Global Analysis Report 2025–2035.

Harvard Business School. (2024). FY2024 Annual Report: Executive Education Division.

ATD (Association for Talent Development). Research on Training Evaluation Practices.

LinkedIn Learning. (2024). 2024 Workplace Learning Report.

When Everyone Has the Same Training, No One Has an Advantage