We may be at an AI crossroads and with large language models especially. The pressure and the hype are still real, yet the verdict on the first generation of pilots seems to be they're not going very well. There are plenty of technical and generally prosaic reasons, but I want to focus on one framing mistake I see again and again that I believe is an inevitable seed of failure: treating AI as a drop-in replacement instead of as a tool that enhances human work.
At the end of the day, to make any difference for your organization, AI needs to touch the human world. If you treat AI like an autonomous black box and don’t invest in the human side of the equation - training, culture, oversight - all of the standard LLM shortcomings are going to appear in a way that tells the story of their (deficient) interface point with humanity.
A quick personal note: I started out very skeptical of generative AI. I initially did not think to ask too much as myself regarding how I used it, and there is much about the "replacement" frame that encourages this headspace. Over time I realized that there is an art to using these systems. Prompting is a skill. The model isn’t a person, but it is constructive to respect the need to communicate with it carefully as one would a person.
That realization unlocked a tremendous amount of value for me. If you want to capture similar value, you need to invest in ensuring people are proficient in using the tools you give them.
It's often helpful to exploit the power of hindsight and examine some other, related trends. As "data science" rose in profile, organizations began to hire various sorts of quantitative experts and often expected immediate business breakthroughs. In many cases, what happened instead was failed communication and cultural mismatch: brilliant people retreated into technical work that didn’t align with business priorities. The result was wasted time, missed opportunities, and frustrated managers.
Intelligence, natural or artificial, is not a one-dimensional panacea. The journey that produces a highly technical person brings a certain culture and set of priorities with it - and that culture might not align with your company’s. If you don’t explicitly bridge those gaps through communication, shared expectations, and oversight, you’ll end up with outputs that are technically impressive but not useful.
The same is true with LLMs. Prompting and oversight are a form of communication. If teams don’t learn to “talk” to the model in ways that reflect their business needs, they will get outputs they cannot use. That mismatch can cause real problems.
There was an illustrative, tragic incident at the National Eating Disorder Association: an LLM-powered chatbot intended to help people in crisis began providing advice on crash dieting and generally doing exactly the wrong things. This wasn’t simply a technical failure but also an operational one: inadequate supervision, insufficient monitoring, and a failure to treat the bot as part of a human-facing system that required training and oversight. If you were running a crisis hotline staffed by humans, you wouldn't presume that staff was a fire and forget solution that would get everything right forever without supervision... yet it appears this was how the NEDA handled their AI system.
Technological disasters are rarely only about technology. The interface where humans and machines touch is almost always a critical failure point. This is very often true in cybersecurity, for example. You can design technically excellent systems, but if you don’t consider the broader system including the human element, outcomes range between meaninglessness and disaster.
So what should business leaders do?
Consider AI as a tool that enhances human capability.
Invest in training and develop cultural practices for interacting with AI tools.
Monitor outputs and build governance and oversight into workflows.
In short: enhance, don’t replace.