Your generative AI project is going to fail

Your genAI project is almost certainly going to fail. But take heart: You probably shouldn’t have been using AI to solve your business problem, anyway. This seems to be an accepted fact among the data science crowd, but that wisdom has been slow to reach enterprise executives. For example, data scientist Noah Lorang once suggested, “There is a very small subset of business problems that are best solved by machine learning; most of them just need good data and an understanding of what it means,” yet 87% of those surveyed by Bain & Company said they’re developing genAI applications.

For some, that’s the exact right approach. For many others, it’s not.

We have collectively gotten so far ahead of ourselves with genAI that we’re setting ourselves up for failure. That failure comes from a variety of sources, including data governance or data quality issues, but the primary problem right now is expectations. People dabble with ChatGPT for an afternoon and expect it to be able to resolve their supply chain issues or customer support questions. It won’t. But AI isn’t the problem, we are.

“Expectations set purely based on vibes”

Shreya Shankar, a machine learning engineer at Viaduct, argues that one of the blessings and curses of genAI is that it seemingly eliminates the need for data preparation, which has long been one of the hardest aspects of machine learning. “Because you’ve put in such little effort into data preparation, it’s very easy to get pleasantly surprised by initial results,” she says, which then “propels the next stage of experimentation, also known as prompt engineering.”

Rather than do the hard, dirty work of data preparation, with all the testing and retraining to get a model to yield even remotely useful results, people are jumping straight to dessert, as it were. This, in turn, leads to unrealistic expectations: “Generative AI and LLMs are a little more interesting in that most people don’t have any form of systematic evaluation before they ship (why would they be forced to, if they didn’t collect a training dataset?), so their expectations are set purely based on vibes,” Shankar says.

Vibes, as it turns out, are not a good data set for successful AI applications.

The real key to machine learning success is something that is mostly missing from genAI: the constant tuning of the model. “In ML and AI engineering,” Shankar writes, “teams often expect too high of accuracy or alignment with their expectations from an AI application right after it’s launched, and often don’t build out the infrastructure to continually inspect data, incorporate new tests, and improve the end-to-end system.” It’s all the work that happens before and after the prompt, in other words, that delivers success. For genAI applications, partly because of how fast it is to get started, much of this discipline is lost.

Things also get more complicated with genAI because there is no consistency between prompt and response. I love the way Amol Ajgaonkar, CTO of product innovation at Insight, puts it. Sometimes we think our prompts to ChatGPT or a similar system is like having a mature conversation with an adult. It’s not, he says, but rather, “It’s like giving my teenage kids instructions. Sometimes you have to repeat yourself so it sticks.” Making it more complicated, “Sometimes the AI listens, and other times it won’t follow instructions. It’s almost like a different language.” Learning how to converse with genAI systems is both art and science and requires considerable experience to do it well. Unfortunately, many gain too much confidence from their casual experiments with ChatGPT and set expectations much higher than the tools can deliver, leading to disappointing failure.

Put down the shiny new toy

Many are sprinting into genAI without first considering whether there are simpler, better ways of accomplishing their goals. Santiago Valdarrama, founder of Tideily, recommends that most start with machine learning (or genAI), but the first step is generally simple heuristics, or rules. He offers two advantages to this approach: “First, you’ll learn much more about the problem you need to solve. Second, you’ll have a baseline to compare against any future machine-learning solution.”

As with software development, where the hardest work isn’t coding but rather figuring out which code to write, the hardest thing in AI is figuring out how or if to apply AI. When simple rules need to yield to more complicated rules, Valdarrama suggests switching to a simple model. Note the continued stress on “simple.” As he says, “simplicity always wins” and should dictate decisions until more complicated models are absolutely necessary.

So, back to genAI. Yes, it might be what your business needs to deliver customer value in a given scenario. Maybe. It’s more likely that solid analysis and rules-based approaches will give the desired yields. For those who are determined to use the shiny new thing, well, even then it’s still best to start small and simple and learn how to use genAI successfully.