Generative AI Tools May Have Passed the Bar, But Are They Ready for Professional Grade Work?

It all comes down to how the models were trained.
WIRED Brand Lab | Generative AI Tools May Have Passed the Bar But Are They Ready for Professional Grade Work

Remember when, in 2011, a supercomputer won a game show? It wasn’t that long ago, but the idea of an AI-powered answer engine winning a trivia contest almost seems quaint by today’s standard of what’s possible with generative AI tools. The stakes have been raised from game show parlor tricks to decidedly more real-world achievements like passing the bar exam and the LSAT. But what does that mean for professional workers?

These amazing technological feats have business leaders asking how AI will transform their industries. And, similar to the excitement we saw in 2011 following the game show stunt—nearly every major news outlet covered it at the time—the answers are filled with a mix of hype and real, practical insight. Perhaps nowhere is that fever pitch hotter than in the legal field, where conjecture around the potential for generative AI in everything from legal research to contract development has been building for years. Now, we’re at the point where a judge in Colombia has used AI to inform a ruling in a court case. In fact, according to new research from Thomson Reuters, 82 percent of legal professionals believe generative AI solutions can be applied to legal work and 59 percent say it should be applied to legal work. 

Big data technology and data science illustration. Data flow concept. Querying, analysing, visualizing complex information. Neural network for artificial intelligence. Data mining. Business analytics.NicoElNino - stock.adobe.com

Tackling Case Law

Just as the trivia supercomputer’s turn in the spotlight did not immediately translate to commercial success, there are some critical hurdles that need to be overcome before today’s generative AI solutions graduate from passing the bar exam to adding value in a real-world, professional legal setting. 

Chief among those is data—data for training and data furnished to the models in prompts. While the technology behind generative AI is shockingly proficient at quickly ingesting information and producing convincing-sounding answers, those responses are not always correct. Idiosesyncrasies in the information used to train the system and nuances in complex topics like legal precedent can create some unintended consequences for users who aren’t scrutinizing the details.

“We have seen countless examples where generative AI technology confidently produces false or misleading information, also known as hallucinations,” says David Wong, chief product officer at Thomson Reuters. A hallucination occurs when the AI algorithm reviews a corpus of text and makes predictions based on patterns it sees—but fails to incorporate all relevant information into that pattern-recognition process. 

Server racks in computer network security server room data center. 3D render dark blue.shock - stock.adobe.com

“My team saw this frequently when we started working with the first version of a particular generative AI tool in 2020,” says Wong. “When we asked it a legal question about information privacy laws in Michigan, for example, the technology could accurately cite Michigan’s Biometric Information Privacy Act—for which similar, but slightly different, laws exist in four other states throughout the US. But due to language repetition and similarity in the laws, when the AI was asked a follow-up question about an information privacy law in a different state, it produced a similar result and then cited a non-existent law by just replacing the state name.”

“Another quirk we observed in applying off-the-shelf generative AI solutions to real-world legal applications was answer instability,” Wong continues. “Essentially, we would ask the same factual question multiple times and receive slightly different results each time.” 

Back-end project architecture development. Markup, database diagram. Integrated Development Environmentyurich84 - stock.adobe.com

No Room for Error

These examples highlight the importance of building products that provide generative AI models access to comprehensive, authoritative reference data—either through training inputs or prompting those models with results from search interfaces to that reference data. That process also needs to be intermediated by human subject matter experts who understand the nuances and context behind the information. 

“In practical terms, that means highly specific generative AI use cases, like a focus on very technical case law, must be able to synthesize existing large language model data in combination with facts and information from proprietary content databases to deliver accurate answers and cite the correct sources to support outputs,” says Wong. In other words, the AI needs to be able to show its work, not just the answer. Achieving that level of transparency and certainty requires an extraordinarily powerful search engine on the front end and proven reference data on the back end.

Until then, it is important for professionals to ask questions such as: How was the model trained? What search queries and search results are used to produce the results? What role did human experts play in helping to tune the results and check for inaccuracies?

Studiohood - stock.adobe.com

“We are in the midst of one of the most exciting periods of technological transformation since the advent of the internet,” Wong says. “But, for every success story, there will be countless cautionary tales of companies that rode the wave of euphoria but didn’t sweat the details.” 

As we get deeper into the world of professional applications for generative AI, it will be critical that the AI system incorporates the reference data relevant to that profession, is developed with strict governance standards, and that subject matter experts continue to be integral to the training and development process.

This story was written by Thomson Reuters and produced by WIRED Brand Lab.