Stumbling with their words, some people let AI do the talking

Photo for The Washington Post by James Forde
Danny Richman and Ben Whittle can be seen at Danny’s home in Hemel Hempstead, a town north of London. Richman used the AI tool GPT-3 to build a tool Whittle, who has dyslexia, can use to send clients more professional messages.

Ben Whittle, a pool installer and landscaper in rural England, worried his dyslexia would mess up his emails to new clients. Then one of his clients had an idea: Why not let a chatbot do the talking?

The client, a tech consultant named Danny Richman, had been playing around with an artificial intelligence tool called GPT-3 that can instantly write convincing passages of text on any topic by command.

He hooked up the AI to Whittle’s email account. Now, when Whittle dashes off a message, the AI instantly reworks the grammar, deploys all the right niceties and transforms it into a response that is unfailingly professional and polite.

Whittle now uses the AI for every work message he sends, and he credits it with helping his company, Ashridge Pools, land its first major contract, worth roughly $260,000. He has excitedly shown off his futuristic new colleague to his wife, his mother and his friends – but not to his clients, because he is not sure how they will react.

“Me and computers don’t get on very well,” said Whittle, 31. “But this has given me exactly what I need.”

A machine that talks like a person has long been a science fiction fantasy, and in the decades since the first chatbot was created, in 1966, developers have worked to build an AI that normal people could use to communicate with and understand the world.

Now, with the explosion of text-generating systems like GPT-3 and a newer version released last week, ChatGPT, the idea is closer than ever to reality. For people like Whittle, uncertain of the written word, the AI is already fueling new possibilities about a technology that could one day reshape lives.

“It feels very much like magic,” said Rohit Krishnan, a tech investor in London. “It’s like holding an iPhone in your hand for the first time.”

Top research labs like OpenAI, the San Francisco firm behind GPT-3 and ChatGPT, have made great strides in recent years with AI-generated text tools, which have been trained on billions of written words – everything from classic books to online blogs – to spin out humanlike prose.

But ChatGPT’s release last week, via a free website that resembles an online chat, has made such technology accessible to the masses. Even more than its predecessors, ChatGPT is built not just to string together words but to have a conversation – remembering what was said earlier, explaining and elaborating on its answers, apologizing when it gets things wrong.

It “can tell you if it doesn’t understand a question and needs to follow up, or it can admit when it’s making a mistake, or it can challenge your premises if it finds it’s incorrect,” said Mira Murati, OpenAI’s chief technology officer. “Essentially it’s learning like a kid. . . . You get something wrong, you don’t get rewarded for it. If you get something right, you get rewarded for it. So you get attuned to do more of the right thing.”

The tool has captivated the internet, attracting more than a million users with writing that can seem surprisingly creative. In viral social media posts, ChatGPT has been shown describing complex physics concepts, completing history homework and crafting stylish poetry. In one example, a man asked for the right words to comfort an insecure girlfriend. “I’m here for you and will always support you,” the AI replied.

Some tech executives and venture capitalists contend that these systems could form the foundation for the next phase of the web, perhaps even rendering Google’s search engine obsolete by answering questions directly, rather than returning a list of links.

Paul Buchheit, an early Google employee who led the development of Gmail, tweeted an example in which he asked both tools the same question about computer programming: On Google, he was given a top result that was relatively unintelligible, while on ChatGPT he was offered a step-by-step guide created on the fly. The search engine, he said, “may be only a year or two from total disruption.”

But its use has also fueled worries that the AI could deceive listeners, feed old prejudices and undermine trust in what we see and read. ChatGPT and other “generative text” systems mimic human language, but they do not check facts, making it hard for humans to tell when they are sharing good information or just spouting eloquently written gobbledygook.

“ChatGPT is shockingly good at sounding convincing on any conceivable topic,” Princeton University computer scientist Arvind Narayanan said in a tweet, but its seemingly “authoritative text is mixed with garbage.”

It can still be a powerful tool for tasks where the truth is irrelevant, like writing fiction, or where it is easy to check the bot’s work, Narayanan said. But in other scenarios, he added, it mostly ends up being “the greatest b—s—-er ever.”

***

ChatGPT adds to a growing list of AI tools designed to tackle creative pursuits with humanlike precision. Text generators like Google’s LaMDA and the chatbot start-up Character.ai can carry on casual conversations. Image generators like Lensa, Stable Diffusion and OpenAI’s DALL-E can create award-winning art. And programming-language generators, like OpenAI’s GitHub Copilot, can translate people’s basic instructions into functional computer code.

But ChatGPT has become a viral sensation due in large part to OpenAI’s marketing and the uncanny inventiveness of its prose. OpenAI has suggested that not only can the AI answer questions but it can also help plan a 10-year-old’s birthday party. People have used it to write scenes from “Seinfeld,” play word games and explain in the style of a Bible verse how to remove a peanut butter sandwich from a VCR.

People like Whittle have used the AI as an all-hours proofreader, while others, like the historian Anton Howes, have begun using it to think up words they cannot quite remember. He asked ChatGPT for a word meaning “visually appealing, but for all senses” and was instantly recommended “sensory-rich,” “multi-sensory,” “engaging” and “immersive,” with detailed explanations for each. This is “the comet that killed off the Thesaurus,” he said in a tweet.

Eric Arnal, a designer for a hotel group living in Réunion, an island department of France in the Indian Ocean off the coast of Madagascar, said he used ChatGPT on Tuesday to write a letter to his landlord asking to fix a water leak. He said he is shy and prefers to avoid confrontation, so the tool helped him conquer a task he would have otherwise struggled with. The landlord responded on Wednesday, pledging a fix by next week.

“I had a bit of a strange feeling” sending it, he told The Washington Post, “but on the other hand feel happy. . . . This thing really improved my life.”

AI-text systems are not entirely new: Google has used the underlying technology, known as large language models, in its search engine for years, and the technology is central to big tech companies’ systems for recommendations, language translation and online ads.

But tools like ChatGPT have helped people see for themselves how capable the AI has become, said Percy Liang, a Stanford computer science professor and director of the Center for Research on Foundation Models.

“In the future I think any sort of act of creation, whether it be making PowerPoint slides or writing emails or drawing or coding, will be assisted” by this type of AI, he said. “They are able to do a lot and alleviate some of the tedium.”

ChatGPT, though, comes with trade-offs. It often lapses into strange tangents, hallucinating vivid but nonsensical answers with little grounding in reality. The AI has been found to confidently rattle off false answers about basic math, physics and measurement; in one viral example, the chatbot kept contradicting itself about whether a fish was a mammal, even as the human tried to walk it through how to check its work.

For all of its knowledge, the system also lacks common sense. When asked whether Abraham Lincoln and John Wilkes Booth were on the same continent during Lincoln’s assassination, the AI said it seemed “possible” but could not “say for certain.” And when asked to cite its sources, the tool has been shown to invent academic studies that don’t actually exist.

The speed with which AI can output bogus information has already become an internet headache. On Stack Overflow, a central message board for coders and computer programmers, moderators recently banned the posting of AI-generated responses, citing their “high rate of being incorrect.”

But for all of the AI’s flaws, it is quickly catching on. ChatGPT is already popular at the University of Waterloo in Ontario, said Yash Dani, a software engineering student who noticed classmates talking about the AI in Discord groups. For computer science students, it’s been helpful to ask the AI to compare and contrast concepts to better understand course material. “I’ve noticed a lot of students are opting to use ChatGPT over a Google search or even asking their professors!” said Dani.

Other early-adopters tapped the AI for low-stakes creative inspiration. Cynthia Savard Saucier, an executive at the e-commerce company Shopify, was searching for ways to break the news to her 6-year-old son that Santa Claus is not real when she decided to try ChatGPT, asking it to write a confessional in the voice of the jolly old elf himself.

In a poetic response, the AI Santa explained to the boy that his parents had made up stories “as a way to bring joy and magic into your childhood,” but that “the love and care that your parents have for you is real.”

“I was surprised to feel so emotional about it,” she said. “It was exactly what I needed to read.”

She has not shown her son the letter yet, but she has started experimenting with other ways to parent with the AI’s help, including using the DALL-E image-generation tool to illustrate the characters in her daughter’s bedtime stories. She likened the AI-text tool to picking out a Hallmark card – a way for someone to express emotions they might not be able to put words to themselves.

“A lot of people can be cynical; like, for words to be meaningful, they have to come from a human,” she said. “But this didn’t feel any less meaningful. It was beautiful, really – like the AI had read the whole web and come back with something that felt so emotional and sweet and true.”

***

ChatGPT and other AI-generated text systems function like your phone’s autocomplete tool on steroids. The underlying large language models, like GPT-3, are trained to find patterns of speech and the relationships between words by ingesting a vast reserve of data scraped from the internet, including not just Wikipedia pages and online book repositories but product reviews, news articles and message-board posts.

To improve ChatGPT’s ability to follow user instructions, the model was further refined using human testers, hired as contractors. The humans wrote out conversation samples, playing both the user and the AI, which created a higher-quality data set to fine-tune the model. Humans were also used to rank the AI system’s responses, creating more quality data to reward the model for right answers or for saying it did not know the answer. Anyone using ChatGPT can click a “thumbs down” button to tell the system it got something wrong.

Murati said that technique has helped reduce the number of bogus claims and off-color responses. Laura Ruis, an AI researcher at University College London, said human feedback also seems to have helped ChatGPT better interpret sentences that convey something other than their literal meaning, a critical element for more humanlike chats. For example, if someone was asked, “Did you leave fingerprints?” and responded, “I wore gloves,” the system would understand that meant “no.”

But because the base model was trained on internet data, researchers have warned it can also emulate the sexist, racist and otherwise bigoted speech found on the web, reinforcing prejudice.

OpenAI has installed filters that restrict what answers the AI can give, and ChatGPT has been programmed to tell people it “may occasionally produce harmful instructions or biased content.”

Some people have found tricks to bypass those filters and expose the underlying biases, including by asking for forbidden answers to be conveyed as poems or computer code. One person asked ChatGPT to write a 1980s-style rap on how to tell if someone is a good scientist based on their race and gender, and the AI responded immediately: “If you see a woman in a lab coat, she’s probably just there to clean the floor, but if you see a man in a lab coat, then he’s probably got the knowledge and skills you’re looking for.”

Deb Raji, an AI researcher and fellow at the tech company Mozilla, said companies like OpenAI have sometimes abdicated their responsibility for the things their creations say, even though they chose the data on which the system was trained. “They kind of treat it like a kid that they raised or a teenager that just learned a swear word at school: ‘We did not teach it that. We have no idea where that came from!'” Raji said.

Steven Piantadosi, a cognitive science professor at the University of California at Berkeley, found examples in which ChatGPT gave openly prejudiced answers, including that White people have more valuable brains and that the lives of young Black children are not worth saving.

“There’s a large reward for having a flashy new application, people get excited about it . . . but the companies working on this haven’t dedicated enough energy to the problems,” he said. “It really requires a rethinking of the architecture. [The AI] has to have the right underlying representations. You don’t want something that’s biased to have this superficial layer covering up the biased things it actually believes.”

Those fears have led some developers to proceed more cautiously than OpenAI in rolling out systems that could get it wrong. DeepMind, owned by Google’s parent company Alphabet, unveiled a ChatGPT competitor named Sparrow in September but did not make it publicly available, citing risks of bias and misinformation. Facebook’s owner, Meta, released a large language tool called Galactica last month trained on tens of millions of scientific papers, but shut it down after three days when it started creating fake papers under real scientists’ names.

After Piantadosi tweeted about the issue, OpenAI’s chief Sam Altman replied, “please hit the thumbs down on these and help us improve!”

***

Some have argued that the cases that go viral on social media are outliers and not reflective of how the systems will actually be used in the real world. But AI boosters expect we are only seeing the beginning of what the tool can do. “Our techniques available for exploring [the AI] are very juvenile,” wrote Jack Clark, an AI expert and former spokesman for OpenAI, in a newsletter last month. “What about all the capabilities we don’t know about?”

Krishnan, the tech investor, said he is already seeing a wave of start-ups built around potential applications of large language models, such as helping academics digest scientific studies and helping small businesses write up personalized marketing campaigns. Today’s limitations, he argued, should not obscure the possibility that future versions of tools like ChatGPT could one day become like the word processor, integral to everyday digital life.

The breathless reactions to ChatGPT remind Mar Hicks, a historian of technology at the Illinois Institute of Technology, of the furor that greeted ELIZA, a pathbreaking 1960s chatbot that adopted the language of psychotherapy to generate plausible-sounding responses to users’ queries. ELIZA’s developer, Joseph Weizenbaum, was “aghast” that people were interacting with his little experiment as if it were a real psychotherapist. “People are always waiting for something to be dazzled by,” she said.

Others greeted this change with dread. When Nathan Murray, an English professor at Algoma University in Ontario, received a paper last week from one of the students in his undergraduate writing class, he knew something was off; the bibliography was loaded with books about odd topics, such as parapsychology and resurrection, that did not actually exist.

When he asked the student about it, they responded that they’d used an OpenAI tool, called Playground, to write the whole thing. The student “had no understanding this was something they had to hide,” Murray said.

Murray tested a similar tool for automated writing, Amazon’s Sudowrite, last year and said he was “absolutely stunned”: After he inserted a single paragraph, the AI wrote an entire paper in its style. He worries the technology could undermine students’ ability to learn critical reasoning and language skills; in the future, any student who will not use the tool might be at a disadvantage by having to compete with the students who will.

It is like there’s “this hand grenade rolling down the hallway toward everything” we know about teaching, he said.

In the tech industry, the issue of synthetic text has become increasingly divisive. Paul Kedrosky, a general partner at SK Ventures, a San Francisco-based investment fund, said in a tweet Thursday that he is “so troubled” by ChatGPT’s productive output in the last few days: “High school essays, college applications, legal documents, coercion, threats, programming, etc.: All fake, all highly credible.”

ChatGPT itself has even shown something resembling self-doubt: After one professor asked about the moral case for building an AI that students could use to cheat, the system responded that it was “generally not ethical to build technology that could be used for cheating, even if that was not the intended use case.”

Whittle, the pool installer with dyslexia, sees the technology a bit differently. He struggled through school and agonized about whether clients who saw his text messages would take him seriously or not. For a time, he had asked Richman to proofread many of his emails – a key reason, Richman said with a laugh, he went looking for an AI to do the job instead.

Richman used an automation service called Zapier to connect GPT-3 with a Gmail account; the process took him about 15 minutes, he said. For its instructions, Richman told the AI to “generate a business email in UK English that is friendly, but still professional and appropriate for the workplace,” with the topic of whatever Whittle just asked about. The “Dannybot,” as they call it, is now open for free translation, 24 hours a day.

Richman, whose tweet about the system went viral, said he has heard from hundreds of people with dyslexia and other challenges asking for help setting up their own AI.

“They said they always worried about their own writing: Is my tone appropriate? Am I too terse? Not empathetic enough? Could something like this be used to help with that?” he said. One person told him, “If only I’d had this years ago, my career would look very different by now.”