Models from OpenAI and Anthropic are improving, but the writing seems flatter

OpenAI and Anthropic models are improving from a technical standpoint, but their writing style feels flatter than before. This is a paradox that many users are starting to sense without always being able to formulate it clearly. Benchmarks are rising, reasoning capabilities are sharpening, and yet, something seems to have faded from the very texture of generated texts. This article explores this phenomenon, its probable causes, and its implications for anyone who uses these tools on a daily basis.

A Very Real Paradox: More Capable, But Less Impactful

You only need to compare the outputs of GPT-4 or Claude 3 with those obtained a year ago to notice a difference in tone. Responses are longer, better structured, and easier to read. But they have lost something essential: a certain uniqueness, a willingness to take risks in expression, a recognizable voice.

This observation, shared by many digital professionals, was accurately summarized by entrepreneur Nav Toor on X (formerly Twitter): prompts that worked beautifully six months ago yield less satisfying results today. The writing sounds more uniform; ideas are safer, more consensual. The model seems to have learned how to never disappoint, at the cost of never surprising.

The Quest for Neutrality as a Stylistic Trap

To understand this shift, one must look at the training processes. Large language models like GPT-4o or Claude 3.5 Sonnet are refined after their pre-training phase using techniques like RLHF (Reinforcement Learning from Human Feedback). Human evaluators score the answers, and the model learns to maximize these scores.

The problem is that human evaluators tend to prefer readable, polite, and unblemished answers. A bold phrasing might look incorrect to an evaluator unfamiliar with the context. An original metaphor might seem vague. As a result, the model converges toward an average style—acceptable to everyone, memorable to no one.

This is what some AI researchers call the “median writing problem”: optimizing for average human preference produces prose that is mediocre in the etymological sense of the word, meaning prose from the middle.

The Web as a Distorting Mirror: Training Data Contamination

A second factor exacerbates this phenomenon. Since the explosion of LLM (Large Language Model) usage in 2023 and 2024, an increasing share of the content published on the Internet is itself generated by AI models. Blog articles, newsletters, LinkedIn posts, and product pages: all written partly or entirely by tools like ChatGPT or Claude.

This content then feeds back into the training corpus of future model versions. We are thus entering a stylistic feedback loop: the model learns to imitate a web that is already imitating it. Each iteration homogenizes the output a bit more. Writing becomes circular, self-referential, and stripped of the creative friction that only direct human experience can generate.

When the Whole Web Begins to Look Alike

This dynamic has consequences far beyond simple AI tools. The attentive reader notices that blog posts look more and more alike, professional emails share the same phrasings, and newsletters adopt identical structures. This is no coincidence: it is the invisible signature of models trained on the same data and deployed on a massive scale.

The web’s stylistic diversity, which used to be its wealth, is progressively eroding. Paradoxically, while OpenAI and Anthropic models are improving technically, they are simultaneously contributing to this collective impoverishment of online language.

Comparison of Major Models: Performance vs. Writing Quality

Model	Developer	Technical Performance (Benchmark)	Perceived Stylistic Quality	Observed Trend
GPT-4o	OpenAI	Very High	Correct, but uniform	Flattening of style since late 2024
Claude 3.5 Sonnet	Anthropic	Very High	Fluid, but consensual	Tendency to avoid strong opinions
GPT-4 (Initial Version)	OpenAI	High	More unique, sometimes rugged	Nostalgic reference point for many users
Claude 2	Anthropic	Medium to High	More distinct voice	Deemed more “human” by some writers

Why This Matters for Content Professionals

For writers, marketers, journalists, and content creators, this evolution is not trivial. Using an LLM to produce text that blends into the background is a short-sighted strategy. If everyone uses the same tools configured the same way, differentiation becomes impossible.

This also raises a deeper question about human added value in the writing process. If the machine produces flawless but bland prose, it is up to the human to provide narrative tension, a sharp point of view, personal anecdotes, and unexpected metaphors. AI then becomes a formatting tool, not a creative one.

Concrete Strategies to Bypass Stylistic Flattening

Given this state of affairs, several approaches can help bring originality back into AI-assisted productions:

Inject a strong voice into the prompt: Instead of simply asking to “write an article about X,” it is more effective to specify the tone, cultural references, phrasings to avoid, or even provide examples of your own writing.
Use AI as a co-pilot, not an autopilot: Generate a first draft, then heavily rework it to insert personal observations, concrete examples from real experience, and phrasings that break the expected rhythm.

The Responsibility of Model Developers

OpenAI and Anthropic are not deaf to these criticisms. Both companies have publicly acknowledged the limitations of RLHF when it comes to creativity and style. Anthropic, in particular, has invested in research on “Constitutional AI” and other methods aimed at reducing stylistic flattening biases.

However, commercial constraints work against creative boldness. A model that delivers sharp answers, defends original positions, or adopts an unconventional style is more likely to offend certain users. For companies commercializing their models to millions of organizations, stylistic caution is a rational decision, even if it is culturally impoverishing.

Toward a New Generation of More Expressive Models?

Some signals are encouraging. Newer, more specialized models trained on high-quality literary or journalistic corpora show that it is possible to reconcile technical performance with stylistic richness. Startups like Mistral AI and open-source projects are exploring alternative training paths that prioritize linguistic diversity.

The question, therefore, is not whether LLMs can write with style, but whether the major companies developing them have enough economic incentives to make them do so. For now, the answer remains uncertain.

Conclusion: Reclaiming Control Over Writing Quality

OpenAI and Anthropic models are improving across many measurable criteria, but their writing feels flatter, and this observation deserves to be taken seriously. Technical performance does not guarantee expressive richness. As these tools integrate into our workflows, it becomes crucial not to entirely delegate creation to systems optimized to please everyone—and thus surprise no one.

The responsibility lies with users to preserve their stylistic uniqueness, using AI as an amplifier of their voice rather than its substitute. And it lies with model developers not to sacrifice creativity on the altar of commercial safety.

Do you notice this phenomenon in your daily practice? Share your experiences in the comments and explore our other resources on the professional use of LLMs to deepen your understanding of these tools.

Régis

Regis Vansnick is a recognized expert with extensive experience at the intersection of technology, business, and innovation. His professional career is marked by a deep understanding of digital transformation and strategic management.

OpenAI and Anthropic Models Progress, But Writing Feels Flatter

A Very Real Paradox: More Capable, But Less Impactful

The Quest for Neutrality as a Stylistic Trap

The Web as a Distorting Mirror: Training Data Contamination

When the Whole Web Begins to Look Alike

Comparison of Major Models: Performance vs. Writing Quality

Why This Matters for Content Professionals

Concrete Strategies to Bypass Stylistic Flattening

The Responsibility of Model Developers

Toward a New Generation of More Expressive Models?

Conclusion: Reclaiming Control Over Writing Quality

Leave a Comment Cancel Reply

A Very Real Paradox: More Capable, But Less Impactful

The Quest for Neutrality as a Stylistic Trap

The Web as a Distorting Mirror: Training Data Contamination

When the Whole Web Begins to Look Alike

Comparison of Major Models: Performance vs. Writing Quality

Why This Matters for Content Professionals

Concrete Strategies to Bypass Stylistic Flattening

The Responsibility of Model Developers

Toward a New Generation of More Expressive Models?

Conclusion: Reclaiming Control Over Writing Quality

Must Read

Leave a Comment Cancel Reply