By Elisa Trippetti in LLM content design — 20 Jun 2026

Can Claude help you automate your product docs?

I ran five tests. Here’s what didn’t work and how to do better next time.

My UX team has been looking for ways to automate parts of our product documentation process with AI.

In one experiment, I tested Claude Cowork with a content task: drafting a high-level overview for five different products.

The results looked promising at first. But then I realized it was all a little too good to be true.

The goal

“Why automation?” you might ask.

“Isn’t writing docs part of your job?” Yes, it is, and I enjoy creating support content (although not as much as writing for new features).

But there are exceptions.

Maybe you’re the sole UX writer at the company. Or maybe you’re one of many product and content designers in a growing engineering team that needs to ship faster and faster.

Wherever you stand, your time is usually better spent on more strategic activities: contributing to the product roadmap, rather than documenting it.

Sometimes, AI can actually help speed up the most time-consuming parts, like ingesting repo content, creating a layout or putting together a first draft.

When you ask an LLM to generate product docs, you’re still both the content’s owner and the editor.

You:

Compare the user journey and specs against the actual implementation
Manage content structure, tone and style
Make sure the content aligns with team and user goals
Check that no errors or hallucinations slipped through the cracks
Involve other technical teams for support as needed

In short, you need to oversee each task from start to finish and stay alert the whole time… And we’ll soon see why.

The process, step by step

I picked Claude Cowork as it performs better with long-context, long-form, structured content. The LLM is Opus 4.8, the latest available at the time of writing. I didn’t get to test Fable here — gone too soon!

Then I outlined the process I wanted to follow and started collecting information for my tests.

Sketching the steps in my process

Choose relevant GitHub repos for testing

I started looking for complex B2B or SaaS platforms that roughly compare to my product’s context.

These are the five products I settled on:

A developer platform
A workflow automation SaaS
A B2B analytics SaaS
A visual AI agent builder
A low-code AI builder for agents and RAG

All these products have a public repo where each product layer is separate and easy to identify (backend, frontend/UX, API surface). This should make it easier for an LLM to interpret and navigate the content.

Also, these are teams that ship frequently and have multiple active releases and changelogs.

Select specific repo folders and files

Within the five repos, I looked for sources like:

README.md and other top-level documentation files
The /docs folder, if present
Feature-specific modules (authentication, APIs, integrations, permissions, billing)
Any SDK folders
API schemas and definitions
Code comments, types, config files
Release notes and changelogs

For this test, the LLM should only generate a high-level product overview, so I was primarily interested in files like README.md, CONTRIBUTING.md, DEVELOPERS.md and any user-facing docs.

But I also wanted to keep layered ingestion in mind, with the product overview being my first layer. Then, if the tests went well, I’d expand into specific features and their key modules. Finally, I’d go in depth on the actual behaviors.

Provide guidelines

Here, I didn’t include any proprietary guidelines on tone of voice, product terminology or UI components. I mainly wanted to see how the LLM would handle the information provided in the repos.

Here’s what I added instead:

Stylistic standards: language variant, capitalization, punctuation, abbreviations
The desired output format for each feature: why it exists, how it works, what the user can do, plus examples
A review checklist: only describe existing functionality, do not invent features or hallucinate details, always cross-check against schemas, flag uncertainty explicitly and suggest minimal updates rather than full rewrites

Test and review outputs

At this point, I had everything I needed to start using Cowork.

In each of the five tests, I structured my request as follows:

You are [technical writer persona: role, purpose, constraints]. Complete [task, e.g. writing a high-level product overview] to provide [target audience] with a big-picture understanding of [product]. Go to [GitHub repo link] and read [named folders and files]. Also, look for and include any other files that are helpful to [goal]. Follow [guidelines: output format, review checklist, stylistic standards]. Using [language, e.g. markdown], deliver a [format, e.g. .md or .mdx] file and save it to [local folder]. Use this example as a reference: [example].

The writer persona, target audience, guidelines and example are four separate files I uploaded with the prompt. I also attached the team’s skill we use for product docs.

The LLM read the files and ran a few scripts. After a while, it returned the artifacts as requested, seemingly outlining everything I’d asked for.

The outputs defined how each product works, its target audience, several core features, how to get started, a few use cases and then resources, security, support and so on.

I spent some time reading through the artifacts, then went back to Cowork to introduce a QA step.

I asked the LLM to check my prompt against the outputs and answer these questions:

Is everything fact-checked?
Is anything speculative?
Is anything undocumented?
Is anything outdated?
In short: have you followed all the instructions?

Where things fell apart

The answers to those questions finally started showing the cracks.

The LLM candidly admitted to overlooking things here and there, and mishandling things here and there… in every test.

Product 1: developer platform

Output: 3,300 words approx.

Decided to skip the /docs folder entirely
Missed an important security file
Speculated details on authentication and REST APIs
Conflated concepts on self-hosting
Couldn’t confirm what it read was current

Product 2: workflow automation SaaS

Output: 2,500 words approx.

Didn’t look beyond the named files, despite being asked to
Relied on the /docs navigation tree to write a section, but didn’t cross-check against code or schemas
Said it made up a claim about the product name, but it was actually right

Product 3: B2B analytics SaaS

Output: 2,100 words approx.

Overlooked a frontend products registry
Presented an unreleased feature as current
Got a product name wrong
Assumed two feature examples not stated in the source

Product 4: visual AI agent builder

Output: 1,600 words approx.

Didn’t verify the entire capabilities list against code or schemas
Omitted three files that would have strengthened the overview
Didn’t flag a versioning inconsistency between sources
Wrote an inaccurate example scenario on condition nodes

Product 5: low-code AI builder for agents and RAG

Output: 2,400 words approx.

Filled gaps through AI-generated search snippets (!) instead of relying on repo source files
Missed documented playground capabilities
Assumed UI labels and paths for share, API and tool access

We’re talking about 1,000–3,000-word drafts that feature 3–5 prominent, repeated errors each.

But the main problem isn’t even that every single artifact comes with issues, although that’s alarming in itself.

The problem is that LLMs have structural limitations that you can’t fully overcome.

LLMs often fail to complete actions, ignore explicit instructions and deviate from the plan without letting you know.

And each time, they do so with full confidence.

LLMs optimize for appearing helpful over actually being helpful. They prioritize impressing with volume instead of adding value. It’s the illusion of efficiency.

Ask the LLM itself, and it will tell you exactly that.

All these and many more behaviors may be hard to spot if you don’t know what you’re doing.

Lessons learned

Expertise

One of the issues with running these tests with other public repos is that, as a UX writer, it’s harder to understand the full technical context I’m working with.

By contrast, within my own team I could count on an in-house engineer or developer to show me around and answer my questions.

I could use my insider knowledge of UX flows, needs and pain points. And I could rely on company subject matter experts to step in and correct any factual errors in the LLM output I’d otherwise miss.

Chunking

I thought drafting a high-level product overview would work as a contained unit I could manage. That’s why I treated it as my first ingestion layer.

But apparently it’s not the case, at least judging by the pattern of repeated mistakes in each test. I should probably break the task down into smaller parts.

First, I could take a few turns to evaluate the LLM’s knowledge of a certain topic against my expectations. Then, I could ask the LLM to gradually ingest the repo and verify its grasp on it. Only then would I make my first, tinier request, for example writing only the first chapter of the overview.

Prompting

In my tests, I named specific folders and files for the LLM to prioritize. I also explicitly asked to look for any other files within the repo that would be useful for the goal.

But since I’d also requested the LLM to work within the “provided code context”, my prompt got collapsed and a portion was outright ignored, overshadowed by later instructions.

Over time I’ve learned that I should place the most critical rules at the very start or very end of the prompt. Reiterating them in subsequent turns helps keep them in the active context window. I should have leaned more on concrete few-shot examples as well.

Review gates

Including only one QA step at the end wasn’t the right choice: it leaves too much room for uncertainty and I can easily lose control over what’s going on.

Along with chunking, I should have introduced a couple more review gates to make sure the LLM isn’t drifting off course, plus a final sign-off from a human expert on the most complex passages.

Conclusions

So, can Claude help you automate your product docs? Can any LLM, really, help you automate long-form product content?

I mean… There are several things I could have done better, for sure. But the tests went worse than I expected.

I'd say, you can give it a try if you understand the domain you’re working with, stay on top of the entire process and know which limitations you’ll run into.

Start with a small, non-critical task if you have a large volume of content to get through. Maybe have the LLM revise your draft instead of writing one from scratch, and see how it goes.

Measure whether it takes more time to spot and fix the errors than to do it all yourself. There are strategies to mitigate the risks, but you can’t eliminate failure entirely.

LLMs don’t have goals and can’t track progress against them. What looks like following a plan is them predicting what token should most likely come next. As Erik Munkby says, your AI agent doesn’t care about your codebase. It doesn’t care about your product content, either.

Its job is to complete the task at hand. Your job, as the content’s owner and editor, is to deal with the complexity of maintaining a system with many moving parts. And without you being in charge, the result will look inconsistent and brittle, if not outright wrong.

Ciao 👋 I’m Elisa, an Italian product writer who believes good design is service. This is where I document my work in UX.

Go to the blog

Let’s talk words

Don’t be a stranger. Connect with me on LinkedIn or Medium to talk about all things content design, UX writing and localization.

Come say hi!