Your AI Is Learning Bad Code. Here's How to Fix That.
A controlled experiment across four AI coding tools reveals that native-first prompting produces cleaner, more accessible UI than ARIA-heavy approaches.
Sankar Mutyala
Product Engineering
Modern AI coding tools make me incredibly fast. But they also tend to reproduce years of accumulated bad habits. Request a sleek storefront homepage and you’ll likely get something attractive that breaks fundamental accessibility rules. That contradiction sparked a research sprint: could intentional prompting train AI to produce accessible UI, rather than merely plausible UI?
To test this, we designed a controlled experiment across four widely used tools such as Builder Fusion, Claude Code, GitHub Copilot (GPT-5), and Vercel v0. Each was asked to build the same mini project: a minimalist luxury watch homepage featuring a sticky header with a mega menu, a welcome dialog, predictive search, two carousels, a journal grid with a “load more” interaction, and a standard footer. We evaluated the five components most likely to fail against WCAG 2.1 AA. The twist wasn’t the scope. It was the prompts.
Four ways to ask, four very different outcomes
Phase 1: Say nothing about accessibility
Our baseline prompt described the page and components, with constraints to ship in three files (HTML, CSS, JS). No accessibility guidance whatsoever.
Results looked slick and failed predictably - missing semantic structure, inconsistent or incorrect ARIA, weak keyboard support, unlabeled images and form controls. Conformance clustered in the 70-80% range.
Pretty, but broken.
Phase 2: Say “make it WCAG 2.1 AA compliant”
This simple nudge helped. All four tools improved heading structure and labeling. Some started auto-adding alt text and form labels.
But the persistent issues stayed persistent: shaky focus management, inconsistent keyboard behavior, and live updates that weren’t announced to assistive tech. Scores plateaued around 82-83%.
Better, not good.
Phase 3: Ask an LLM to rewrite the prompt with explicit success criteria
Here we asked AI to expand our instructions with detailed accessibility requirements. It did - mostly by stuffing ARIA everywhere. We saw better landmarks and some keyboard gains, but also a new class of “A-level” failures created by unnecessary and incorrect roles.
The net effect: minimal score movement and a few regressions, with most tools hovering around 81-83%. In some cases we fixed AA issues while introducing fresh A issues.
That’s not progress.
Phase 4: Lead with a native-first accessibility checklist
Instead of burying accessibility inside the build request, we put it up front as a prerequisite and kept it simple:
- Semantic HTML first, ARIA only if needed
- Logical tab order
- Visible focus
- Skip link respected
- Clear labels and error handling
- Polite live regions for search, assertive for form errors
- Contrast targets met
- Respect reduced motion
Then we asked for the same page.
This changed the pattern. Predictive search and dialogs leaned on native elements. Menus behaved. Labeling stopped multiplying. Keyboard navigation felt natural. Scores rose again - Claude Code and V0 were standouts - while Copilot dipped a few points due to a couple of stubborn ARIA misuses.
The takeaway is blunt: moving from “ARIA-heavy accessibility” to a native-first prompting strategy produced cleaner, more compliant code with fewer false positives and stronger keyboard support.
What the numbers said
Across phases, the broad trend was clear:
- Baseline conformance sat roughly in the mid-70s to around 80%
- Adding “make it accessible” nudged tools into the low-80s
- The ARIA-stuffed rewrite didn’t unlock a new level - if anything, it risked new breakage
- The native-first prompt stabilized semantics and lifted quality again, with Claude Code and V0 edging ahead in our final round, and Copilot showing a small drop tied to specific ARIA attribute errors
Numbers aren’t the whole story, though. The feel of the output changed.
When tools relied on native patterns first, focus order made sense, keyboard interactions worked without gymnastics, and assistive tech announced changes reliably. When tools reached for ARIA early, they over-labeled, mis-labeled, or conflated patterns - most visibly around combobox behavior in predictive search.
That’s exactly the kind of “helpfulness” that can sabotage accessibility in the real world.
Why “native-first” works
Modern browsers already implement a huge amount of accessible behavior. Semantic HTML exposes names, roles, and states that assistive tech understands. When you start with those primitives, you inherit the right accessibility defaults.
ARIA is powerful, but it’s meant to fill gaps - not reinvent everyday controls. Ask an AI to sprinkle ARIA, and it will. Ask it to use native HTML first, and it takes the safest path.
The results aligned with that principle. Our best runs leaned into native navigation landmarks, form labels, lists and headings, dialog semantics, and buttons that behaved like buttons. The fewer custom roles we introduced, the fewer edge cases we created.
What this means for teams using AI to build UI
If your default prompt is “build X and make it accessible,” you’ll get inconsistent gains and recurring pain.
Swap that for a checklist-style preamble that sets the ground rules, then describe your components. Keep it human and semantic, not jargon-heavy. You don’t have to know the exact ARIA recipe for a combobox. You do need to state the outcomes you want:
- Logical tab order
- Arrow-key navigation in menus
- Enter to select
- Polite live announcements for search results
- Assertive error announcements for forms
The model can translate those outcomes into code, and it will do better when nudged toward native elements first.
Two practical cautions from the experiment:
- Prompting isn’t a silver bullet. Even in the best runs, we still reviewed output with real tools and caught edge cases. Keep humans in the loop.
- Tool behavior varies. Across our tests, Claude Code and Vercel V0 responded especially well to the native-first approach. Builder Fusion needed more re-prompting. Copilot (GPT-5) was solid overall but occasionally overreached with ARIA, which hurt its final score in Phase 4. Your mileage will vary by component and update cadence, so verify.
How to start, fast
Here’s the pattern and sample prompts that worked for us - adapt as needed:
1. Lead with the guardrails
“Use semantic HTML first. Only add ARIA if a native pattern can’t express the behavior. Provide logical tab order, visible focus, a working skip link, and label all controls. Announce live updates politely for search and assertively for form errors. Meet WCAG 2.1 AA color contrast. Respect reduced motion.”
2. Describe the component goals, not the ARIA
“Build a header with logo, predictive search, and a mega menu. The menu should support arrow keys and Enter to select. The search should announce result counts without stealing focus. Dialogs must trap focus and close with Escape.”
3. Ask for validation
“Append a short explanation of how this meets the navigation and forms criteria above.”
4. Iterate per component
Generate, test, and refine one complex widget at a time - menu, search, dialog, carousel - rather than the whole page at once. It’s faster to isolate and fix interaction bugs when you’re not diffing a thousand lines of combined output.
That’s the workflow we now use day-to-day. It’s simple enough for non-experts, and it consistently produces code that’s closer to shippable on the first pass.
Closing thought
AI mirrors the patterns we reward. If you ask for a page, it’ll copy patterns from the web - warts included. If you ask for a page that honors human needs first, it will move in that direction.
Our experiment didn’t make any tool perfect, but it did prove a reliable way to get better outcomes: teach your AI to start with the platform’s built-in accessibility, then add only what you must.
That one change took us from “pretty but broken” to “cleaner, more compliant, and keyboard-friendly.” It will do the same for your team.
Building accessible products with AI? Contact us to learn how Equiwiz approaches AI-assisted development with accessibility built in from the start.
Related Articles
Why AI Forces a Rethink of Change Management
Exploring how artificial intelligence is transforming traditional change management approaches and what organizations need to adapt for successful AI adoption.
Guide to Governing AI Agents Safely at Scale
A practical guide for CIOs and enterprise leaders on establishing governance frameworks for AI agents. Learn how to deploy agentic systems with observability, security, and compliance at scale.
EventStorming for Agentic AI: Design the Process Before You Build the Agent
How EventStorming and Domain-Driven Design give enterprise teams a structured way to design agentic AI systems that are grounded in real business processes, not just technology hype.