AI & Innovation

Your AI Is Learning Bad Code. Here's How to Fix That.

A controlled experiment across four AI coding tools reveals that native-first prompting produces cleaner, more accessible UI than ARIA-heavy approaches.

Sankar Mutyala

Product Engineering

Feb 1, 2025 7 min read

Engineers reviewing AI-generated code on a tablet in a tech workspace

Modern AI coding tools make me incredibly fast. But they also tend to reproduce years of accumulated bad habits. Request a sleek storefront homepage and you’ll likely get something attractive that breaks fundamental accessibility rules. That contradiction sparked a research sprint: could intentional prompting train AI to produce accessible UI, rather than merely plausible UI?

To test this, we designed a controlled experiment across four widely used tools such as Builder Fusion, Claude Code, GitHub Copilot (GPT-5), and Vercel v0. Each was asked to build the same mini project: a minimalist luxury watch homepage featuring a sticky header with a mega menu, a welcome dialog, predictive search, two carousels, a journal grid with a “load more” interaction, and a standard footer. We evaluated the five components most likely to fail against WCAG 2.1 AA. The twist wasn’t the scope. It was the prompts.

Four ways to ask, four very different outcomes

Phase 1: Say nothing about accessibility

Our baseline prompt described the page and components, with constraints to ship in three files (HTML, CSS, JS). No accessibility guidance whatsoever.

Results looked slick and failed predictably - missing semantic structure, inconsistent or incorrect ARIA, weak keyboard support, unlabeled images and form controls. Conformance clustered in the 70-80% range.

Pretty, but broken.

Phase 2: Say “make it WCAG 2.1 AA compliant”

This simple nudge helped. All four tools improved heading structure and labeling. Some started auto-adding alt text and form labels.

But the persistent issues stayed persistent: shaky focus management, inconsistent keyboard behavior, and live updates that weren’t announced to assistive tech. Scores plateaued around 82-83%.

Better, not good.

Phase 3: Ask an LLM to rewrite the prompt with explicit success criteria

Here we asked AI to expand our instructions with detailed accessibility requirements. It did - mostly by stuffing ARIA everywhere. We saw better landmarks and some keyboard gains, but also a new class of “A-level” failures created by unnecessary and incorrect roles.

The net effect: minimal score movement and a few regressions, with most tools hovering around 81-83%. In some cases we fixed AA issues while introducing fresh A issues.

That’s not progress.

Phase 4: Lead with a native-first accessibility checklist

Instead of burying accessibility inside the build request, we put it up front as a prerequisite and kept it simple:

Semantic HTML first, ARIA only if needed
Logical tab order
Visible focus
Skip link respected
Clear labels and error handling
Polite live regions for search, assertive for form errors
Contrast targets met
Respect reduced motion

Then we asked for the same page.

This changed the pattern. Predictive search and dialogs leaned on native elements. Menus behaved. Labeling stopped multiplying. Keyboard navigation felt natural. Scores rose again - Claude Code and V0 were standouts - while Copilot dipped a few points due to a couple of stubborn ARIA misuses.

The takeaway is blunt: moving from “ARIA-heavy accessibility” to a native-first prompting strategy produced cleaner, more compliant code with fewer false positives and stronger keyboard support.

What the numbers said

Across phases, the broad trend was clear:

Baseline conformance sat roughly in the mid-70s to around 80%
Adding “make it accessible” nudged tools into the low-80s
The ARIA-stuffed rewrite didn’t unlock a new level - if anything, it risked new breakage
The native-first prompt stabilized semantics and lifted quality again, with Claude Code and V0 edging ahead in our final round, and Copilot showing a small drop tied to specific ARIA attribute errors

Numbers aren’t the whole story, though. The feel of the output changed.

When tools relied on native patterns first, focus order made sense, keyboard interactions worked without gymnastics, and assistive tech announced changes reliably. When tools reached for ARIA early, they over-labeled, mis-labeled, or conflated patterns - most visibly around combobox behavior in predictive search.

That’s exactly the kind of “helpfulness” that can sabotage accessibility in the real world.

Why “native-first” works

Modern browsers already implement a huge amount of accessible behavior. Semantic HTML exposes names, roles, and states that assistive tech understands. When you start with those primitives, you inherit the right accessibility defaults.

ARIA is powerful, but it’s meant to fill gaps - not reinvent everyday controls. Ask an AI to sprinkle ARIA, and it will. Ask it to use native HTML first, and it takes the safest path.

The results aligned with that principle. Our best runs leaned into native navigation landmarks, form labels, lists and headings, dialog semantics, and buttons that behaved like buttons. The fewer custom roles we introduced, the fewer edge cases we created.

What this means for teams using AI to build UI

If your default prompt is “build X and make it accessible,” you’ll get inconsistent gains and recurring pain.

Swap that for a checklist-style preamble that sets the ground rules, then describe your components. Keep it human and semantic, not jargon-heavy. You don’t have to know the exact ARIA recipe for a combobox. You do need to state the outcomes you want:

Logical tab order
Arrow-key navigation in menus
Enter to select
Polite live announcements for search results
Assertive error announcements for forms

The model can translate those outcomes into code, and it will do better when nudged toward native elements first.

Two practical cautions from the experiment:

Prompting isn’t a silver bullet. Even in the best runs, we still reviewed output with real tools and caught edge cases. Keep humans in the loop.
Tool behavior varies. Across our tests, Claude Code and Vercel V0 responded especially well to the native-first approach. Builder Fusion needed more re-prompting. Copilot (GPT-5) was solid overall but occasionally overreached with ARIA, which hurt its final score in Phase 4. Your mileage will vary by component and update cadence, so verify.

How to start, fast

Here’s the pattern and sample prompts that worked for us - adapt as needed:

1. Lead with the guardrails

“Use semantic HTML first. Only add ARIA if a native pattern can’t express the behavior. Provide logical tab order, visible focus, a working skip link, and label all controls. Announce live updates politely for search and assertively for form errors. Meet WCAG 2.1 AA color contrast. Respect reduced motion.”

2. Describe the component goals, not the ARIA

“Build a header with logo, predictive search, and a mega menu. The menu should support arrow keys and Enter to select. The search should announce result counts without stealing focus. Dialogs must trap focus and close with Escape.”

3. Ask for validation

“Append a short explanation of how this meets the navigation and forms criteria above.”

4. Iterate per component

Generate, test, and refine one complex widget at a time - menu, search, dialog, carousel - rather than the whole page at once. It’s faster to isolate and fix interaction bugs when you’re not diffing a thousand lines of combined output.

That’s the workflow we now use day-to-day. It’s simple enough for non-experts, and it consistently produces code that’s closer to shippable on the first pass.

Closing thought

AI mirrors the patterns we reward. If you ask for a page, it’ll copy patterns from the web - warts included. If you ask for a page that honors human needs first, it will move in that direction.

Our experiment didn’t make any tool perfect, but it did prove a reliable way to get better outcomes: teach your AI to start with the platform’s built-in accessibility, then add only what you must.

That one change took us from “pretty but broken” to “cleaner, more compliant, and keyboard-friendly.” It will do the same for your team.

Building accessible products with AI? Contact us to learn how Equiwiz approaches AI-assisted development with accessibility built in from the start.

#AI Code Generation #Web Accessibility #WCAG #Prompt Engineering #AI Coding Tools #Accessible UI #Semantic HTML #GitHub Copilot #Claude Code #Frontend Development

Share this article

AI & Innovation 3 min read

Product Strategy & Consulting

SaaS Development & Platforms

Legacy Product Modernization

UX/UI Design & Product Experience

Maintenance & Support

Custom Software Development

Software Architecture

Legacy Application Modernization

Digital Transformation Consulting

API Development & Integration

Mobile App Development

Customer Experience Platforms

Cloud Strategy & Consulting

Cloud Migration & Modernization

Cloud-Native Development

DevOps, CI/CD & Automation

Cloud & Application Security

Power Platform

OutSystems

Zoho Creator

Citizen Development Enablement

Enterprise System Integration

Dedicated Team

Offshore Development Center

AI Strategy & Roadmap Consulting

Generative AI Solutions

AI Integration & Implementation

AI App Development

Intelligent Agent Development

LLM Development

On-Demand AI Experts

Conversational AI & Chatbot

MLOps & Model Governance

Natural Language Processing

Computer Vision & Image Analytics

Big Data Solutions & Management

Data Warehouse Strategy & Implementation

Data Governance & Compliance Consulting

Data Visualization & Dashboard

Data Strategy & Roadmap Consulting

Data Platform Modernization

Data Analytics

Robotic Process Automation

Data Science Consulting

Predictive Analytics

FinTech

HRTech

EdTech

Retail & eCommerce

LegalTech

Manufacturing

Healthcare

Cleantech

InsurTech

Travel & Hospitality

Software & Hi-Tech