Use case

Verify what your coding agent shipped

A coding agent writes plausible code fast, but a passing diff is not a working app. AuditWard points a QA agent and real security tooling at the deployed URL, drives the actual browser, probes the live target, and hands back triaged findings with screenshots and a PDF report.

The gap

A green diff is not a verified app.

Your agent reports the task done. The unit tests pass and the build is green. None of that tells you whether the signup form submits, whether the API leaks a stack trace, or whether the agent left a debug header in place. Those facts only exist on the running site.

To check AI coding agent output for real, you have to exercise the deployed app the way a user and an attacker would. That means clicking through the flows in a browser and probing the server with security tooling. AuditWard does both from one URL, so verifying an AI-built app does not become its own afternoon of manual QA.

Workflow

The verification run, step by step.

You give AuditWard the URL your agent deployed and a short note about what changed. From there the pipeline runs four stages on its own and returns the evidence. You can kick it off from the dashboard or call qa_test straight from your coding agent over MCP.

01POINT IT AT THE DEPLOY

Give it the live URL

Paste the deployed URL (a staging or preview link works) and add instructions like "I just added Stripe checkout and a password reset flow, check those first". If the app sits behind a login, the scan pauses to ask for credentials, you answer, and it resumes. Those answers are KMS-encrypted.

02PLANNER BUILDS THE CHECKLIST

A test plan, not a script

An LLM Planner reads the URL and your note, then writes a checklist of things to verify: the flows you flagged, the obvious user paths, and the surfaces worth a security look. There is nothing to record or maintain. The plan is built fresh for the app that is actually live.

03EXPLORER DRIVES THE BROWSER

It clicks through for real

An Explorer agent runs the checklist in a real Chromium browser. It fills forms, follows links, submits the checkout, and watches what the page does. While it works, security tools probe the same target: curl, testssl.sh, Nuclei, Nmap, Gobuster, nslookup, and WhatWeb. Screenshots are captured at each step.

04ANALYST TRIAGES THE EVIDENCE

Findings you can act on

An Analyst agent turns the raw evidence into findings, each with a confidence score, a severity, annotated screenshots, and tags to the frameworks it touches (PCI DSS 4.0, SOC 2, GDPR, OWASP Top 10, HIPAA, ISO 27001). You get a finding list in the dashboard and a pentest-style PDF report.

# From your coding agent, once AuditWard is added as an MCP server:
"Run a QA and security scan of https://preview.myapp.dev,
 I just shipped checkout and password reset. Report the findings."
What it catches

The kinds of things agents leave behind.

Coding agents tend to fail in repeatable ways. They wire up a flow that looks right but breaks at submission, ship a UI full of placeholder copy, and skip the security headers that no test was asking about. Here are common finding types from a verification run, split by the agent that surfaces them.

SourceFinding typeWhat it looks like on an agent-built app
QABroken flowCheckout submits but the success page 500s, or a form posts to a route the agent never built.
QAPlaceholder contentLorem ipsum, "Your Name Here", or stub product data left in a live page.
QAVisual or layout bugOverlapping elements, an unstyled error state, or a modal that opens but cannot be closed.
SecurityMissing security headersNo Content-Security-Policy, HSTS, or X-Content-Type-Options on the deployed host.
SecurityWeak TLS configurationOutdated protocol versions or weak ciphers flagged by testssl.sh on the live endpoint.
SecurityInformation disclosureA verbose stack trace, an exposed admin path, or a debug header the agent forgot to strip.
Honest scope

What this run does and does not do.

A verification run tells you what is observable from the outside of the deployed app: broken user flows and the security issues a black-box scan can reach. It is a fast check on AI coding agent output, not a manual penetration test and not a code review. Read it as a strong first pass, not a sign-off.

What it covers well

The flows a user touches and the surfaces an attacker probes from outside. Browser-level QA on the live app, plus security findings from real pentest tooling, tagged to the frameworks each issue maps to. Run it on every preview deploy without setting anything up first.

Where you still need people

AuditWard does not replace a manual penetration test, it complements one. It is not a certified pentest and not a PCI ASV scan. Business-logic bugs that need a human to reason about intent, and risks that only show in source, still call for review by an engineer or a tester.

FAQ

Verifying agent output.

Do I scan the code or the running app?

The running app. AuditWard points its QA agent and security tooling at the deployed URL and exercises it as a user and an attacker would, so you verify what the agent actually shipped, not just the diff it wrote.

Can it verify an app that needs a login?

Yes. When the scan hits a login wall it pauses with structured questions. You answer in the dashboard or with qa_provide_context from your coding agent, and the scan resumes. Your answers are KMS-encrypted before storage.

Can I run this from my coding agent directly?

Yes. AuditWard ships an MCP server, so an agent like Claude Code can call qa_test on a deployed URL, poll qa_status, and pull the report with qa_report. MCP access is on the Starter plan and above.

Does a passing run mean the app is secure?

No. A run checks what is observable from outside the deployed app. It is a fast first pass on agent output, not a manual penetration test, a code review, or a PCI ASV scan, and it does not replace a human security review.

How long does a verification run take?

It runs asynchronously. You start it, then watch the screenshot feed and checklist progress in the live dashboard or poll status from your agent. Time depends on how many flows the Planner queues and how large the target is.