Verify what your coding agent shipped
A coding agent writes plausible code fast, but a passing diff is not a working app. AuditWard points a QA agent and real security tooling at the deployed URL, drives the actual browser, probes the live target, and hands back triaged findings with screenshots and a PDF report.
A green diff is not a verified app.
Your agent reports the task done. The unit tests pass and the build is green. None of that tells you whether the signup form submits, whether the API leaks a stack trace, or whether the agent left a debug header in place. Those facts only exist on the running site.
To check AI coding agent output for real, you have to exercise the deployed app the way a user and an attacker would. That means clicking through the flows in a browser and probing the server with security tooling. AuditWard does both from one URL, so verifying an AI-built app does not become its own afternoon of manual QA.
The verification run, step by step.
You give AuditWard the URL your agent deployed and a short note about what changed. From there the pipeline runs four stages on its own and returns the evidence. You can kick it off from the dashboard or call qa_test straight from your coding agent over MCP.
Give it the live URL
Paste the deployed URL (a staging or preview link works) and add instructions like "I just added Stripe checkout and a password reset flow, check those first". If the app sits behind a login, the scan pauses to ask for credentials, you answer, and it resumes. Those answers are KMS-encrypted.
A test plan, not a script
An LLM Planner reads the URL and your note, then writes a checklist of things to verify: the flows you flagged, the obvious user paths, and the surfaces worth a security look. There is nothing to record or maintain. The plan is built fresh for the app that is actually live.
It clicks through for real
An Explorer agent runs the checklist in a real Chromium browser. It fills forms, follows links, submits the checkout, and watches what the page does. While it works, security tools probe the same target: curl, testssl.sh, Nuclei, Nmap, Gobuster, nslookup, and WhatWeb. Screenshots are captured at each step.
Findings you can act on
An Analyst agent turns the raw evidence into findings, each with a confidence score, a severity, annotated screenshots, and tags to the frameworks it touches (PCI DSS 4.0, SOC 2, GDPR, OWASP Top 10, HIPAA, ISO 27001). You get a finding list in the dashboard and a pentest-style PDF report.
# From your coding agent, once AuditWard is added as an MCP server:
"Run a QA and security scan of https://preview.myapp.dev,
I just shipped checkout and password reset. Report the findings."The kinds of things agents leave behind.
Coding agents tend to fail in repeatable ways. They wire up a flow that looks right but breaks at submission, ship a UI full of placeholder copy, and skip the security headers that no test was asking about. Here are common finding types from a verification run, split by the agent that surfaces them.
| Source | Finding type | What it looks like on an agent-built app |
|---|---|---|
| QA | Broken flow | Checkout submits but the success page 500s, or a form posts to a route the agent never built. |
| QA | Placeholder content | Lorem ipsum, "Your Name Here", or stub product data left in a live page. |
| QA | Visual or layout bug | Overlapping elements, an unstyled error state, or a modal that opens but cannot be closed. |
| Security | Missing security headers | No Content-Security-Policy, HSTS, or X-Content-Type-Options on the deployed host. |
| Security | Weak TLS configuration | Outdated protocol versions or weak ciphers flagged by testssl.sh on the live endpoint. |
| Security | Information disclosure | A verbose stack trace, an exposed admin path, or a debug header the agent forgot to strip. |
What this run does and does not do.
A verification run tells you what is observable from the outside of the deployed app: broken user flows and the security issues a black-box scan can reach. It is a fast check on AI coding agent output, not a manual penetration test and not a code review. Read it as a strong first pass, not a sign-off.
What it covers well
The flows a user touches and the surfaces an attacker probes from outside. Browser-level QA on the live app, plus security findings from real pentest tooling, tagged to the frameworks each issue maps to. Run it on every preview deploy without setting anything up first.
Where you still need people
AuditWard does not replace a manual penetration test, it complements one. It is not a certified pentest and not a PCI ASV scan. Business-logic bugs that need a human to reason about intent, and risks that only show in source, still call for review by an engineer or a tester.
Verifying agent output.
Do I scan the code or the running app?
The running app. AuditWard points its QA agent and security tooling at the deployed URL and exercises it as a user and an attacker would, so you verify what the agent actually shipped, not just the diff it wrote.
Can it verify an app that needs a login?
Yes. When the scan hits a login wall it pauses with structured questions. You answer in the dashboard or with qa_provide_context from your coding agent, and the scan resumes. Your answers are KMS-encrypted before storage.
Can I run this from my coding agent directly?
Yes. AuditWard ships an MCP server, so an agent like Claude Code can call qa_test on a deployed URL, poll qa_status, and pull the report with qa_report. MCP access is on the Starter plan and above.
Does a passing run mean the app is secure?
No. A run checks what is observable from outside the deployed app. It is a fast first pass on agent output, not a manual penetration test, a code review, or a PCI ASV scan, and it does not replace a human security review.
How long does a verification run take?
It runs asynchronously. You start it, then watch the screenshot feed and checklist progress in the live dashboard or poll status from your agent. Time depends on how many flows the Planner queues and how large the target is.