beacon
Scans business websites for security weaknesses and matches every finding to a documented breach where the same vulnerability was exploited. I built it because every security report I'd seen said "missing CSP header" and expected a managing partner to know what that means.
What it finds
The first version just checked HTTP headers. Then I scanned a real immigration agency and the headers were perfect — Vercel defaults handle most of it. The actual problems were zero SPF, no DMARC, and Hotjar recording every form interaction. The headers gave them an A while their clients' passport copies were being captured by a session recording tool.
That's when the tool stopped being a header checklist and started thinking about what matters. Seven scanners now: TLS (protocol, cipher, certificate chain, HSTS), headers (CSP, X-Frame-Options, referrer policy, permissions policy), DNS (SPF, DKIM across 18 common selectors, DMARC enforcement, DNSSEC), exposed paths (25 common files with body validators), third-party tracking (20 known trackers including session recording), forms (Google Forms, WhatsApp links, insecure uploads), and cookies (HttpOnly, Secure, SameSite flags). Each graded A through F.
Breach precedent matching
115 verified data breaches. Every source is a link to an ICO enforcement action, FTC filing, FBI IC3 report, or court document. When beacon finds a missing CSP, it doesn't say"this is bad" — it shows you British Airways: 380,000 payment cards stolen, £20 million ICO fine. When it finds no DMARC, it shows you the FBI IC3 report documenting $55 billion in business email compromise losses.
Three entries include victim quotes for human-impact context. The precedent database covers 21 vulnerability categories across XSS, email spoofing, exposed files, cloud storage, session recording, credential theft, and more. Building this database changed how I thought about every finding. A missing header is abstract. A £20 million fine is concrete.
The false positive problem
Most modern sites return HTTP 200 for every URL. Request /.env on a Next.js site and you get the homepage with a 200 status. Without validation, every SPA gets flagged for every path.
Each of the 25 checked paths has a validator function that inspects the response body. Does it look like KEY=VALUE pairs, or does it look like HTML? Does /.git/HEAD start with"ref: refs/heads/" or is it a React app? With validators, false positives dropped to near zero.
Industry context
Same scan, different interpretation. An immigration agency with no DMARC is not the same as a restaurant with no DMARC. Five industry profiles (immigration, law, accounting, healthcare, general) adjust severity levels based on what data the business handles. No DMARC on a general site is high severity. On an immigration agency — which sends payment instructions and handles passport data by email — it becomes critical, automatic F. Session recording tools on a law firm portal get bumped to high because they capture case file interactions. Each profile appends industry-specific risk context: the immigration profile mentions GDPR Article 32 and the £150M UK solicitor invoice fraud epidemic.
From the source
/** Map from scanner findings to breach database categories. */
const FINDING_TO_BREACH: Record<string, string[]> = {
"headers-no-csp": ["xss", "supply-chain"],
"dns-no-dmarc": ["email-spoofing"],
"dns-no-spf": ["email-spoofing"],
"paths-env": ["exposed-files"],
"paths-git-head": ["exposed-files"],
"third-party-hotjar": ["session-recording"],
"forms-whatsapp-communication": ["whatsapp-consumer-tools"],
// ... 40+ mappings across 20 categories
};
/** Pick the most impactful precedent from matching breaches. */
function pickPrecedent(categories: string[]): BreachPrecedent | undefined {
const matches = BREACHES
.filter(b => categories.includes(b.category))
.sort((a, b) => (b.impact?.length ?? 0) - (a.impact?.length ?? 0));
return matches[0];
}const WEIGHTS: Record<Severity, number> = {
critical: 40,
high: 20,
medium: 8,
low: 2,
info: 0,
};
export function computeGrade(findings: Finding[]): Grade {
const criticals = findings.filter((f) => f.severity === "critical").length;
const highs = findings.filter((f) => f.severity === "high").length;
// Floor: any critical = F
if (criticals > 0) return "F";
let score = 100;
for (const f of findings) score -= WEIGHTS[f.severity];
// Cap: 2+ highs = cannot be above D
if (highs >= 2) return gradeMax(scoreToGrade(score), "D");
return scoreToGrade(score);
}src/grade.ts; weights are calibrated against the 115-precedent breach database. Industry profiles (immigration, law, accounting, healthcare, general) reweight severity per finding before this function runs.What it doesn't do
- ▲Third-party detection works on the initial HTML only. Scripts loaded dynamically via Google Tag Manager are not detected. A site using GTM to load Hotjar will appear clean.
- ▲DKIM selector enumeration checks 18 common selectors. Custom selectors used by some providers won't be found. A missing DKIM finding might be a false negative.
- ▲This is passive analysis. No authentication bypass, no payload injection, no exploitation. A clean scan does not mean the site is secure — it means the publicly visible configuration has no obvious weaknesses.
- ▲Cookie analysis only covers cookies set on the initial page load. Session cookies that appear after login are not captured.
- ▲The breach precedent database is manually curated. It covers the most consequential incidents but will always be incomplete.
Stack
TypeScript + Node.js. Direct TLS socket inspection (no external scanner). DNS resolution via Node's dns/promises. HTML parsing for third-party detection. Next.js frontend with real-time scan progress. Two external dependencies total.
Use only on systems you own, or for which you have written authorisation from the owner. Pointing a scanner at a third-party domain without authorisation may be an offence under section 1 of the Computer Misuse Act 1990. Full methodology at /scope.