beacon

Scans business websites for security weaknesses and matches every finding to a documented breach where the same vulnerability was exploited. I built it because every security report I'd seen said "missing CSP header" and expected a managing partner to know what that means.

TypeScriptNode.jsNext.jsTLS socketsDNS / dns-promisesHTML parsing
beacon.giuseppegiona.com Source 7 scanners · 115 precedents · 65 tests

What it finds

The first version just checked HTTP headers. Then I scanned a real immigration agency and the headers were perfect — Vercel defaults handle most of it. The actual problems were zero SPF, no DMARC, and Hotjar recording every form interaction. The headers gave them an A while their clients' passport copies were being captured by a session recording tool.

That's when the tool stopped being a header checklist and started thinking about what matters. Seven scanners now: TLS (protocol, cipher, certificate chain, HSTS), headers (CSP, X-Frame-Options, referrer policy, permissions policy), DNS (SPF, DKIM across 18 common selectors, DMARC enforcement, DNSSEC), exposed paths (25 common files with body validators), third-party tracking (20 known trackers including session recording), forms (Google Forms, WhatsApp links, insecure uploads), and cookies (HttpOnly, Secure, SameSite flags). Each graded A through F.

Breach precedent matching

115 verified data breaches. Every source is a link to an ICO enforcement action, FTC filing, FBI IC3 report, or court document. When beacon finds a missing CSP, it doesn't say"this is bad" — it shows you British Airways: 380,000 payment cards stolen, £20 million ICO fine. When it finds no DMARC, it shows you the FBI IC3 report documenting $55 billion in business email compromise losses.

Three entries include victim quotes for human-impact context. The precedent database covers 21 vulnerability categories across XSS, email spoofing, exposed files, cloud storage, session recording, credential theft, and more. Building this database changed how I thought about every finding. A missing header is abstract. A £20 million fine is concrete.

The false positive problem

Most modern sites return HTTP 200 for every URL. Request /.env on a Next.js site and you get the homepage with a 200 status. Without validation, every SPA gets flagged for every path.

Each of the 25 checked paths has a validator function that inspects the response body. Does it look like KEY=VALUE pairs, or does it look like HTML? Does /.git/HEAD start with"ref: refs/heads/" or is it a React app? With validators, false positives dropped to near zero.

Industry context

Same scan, different interpretation. An immigration agency with no DMARC is not the same as a restaurant with no DMARC. Five industry profiles (immigration, law, accounting, healthcare, general) adjust severity levels based on what data the business handles. No DMARC on a general site is high severity. On an immigration agency — which sends payment instructions and handles passport data by email — it becomes critical, automatic F. Session recording tools on a law firm portal get bumped to high because they capture case file interactions. Each profile appends industry-specific risk context: the immigration profile mentions GDPR Article 32 and the £150M UK solicitor invoice fraud epidemic.

From the source

Breach precedent matching — maps findings to real incidentssrc/data/precedents.ts
/** Map from scanner findings to breach database categories. */
const FINDING_TO_BREACH: Record<string, string[]> = {
  "headers-no-csp": ["xss", "supply-chain"],
  "dns-no-dmarc":   ["email-spoofing"],
  "dns-no-spf":     ["email-spoofing"],
  "paths-env":      ["exposed-files"],
  "paths-git-head": ["exposed-files"],
  "third-party-hotjar": ["session-recording"],
  "forms-whatsapp-communication": ["whatsapp-consumer-tools"],
  // ... 40+ mappings across 20 categories
};

/** Pick the most impactful precedent from matching breaches. */
function pickPrecedent(categories: string[]): BreachPrecedent | undefined {
  const matches = BREACHES
    .filter(b => categories.includes(b.category))
    .sort((a, b) => (b.impact?.length ?? 0) - (a.impact?.length ?? 0));
  return matches[0];
}
Grade computation — deduction with hard floorssrc/grade.ts
const WEIGHTS: Record<Severity, number> = {
  critical: 40,
  high:     20,
  medium:    8,
  low:       2,
  info:      0,
};

export function computeGrade(findings: Finding[]): Grade {
  const criticals = findings.filter((f) => f.severity === "critical").length;
  const highs = findings.filter((f) => f.severity === "high").length;

  // Floor: any critical = F
  if (criticals > 0) return "F";

  let score = 100;
  for (const f of findings) score -= WEIGHTS[f.severity];

  // Cap: 2+ highs = cannot be above D
  if (highs >= 2) return gradeMax(scoreToGrade(score), "D");

  return scoreToGrade(score);
}
Figure 1 — Grade as a function of (n_high, n_medium)piecewise scoring
A100A92A84B76B68B60C52C44D36D28D20A80B72B64C56C48C40D32D24F16F8F0B60C52C44D36D28D20F12F4F0F0F0C40D32D24F16F8F0F0F0F0F0F0D20F12F4F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0n_high ≥ 2 ⇒ cap at D012345678910n_medium findings012345n_high findingsscore functionscore = 100− 40·n_crit− 20·n_high− 8·n_med− 2·n_lowfloor: n_crit>0 ⇒ FgradesA≥ 80B60–79C40–59D20–39F< 20
Grade is a piecewise function of the finding severity counts. The visible slice fixes n_crit = n_low = 0; the dashed warn line marks where the n_high ≥ 2 cap kicks in (everything above it is held at D maximum no matter how clean the rest of the scan is). One critical anywhere is an automatic F — that floor is off this slice. Function lives in src/grade.ts; weights are calibrated against the 115-precedent breach database. Industry profiles (immigration, law, accounting, healthcare, general) reweight severity per finding before this function runs.

What it doesn't do

  • Third-party detection works on the initial HTML only. Scripts loaded dynamically via Google Tag Manager are not detected. A site using GTM to load Hotjar will appear clean.
  • DKIM selector enumeration checks 18 common selectors. Custom selectors used by some providers won't be found. A missing DKIM finding might be a false negative.
  • This is passive analysis. No authentication bypass, no payload injection, no exploitation. A clean scan does not mean the site is secure — it means the publicly visible configuration has no obvious weaknesses.
  • Cookie analysis only covers cookies set on the initial page load. Session cookies that appear after login are not captured.
  • The breach precedent database is manually curated. It covers the most consequential incidents but will always be incomplete.

Stack

TypeScript + Node.js. Direct TLS socket inspection (no external scanner). DNS resolution via Node's dns/promises. HTML parsing for third-party detection. Next.js frontend with real-time scan progress. Two external dependencies total.

Authorised use

Use only on systems you own, or for which you have written authorisation from the owner. Pointing a scanner at a third-party domain without authorisation may be an offence under section 1 of the Computer Misuse Act 1990. Full methodology at /scope.