Methodology · v1.1

How beacon works

This page describes, check by check, what the beacon scanner does and what it deliberately doesn’t. It’s written for anyone who wants to understand or replicate the work — another security person, a journalist, a curious operator. The shorter operator-facing summary is at /scope; this page is the longer answer.

Version 1.1 · last reviewed 6 May 2026.

1. What beacon is for

Beacon is a defensive tool. It exists so that someone running a public website can read, in plain English, the externally visible signals a competent attacker would use to estimate whether the operator has taken information security seriously. It looks at what the public internet already shows, then scores it.

The threat model assumes three kinds of adversary: the opportunistic attacker who automates checks across the internet looking for low-hanging configuration mistakes; the targeted attacker who has already chosen the organisation and will make a small number of careful requests; and the spoofing attacker who wants to send email that the recipient treats as authentic.

Beacon doesn’t model insider threats, attackers with credentials, attackers with a foothold, or anything that requires exploiting an unknown vulnerability. Tools for those scenarios exist; this isn’t one of them.

The output is a grade and a set of annotated findings. The grade is calibrated against a database of public breach precedents. A finding is never a vulnerability disclosure — it’s a description of a configuration that has, historically, been associated with the documented incident cited next to it.

2. A few words used precisely

“Passive scan” is overloaded across the industry. The terms below mean what they say throughout this document.

Scan target
A registered domain and the public-facing services reachable through it. Internal services not resolvable from public DNS are out of scope. RFC 1918 addresses, link-local addresses, and cloud metadata endpoints are blocked at the request layer.
Operator
Whoever submits a scan to the hosted version, or runs the open-source CLI on their own machine. The operator is the party making the network requests.
Passive analysis
Network operations that satisfy all of these at once: no authentication is attempted; no payload that could change the target’s state is sent; no input is shaped to test a known vulnerability class; no probe is built to exhaust resources or trip rate limits.
Authorisation
You own the domain, you administer it on behalf of someone who does, or you have the owner’s written permission. Implied authorisation isn’t relied on.
Finding
An externally observable configuration state, with a severity, a short note on why it matters, and a citation to a public breach where the same state was a contributing factor.

3. The seven checks, in detail

Beacon performs seven categories of network operation, described in the order they execute. Each says what it does, what it captures, and what it doesn’t.

3.1 TLS inspection

Open a TCP connection to the canonical hostname on port 443. Perform a TLS handshake using modern Node.js defaults. Read the negotiated protocol version, cipher suite, and certificate chain. Close the connection.

No application-layer request is sent. The certificate is parsed for issuer, subject, validity period, signature algorithm, and SAN list. HSTS, where present, is read from the separate single GET in 3.2.

3.2 Response headers

One HTTPS GET to the root path,https://<target>/, with a User-Agent that identifies the request as beacon. Redirects follow up to five hops, but only if they stay on the same registrable domain. Cross-domain redirects end the request.

Status, headers, and the first 256 KB of the body are read. The body is reused by the tracker check (3.5) and the form check (3.6) and isn’t kept beyond the scan. Headers inspected: CSP, HSTS, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy.

3.3 Email authentication via DNS

DNS queries against the public hierarchy. No private resolvers, no zone transfers. Queries are: TXT at the apex for SPF; TXT at _dmarc.<domain> for DMARC; TXT at <selector>._domainkey.<domain> for each of eighteen common DKIM selectors; MX at the apex; DNSSEC chain state for the apex.

The selector list lives in src/data/selectors.ts and is open. Custom selectors used by some providers aren’t enumerated, so a missing-DKIM finding can be a false negative. No email is sent.

3.4 Exposed paths

A fixed set of unauthenticated HTTPS GETs to twenty-five paths that, when present and serving real content, indicate a known misconfiguration. The list is open and kept in the repo. Examples: /.git/HEAD, /.env, /backup.sql, /server-status.

Each response is checked against a category-specific validator that inspects only the leading bytes. A 200 with a generic single-page-app shell is treated as a false positive and discarded; a 200 whose body matches the real shape (e.g. ref: refs/heads/ for a Git HEAD) is recorded. No credentials, no payloads, no path traversal, no concatenation of user input into paths.

This is the only check that requests paths the operator hasn’t pre-confirmed the target intends to expose. See section 4.

3.5 Third-party trackers

The HTML body from 3.2 is parsed without script execution. The parser reports external hostnames referenced from <script src>, <img src>, and <iframe src>. These are matched against an open list of twenty known analytics, ad, and session-recording services. Scripts loaded later by a tag manager won’t be detected.

3.6 Forms

The body is searched for <form> elements. For each, beacon reports the action target, method, and whether the action is on the same registrable domain or off-domain. Nothing is submitted.

3.7 Cookies

The Set-Cookie headers from 3.2 are read. Beacon reports whether HttpOnly, Secure, and SameSite are set, plus the declared path and domain scope. Cookies aren’t stored, replayed, or used. Cookies set after authentication or a client-side state change aren’t visible to the scanner.

4. The exposed-paths boundary

Six of the seven checks just observe what the target already serves to anyone. They’re hard to describe as anything other than reading the public surface. The seventh — exposed paths — is different and worth being honest about.

That check requests paths the target may not have meant to expose. The list is well-known, the validators only look at leading bytes, no credentials are sent, and nothing is crafted to dodge a WAF. Even so, it’s the one check where authorisation matters: run beacon against domains you own, or where the owner has said yes. UK law takes unauthorised access seriously, and that responsibility sits with whoever runs the tool, not the tool itself.

The hosted version asks for an explicit authorisation warranty before each scan. The CLI runs on your machine, so the same applies in spirit and the warranty is self-attested.

5. What gets kept, for how long

Each hosted scan creates four records: the submitted domain, the operator’s IP, the result, and the timestamp. These are kept up to ninety days so a complaint about misuse can be looked into, then deleted. Short enough to limit harm if the store were ever exposed; long enough to handle a routine complaint with the actual inputs and outputs.

Rate limit: one submission per thirty seconds per IP. Submissions that look like authorisation-warranty abuse — for example, repeated submissions of unrelated third-party domains from one IP — are refused.

The CLI sends nothing to me. If you run the CLI, you’re the only one with the inputs and outputs.

The full privacy notice is at /privacy.

6. Known limitations

  • The third-party tracker list is open and incomplete. Trackers loaded after the initial parse, including through Google Tag Manager and similar, won’t be picked up.
  • The DKIM selector list is finite. A target with a custom selector will be reported as missing DKIM even when it’s configured. The UI flags this as a possible false negative.
  • The exposed-paths validator only inspects leading bytes. A honeypot configured to serve plausible content for one of the checked paths will produce a false positive.
  • The grade calibration is based on a manually curated precedent database (115 incidents at the time of writing). It covers the most consequential public incidents from a small number of jurisdictions, not everything.
  • The hosted scanner runs from a small, identifiable set of egress addresses. A target whose WAF refuses traffic from those addresses produces empty findings, which can read as a clean result.
  • Results are point-in-time. A domain that scored well today may not next week. The current grade isn’t ongoing assurance.

7. Versions, corrections, contact

This page is versioned. Substantive changes update the version number and the date at the top. Earlier versions are kept and available on request.

If something here is wrong, or you’d like a correction or clarification, email [email protected]. I respond within one calendar month, usually faster.