threadr

Open source reconnaissance tool for security teams and penetration testers. Feed it an email address and it maps the identity graph: domains, IPs, usernames, breach history, certificates, open ports. The interesting part isn't what it finds — it's how it avoids being detected while finding it.

TypeScriptLévy stable timingKolmogorov-SmirnovDempster-ShaferJaro-Winklerspectral Laplaciank-anonymityNeo4jTor SOCKS5ECDSA P-256Docker
Source 17 plugins · 331 tests · 7 OPSEC layers

Detection resistance

A reconnaissance tool that's trivially detectable is useless for authorised assessments where the target monitors for scanning. Seven layers handle this.

Per-plugin Tor circuit isolation gives each data source a different exit IP. Browser identity mimicry uses exact Chrome header order, sec-ch-ua client hints, and session cookie continuity. Timing is drawn from a Lévy stable distribution (α=1.5, infinite variance) — I built it with Poisson first, but a Kolmogorov-Smirnov test rejected the null hypothesis at p<0.001. Poisson has finite variance and a known exponential signature. The Lévy distribution's infinite variance means the KS test can't distinguish it from real browsing. The tradeoff: occasionally you get a 15-second delay where Poisson would have given 2 seconds.

Figure 1 — Inter-request delay CDFs vs reference browsingKS test, two-sample
0.000.250.500.751.00100ms200ms500ms1s2s5s10s30sinter-request delay (log)F(t) cumulative probabilityD = 0.244Poisson rejectedp < 0.001Lévy α=1.5 indistinguishablep = 0.68Poisson (rate 1/2000ms)Lévy stable (α=1.5, β=1)real browsing reference
Empirical CDFs of inter-request delays. Sampling code in packages/shared/src/anonymity/timing.ts. The KS test rejects Poisson against the reference at p < 0.001 (D=0.244); the heavy-tailed Lévy stable distribution is indistinguishable from the reference at p=0.68. Curve values are computed from the analytical Poisson CDF and a Chambers-Mallows-Stuck Lévy sample, not a single production capture.

Markov chain cover traffic uses a 7-state browsing model, statistically indistinguishable from real browsing. Target decoys use k-anonymity: for every real target, k decoys of the same type are queried from the Tranco top 1000. Observer identification probability: 1/(k+1). The decoy pool started with random domains — querying xj7k2m.xyz through a Tor exit is suspicious on its own. google.com and bbc.co.uk receive billions of queries daily. One more is invisible.

Entity resolution

The original resolver averaged field similarities with hardcoded weights. Email: 0.95, username: 0.70. It kept returning 0.7 confidence when Gravatar said same person and WHOIS said different registrant. A weighted average can't represent"these sources disagree and I genuinely don't know."

Dempster-Shafer fixes this. The uncertainty mass stays separate from belief and disbelief. When two sources conflict, the conflict mass K increases and the normalisation (1-K) shrinks the confident outputs. The result says "high conflict, low certainty" instead of a misleading 0.7.

For 15 person nodes, O(n²) resolution runs 105 comparisons in <1ms. For 200 nodes it's 19,900 comparisons. Still fast, but the quadratic scaling is visible. Spectral clustering via normalized graph Laplacian with power iteration catches the larger cases. One thing I didn't expect: disconnected components produce multiple zero eigenvalues and the shift-and-deflate method breaks. Added a BFS component counter as a fast path.

Cryptographic accounts

No email, no name, no KYC. Generate an ECDSA P-256 keypair in the browser. Public key = your identity. Server sends 32-byte random challenge, client signs with private key, server verifies. Stateless HMAC-SHA256 sessions. API keys encrypted at rest with AES-256-GCM via HKDF key derivation from the public key. Private key never leaves the client.

From the source

Lévy stable sampling — Chambers-Mallows-Stuck (1976)packages/shared/src/anonymity/timing.ts
function levySample(scale: number, min = 100, max = 30000): number {
  const alpha = 1.5 // heavy tail without Cauchy extremes
  const beta = 1.0  // right-skewed (positive delays)

  // Chambers-Mallows-Stuck algorithm
  const U = (Math.random() - 0.5) * Math.PI
  const W = -Math.log(Math.random() || 1e-10)

  const phi0 = Math.atan(beta * Math.tan(Math.PI * alpha / 2)) / alpha
  const factor = Math.pow(
    Math.cos(U - alpha * phi0) / W,
    (1 - alpha) / alpha
  ) * Math.sin(alpha * (U - phi0))
    / Math.pow(Math.cos(U), 1 / alpha)

  return Math.min(Math.abs(factor) * scale + min, max)
}
k-anonymity decoy guarantee — identification probability 1/(k+1)packages/shared/src/anonymity/decoys.ts
function generateDecoys(real: string, type: SeedType, k: number): string[] {
  const pool = POOLS[type].filter(t => t !== real.toLowerCase())
  const count = Math.min(k, pool.length)
  const decoys: string[] = []
  const used = new Set<number>()

  const randomBytes = new Uint32Array(count)
  crypto.getRandomValues(randomBytes)

  for (let i = 0; i < count; i++) {
    let idx = randomBytes[i] % pool.length
    while (used.has(idx)) idx = (idx + 1) % pool.length
    used.add(idx)
    decoys.push(pool[idx])
  }
  return decoys
}

// k=3 → 25% identification probability
// k=5 → 16.7%
function identificationProbability(k: number): number {
  return 1 / (k + 1)
}

Known gaps

  • Spectral clustering assumes the graph is connected. Disconnected components produce multiple zero eigenvalues and the power-iteration-with-deflation method returns wrong values. BFS fast path handles this, but the eigendecomposition itself doesn't.
  • Node.js can't fully control TLS extensions, so the JA3 fingerprint is shuffled but not impersonated. An anti-bot system with a Chrome JA3 whitelist (rather than a bot blacklist) would flag the connection.
  • WHOIS parsing is regex-based. Every registrar formats output differently. The parser catches about 70% of cases. The other 30% return partial or empty data.
  • Social profile detection via HEAD requests has a ~15% false positive rate. LinkedIn returns 200 for non-existent profiles.
  • 331 tests but no integration tests. Everything is unit-tested against mocked data. Spinning up Neo4j + Redis + Tor in CI is doable but I haven't prioritised it.

Stack

React + Vite frontend with force-directed graph visualisation. Hono API server + BullMQ worker. Neo4j graph database with Cypher queries. Redis for job queue and caching. SQLite WAL for account storage. ECDSA P-256 auth. Tor SOCKS5 proxy. Docker Compose for local dev. Headless CLI for scripting.

Authorised use

Use only on systems and accounts you own, or for which you have written authorisation. Aggregating personal data from public sources still triggers obligations under UK GDPR and the Computer Misuse Act 1990. Methodology at /scope.