A Vault1337 companion tool

Insight | Web Threat Scanner

Open-source passive web threat scanner. Submit any URL — Insight fetches all public resources and analyses them entirely on content, with no reliance on reputation databases.

License Python Django React Celery

Overview

Insight is a passive web threat scanner. Submit any URL and it fetches all public resources — HTML, scripts, HTTP headers, and TLS certificates — then analyses them entirely on content. There are no calls to reputation databases or external threat intelligence APIs.

Because detection is content-based, Insight catches zero-day campaigns, freshly registered phishing domains, and newly injected skimmers that reputation feeds haven't yet indexed. It is a companion tool to vault1337.com and shares the same design system.

The result of each scan is a prioritised findings report covering JavaScript threats, phishing indicators, domain intelligence, security misconfigurations, and the full detected technology stack — each finding categorised by severity (CRITICAL / HIGH / MEDIUM / LOW / INFO) with supporting evidence extracted directly from the page content.

Disclaimer: All scan results are provided for informational purposes only and must be independently verified by a trained security analyst before any action is taken. Findings may contain false positives or miss threats not covered by the current detection rules.

Requirements

RequirementVersionNotes
Python3.11+Global or virtual environment
Node.js18+For the frontend dev server
Redis7+Required for Celery task queue — Docker is the easiest option

1. Start Redis

Redis must be running before the backend or Celery worker will start successfully.

A
Docker (any OS — recommended)
docker run -d -p 6379:6379 redis:7-alpine
B
Native (Linux / macOS)
redis-server

2. Backend

1
Clone the repository
git clone https://github.com/DanDreadless/Insight.git
cd Insight/backend
2
Install Python dependencies
pip install -r requirements.txt
3
Configure the environment

Copy the sample env file and set at minimum a SECRET_KEY value.

cp ../.env.sample ../.env
# Edit ../.env — set SECRET_KEY to a long random string
4
Run migrations and start the server
python manage.py migrate
python manage.py runserver

The Django API will be available at http://localhost:8000.

3. Celery Worker

Scans are executed as asynchronous Celery tasks. The worker must be running in a separate terminal — scans will queue but never run without it.

1
Start the worker (from the backend/ directory)
celery -A insight worker -l info

4. Frontend

The React frontend runs on its own dev server and proxies all /api/ requests to Django on port 8000.

1
Install and start (from the frontend/ directory)
npm install
npm run dev

The dev server starts at http://localhost:5173.

5. Verify

URLExpected
http://localhost:5173React frontend — URL input and scan interface
http://localhost:8000/api/health/{"status":"ok"}
http://localhost:8000/api/schema/swagger-ui/Interactive API documentation

Run the test suite

# All tests
python manage.py test scanner

# Single module
python manage.py test scanner.tests.test_validators

Docker (Alternative)

If you prefer not to install Python and Node.js locally, the full stack can be started with Docker Compose.

1
Configure and start
cp .env.sample .env   # edit SECRET_KEY
docker-compose up --build

Frontend on :5173, backend on :8000.

2
Teardown
# Stop and remove volumes
docker-compose down -v

# Full clean (removes images too)
docker-compose down --rmi all --volumes --remove-orphans

Environment Variables

All configuration lives in .env at the repo root. Copy from .env.sample — the only required change for local development is setting SECRET_KEY.

VariableDefaultNotes
SECRET_KEY(insecure sample)Must be changed — startup fails in production if unchanged
DEBUGTrueSet False in production
REDIS_URLredis://localhost:6379/0Use rediss:// (TLS) in production
DATABASE_URLsqlite:///db.sqlite3Use PostgreSQL in production
CORS_ALLOWED_ORIGINShttp://localhost:5173Frontend origin
RATE_LIMIT_SCANS_PER_HOUR5Per IP address
MAX_SCAN_RESOURCES50External scripts analysed per scan
SCAN_TIMEOUT_SECONDS60Hard Celery task time limit
CARAPACE_URL(unset)Base URL of the Carapace API, e.g. http://carapace:8080. Screenshots disabled if unset.
CARAPACE_API_KEY(unset)Optional API key sent in X-Api-Key header to Carapace
CARAPACE_SCREENSHOT_TIMEOUT30Per-request timeout in seconds for Carapace render calls

How It Works

When a URL is submitted, Insight fetches the page and all linked external scripts using an SSRF-safe HTTP fetcher. No external threat intelligence APIs are called — all analysis is performed against the raw content returned by the target server.

Five analysis modules run against every scan:

  • JavaScript analyser — 59 checks against all scripts collected from the page
  • HTML analyser — 33+ structural checks against the page markup
  • Domain intelligence — 14 checks on the hostname and TLD
  • Header analyser — 15 checks on HTTP response headers
  • SSL analyser — 6 checks on the TLS certificate

Each check emits zero or more findings. A context-collapse engine fires additional synthetic findings when combinations of signals indicate coordinated attack infrastructure.

Verdict & Scoring

Verdict derivation

VerdictCondition
MALICIOUSAny CRITICAL finding
SUSPICIOUSAny HIGH finding, or 2+ MEDIUM findings
CLEANOnly LOW and INFO findings
UNKNOWNNo findings at all

Context collapse rules

Synthetic findings are generated when multiple signals combine to indicate a coordinated attack pattern:

  • High-risk TLD + external form action + missing security headers → HIGH "phishing infrastructure"
  • DGA domain + hidden iframe + obfuscated JS → CRITICAL "drive-by malware delivery"
  • Brand impersonation + phishing form (± new certificate) → CRITICAL "active phishing campaign"
  • Keylogger / skimmer + DevTools evasion → CRITICAL "sophisticated targeted malware"
  • Fake CAPTCHA / ClickFix UI + clipboard write → CRITICAL "ClickFix malware delivery"
  • Injected unknown external script + ClickFix / clipboard payload → CRITICAL "compromised site delivering ClickFix malware"
  • Newly registered domain (≤ 30 days) + high-risk TLD → HIGH "newly registered high-risk domain" — purpose-built attack infrastructure signal

JavaScript Analysis

All scripts are run through jsbeautifier before analysis (files ≤ 256 KB). Both the original and beautified forms are checked. Base64-encoded strings are decoded and the plaintext payload is appended to the evidence block where printable.

CRITICAL

CheckPattern
Encoded payload executioneval(atob(...)), eval(unescape(...)), eval(decodeURIComponent(...)) chains
Session theftCookie / localStorage read + outbound fetch / XHR / sendBeacon
Credential harvesterForm submit hijack + external fetch / sendBeacon + preventDefault
Keyloggerkeydown / keyup listener reading event.key + outbound network call
Magecart skimmerDOM query targeting card / CVV fields + exfiltration + encoding or polling
Crypto minerStratum protocol strings, CoinHive / CryptoLoot names, WebWorker + WASM pattern
Unix shell dropperbase64 -d | bash pattern embedded in JS strings
PowerShell dropperirm ... | iex / Invoke-RestMethod ... | iex embedded in JS strings
HTML smugglingnew Blob([...]) + URL.createObjectURL + auto-download trigger
Web3 wallet drainerwindow.ethereum + eth_sendTransaction / eth_signTypedData / personal_sign
Malicious service workernavigator.serviceWorker.register() from blob: or data: URI
Remote code executionfetch() + .then() / await + eval() / new Function() in same async chain — compromised WordPress pattern
Decrypt-then-executecrypto.subtle.decrypt / importKey + eval() / new Function() — encrypted payload executed at runtime
ClickFix clipboard payloadnavigator.clipboard.writeText() argument contains shell command indicators (PowerShell, mshta, cmd.exe, | bash, etc.) — content-based detection regardless of click handler

HIGH

CheckPattern
Obfuscator.io fingerprint_0x array-rotation variable pattern
Character-code obfuscationString.fromCharCode(...) building strings character by character
High entropy stringShannon entropy > 5.5 bits/char on literals > 64 chars
Dynamic hidden iframecreateElement('iframe') + display:none / width:0
Forced downloadcreateElement('a') + .download + .click()
Beacon exfiltrationnavigator.sendBeacon() to external domain
Clipboard hijacknavigator.clipboard.writeText() outside a recognisable click handler
Script injectiondocument.write() injecting external <script src="https://...">
Shell stringbash -c execution string embedded in JS
C2 infrastructurecurl / wget to bare IP address
External service workerserviceWorker.register() loading from an external domain
Living off Trusted Sites (LoTS)Exfiltration via Telegram Bot API, Discord webhook, Slack webhook, Google Apps Script, Webhook.site, Pipedream, RequestBin, Pastebin API
Dynamic module importimport('https://...') loading an ES module from an unknown external URL

MEDIUM

CheckPattern
URL evasion['a','b','c'].join('') array-split string construction
Moderate entropyShannon entropy 4.8–5.5 bits/char (possibly encoded)
Auto-redirectwindow.location inside setTimeout < 3000ms
Right-click disablecontextmenu event + preventDefault()
DevTools detectionouterWidth / outerHeight delta, __REACT_DEVTOOLS_GLOBAL_HOOK__, console timing tricks

HTML Analysis

SeverityCheck
CRITICALPhishing form: action domain ≠ page domain + brand keyword in page title
CRITICALShell command (PowerShell, mshta, cmd, iex) embedded in HTML data-* attribute or event handler — ClickFix payload storage pattern
CRITICALShell command embedded in hidden HTML element (display:none, hidden input, <template>) — ClickFix payload storage pattern
HIGHPhishing form: action domain ≠ page domain (no brand signal in title)
HIGHHidden iframe (display:none, width=0, height=0, off-screen position)
HIGH<base href> pointing to external domain (URL hijacking)
HIGHMeta refresh redirect with delay ≤ 2s to external domain
HIGHLogin form transmitting credentials over plain HTTP
HIGHFake browser update page: browser/update terminology + executable download link (SocGholish / ClearFake)
HIGHFake CAPTCHA / ClickFix: human-verification text + Win+R / terminal execution instructions (expanded: "click to fix", "browser verification", "run the following command" variants)
HIGHClickjacking overlay: full-viewport fixed/absolute element with z-index > 100 + click handler
HIGHExternal script loaded from unknown domain that is also dns-prefetch-staged in the same page — deliberate WordPress compromise pattern (e.g. WPCode injection)
MEDIUM<base href> present — same origin, verify it is intentional
MEDIUMMeta refresh redirect (any delay)
MEDIUMRight-click disabled via oncontextmenu="return false"
MEDIUMSuspicious executable download link (.exe, .msi, .ps1, .bat, .hta, etc.)
MEDIUMInline script dominates page content (script > 3× non-script HTML)
MEDIUM<noscript> block contains external URL redirect
MEDIUMSensitive keywords in HTML comments (password, api_key, token, secret, etc.)
MEDIUMResources loaded from IPFS gateways (takedown-resistant phishing / drainer hosting)
MEDIUMExternal script preloaded via <link rel="preload" as="script"> or <link rel="prefetch"> from unknown domain — WordPress malware injection staging pattern
MEDIUMExternal <script src> from unknown domain without dns-prefetch staging
LOWExternal scripts loaded without Subresource Integrity (SRI)
LOWPassword field missing autocomplete attribute
LOWCSS user-select: none disabling text selection

Domain Intelligence

Domain checks run against the hostname of the scanned URL. No WHOIS or DNS lookups are made — detection is based on the domain string itself.

SeverityCheck
CRITICALSubdomain token is a typosquat of a known brand (Levenshtein edit distance 1)
CRITICALSubdomain contains exact brand keyword (e.g. paypal.attacker.com)
HIGHSLD (registered domain) is a typosquat of a known brand (edit distance 1)
HIGHIDN / homograph attack — Cyrillic or mixed-script characters in domain
HIGHBrand keyword in registered domain (e.g. paypal-secure.com) — attacker owns the SLD
MEDIUMHigh-risk TLD (.xyz, .top, .click, .loan, .zip, .cyou, and 20+ more) — strong context signal, not conclusive alone
HIGHDGA probability score > 0.8 — strong algorithmic generation signal, characteristics consistent with C2 infrastructure
MEDIUMDGA probability score 0.6–0.8 (consonant ratio + entropy + English subword absence)
MEDIUMDigit substitution in SLD (e.g. g00gle, faceb00k)
MEDIUMExcessive subdomain depth (> 4 labels)
MEDIUMHosted on abuse-prone free platform with long random subdomain (Cloudflare R2, Pages.dev, Firebase)
MEDIUMNewly registered domain (≤ 30 days old) — disproportionately present in threat feeds; requires WHOIS data
MEDIUMSubdomain encodes a domain via dot-to-hyphen substitution (e.g. support-paypal-com.zapier.app) — phishing-as-a-service technique using free subdomain hosting; HIGH if hosted on an abuse-prone platform
MEDIUMDelivery/postal brand keyword embedded in SLD (USPS, FedEx, DHL) without being the official site — fake parcel notification phishing
INFORecently registered domain (31–90 days old) — context signal for analysts

Brands monitored include: PayPal, Google, Microsoft, Apple, Amazon, Facebook, Instagram, Netflix, Steam, Coinbase, Binance, MetaMask, Ledger, Trezor, Trust Wallet, OpenSea, Roblox, Discord, Twitch, Spotify, Chase, Barclays, and more.

Header Analysis

15 checks on HTTP response headers. All checks are passive — no additional requests are made.

Severity philosophy: Missing defensive headers are configuration debt, not threat indicators. Industry consensus (Cobalt, OWASP, Invicti, pentest report standards) rates them LOW/INFO. They only escalate in meaning when combined with active threat signals — handled by the context collapse engine.

SeverityCheck
HIGHSite served over unencrypted HTTP — credentials and sessions exposed in plaintext
HIGHEnd-of-life server software (Apache 2.2, PHP 5.x, IIS 6/7) — unpatched CVEs, likely compromised or abandoned infra
HIGHCORS misconfiguration: wildcard Access-Control-Allow-Origin: * + Access-Control-Allow-Credentials: true
LOWMissing X-Content-Type-Options: nosniff
LOWMissing HSTS on HTTPS site — SSL-stripping attack possible via active MitM
LOWHSTS max-age below recommended 1 year
LOWServer header discloses software version — reconnaissance aid
LOWX-Powered-By header exposes backend technology
LOWInsecure cookie flags — missing HttpOnly, Secure, or SameSite
LOWCORS wildcard Access-Control-Allow-Origin: * (acceptable for public APIs, noted as LOW)
LOWCSP allows unsafe-inline or unsafe-eval — weakens XSS protection
INFOMissing Content-Security-Policy — hardening gap, not a threat signal
INFOMissing X-Frame-Options — hardening gap, not a threat signal
INFOMissing Referrer-Policy — privacy gap
INFOMissing Permissions-Policy — rarely set by any site

SSL Analysis

6 checks on the TLS certificate. The certificate is retrieved directly — no third-party certificate transparency APIs are used.

SeverityCheck
HIGHCertificate expires in fewer than 14 days
HIGHSelf-signed certificate
HIGHHostname / SAN mismatch
HIGHLet's Encrypt certificate issued to a brand-impersonating domain
MEDIUMDeprecated TLS version (1.0 or 1.1) negotiated
INFOCertificate issued within the last 7 days — new cert on suspicious domain is a phishing indicator

Technology Detection

Identifies the technology stack from HTML, script sources, HTTP headers, and cookies. Displayed as colour-coded badges with logos on the results page.

CategoryTechnologies detected
CMSWordPress, Drupal, Joomla, Ghost, Shopify, Wix, Squarespace, Webflow, HubSpot CMS
JS FrameworkReact, Next.js, Vue, Nuxt, Angular, Svelte, SvelteKit, Ember, Backbone.js, Astro, Remix, Gatsby, Solid.js
Build ToolVite, webpack
JS LibraryjQuery, Lodash, Axios, GSAP, Three.js, Alpine.js, htmx, Socket.io, Chart.js, D3.js, Swiper, Pusher
CSS FrameworkBootstrap, Tailwind CSS, Bulma, Font Awesome, UIkit
BackendPHP, Python, Node.js, Express, ASP.NET, Laravel, Django, Ruby on Rails, Java, Flask, FastAPI, Symfony, Spring Boot
Web Servernginx, Apache, Caddy, Gunicorn, LiteSpeed
CDNCloudflare, AWS CloudFront, Fastly, Akamai, jsDelivr
HostingVercel, Netlify, GitHub Pages, Firebase, Render
AnalyticsGoogle Analytics, Google Tag Manager, Facebook Pixel, Hotjar, Intercom, Mixpanel, Plausible, Matomo, TikTok Pixel, LinkedIn Insight, Cloudflare Web Analytics
SecurityCloudflare Turnstile, reCAPTCHA, hCaptcha, Cloudflare Bot Management
PaymentStripe, PayPal, Square, Klarna

Visual Renderer (Carapace)

Carapace is an optional companion service that provides a Chromium-headless visual screenshot of each scanned URL. It renders the page with JavaScript fully enabled but all network requests intercepted and blocked — allowing dynamic overlays (ClickFix, SocGholish, ClearFake, drainers) to execute and render visibly in the screenshot, revealing the actual attack UI. A verdict badge is composited onto every screenshot.

Carapace runs as a separate Docker service alongside the Insight backend. When CARAPACE_URL is configured, every scan automatically calls the POST /render endpoint and the screenshot is stored with the scan result. If Carapace is unavailable the scan continues normally — it is entirely best-effort.

In addition to the screenshot, Carapace returns a threat report with:

  • A risk score (0–100) derived from renderer-level observations
  • Threat flags — findings from the renderer converted directly into Insight findings under the Renderer category
  • Technology detections — a DOM-parsed tech stack that is merged with Insight's own BeautifulSoup-based detections, catching things a static parser may miss
Carapace is open-source and available at github.com/DanDreadless/Carapace. It is designed to be deployed as a sidecar alongside Insight — see the Carapace README for setup instructions.

Renderer Findings

Threat flags from Carapace are converted to Insight findings in the Renderer category. Sanitisation-behaviour codes that fire on almost every real page (e.g. BLOCKED_ELEMENT_SCRIPT, NETWORK_ATTEMPT_BLOCKED) are suppressed — only signals with independent threat value are surfaced.

SeverityFlagDescription
CRITICALDrive-by download blockedRenderer intercepted an automatic file download — filename, MIME type, and SHA-256 recorded without executing the file
HIGHJS eval detectedeval() or new Function() execution observed at render time — runtime obfuscation
HIGHExfiltration attempt blockedOutbound XHR / fetch to an external domain intercepted by the network block
HIGHCredential field on HTTPPassword input found on an unencrypted page — credentials would be sent in plaintext
MEDIUMRedirect chain detectedMultiple HTTP redirects observed before final page load — common in traffic distribution systems
MEDIUMSuspicious download linkExecutable file linked from page content (.exe, .msi, .ps1, etc.)
MEDIUMDevTools evasion detectedRenderer-side debugger / devtools detection attempt observed

API Endpoints

All endpoints are under /api/. No authentication is required — rate limiting is enforced per IP address.

MethodPathDescription
POST/api/scan/Submit a URL for scanning. Returns {"id": "...", "status": "PENDING"}. 202 on success, 429 if rate limited.
GET/api/scan/{id}/Poll scan status and retrieve full results once complete.
GET/api/scan/{id}/stream/Server-Sent Events stream — events: status_update, complete, error. Auto-retry on disconnect (3 attempts, 2s delay).
GET/api/scan/{id}/source/Re-fetch the raw source of a URL that belongs to a completed scan. Only permits URLs within the original scan scope.
GET/api/history/Paginated list of completed scans. Supports ?q= URL substring filter and ?page=.
GET/api/health/Health check — returns {"status":"ok"}.
GET/api/schema/swagger-ui/Interactive API documentation (drf-spectacular).
SSRF protection: All outbound HTTP requests made during a scan go through an SSRF-safe fetcher that rejects non-HTTP(S) schemes, resolves DNS and blocks RFC1918 / loopback / link-local addresses, caps response bodies at 5 MB, and enforces hard timeouts. It is not possible to use the scan endpoint to probe internal network resources.

Tech Stack

LayerTechnology
BackendPython 3.11 / Django 5.2 / Django REST Framework
Task queueCelery + Redis
API docsdrf-spectacular — Swagger UI at /api/schema/swagger-ui/
FrontendReact 19 / TypeScript / Vite / Tailwind CSS 4
DatabaseSQLite (development) / PostgreSQL (production)
Cache / brokerRedis
Visual rendererCarapace — Chromium-headless screenshot service (optional sidecar)

Acknowledgements

Insight is built on a number of excellent open-source libraries.

Backend & Analysis

Frontend

Infrastructure

Visual Renderer