Footprinting and Reconnaissance
Footprinting (passive information gathering) and reconnaissance (active probing) are typically the first phases of an engagement.
These phases collect intelligence about a target to identify potential vulnerabilities without (passive) or with (active) interaction with the target systems.
Footprinting (Passive)
Footprinting is the gathering and recording of publicly available information about a target (names, addresses, phone numbers, products used, relationships, etc.) without direct interaction with the target.
It relies on Open Source Intelligence (OSINT) such as search engines, public records, social media, and third‑party services.
Passive collection can reveal potential vulnerabilities and weaknesses.
Reconnaissance (Active)
Reconnaissance (active footprinting) uses scanning and probing to discover live hosts, open ports, services, and versions.
This is more intrusive than passive footprinting.
Common tools:
nmap,masscan, and service-specific probes.
Keeping an organized Inventory (spreadsheet or database) to track findings: names, dates, links, IP addresses, OS, exposed services, and login endpoints.
- A well‑structured inventory prevents duplicate work, helps find what is relevant, and prioritizes follow-up scanning or analysis.
Web Searches and Google Hacks (Google Dorking)
Google provides advanced search operators (directives) that enable focused OSINT collection. These operators can reveal exposed files, admin panels, login pages, and misconfigurations.
Google Dork queries can help uncover:
- Exposed sensitive files or data
- Misconfigured websites
- Hidden admin panels and login/backups
- Publicly available information useful for security research
| Purpose | Operator | Example |
|---|---|---|
| Search within a specific domain | site: | site:wiley.com cybersecurity |
| Find specific file types or extensions | filetype: / ext: | filetype:pdf "ethical hacking" |
| Pages with words in the title | intitle: / allintitle: | intitle:"login page" |
| Words in the URL | inurl: / allinurl: | inurl:admin |
| Words in page body | intext: / allintext: | intext:"confidential" |
| View Google's cached version | cache: | cache:example.com |
| Find related sites | related: | related:cnn.com |
| Pages that link to a domain | link: | link:starbucks.com |
Useful Combined Queries
- Find usernames:
allintext:username filetype:log - Find email lists:
allintext:email filetype:txt - Find SSH private keys:
intitle:"index of" id_rsa -id_rsa.pub
WHOIS Database Records
WHOIS is a database of domain registration details (registrant contact, registrar, creation/expiration dates, name servers, and registration IPs) maintained by domain registrars and Regional Internet Registries (RIRs) overseen by ICANN (Internet Corporation for Assigned Names and Numbers).
- While privacy protections have limited some public WHOIS data, WHOIS remains useful for reconnaissance and attribution.
Accessing WHOIS Information
Use lookup services (e.g., whois.net, lookup.icann.org, domaintools.com) or query Regional Internet Registries (RIRs) directly.
DomainTools provides enhanced historic and contextual data (paid features available).
Regional Internet Registries (RIRs)
The five main RIRs are:
APNIC — Asia Pacific Network Information Centre:
https://www.apnic.net/(Asia-Pacific)RIPE NCC — Réseaux IP Européens:
https://www.ripe.net/(Europe, Middle East, parts of Central Asia)ARIN — American Registry for Internet Numbers:
https://www.arin.net/(North America)LACNIC — Latin American and Caribbean Internet Address Registry:
https://www.lacnic.net/(Latin America & Caribbean)AfriNIC:
https://afrinic.net/(Africa)
Understanding Name Server Entries
Name servers hold DNS records for a domain; WHOIS records often list authoritative name servers.
From name server names and associated IPs you can infer hosting providers or CDNs (for example, AWS, Cloudflare) and identify where to continue infrastructure reconnaissance.
nslookup,dig, and online DNS lookup services to query A, MX, TXT, CNAME, and other records.
Third‑party Sources of Intelligence
Publicly traded companies and large organizations publish additional information (financial reports, executive names, partnerships) that can be collected.
In the US, consult the SEC EDGAR database for company filings (10‑K, annual reports).
Sources for Collecting Intelligence
Collect and analyse information from websites, forums, news, public registers, and specialised databases.
- Validate collected information as data can be outdated, incorrect, or intentionally misleading.
Social Networks
Social platforms (X/Twitter, LinkedIn, Facebook, Instagram) are rich OSINT sources: marketing posts, employee activity, job postings, and user comments.
LinkedIn is especially useful for discovering employee names, roles, and technologies are valuable for targeted social‑engineering.
Organizational Website Reconnaissance
Inspect target websites for contact info, site maps,
robots.txt, investor relations, and job listings.Job postings often disclose technologies in use and team responsibilities.
Inspect client-side code for libraries and plugins (e.g., jQuery, Bootstrap, fancybox). Outdated components may map to known CVEs.
Meta tags and comments in HTML can reveal developer notes or framework details.
Location of login/admin pages may be discoverable through URL patterns or site structure.
Documents, Pictures & Metadata
Many file formats embed metadata (data about the file) which can include creator names, application used, creation/modification timestamps, and sometimes GPS coordinates for images. Information leakage through metadata can reveal sensitive details about the target.
Common file types with useful metadata:
.pdf,.doc,.docx,.xls,.xlsx,.ppt,.pptx,.jpg,.jpeg
Accessing Hidden Information
Hidden or embedded data can be exposed via file properties, EXIF readers, or forensic tools.
Many organizations publish documents and images on public sites—these files can leak metadata that was not intended for public consumption.
In digital forensics, investigators use tools and methods to examine file structures and metadata to uncover concealed content within seemingly innocuous files.
Quick methods
- File properties (Windows): Right‑click → Properties → Details tab (quick but may be incomplete).
- Use EXIF/metadata tools for comprehensive extraction.
EXIF / Metadata Tools
- EXIF stands for Exchangeable Image File Format; it mainly applies to images but metadata extraction tools support many document formats.
- ExifTool extracts extensive metadata from images and documents.
Practical workflow
- Collect files from the target website (documents, images).
- Run metadata extraction with ExifTool and record findings in your inventory.
- Correlate metadata (names, timestamps, locations) with other OSINT findings (social profiles, job posts, WHOIS data).
Tools to continue with
DomainTools : domain history, DNS, and WHOIS context.
ExifTool : metadata extraction.
GitHub and forum monitoring : search for leaked credentials, code, or project details.
SpiderFoot : automated OSINT collection and correlation.
Dmitry : command-line reconnaissance tool for quick domain/IP enumeration.
Shodan : internet-wide search engine for discovering exposed devices and services.
Wayback Machine : archived website snapshots for historical content analysis.
OSINT Tools
Maltego
Maltego is a visual link-analysis tool that ingests data points and uses "transforms" to discover and connect entities (domains, people, IPs, emails, infrastructure) to draw a bigger picture.
- Visual link analysis for aggregating relationships between domains, people, and infrastructure.
- Visual graphs often reveal correlations not obvious from raw data. Results can be saved or exported (XML/CSV) for inventorying.
GitHub and Online Forums
GitHub, forums, and social platforms (X, Reddit, StackOverflow, Glassdoor) frequently leak sensitive information like secrets, API tokens, credentials, internal docs, or developer discussions.
- Monitor and search these platforms to detect exposed secrets and project information.
- Example incident: leaked Mercedes source code
Best practices include using secret scanning tools, setting up alerts for sensitive keywords, and regularly auditing repositories for exposed data.
SpiderFoot
SpiderFoot is an automated OSINT collection tool that scans domains, IPs, emails, and addresses and reports discovered data. It supports various scan modes (Footprint, Investigate, Passive).
- Integrates with numerous data sources (WHOIS, DNS, social media, breach databases) to gather extensive information.
- Built-in modules cover a wide range of data points, from domain info to social media profiles.
Dmitry (Deepmagic Information Gathering Tool)
Dmitry is a lightweight command-line reconnaissance tool (included in Kali) for domain and IP lookups, subdomain enumeration, email discovery, and optional aggressive scans such as TCP port scanning.
- Useful for quickly enumerating possible subdomains that may be legacy or forgotten and could contain outdated technologies.
Shodan
Shodan is an internet-wide search engine that crawls the internet, indexes devices, services, and their exposed ports and banners.
It can reveal devices and services vulnerabilities already scanned and categorized.
Attackers use Shodan to find vulnerable devices; defenders can use it to discover exposed assets.
Archived Information (Wayback Machine)
The Wayback Machine (Archive.org) stores historical snapshots of websites and can reveal content, pages, or documents that have since been removed or changed.
- Archived pages can provide older documents, comments, or technology references useful for reconnaissance without requiring paid services.
Summary
WHOIS data provides foundational domain registration info; follow up with geolocation and context tools.
Third-party sources (SEC filings, corporate reports) can yield executive names, technologies, and structure.
Social networks and organizational websites are rich OSINT sources for employee info and technologies.
Document and image metadata can leak sensitive info; use ExifTool for comprehensive extraction.
Combine multiple OSINT tools (Maltego, DomainTools, SpiderFoot, etc.) for deeper analysis and correlation of findings.
