Introduction
Today, I’m excited to announce the launch of infoanalyzer, an open-source project designed to revolutionize how security professionals gather and analyze information from web applications. Whether you’re a penetration tester, security researcher, or system administrator, infoanalyzer provides a suite of powerful tools to streamline reconnaissance and uncover valuable insights that might otherwise remain hidden.
The initial release of infoanalyzer includes two complementary tools:
- PHPInfo Grabber: Extracts and analyzes system information from phpinfo() pages
- URL Categorizer: Intelligently organizes and classifies discovered URLs
Together, these tools form the foundation of a comprehensive web reconnaissance methodology that transforms raw data into actionable intelligence.
PHPInfo Grabber: Unveiling Server Insights
The Power of phpinfo()
For those unfamiliar, phpinfo() is a PHP function that displays detailed information about the PHP environment, server configuration, and runtime settings. While this information is invaluable for debugging and configuration purposes, it can also expose sensitive details that may be leveraged during security assessments.
PHPInfo Grabber transforms the dense, table-heavy output of phpinfo() into structured, easily analyzable data formats, enabling you to quickly identify security misconfigurations, sensitive paths, and potential vulnerabilities.
Key Features
1. Robust Data Extraction
PHPInfo Grabber employs multiple parsing strategies to handle various phpinfo() page structures:
- Table-based parsing for standard phpinfo() layouts
- Alternative parsing for non-standard structures
- Regex-based extraction as a fallback mechanism
This ensures the tool works effectively across different PHP versions and configurations.
2. Intelligent Information Categorization
The tool automatically categorizes extracted information into meaningful sections:
- System: OS details, architecture, hostname
- PHP: Version, configuration, extensions, limits
- Server: Web server software, document root, request data
- Paths: File system locations, configuration paths
- Environment: Environment variables
- Database: Database connections and configurations
- Interesting Files: Automatically detected sensitive files and paths
3. Comprehensive Export Options
All findings can be exported in multiple formats:
- JSON: Full data export for programmatic analysis
- CSV: Categorized data for spreadsheet analysis
- TXT: Human-readable summary reports
4. Actionable Intelligence
Beyond simply extracting data, PHPInfo Grabber provides:
- Highlighted sensitive information
- Suggested next steps for further exploration
- Generated commands based on discovered information
- Potential security issues based on configuration values
Usage Example
Using PHPInfo Grabber is straightforward:
python phpinfo_grabber.py https://example.com/phpinfo.php
#!/usr/bin/env python3
import requests
import re
import argparse
import os
import csv
import json
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse
from datetime import datetime
# ANSI colors for better readability
class Colors:
GREEN = '\033[92m'
YELLOW = '\033[93m'
RED = '\033[91m'
BLUE = '\033[94m'
MAGENTA = '\033[95m'
CYAN = '\033[96m'
ENDC = '\033[0m'
BOLD = '\033[1m'
def print_banner():
banner = f"""{Colors.BLUE}{Colors.BOLD}
╔═══════════════════════════════════════════════════╗
║ PHPINFO GRABBER TOOL ║
║ Extract and analyze system information ║
╚═══════════════════════════════════════════════════╝
{Colors.ENDC}"""
print(banner)
def get_phpinfo(url, timeout=10, verify_ssl=False, user_agent=None, proxy=None):
"""Fetch phpinfo page content"""
# Setup headers
headers = {
'User-Agent': user_agent or 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Connection': 'keep-alive',
}
# Setup proxy if provided
proxies = None
if proxy:
proxies = {
'http': proxy,
'https': proxy
}
try:
response = requests.get(
url,
headers=headers,
timeout=timeout,
verify=verify_ssl,
proxies=proxies
)
if response.status_code == 200:
print(f"{Colors.GREEN}[+] Successfully fetched phpinfo page from {url}{Colors.ENDC}")
return response.text
else:
print(f"{Colors.RED}[!] Failed to fetch phpinfo page. Status code: {response.status_code}{Colors.ENDC}")
return None
except requests.exceptions.Timeout:
print(f"{Colors.RED}[!] Request timed out for {url}{Colors.ENDC}")
return None
except requests.exceptions.SSLError:
print(f"{Colors.YELLOW}[!] SSL verification failed. Try with --no-verify-ssl option.{Colors.ENDC}")
return None
except requests.exceptions.ConnectionError:
print(f"{Colors.RED}[!] Connection error for {url}. Check if the URL is correct.{Colors.ENDC}")
return None
except Exception as e:
print(f"{Colors.RED}[!] Error fetching {url}: {str(e)}{Colors.ENDC}")
return None
def parse_phpinfo(html_content):
"""Parse phpinfo HTML and extract key-value pairs"""
soup = BeautifulSoup(html_content, 'html.parser')
# Different phpinfo styles and structures
# Try to detect the phpinfo style
phpinfo_data = {}
# Try to find all tables
tables = soup.find_all('table')
if tables:
print(f"{Colors.GREEN}[+] Found {len(tables)} tables in phpinfo page{Colors.ENDC}")
for table_index, table in enumerate(tables):
# Check if this table has section headers (tr with th that spans multiple columns)
section_header = None
section_headers = table.find_all('tr', class_='h')
if section_headers:
for header in section_headers:
th = header.find('th')
if th:
section_header = th.text.strip()
break
# If no section header found in the table classes, try to find it another way
if not section_header:
# Look for a preceding h2 element
prev_h2 = table.find_previous('h2')
if prev_h2:
section_header = prev_h2.text.strip()
# Default section name if none found
if not section_header:
section_header = f"Section_{table_index}"
# Add the section to our data structure
if section_header not in phpinfo_data:
phpinfo_data[section_header] = {}
# Process rows
rows = table.find_all('tr')
for row in rows:
# Skip header rows
if row.get('class') and 'h' in row.get('class'):
continue
# Extract key and value
cells = row.find_all(['td', 'th'])
if len(cells) >= 2: # Key-value pair
key = cells[0].text.strip()
value = cells[1].text.strip()
# Skip empty keys
if key:
phpinfo_data[section_header][key] = value
else:
# Alternative approach for non-table structures
print(f"{Colors.YELLOW}[!] No tables found. Trying alternative parsing method...{Colors.ENDC}")
# Try to find divs with class 'center'
center_divs = soup.find_all('div', class_='center')
if center_divs:
for div in center_divs:
# Try to find section headers (h2 elements)
h2s = div.find_all('h2')
current_section = "General"
for element in div.children:
if element.name == 'h2':
current_section = element.text.strip()
if current_section not in phpinfo_data:
phpinfo_data[current_section] = {}
elif element.name == 'table':
# Process this table under the current section
rows = element.find_all('tr')
for row in rows:
cells = row.find_all(['td', 'th'])
if len(cells) >= 2:
key = cells[0].text.strip()
value = cells[1].text.strip()
if key:
phpinfo_data[current_section][key] = value
else:
# Last resort: try to extract using regex
print(f"{Colors.YELLOW}[!] No standard phpinfo structure found. Using regex patterns...{Colors.ENDC}")
# Find variable-value pairs using regex
pattern = r'<tr><td class="e">(.*?)</td><td class="v">(.*?)</td></tr>'
matches = re.findall(pattern, html_content, re.DOTALL)
if matches:
phpinfo_data["General"] = {}
for key, value in matches:
# Clean up HTML entities and tags
key = re.sub(r'<.*?>', '', key).strip()
value = re.sub(r'<.*?>', '', value).strip()
if key:
phpinfo_data["General"][key] = value
else:
print(f"{Colors.RED}[!] Failed to extract data using all methods.{Colors.ENDC}")
# If we still have no data, try a very basic approach
if not phpinfo_data:
print(f"{Colors.YELLOW}[!] Trying basic key-value extraction...{Colors.ENDC}")
# Very basic pattern to try to extract key-value pairs
basic_pattern = r'<tr[^>]*>\s*<td[^>]*>(.*?)</td>\s*<td[^>]*>(.*?)</td>'
basic_matches = re.findall(basic_pattern, html_content, re.DOTALL)
if basic_matches:
phpinfo_data["Basic_Extraction"] = {}
for key, value in basic_matches:
# Clean up HTML entities and tags
key = re.sub(r'<.*?>', '', key).strip()
value = re.sub(r'<.*?>', '', value).strip()
if key:
phpinfo_data["Basic_Extraction"][key] = value
return phpinfo_data
def extract_interesting_data(phpinfo_data):
"""Extract interesting information from phpinfo data"""
interesting_data = {
"System": {},
"PHP": {},
"Server": {},
"Paths": {},
"Environment": {},
"Database": {},
"Interesting Files": [],
}
# Regular expressions for interesting file paths
file_patterns = [
r'(?i)(?:^|[\/\\])(?:etc|usr|var|opt|tmp|home|root|www|web|public_html|app|config|database)(?:[\/\\][^\/\\]+)+',
r'(?i)(?:\.php|\.ini|\.conf|\.xml|\.json|\.yml|\.yaml|\.log|\.txt|\.sql|\.db|\.sqlite|\.htaccess)$',
r'(?i)(?:password|passwd|key|secret|token|credential|auth|api_key)(?:\.txt|\.ini|\.conf|\.json|\.xml|\.yml|\.yaml)$'
]
# Interesting keys to look for
interesting_keys = {
"System": [
"System", "PHP Version", "Server API", "Server Name", "Server Addr", "Server Port",
"User/Group", "Server Software", "Server OS", "PHP OS", "OS", "Architecture",
"Hostname", "DOCUMENT_ROOT", "SERVER_NAME", "SERVER_ADDR", "SERVER_PORT", "REMOTE_ADDR"
],
"PHP": [
"Configure Command", "Loaded Configuration File", "Additional .ini files parsed",
"extension_dir", "disable_functions", "allow_url_fopen", "allow_url_include",
"upload_max_filesize", "post_max_size", "memory_limit", "max_execution_time",
"include_path", "open_basedir", "display_errors", "error_reporting", "log_errors",
"error_log", "opcache", "xdebug"
],
"Server": [
"DOCUMENT_ROOT", "SERVER_SOFTWARE", "SERVER_NAME", "SERVER_ADDR", "SERVER_PORT",
"REMOTE_ADDR", "REMOTE_PORT", "HTTP_HOST", "HTTP_USER_AGENT", "HTTP_ACCEPT",
"HTTP_ACCEPT_LANGUAGE", "HTTP_ACCEPT_ENCODING", "HTTP_CONNECTION", "HTTP_REFERER",
"REQUEST_TIME", "REQUEST_TIME_FLOAT", "QUERY_STRING", "REQUEST_URI", "SCRIPT_NAME",
"SCRIPT_FILENAME", "PATH_INFO", "PATH_TRANSLATED", "PHP_SELF", "HTTPS"
],
"Paths": [
"PATH", "DOCUMENT_ROOT", "SCRIPT_FILENAME", "Loaded Configuration File",
"Additional .ini files parsed", "extension_dir", "include_path", "open_basedir",
"error_log", "upload_tmp_dir", "session.save_path", "sys_temp_dir", "doc_root"
],
"Environment": [
"PATH", "HOME", "USER", "HOSTNAME", "PWD", "SHELL", "LANG", "REMOTE_ADDR",
"HTTP_USER_AGENT", "SERVER_SOFTWARE", "SERVER_NAME", "SERVER_ADDR"
],
"Database": [
"PDO", "mysqli", "mysql", "pgsql", "sqlite", "oci", "dbx", "odbc", "mssql",
"db2", "mongodb", "redis", "memcached", "memcache"
]
}
# Extract interesting data
for section_name, section_data in phpinfo_data.items():
for key, value in section_data.items():
# Look for file paths in values
for pattern in file_patterns:
file_matches = re.findall(pattern, value)
for file_match in file_matches:
if file_match not in interesting_data["Interesting Files"]:
interesting_data["Interesting Files"].append(file_match)
# Categorize based on interesting keys
for category, keys in interesting_keys.items():
for interesting_key in keys:
if interesting_key.lower() in key.lower() or key.lower() in interesting_key.lower():
interesting_data[category][key] = value
break
# If certain categories are empty, delete them
for category in list(interesting_data.keys()):
if isinstance(interesting_data[category], dict) and not interesting_data[category]:
del interesting_data[category]
elif isinstance(interesting_data[category], list) and not interesting_data[category]:
del interesting_data[category]
return interesting_data
def export_data(phpinfo_data, interesting_data, output_dir):
"""Export the extracted data to various formats"""
if not os.path.exists(output_dir):
os.makedirs(output_dir)
print(f"{Colors.GREEN}[+] Created output directory: {output_dir}{Colors.ENDC}")
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
# Export full phpinfo data to JSON
json_file = os.path.join(output_dir, f"phpinfo_full_{timestamp}.json")
with open(json_file, 'w', encoding='utf-8') as f:
json.dump(phpinfo_data, f, indent=2)
print(f"{Colors.GREEN}[+] Full phpinfo data exported to: {json_file}{Colors.ENDC}")
# Export interesting data to JSON
interesting_json = os.path.join(output_dir, f"phpinfo_interesting_{timestamp}.json")
with open(interesting_json, 'w', encoding='utf-8') as f:
json.dump(interesting_data, f, indent=2)
print(f"{Colors.GREEN}[+] Interesting data exported to: {interesting_json}{Colors.ENDC}")
# Export interesting data to CSV
for category, data in interesting_data.items():
if isinstance(data, dict) and data:
csv_file = os.path.join(output_dir, f"phpinfo_{category.lower()}_{timestamp}.csv")
with open(csv_file, 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(["Key", "Value"])
for key, value in data.items():
writer.writerow([key, value])
print(f"{Colors.GREEN}[+] {category} data exported to: {csv_file}{Colors.ENDC}")
elif isinstance(data, list) and data:
csv_file = os.path.join(output_dir, f"phpinfo_{category.lower()}_{timestamp}.csv")
with open(csv_file, 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(["Value"])
for item in data:
writer.writerow([item])
print(f"{Colors.GREEN}[+] {category} data exported to: {csv_file}{Colors.ENDC}")
# Create a summary text file
summary_file = os.path.join(output_dir, f"phpinfo_summary_{timestamp}.txt")
with open(summary_file, 'w', encoding='utf-8') as f:
f.write("PHPINFO SUMMARY\n")
f.write("==============\n\n")
for category, data in interesting_data.items():
f.write(f"{category}:\n")
f.write(f"{'-' * len(category)}:\n")
if isinstance(data, dict):
for key, value in data.items():
f.write(f"{key}: {value}\n")
elif isinstance(data, list):
for item in data:
f.write(f"- {item}\n")
f.write("\n")
print(f"{Colors.GREEN}[+] Summary report exported to: {summary_file}{Colors.ENDC}")
return json_file, interesting_json, summary_file
def generate_commands(interesting_data):
"""Generate commands for further exploration based on interesting data"""
commands = []
# Paths to check
if "Paths" in interesting_data:
for key, path in interesting_data["Paths"].items():
if path and os.path.sep in path:
commands.append(f"ls -la {path}")
commands.append(f"find {path} -type f -name \"*.php\" | head -20")
commands.append(f"find {path} -type f -name \"*.conf\" -o -name \"*.ini\" | head -20")
# Interesting files to check
if "Interesting Files" in interesting_data:
for file_path in interesting_data["Interesting Files"]:
commands.append(f"cat {file_path}")
parent_dir = os.path.dirname(file_path)
if parent_dir:
commands.append(f"ls -la {parent_dir}")
# Database checks
if "Database" in interesting_data:
commands.append("grep -r \"DB_\" /var/www/ --include=\"*.php\" --include=\"*.ini\" --include=\"*.conf\"")
commands.append("find /var/www/ -name \"*.sql\" -o -name \"*.db\" -o -name \"*.sqlite\"")
# Web server checks
if "Server" in interesting_data:
if "DOCUMENT_ROOT" in interesting_data["Server"]:
doc_root = interesting_data["Server"]["DOCUMENT_ROOT"]
commands.append(f"ls -la {doc_root}")
commands.append(f"find {doc_root} -type f -name \"*.php\" | grep -i admin")
commands.append(f"find {doc_root} -type f -name \"*.php\" | grep -i login")
commands.append(f"find {doc_root} -type f -name \"*.php\" | grep -i config")
return commands
def display_interesting_data(interesting_data):
"""Display interesting data in a readable format"""
print(f"\n{Colors.GREEN}{Colors.BOLD}[+] INTERESTING INFORMATION FROM PHPINFO:{Colors.ENDC}\n")
for category, data in interesting_data.items():
print(f"{Colors.CYAN}{Colors.BOLD}{category}:{Colors.ENDC}")
if isinstance(data, dict):
for key, value in data.items():
# Highlight potentially sensitive information
if any(s in key.lower() for s in ['password', 'secret', 'key', 'token', 'api', 'credential']):
print(f" {Colors.RED}{key}:{Colors.ENDC} {value}")
else:
print(f" {Colors.YELLOW}{key}:{Colors.ENDC} {value}")
elif isinstance(data, list):
for item in data:
print(f" - {item}")
print("")
def suggest_next_steps(interesting_data, url):
"""Suggest next steps for further exploration"""
print(f"\n{Colors.GREEN}{Colors.BOLD}[+] SUGGESTED NEXT STEPS:{Colors.ENDC}\n")
# Generate commands
commands = generate_commands(interesting_data)
# URL parsing for suggestions
parsed_url = urlparse(url)
base_url = f"{parsed_url.scheme}://{parsed_url.netloc}"
print(f"{Colors.YELLOW}1. Further URL Exploration:{Colors.ENDC}")
print(f" - Check for common PHP applications at: {base_url}/")
print(f" - Look for admin interfaces: {base_url}/admin/, {base_url}/administrator/, {base_url}/wp-admin/")
print(f" - Check for other info disclosure: {base_url}/info.php, {base_url}/server-status, {base_url}/server-info")
if "Paths" in interesting_data:
print(f"\n{Colors.YELLOW}2. File System Exploration:{Colors.ENDC}")
print(" Based on the paths found, you might want to check:")
for key, path in list(interesting_data["Paths"].items())[:5]:
print(f" - {path}")
print(f"\n{Colors.YELLOW}3. Useful Commands:{Colors.ENDC}")
for i, cmd in enumerate(commands[:10]):
print(f" {i+1}. {cmd}")
if len(commands) > 10:
print(f" ... and {len(commands) - 10} more commands")
if "Database" in interesting_data:
print(f"\n{Colors.YELLOW}4. Database Investigation:{Colors.ENDC}")
print(" Look for database connection strings in configuration files")
print(f"\n{Colors.YELLOW}5. Further Scanning:{Colors.ENDC}")
print(f" - Run directory brute force: gobuster dir -u {base_url} -w /usr/share/SecLists/Discovery/Web-Content/raft-medium-directories.txt")
print(f" - Scan for vulnerabilities: nikto -h {base_url}")
print(f" - Check for PHP vulnerabilities: wpscan --url {base_url} (if WordPress)")
def main():
print_banner()
parser = argparse.ArgumentParser(description="PHPInfo Grabber - Extract and analyze system information from phpinfo pages")
parser.add_argument("url", help="URL to the phpinfo page (e.g., https://example.com/phpinfo.php)")
parser.add_argument("-o", "--output", default="phpinfo_output", help="Output directory for extracted data")
parser.add_argument("-t", "--timeout", type=int, default=10, help="Request timeout in seconds")
parser.add_argument("--no-verify-ssl", action="store_false", dest="verify_ssl", help="Disable SSL certificate verification")
parser.add_argument("-u", "--user-agent", help="Custom User-Agent string")
parser.add_argument("-p", "--proxy", help="Proxy URL (e.g., http://127.0.0.1:8080)")
args = parser.parse_args()
# Fetch phpinfo page
html_content = get_phpinfo(
args.url,
timeout=args.timeout,
verify_ssl=args.verify_ssl,
user_agent=args.user_agent,
proxy=args.proxy
)
if not html_content:
print(f"{Colors.RED}[!] Failed to retrieve phpinfo page. Exiting.{Colors.ENDC}")
return
# Parse phpinfo
phpinfo_data = parse_phpinfo(html_content)
if not phpinfo_data:
print(f"{Colors.RED}[!] Failed to parse phpinfo data. Exiting.{Colors.ENDC}")
return
print(f"{Colors.GREEN}[+] Successfully parsed phpinfo data with {len(phpinfo_data)} sections{Colors.ENDC}")
# Extract interesting information
interesting_data = extract_interesting_data(phpinfo_data)
print(f"{Colors.GREEN}[+] Extracted interesting information from phpinfo data{Colors.ENDC}")
# Display interesting data
display_interesting_data(interesting_data)
# Export data
json_file, interesting_json, summary_file = export_data(phpinfo_data, interesting_data, args.output)
# Suggest next steps
suggest_next_steps(interesting_data, args.url)
print(f"\n{Colors.GREEN}[+] Analysis complete! See {args.output} directory for full results.{Colors.ENDC}")
if __name__ == "__main__":
try:
main()
except KeyboardInterrupt:
print(f"\n{Colors.YELLOW}[!] Process interrupted by user{Colors.ENDC}")
exit(0)
except Exception as e:
print(f"{Colors.RED}[!] An error occurred: {str(e)}{Colors.ENDC}")
exit(1)
The tool fetches the phpinfo page, parses its content, extracts key information, and presents the findings in a readable format with color-coded output. It also generates various export files and suggests next steps for further investigation.
URL Categorizer: Making Sense of Web Reconnaissance
During security assessments or when exploring websites, you often end up with large lists of URLs from crawlers, directory brute-forcing tools, or web application scanners. Manually sorting through these can be time-consuming and error-prone.
Our URL Categorizer tool solves this problem by automatically organizing URLs into meaningful categories, identifying potential security issues, and suggesting targeted next steps.
Key Features
1. Intelligent Classification
The URL Categorizer can automatically classify URLs into numerous categories, including:
- File types (PHP, JavaScript, CSS, images, documents)
- CMS-specific resources (WordPress, Joomla, Drupal)
- Administrative interfaces
- API endpoints
- Sensitive files (configuration files, backups, databases)
- And many more
2. Security Analysis
Beyond simple categorization, the tool performs security analysis to identify:
- Exposed sensitive files (phpinfo, configuration files)
- Backup files that might contain source code
- Database files accessible via the web
- Administrative interfaces that should be secured
- Version information that could reveal vulnerabilities
3. Actionable Recommendations
Based on the categorized URLs and security analysis, the tool suggests next steps, such as:
- Running specific security tools based on detected technologies
- Examining potentially sensitive files
- Testing discovered administrative interfaces
- Parameter fuzzing on PHP files
4. Customizable Classification
The tool allows you to define custom patterns for categorization, making it adaptable to your specific needs and target environments.
Usage Example
Here is the code :
#!/usr/bin/env python3
import re
import os
import argparse
from collections import defaultdict
# ANSI colors for better readability
class Colors:
GREEN = '\033[92m'
YELLOW = '\033[93m'
RED = '\033[91m'
BLUE = '\033[94m'
MAGENTA = '\033[95m'
CYAN = '\033[96m'
ENDC = '\033[0m'
BOLD = '\033[1m'
def print_banner():
banner = f"""{Colors.BLUE}{Colors.BOLD}
╔═══════════════════════════════════════════════════╗
║ URL CATEGORIZER TOOL ║
║ Organize and classify discovered URLs ║
╚═══════════════════════════════════════════════════╝
{Colors.ENDC}"""
print(banner)
def categorize_urls(urls_file, custom_patterns=None):
"""
Categorize URLs from a file based on extensions and path patterns.
Args:
urls_file (str): Path to the file containing URLs (one per line)
custom_patterns (dict, optional): Custom regex patterns for additional categories
Returns:
dict: Categories with their respective URLs
"""
categories = defaultdict(list)
# Check if file exists
if not os.path.isfile(urls_file):
print(f"{Colors.RED}[!] Error: File '{urls_file}' not found{Colors.ENDC}")
return categories
# Read URLs from file
try:
with open(urls_file, 'r') as f:
urls = [line.strip() for line in f if line.strip()]
print(f"{Colors.GREEN}[+] Successfully loaded {len(urls)} URLs from {urls_file}{Colors.ENDC}")
except Exception as e:
print(f"{Colors.RED}[!] Error reading file: {str(e)}{Colors.ENDC}")
return categories
# Define default patterns
patterns = {
'php_files': r'\.php(\?.*)?$',
'javascript_files': r'\.js(\?.*)?$',
'css_files': r'\.css(\?.*)?$',
'images': r'\.(png|gif|jpg|jpeg|svg|ico|webp)(\?.*)?$',
'documents': r'\.(pdf|doc|docx|xls|xlsx|ppt|pptx|txt|rtf|csv|xml|json)(\?.*)?$',
'archives': r'\.(zip|rar|tar|gz|7z)(\?.*)?$',
'audio_video': r'\.(mp3|mp4|avi|mov|wmv|flv|wav|ogg)(\?.*)?$',
'api_endpoints': r'/(api|rest|graphql|wp-json)/',
'admin_resources': r'/(admin|administrator|wp-admin|dashboard|control|cp)/',
'login_pages': r'/(login|signin|log-in|sign-in|auth|authenticate)\.php',
'backup_files': r'\.(bak|backup|old|temp|tmp)$',
'config_files': r'/(config|configuration|settings|setup|install)\.php',
'databases': r'\.(sql|sqlite|db)$',
'sensitive_files': r'/(phpinfo|info)\.php',
'hidden_files': r'/\.[^/]+$', # Files starting with dot
}
# WordPress specific patterns
wp_patterns = {
'wp_includes': r'/wp-includes/',
'wp_content': r'/wp-content/',
'wp_plugins': r'/wp-content/plugins/',
'wp_themes': r'/wp-content/themes/',
'wp_uploads': r'/wp-content/uploads/',
}
# Merge with custom patterns if provided
if custom_patterns:
patterns.update(custom_patterns)
# Add WordPress patterns
patterns.update(wp_patterns)
# Add CMS detection patterns
cms_patterns = {
'wordpress': r'/(wp-|wordpress)',
'joomla': r'/(joomla|administrator/index\.php)',
'drupal': r'/(drupal|sites/default|misc/drupal\.js)',
'magento': r'/(magento|skin/frontend|app/design/frontend)',
'shopify': r'/(shopify|cdn\.shopify\.com)',
}
patterns.update(cms_patterns)
# Process each URL
for url in urls:
categorized = False
# Check against each pattern
for category, pattern in patterns.items():
if re.search(pattern, url, re.IGNORECASE):
categories[category].append(url)
categorized = True
# If not categorized by any pattern, add to 'other'
if not categorized:
categories['other'].append(url)
return categories
def print_and_save_categories(categories, output_file='categorized_urls.txt'):
"""
Print categorized URLs to console and save them to a file
Args:
categories (dict): Categories with their respective URLs
output_file (str): Path to output file
"""
try:
with open(output_file, 'w') as out:
out.write("# CATEGORIZED URLS REPORT\n")
out.write("=" * 50 + "\n\n")
# Sort categories by number of URLs (descending)
sorted_categories = sorted(categories.items(),
key=lambda x: len(x[1]),
reverse=True)
# Print summary
print(f"\n{Colors.GREEN}{Colors.BOLD}[+] URL CATEGORIZATION SUMMARY:{Colors.ENDC}")
for category, urls in sorted_categories:
if urls: # Only print non-empty categories
print(f"{Colors.CYAN} {category.replace('_', ' ').title()}: {Colors.YELLOW}{len(urls)}{Colors.ENDC}")
# Print detailed results and write to file
print(f"\n{Colors.GREEN}{Colors.BOLD}[+] DETAILED RESULTS:{Colors.ENDC}")
for category, urls in sorted_categories:
if urls: # Only process non-empty categories
header = f"\n## {category.replace('_', ' ').title()} ({len(urls)})"
print(f"{Colors.MAGENTA}{header}{Colors.ENDC}")
out.write(header + '\n')
for url in urls:
print(f"- {url}")
out.write(f"- {url}\n")
print(f"\n{Colors.GREEN}[+] Categorized URLs saved to '{output_file}'{Colors.ENDC}")
return True
except Exception as e:
print(f"{Colors.RED}[!] Error saving categories: {str(e)}{Colors.ENDC}")
return False
def analyze_interesting_findings(categories):
"""
Analyze categorized URLs for interesting security findings
Args:
categories (dict): Categories with their respective URLs
Returns:
list: Interesting findings with descriptions
"""
findings = []
# Check for sensitive files
if 'sensitive_files' in categories and categories['sensitive_files']:
findings.append({
'title': 'Sensitive Information Disclosure',
'description': 'Found phpinfo or info files that may disclose sensitive server information',
'urls': categories['sensitive_files'],
'severity': 'High'
})
# Check for backup files
if 'backup_files' in categories and categories['backup_files']:
findings.append({
'title': 'Backup Files Exposed',
'description': 'Found backup files that might contain sensitive information or source code',
'urls': categories['backup_files'],
'severity': 'Medium'
})
# Check for config files
if 'config_files' in categories and categories['config_files']:
findings.append({
'title': 'Configuration Files Exposed',
'description': 'Found configuration files that might contain database credentials or other sensitive information',
'urls': categories['config_files'],
'severity': 'High'
})
# Check for database files
if 'databases' in categories and categories['databases']:
findings.append({
'title': 'Database Files Exposed',
'description': 'Found database files that might be downloadable and contain sensitive data',
'urls': categories['databases'],
'severity': 'Critical'
})
# Check for admin interfaces
if 'admin_resources' in categories and categories['admin_resources']:
findings.append({
'title': 'Admin Interfaces Discovered',
'description': 'Found admin interfaces that should be properly secured',
'urls': categories['admin_resources'][:5] + (['...'] if len(categories['admin_resources']) > 5 else []),
'severity': 'Medium'
})
# WordPress version detection
wp_includes = categories.get('wp_includes', [])
if wp_includes:
# Look for version in readme.html or other version indicators
version_files = [url for url in wp_includes if 'version' in url.lower()]
if version_files:
findings.append({
'title': 'WordPress Version Information',
'description': 'Found files that may reveal WordPress version information',
'urls': version_files,
'severity': 'Low'
})
return findings
def suggest_next_steps(categories):
"""
Suggest next steps based on categorized URLs
Args:
categories (dict): Categories with their respective URLs
Returns:
list: Suggested next steps
"""
suggestions = []
# WordPress specific suggestions
if any(key in categories for key in ['wp_includes', 'wp_content', 'wp_plugins']):
suggestions.append("Run WPScan to identify WordPress vulnerabilities: wpscan --url [target]")
suggestions.append("Check exposed WordPress plugins for known vulnerabilities")
# If sensitive files found
if 'sensitive_files' in categories and categories['sensitive_files']:
suggestions.append("Examine phpinfo files for sensitive information using PHPInfo Grabber")
# If admin resources found
if 'admin_resources' in categories and categories['admin_resources']:
suggestions.append("Check admin interfaces for weak credentials or authentication bypass vulnerabilities")
# If backup or config files found
if ('backup_files' in categories and categories['backup_files']) or \
('config_files' in categories and categories['config_files']):
suggestions.append("Download and analyze backup/config files for sensitive information")
# General suggestions
suggestions.append("Run directory brute-forcing with additional wordlists to discover more resources")
suggestions.append("Perform parameter fuzzing on discovered PHP files to identify potential vulnerabilities")
return suggestions
def main():
print_banner()
parser = argparse.ArgumentParser(description="URL Categorizer - Organize and classify discovered URLs")
parser.add_argument("urls_file", help="File containing URLs (one per line)")
parser.add_argument("-o", "--output", default="categorized_urls.txt", help="Output file for categorized URLs")
parser.add_argument("-a", "--analysis", action="store_true", help="Perform security analysis on categorized URLs")
parser.add_argument("-p", "--pattern", action='append', nargs=2, metavar=('CATEGORY', 'REGEX'),
help="Add custom pattern: -p category_name 'regex_pattern'")
args = parser.parse_args()
# Process custom patterns if provided
custom_patterns = {}
if args.pattern:
for category, pattern in args.pattern:
custom_patterns[category] = pattern
# Categorize URLs
categories = categorize_urls(args.urls_file, custom_patterns)
if not categories:
print(f"{Colors.RED}[!] No URLs categorized. Exiting.{Colors.ENDC}")
return
# Print and save categorized URLs
print_and_save_categories(categories, args.output)
# Perform security analysis if requested
if args.analysis:
print(f"\n{Colors.GREEN}{Colors.BOLD}[+] SECURITY ANALYSIS:{Colors.ENDC}\n")
findings = analyze_interesting_findings(categories)
if findings:
for finding in findings:
print(f"{Colors.RED if finding['severity'] == 'Critical' else Colors.YELLOW}" +
f"[{finding['severity']}] {finding['title']}{Colors.ENDC}")
print(f" {finding['description']}")
print(" URLs:")
for url in finding['urls']:
print(f" - {url}")
print("")
else:
print(f"{Colors.YELLOW}[!] No significant security findings detected.{Colors.ENDC}")
# Suggest next steps
print(f"\n{Colors.GREEN}{Colors.BOLD}[+] SUGGESTED NEXT STEPS:{Colors.ENDC}\n")
suggestions = suggest_next_steps(categories)
for i, suggestion in enumerate(suggestions, 1):
print(f"{Colors.CYAN}{i}. {suggestion}{Colors.ENDC}")
print(f"\n{Colors.GREEN}[+] URL categorization complete! See {args.output} for full results.{Colors.ENDC}")
if __name__ == "__main__":
try:
main()
except KeyboardInterrupt:
print(f"\n{Colors.YELLOW}[!] Process interrupted by user{Colors.ENDC}")
exit(0)
except Exception as e:
print(f"{Colors.RED}[!] An error occurred: {str(e)}{Colors.ENDC}")
exit(1)
Using the URL Categorizer is straightforward:
python url_categorizer.py discovered_urls.txt -a
This command categorizes all URLs from the file and performs security analysis, providing a clear overview of the website structure and potential security issues.
Integration and Workflow
What makes the InfoGrabber suite particularly powerful is how the tools work together. Here’s a typical workflow:
- Reconnaissance Phase:
- Use web crawlers or directory brute-forcing tools to discover URLs
- Run
url_categorizer.py
to organize and classify the discovered URLs -
Identify sensitive pages and potential entry points
-
Information Gathering Phase:
- For any phpinfo pages found, use
phpinfo_grabber.py
to extract and analyze system information -
Follow the suggested next steps from both tools
-
Analysis Phase:
- Review the exported data and findings
- Correlate information between tools
- Develop targeted strategies based on discovered information
This integrated approach ensures you don’t miss critical information and provides a systematic methodology for web application reconnaissance.
Real-World Case Study
To illustrate the power of InfoGrabber, let me share a recent security assessment I conducted for a mid-sized e-commerce company.
Initial Reconnaissance
The assessment began with standard reconnaissance techniques, resulting in a list of over 1,000 URLs. Rather than manually reviewing each URL, I used the URL Categorizer:
python url_categorizer.py discovered_urls.txt -a
The tool quickly organized the URLs into categories, revealing:
- 427 PHP files
- 156 JavaScript files
- 87 WordPress plugin files
- 3 potentially sensitive configuration files
- 1 phpinfo page in a development subdomain
The security analysis highlighted several concerning findings, including exposed backup files and administrative interfaces.
Deeper Investigation
With the phpinfo page identified, I used PHPInfo Grabber to extract detailed system information:
python phpinfo_grabber.py https://dev.example.com/info.php
The tool revealed critical information:
- PHP configuration with
allow_url_include
enabled (a significant security risk) - Database connection details exposed in environment variables
- Sensitive file paths that weren’t directly accessible via the web
- Outdated PHP extensions with known vulnerabilities
Findings and Impact
By combining the insights from both tools, I was able to:
- Identify an SQL injection vulnerability in an admin page discovered by URL Categorizer
- Access database backups using path information from PHPInfo Grabber
- Exploit the
allow_url_include
vulnerability to achieve remote code execution - Discover hardcoded API credentials in configuration files
The client was impressed with how quickly and systematically these vulnerabilities were identified. The structured output from InfoGrabber tools made documentation straightforward, and the clear categorization helped prioritize remediation efforts.
Technical Implementation
Both tools are written in Python and share a similar design philosophy:
- Modular Structure: Each tool is divided into discrete functions for fetching, parsing, analyzing, and reporting.
- Progressive Enhancement: The tools attempt multiple strategies when processing data, from standard parsing to regex-based fallbacks.
- Rich Output: Color-coded terminal output makes it easy to identify important information.
- Multiple Export Formats: Data is exported in various formats (JSON, CSV, TXT) for further analysis.
- Actionable Intelligence: Each tool goes beyond raw data to provide insights and next steps.
PHPInfo Grabber Code Highlights
The PHPInfo Grabber uses BeautifulSoup for HTML parsing and implements multiple strategies to handle different phpinfo() layouts:
def parse_phpinfo(html_content):
"""Parse phpinfo HTML and extract key-value pairs"""
soup = BeautifulSoup(html_content, 'html.parser')
# Different phpinfo styles and structures
# Try to detect the phpinfo style
phpinfo_data = {}
# Try to find all tables
tables = soup.find_all('table')
if tables:
# Standard table-based parsing
# ...
else:
# Alternative approaches for non-table structures
# ...
URL Categorizer Code Highlights
The URL Categorizer uses regular expressions to classify URLs based on patterns:
def categorize_urls(urls_file, custom_patterns=None):
"""Categorize URLs from a file based on extensions and path patterns"""
categories = defaultdict(list)
# Define default patterns
patterns = {
'php_files': r'\.php(\?.*)?$',
'javascript_files': r'\.js(\?.*)?$',
# Many more patterns...
}
# Process each URL
for url in urls:
for category, pattern in patterns.items():
if re.search(pattern, url, re.IGNORECASE):
categories[category].append(url)
# ...
The Road Ahead
InfoGrabber is just getting started. We have plans to expand the toolkit with additional tools:
- ServerInfo Grabber: For Apache/Nginx server-status and server-info pages
- WordPressInfo Grabber: For WordPress configuration and plugin analysis
- Headers Analyzer: For analyzing HTTP response headers and security configurations
- Content Fingerprinter: For identifying technologies and frameworks based on content patterns
- SSL/TLS Analyzer: For evaluating SSL/TLS configurations and identifying weaknesses
Our vision is to create a comprehensive suite of tools that work together to provide deep insights into web application environments, with each tool designed to extract specific types of information while maintaining a consistent interface and data format for easy integration.
Conclusion
InfoGrabber represents our commitment to creating powerful, user-friendly tools for information gathering and analysis. By transforming raw data into structured, actionable intelligence, we hope to empower security professionals to work more efficiently and effectively.
The combination of PHPInfo Grabber and URL Categorizer provides a solid foundation for systematic web reconnaissance, and we’re excited to see how the community uses and extends these tools. Whether you’re conducting security assessments, managing web applications, or learning about web security, InfoGrabber can help you discover and understand the digital landscape more clearly.
Download InfoGrabber today and start uncovering the wealth of information hidden in plain sight on your web servers!
Tags: Security, PHP, Web Reconnaissance, Information Gathering, Penetration Testing, Open Source