Bluto was originally written to save myself time during external pentests by only needing to use one tool, rather than an arsenal. It would also reduce the time taken to report on the findings because I didn’t have to format output from several tools into a format I could use in my reports. It allowed me to format the output from the tool to allow a direct copy to the report. Its original features included DNS Enumeration, DNS Zone Transfers, Passive Subdomain Harvesting and, Subdomain Brute Forcing.
DNS Enumeration would enumerate all NS (name servers) records and MX (mail servers) records. It would then attempt a zone transfer on each Name Server, identifying the relevant Name Servers configured insecurely and dumping the entire zone file. I was originally using an unofficial API for NetCraft (by PaulSec) to gather subdomains passively, however I have since written my own function to achieve this.
The Subdomain Brute Forcing is carried out using parallel sub processing on the top 20000 of the ‘The Alexa Top 1 Million subdomains’. This gives Bluto a great source for the most likely targets in a rather fast time frame and, as it was gathered using Google’s public DNS servers it didn’t have any impact on the target domain.
After a week or so I decided I that I needed to move away from theHarvester (a great tool I would like to add, by Christian Martorella @Edge-Security Research) and wanted some functionality that better suited my needs. I introduced Email Harvesting, it was carried out on Google and Bing which initially worked great. However my man @securityshell raised concerns that it was very slow and kept getting snarfed by the Google captchas.
This proved to be a nice challenge, I fired up my vpns and started testing from all over the world. Although this wasn’t ideal as I had a higher hop count as opposed to a native client in the egress country, but it was good enough to highlight the issue to me and test ‘fixes’. So after some thought about how Google would identify agents from various countries. I came to the conclusion, as I’m sure anyone else reading this blog would, that it would be based on a GEO lookup on the clients public facing address. A quick look at the site and you can see it redirects to the relevant country regardless of the target Google url you enter. For example going to www.google.co.uk from a US agent address redirects to the .com address, and so on. On closer inspection of the traffic between my web agent and Google, it showed that Google sets the relevant country cookie value on the redirect. After some googling around (ironic yes) I was able to find a list of country specific Google server host names. Great, all I needed now was the ability to lookup the client’s country and assign them their Google server to search on. Ideally I would be utilising my own GEOip service however, currently I don’t have this functionality but I was able to find some free services available. Currently Bluto uses ‘https://freegeoip.net’ it will return various data in a lovely JSON response. Bluto consumes the Country origin value and uses this to select the relevant Google server.
To completely move away from other tools (a goal I now wanted to achieve) I needed to include potential Staff Member Harvesting. The most fruitful place in my opinion to gather this information is from LinkedIn. I was originally proxying queries through DuckDuckGo for Bing and Google searches but by the time I attempted to include Staff Harvesting they had added captchas to their search queries for anything slightly resembling automated requests, this forced me to migrate to direct queries to Bing and Google.
Direct simple queries proved to be quite difficult to achieve as Google fires up a redirect to captchas for anything suspiciously automated (in a similar fashion to DuckDuckGo). I played around with the queries and eventually found a way to avoid the captchas, to a degree. A large list of User Agents with one being randomly selected on each request, mixed with delayed queries, query attempts limited to specific quantities and making sure each request closes its connection seemed to suggest reasonable usage. Bing is far more lenient when it comes to queries, we don’t need to do anything particularly clever to get good results. Bing is used to identify LinkedIn results and email addresses. The LinkedIn queries were originally based on the domain name presented to Bluto, this has since evolved into a value initially gathered from Whois requests on the registra value. If this result looks reasonably usable to Bluto it will ask for confirmation from the user, if the user disagrees you are able to supply the company name (the registered company name seems to yield the best results).
Suggestions from the community started coming in, the very first one if I recall correctly was from @erwan_lr a colleague at the time who is also one of the developers of WPscan. He suggested the addition of subdomain wild card checks, this of course was a great idea, and it would also eventually prove to become a stepping stone to the Wild Card Brute Forcing functionality in the latest version. This was achieved by simply generating a sudo random string made of A-Z|a-z|0-9 and checking the response value. If a response was identified, it would be safe to say that wild cards are in place and the usual approach to subdomain brute forcing would be pointless. Originally Bluto would have halted any brute forcing and carried out all other OSINT as normal. The latest version now includes wild card brute forcing, this was initially achieved by noting the random host response and comparing it to each response from the brute forcing. Matching responses would be discarded and non matching responses noted as valid targets. This technique worked well for quite some time, however some target domains showed oddities in the form of multiple fake responses with individual IPs. It seems that hostname queries that resemble ‘normal’ words respond with a different addresses, compared to those that look more random e.g awindwuvyw26.target.domain, 1738w1jf72hf6.target.domain. To bypass/work around these types of checks a larger sample of random host queries is used. This has proved to be a good technique so far, working for all target domains that I have had to assess. The Wild Card Brute Forcing amongst other functionality, as far as I am aware is quite unique to Bluto, and although it does take roughly 4 x longer than the standard brute force technique, it does yield great results.
Email Hunter support was next to be introduced, this complemented the findings from search engines and gave us a wider footprint in most cases (I really suggest getting a free API Key). Alongside the newly found API support was the introduction of haveibeenpwned.com support. This is the service hosted and created by Troy Hunt. It consumes email addresses which it cross references with details acquired from various data breaches. This gives us the extra knowledge that the account password was most likely compromised and potentially available online. Which could be used to form a potential password list for brute force attacks on other services identified during the assessment. Currently the identification of the actual data from the breaches is not supported. However a good deal of information relating to the breach is. Information such as the breach domain, the date of the breach, what data was compromised in the breach and when the data was originally added to the service.
The latest edition to Bluto truly makes it an all rounder with regards to External Assessment, MetaData Harvesting. Currently it supports PDF, DOC, XLS and, PPT documents. The identification of the potential documents is currently passive, by means of using search engines to identify documents linked to the target domain. The document list is then split into PDF and MSdoc lists and parsed in parallel for potential usernames, staff names, software version numbers, passwords, and pretty much anything else placed into the meta fields.
All the evidence gathered by Bluto was originally presented in the terminal, which at first was great considering the output was relatively minimal (comparatively). The latest release has moved all of this data, into an HTML Report. The report gives some general remediation advise around each area of enumeration as well as the relevant information to locate where the data was gathered from in the first place.
- Email Addresses and the URLs they were identified from.
- Staff Names and the URLs they were identified from.
- Compromised Email Accounts and relevant data from the compromises.
- Potential Staff Names and the URLs of the documents the data was extracted from.
- Potential software and versions used within the orginisation and, the URLs of the documents the data was extracted from.
Bluto carries out quite comprehensive local logging, Bluto does let you know where the logs will be kept and once its run is completed will ask if you would like to keep logs and the evidence report.
I plan on continuing to improved Bluto’s capabilities and ask the community to have their say. If you have any feedback, feature requests or the like please feel free to leave a comment here or raise an issue on the github page.
Feel free to give Bluto your vote
unofficial NetCraft API