For the past several months, StopBadware's research team has been paying special attention to ways we can differentiate and track certain categories of infected websites. Thousands of website review requests are submitted to us every month; most of these are for hacked legitimate sites whose owners are concerned with cleaning up malware infections and protecting visitors. Some, however, are maliciously registered sites, or sites whose owners are abusing free hosting or dynamic DNS services to spread malware. When we encounter malicious sites like this, we want to make sure they stay on blacklists, and we want to be able to report them to people who can help take them down.
One of the first steps in doing that is developing a big picture understanding of the kinds of sites we encounter over time. Our team tracked the sites we tested manually (this is a relatively small percentage of the total number of review requests submitted to us) from late March to mid-July 2014.
Because of the nature of infection chains, we differentiate between several types of sites when determining intention. Unsurprisingly, most of the sites we see are legitimate sites that have been hacked for use as landing pages (e.g., compromised with a malicious iframe, script, or http redirect). Exploit pages, of course, are almost always malicious by design, as they contain the malicious executable that infects the target machine with badware. StopBadware sees very few exploit pages; this is largely a result of our testing IPs being blocked by malware distributors.
Note: Generally speaking, we consider sites that fall into the "free host" category to be malicious. This is not necessarily a comment on the practices or intentions of free hosting (or other free service) providers—many of whom are operating in good faith and some of whom have worked with us for years to curb abuse on their platforms—but rather a result of the fact that bad actors routinely abuse free services to spread malware.
The most interesting category we examined was intermediary pages. Our researchers classified intermediary sites as hacked or malicious by looking at a number of factors, including WHOIS data, the page's accessibility, and whether the site has legitimate content. This type of analysis is a common practice in the security industry, but it's also rather resource-intensive—especially for a small nonprofit.
Ideally, we'd like to be able to automatically classify malicious websites so we can make the Web safer and minimize abuse of our processes at the same time. Over the next few weeks, our team will be using our data and a third-party service to come up with an experimental classifier for malicious vs. hacked sites. We look forward to sharing additional data and results once the project is finished; in the meantime, advice from those with experience in this arena is welcome!
*Special thanks to our outstanding research and testing intern, Luke Oglesbee, for his work on this!