Malware reporting study: more infomation leads to higher cleanup rate

Posted on March 21, 2012 - 10:22 by mvasek

I’m Marie Vasek, a computer science and mathematics student at Wellesley College and the resident testing intern at StopBadware. When a website is on one of our data providers’ malware blacklists and a person responsible for the site asks StopBadware for an independent review, I test the website to see if it is actively delivering badware. This past fall, I completed a study in conjunction with StopBadware and Tyler Moore of Wellesley College. We found that following StopBadware’s Best Practices for Reporting Badware URLs helped get badware sites cleaned up or taken down.

At StopBadware, we have a list of URLs that community members have reported to us as containing badware. We manually test all URLs from this feed to see if they contain badware, and when badware is present, we report the URLs to appropriate parties. In July, I started reporting URLs from the community feed in accordance with StopBadware’s Best Practices for Reporting Badware URLs; I tracked responses and regularly checked back to see if the sites had been cleaned up or taken down.

In October 2011 I began an academic study based on StopBadware’s pilot reporting project. My methodology was as follows: On day 0, I manually tested a URL taken from StopBadware’s community feed. If it was actively delivering badware, I randomly assigned the URL to one of three groups: control, minimal, and full. For the control group of URLs, no reports were sent out. For the URLs assigned to the minimal group, I sent out badware reports to the appropriate parties, but the reports contained only a minimal amount of information*. For the URLs assigned to the full group, I sent out minimal reports with additional detailed information* at the end. After the reports were sent out, I followed up on each of the URLs 1, 2, 4, 8, and 16 days after the day that I first found badware (day 0) to see if that badware had been removed.

The table below shows the probability that a URL will be “permanently” cleaned up after so many days. For the purposes of this study, I considered a URL "permanently" cleaned up on a day if on this day and every future follow-up day the URL was clean.

  1 day 2 days 4 days 8 days 16 days
Full report 32.1% 43.4% 45.3% 49.1% 62.3%
Minimal report 23.6% 25.5% 27.3% 36.4% 49.1%
No report 13.5% 17.3% 32.7% 38.4% 46.2%

*percentages represent the probability that a URL is “permanently” clean after x days with the specified level of reporting.

As you can see, sending a full report substantially improved the likelihood that an infected URL would be cleaned up. Full reports were also observed to be significantly more effective than minimal and no reports on every single day that I followed up on a URL.

But what does this all mean? It means that sending a detailed badware report appears to be an effective measure for getting a badware URL cleaned up. Furthermore, providing more details seemed to be helpful to the site owners and abuse teams who had the ability to clean up the badware.

We’re currently working on ascertaining whether other forms of notification sent in the same time frame (e.g., malware notifications from Google Webmaster Tools) could have prompted some of the badware URL clean-up we observed. Tyler Moore and I are in the process of writing an academic paper with the complete methodology and full results of this study; the paper will be published later this year.

*For examples of minimal reports and additional information, please see pages B-2 to B-4 of StopBadware’s reporting best practices.

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.