Blog

The challenges of counting badware sites

Like many folks who are trying to fight badware, we often find ourselves trying to quantify the problem. How many badware websites are there? How many are hosted by a particular IP address, or a specific AS, or a given hosting provider? And, knowing these numbers, how do we understand how big a problem these numbers represent?

Something as simple as counting may sound easy, but there are many challenges:

  • It's hard to know which units to count. Individual URLs? Fully qualified domain names? Base domain names? None of these consistently correlate to what a human would consider "a website." And counting any one of these results in skewed data.
  • What's the denominator? Suppose hosting provider A has more infected sites than hosting provider B. Is this a result of negligence on the part of A, or is A simply much larger than B? It's difficult to impossible to find accurate data about tne numbers of unique domain names or websites hosted by a particular provider.
  • How do you count a URL or domain name over time if its IP address changes? Suppose you're reporting weekly stats; if a URL moves from hosting provider A to B within that week, how do you account for that in reporting the numbers? What if the site was "bad" when it was provider A but is now clean when it's at B? Do you still report A as hosting a bad site for the week?
  • How do you count a URL that resolves to more than one IP address? Do you double count it?

These are questions with no easy answers. Yet, as Brian Krebs and various government officials have pointed out, it's difficult to know what action to take, and against whom, if you don't have a good way of measuring the problem. At StopBadware, we're gradually trying to work on answers to these questions. We've learned a few lessons from our successes and our mistakes, but we can also use more input. If you have ideas, or would like to talk with us further about the measurement challenge, please let us know in the comments or at contact <at> stopbadware <dot> org.

1 response to

The challenges of counting badware sites

MalwareGroup says:

IMHO, there will be always one-off's. The aim here should be to minimize these one-off's as we cannot bring them to zero.

* It's hard to know which units to count.

You missed IpAddresses. Most of times, badware sites (actual C&C and not sites which are hacked and hosting malware) are rarely deployed in a shared hosting environment.

The next best option is to consider base domain-names because some new malware can generate dynamic urls and subdomains for targeted attacks, which will skew data.

* What's the denominator?

Something like this...

Hosting Provider Rating = ( active_malicious_ipaddresses * active_malicious_base_domains ) / total_active_ipaddresses_with_provider

IMHO, "negligence" will be the outcome of this equation, therefore cannot be used as denominator.

* How do you count a URL or domain name over time if its IP address changes?

Example of this is fast-flux domains. IMO, if a domain-name was bad on Ip-1 then the probability of it being bad on Ip-2 is higher than it not being malicious.

Therefore, just because the Ipaddress changed, I wouldn't consider domain to be new or clean, but I queue that domain for reassessment (while keeping the old status).

* How do you count a URL that resolves to more than one IP address? Do you double count it?

No. If a domain resolves to 3 Ipaddresses, then i will count it as 1 but flag all 3 Ipaddresses as suspicious & queue them for assessment.

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.