Building a better Clearinghouse, part 2

A few months ago, I blogged about our foray into building a new, improved Badware URL Clearinghouse. At the time, we were starting a three month pilot project. That pilot has since concluded, and I'm back to share what we learned and accomplished during that time.

On the technical side, our developer, Matthew, built a production-ready platform to store badware URLs and associated data. He stuck with his original plan to use MongoDB and Java, and it seems to have worked well. He had to perform some multi-thread magic to efficiently resolve large numbers of domain names efficiently within Java, but he pulled it off. We look forward to migrating the data we currently collect from our data providers and our review process onto the new platform in the coming months.

I'm an executive, not a developer, so for me, the more interesting part of the pilot was talking with current and potential Partners about their interest in data sharing. Nearly every company we talked to craves data, whether to help clean up their own environments (in the case of hosting providers and registrars, for example) or to better protect their customers (in the case of security vendors). But would they be willing to share data to get data? We heard several reservations about sharing data:

  • Revealing proprietary methods or information.
  • Losing competitive advantage.
  • Violating legal restrictions on sharing.
  • Helping freeloaders.
  • Giving away data that could be marketable.
  • Exposing themselves to liability or negative PR.

Despite these concerns, though, several Partners are still interested. Why? Well, the aforementioned demand for data is one reason. Another is the opportunity to help shape a new effort  with great potential for helping the Web: by sharing data, our Partners will help each other protect users and help StopBadware to report on badware trends and facilitate cleanup efforts. Some Partners also recognized that a data sharing program is a vehicle for demonstrating their expertise to, and learning from, industry peers.

In that spirit, we're putting together plans to try out a data sharing program with a handful of our Partners. We'll use the new platform that Matthew built, and Partners will be required to contribute substantive data of their own if they want to see others' data. Eventually, we plan to build an API and a Web interface, though we'll likely start with a much more basic daily data feed. Meanwhile, we'll continue looking for opportunities to learn from, and perhaps even combine efforts with, other data sharing initiatives already underway.

Building a better Clearinghouse

This month, StopBadware started a pilot project to explore what a new, expanded Badware Website Clearinghouse might look like. Our idea is to create a collaborative platform that aggregates and makes available extensive data and metadata about badware URLs and domains. That might include information from malicious URL feeds, reports from our community, results of scans against some of our partners' analysis tools, DNS and AS data drawn from public sources, and so on. The platform would power tools, services, and data reports designed to benefit our partners, website owners, and the broader Web ecosystem.

We're in the early stages of what we expect to be a three month pilot. So far, there are a lot of unanswered questions. Here are a few of the big ones:

  • What will the inaugural set of tools/services look like? So far, we're thinking of a data exchange API and a basic Web interface for searching the data.
  • Who will have access to the data? Those with the best data often have valid (and occasionally not-so-valid) reasons for not wanting to share their data openly. We want to offer flexibility that encourages broad sharing but allows more limited sharing where appropriate. So, we're imagining some sort of tiered permissions model.
  • What incentives will there be to contribute data? Two models I've seen used before are quid pro quo—you earn access equivalent to what you contribute—and "minimum threshold," in which you must contribute a certain amount, after which you get full access. Both of these could have value, but it would be nice to provide access to a broader audience than just those who have substantial data to contribute.
  • Which database platform should we use? Right now, our developer, Matthew, is experimenting with MongoDB (using Java for the middleware layer that will manage the data).

We'll do our best to blog periodically throughout the pilot as we refine our answers to these and many other questions. Meanwhile, we'd love to hear your suggestions and other feedback in the comments or via email (contact <at> ourdomain).

Cybersecurity data sharing: you're doing it wrong

Posted on December 9, 2011 - 11:06 by imeister

One aspect of cybersecurity that StopBadware routinely emphasizes as essential to collective defense against malware is data sharing. As we've pointed out in the past, there are few incentives favoring, and many opposing, the sharing of malware attack-related data among private ecosystem participants like ISPs and web hosting providers, which makes tackling malware threats collaboratively prohibitively difficult. Apparently, data sharing problems are on Congress's mind as well. Last week, the House Intelligence Committee considered and passed HR 3523, the Cyber Intelligence Sharing and Protection Act of 2011, one of Congress's most visible efforts to confront computer security issues, which specifically addresses the sharing of "cyber threat intelligence". Unfortunately, the bill's sponsors appear to perceive all forms of cyber threat intelligence -- everything from a RSA-style infiltration to a blind SQL injection -- as (a) presumptively classified and in desperate need of control and (b) something from which private companies like ISPs and web hosting providers need protection.

First off, it seems the height of ridiculousness to assert that the intelligence community requires Congress's special permission to share information with important private sector infrastructure companies (like telcos and ISPs) if it possesses information that demands action. Federal intelligence and law enforcement agencies share, and are certainly not statutorily barred from sharing, malware- and cyberattack information with private parties already; specifying a system of temporary security clearances presumes that the disclosure of much of such information will place the national security of the United States in jeopardy. So either the status quo is somehow a dangerous threat to our nation, or the bill's solution is in search of a problem.

Moreover, the bill fails to address the actual collective action problem at the core of malware data sharing. By allowing companies to specify how malware data is shared with other private parties, the broader cybersecurity community, whose operations dwarf those of the federal government, need not be materially enriched in any way. In essence, the government seeks to establish a cybersecurity clearinghouse that need enrich only itself. The government should provide additional resources and tools to companies willing to make common cause with one another in the cybersecurity fight, not reward companies that share data -- which is very loosely defined by the bill, is exempt from the Privacy and Freedom of Information Acts, and may include PII and customer-created content -- with it and it alone.

Secondly, the bill takes the extraordinary step of immunizing all participating companies from any criminal or civil liability as a result of sharing information or failing to act on information they receive. Content providers like ISPs and hosting providers are already immunized for failure to take action on malware reports under section 230 of the CDA as courts have interpreted the law; further grants of immunity should be conditioned on at least a minimal standard of accountability for gross negligence in the handling of such data.

The Washington Post has reported that in response to objections from privacy advocates and concerns from the White House, the bill has been amended to include protections against coercive data sharing practices and oversight by the intelligence community Inspector General. It's a step in the right direction, but does little to cure the bill's other flaws, including facilitating sharing of irrelevant, private information and use of data submitted for purposes other than cybersecurity defense. While increased sharing of cybersecurity information within the Internet ecosystem is a laudable goal, Congress should seriously consider an approach that emphasizes data sharing within the private sector, and better protect the general public from abuse.