MIT Internet Traffic Analysis Study (MITAS)

Project Description and Research Goals

MITAS is a research project at the MIT Computer Science and Artificial Intelligence laboratory (CSAIL) under the direction of David Clark with the collaboration and financial support of a number of Internet Service Providers (ISPs) serving customers in the United States and abroad.

The goal of this project is to undertake novel empirical research of ISP traffic data collected from participating ISPs serving a cross-section of geographically dispersed markets using a variety of network architectures. Detailed non-personally identifiable traffic data will be collected from operators over time that enables empirically valid characterizations of both aggregate and subscriber level traffic patterns. A related goal is to demonstrate the viability of collecting and sharing this ISP traffic data in a secure way that respects concerns about ISP confidentiality and end-user privacy. No personally identifiable information about any participating ISPs' users will be used in this project.

The data and analysis will enhance the collective understanding of broadband trends and facilitate better forecasting and scenario analyses by ISPs and other stakeholders in the Internet value chain. Better data and collection methodologies are needed to inform the industry, the network research community, and policy discussions about appropriate technical and business approaches to traffic management. This is particularly significant in light of the pressures imposed on last-mile access providers from the growth in broadband traffic and concerns from end-users and upstream application providers that resources be managed fairly and efficiently.

The project is being organized around a series of one-year, bilateral research agreements negotiated between MIT and participating ISPs. This structure was adopted to provide a scalable model for funding this research and for managing the collection and management of confidential, non-personally identifiable ISP traffic data. The first such agreement was signed in December 2008, formally launching the project. The hope is that the ISP sponsors will find the initial work sufficiently valuable so that the project will be continued beyond the expiration of the initial one-year agreements.

Key deliverables of the project include:

  • White papers and research publications in peer-reviewed journals documenting the research results and progress. This will include both theoretical and empirical research. Papers will include traffic modeling, documentation of process learning, and empirical analysis of aggregate and anonymous per-user traffic data, as well as work on identifying appropriate traffic metrics and measurement strategies.
  • Traffic data repository including electronic datasets data which may be provided to ISP partners for subsequent analysis. We also hope to enable sharing of suitably anonymized data among ISP partners and, potentially, with unaffiliated researchers.
  • Workshops and related outreach to enhance multidisciplinary research and understanding of broadband traffic growth and its implications for such issues as traffic management, network investment, and architecture.

Frequently Asked Questions

Questions

  1. What sorts of data will this project collect from operators?
  2. How does this project plan to use this data? How does it plan to share this data with the research community?
  3. How does this project preserve user privacy? Confidentiality? Proprietary interests?
  4. Why are we encouraging network operators to make data available to the research community?
  5. Which ISPs are participating in this research project?
  6. Whom can I contact for more information?

Responses

  1. What sorts of data will this project collect from operators?

    We will be collecting traffic data over time from a collection of broadband access ISPs operating across a diverse mix of geographically distributed markets in the US and abroad and operating using a variety of access platform technologies (including DSL and cable modem systems). The data will include per subscriber line and aggregate traffic data, which in some cases will allow us to stratify traffic by application type and to observe per-subscriber and aggregate traffic patterns over differing time scales. To preserve subscriber and provider privacy, no personally identifiable information will be included in the data, and ISP data will be coded so as to prevent mapping data back to particular ISPs.

  2. How does this project plan to use this data? How does it plan to share this data with the research community?

    The goal of this research is to enhance our understanding of broadband traffic. While individual ISPs have varying levels of insight into the traffic on their own networks, there is only limited opportunity to observe traffic across multiple ISPs, and even more limited availability of traffic data among non-ISP analysts. A better understanding of broadband traffic will enhance our collective ability to model traffic, provision and manage networks, assess traffic trends, and investigate market and network phenomena. The principal vehicle for sharing this data with researchers who are not part of this project will be via published research in white papers and peer-reviewed academic and industry publications, via the public website (http://mitas.csail.mit.edu), and via presentations and participation in workshops and conferences. Eventually, we hope to share representative data sets via the web. The design of these data sets (what to collect, what to report, and how to organize the data) and how best to share the data (to minimize collection and maintenance costs and to protect user privacy and ISP confidentiality) are anticipated research products of this effort.

    The data set design aspects represent a significant challenge since the potential data that could be collected and maintained is vast and there is no general consensus on what data are most important or how best to construct summary metrics.

  3. How does this project preserve user privacy? Confidentiality? Proprietary interests?

    A key design feature of this research project is to protect user privacy and ISP confidentiality. Concerns regarding these matters have posed a significant impediment to collecting relevant data in the past. The organization of this project since its inception has reflected our concern to confront these issues head-on. First, we are not planning to collect any personally identifiable information (PII). For example, the per-subscriber data we are collecting will be scrubbed so that no PII data or data that would make it feasible to identify an individual subscriber (e.g., the MAC address) will be provided to MIT researchers. In the case of the MAC address, the data is anonymized so that patterns may be analyzed over time, but the subscriber's MAC address is not observable. Second, each of the participating ISPs is contributing the raw data to MIT under the protection of a bilateral contract and a Non-Disclosure Agreement that provides an added layer of protection for both user privacy and ISP confidentiality. Third, the project researchers are committed to using best-practice data management procedures when handling the data and will operate in accordance with MIT's guidelines for research involving human subjects as set forth by the Institutional Review Board (IRB). At each stage in the process, the need to protect user privacy and ISP confidentiality is considered as the research progresses to minimize the risk that confidentiality will be breeched.

  4. Why are we encouraging network operators to make data available to the research community?

    The data collection effort anticipated here addresses uncharted territory. It is best pursued via a process that is flexible and able to evolve as our understanding of the data and appropriate metrics evolves. By relying on voluntary participation, we ensure cooperative engagement in this process and avoid the incentive problems and costs associated with a regulatory proceeding. Regulation imposes direct and indirect costs and distorts behavior in markets, and consequently, most communications policy analysts favor light-handed regulation when it suffices. Seeking to collect an equivalent data set under regulatory mandate would impose additional delays, disclosure burdens, and would result in a less-flexible and responsive and potentially much less detailed set of data being made available for analysis. The voluntary process being employed in this project allows us to maximize opportunities to collect more detailed data and to explore cost-benefit trade-offs from alternative strategies. Finally, this project does not preclude collecting data under a regulatory mandate if that is deemed to be desirable.

  5. Which ISPs are participating in this research project?

    At the time of the writing of this FAQ, there are six ISPs participating: Clearwire, Comcast, Liberty Global, Rogers Communications, Telus, and Time Warner Cable. Additional ISPs and other industry partners may participate in the future.

  6. Whom can I contact for more information?

    For more information, contact Bill Lehr (wlehr@mit.edu).

Frequently asked questions as a printable PDF

Related Papers

MIT Project Members

Member Resources

For project member working documents: see (http://documents.csail.mit.edu/broadband) (member's only)