Anomaly Detection in Graph Streams


The main objective of this project is to develop scalable algorithms for learning normative patterns and anomalies in graph streams, where the patterns are known, unknown but fixed, or changing over time. The project team is pursuing several techniques, including partitioning the graph over time, processing only the changes to the graph over time, and parallel implementations on high-performance computing platforms. They are evaluating the effectiveness and efficiency of these algorithms in terms of expected data sizes, data rates, and recall/precision using several real-world, large, dynamic datasets as well as synthetic data. They are also evaluating the discovered patterns and anomalies for their significance in the target domains. This research is advancing the knowledge and understanding of how to efficiently process large, high-rate data streams represented as a graph in order to learn structural patterns and detect structural anomalies in real time. The algorithms developed under this project represent a new level of scalability that is necessary to address today's massive, dynamic data environments, as well as users' needs to quickly discover actionable intelligence in the form of trends and anomalies. This project impacts the scientific research community by advancing the state-of-the-art in mining graphs for patterns and anomalies in large, dynamic data streams, and disseminating these research results via publications, software tools and data to be provided on the project website.

This project also impacts education via the inclusion of research results into existing courses at the teams' institutions, and the dissemination of these curricular materials via the project website. The project supports the research training of two graduate students, utilizing recruiting efforts from underrepresented groups to assist in the selection of these students. The project benefits society by providing efficient and effective tools for detecting patterns and anomalies in data that can lead to new discoveries in a variety of domains where large amounts of dynamic data are available, including national security, cyber-security, and social media.

This material is based upon work supported by the National Science Foundation under Grant No. IIS-1318913 and IIS-1318957.