Proc. of the 3rd Int. Congress on Big Data (IEEE BigData 2014)
Performing predictive modelling, such as anomaly detection, in Big Data is a difficult task. This problem is compounded as more and more sources of Big Data are generated from environmental sensors, logging applications, and the Internet of Things. Further, most current techniques for anomaly detection only consider the content of the data source, i.e. the data itself, without concern for the context of the data. As data becomes more complex it is increasingly important to bias anomaly detection techniques for the context, whether it is spatial, temporal, or semantic. The work proposed in this paper outlines a contextual anomaly detection technique for use in streaming sensor networks. The technique uses a well-defined content anomaly detection algorithm for real-time point anomaly detection. Additionally, we present a post-processing context aware anomaly detection algorithm based on sensor profiles, which are groups of contextually similar sensors generated by a multivariate clustering algorithm. Our proposed research has been implemented and evaluated with real-world data provided by Powersmiths, located in Brampton, Ontario, Canada.