Future Generation Computer Systems
URL with Digital Object Identifier
The emergence of Big Data has had profound impacts on how data are stored and processed. As technologies created to process continuous streams of data with low latency, Complex Event Processing (CEP) and Stream Processing (SP) have often been related to the Big Data velocity dimension and used in this context. Many modern CEP and SP systems leverage cloud environments to provide the low latency and scalability required by Big Data applications, yet validating these systems at the required scale is a research problem per se. Cloud computing simulators have been used as a tool to facilitate reproducible and repeatable experiments in clouds. Nevertheless, existing simulators are mostly based on simple application and simulation models that are not appropriate for CEP or for SP. This article presents CEPSim, a simulator for CEP and SP systems in cloud environments. CEPSim proposes a query model based on Directed Acyclic Graphs (DAGs) and introduces a simulation algorithm based on a novel abstraction called event sets. CEPSim is highly customizable and can be used to analyze the performance and scalability of user-defined queries and to evaluate the effects of various query processing strategies. Experimental results show that CEPSim can simulate existing systems in large Big Data scenarios with accuracy and precision.