Electronic Thesis and Dissertation Repository

Degree

Doctor of Philosophy

Program

Electrical and Computer Engineering

Supervisor

Dr. Abdelkader Ouda

Abstract

The main concept behind business intelligence (BI) is how to use integrated data across different business systems within an enterprise to make strategic decisions. It is difficult to map internal and external BI’s users to subsets of the enterprise’s data warehouse (DW), resulting that protecting the privacy of this data while maintaining its utility is a challenging task. Today, such DW systems constitute one of the most serious privacy breach threats that an enterprise might face when many internal users of different security levels have access to BI components. This thesis proposes a data masking framework (iMaskU: Identify, Map, Apply, Sign, Keep testing, Utilize) for a BI platform to protect the data at rest, preserve the data format, and maintain the data utility on-the-fly querying level. A new reversible data masking technique (COntent BAsed Data masking - COBAD) is developed as an implementation of iMaskU. The masking algorithm in COBAD is based on the statistical content of the extracted dataset, so that, the masked data cannot be linked with specific individuals or be re-identified by any means.

The strength of the re-identification risk factor for the COBAD technique has been computed using a supercomputer where, three security scheme/attacking methods are considered, a) the brute force attack, needs, on average, 55 years to crack the key of each record; b) the dictionary attack, needs 231 days to crack the same key for the entire extracted dataset (containing 50,000 records), c) a data linkage attack, the re-identification risk is very low when the common linked attributes are used. The performance validation of COBAD masking technique has been conducted. A database schema of 1GB is used in TPC-H decision support benchmark. The performance evaluation for the execution time of the selected TPC-H queries presented that the COBAD speed results are much better than AES128 and 3DES encryption. Theoretical and experimental results show that the proposed solution provides a reasonable trade-off between data security and the utility of re-identified data.

Share

COinS