Electronic Thesis and Dissertation Repository

Degree

Master of Engineering Science

Program

Electrical and Computer Engineering

Supervisor

Dr. Aleksander Essex

Abstract

Record Linkage is a process of combining records representing same entity spread across multiple and different data sources, primarily for data analytics. Traditionally, this could be performed with comparing personal identifiers present in data (e.g., given name, surname, social security number etc.). However, sharing information across databases maintained by disparate organizations leads to exchange of personal information pertaining to an individual. In practice, various statutory regulations and policies prohibit the disclosure of such identifiers. Private record linkage (PRL) techniques have been implemented to execute record linkage without disclosing any information about other dissimilar records.

Various techniques have been proposed to implement PRL, including cryptographically secure multi-party computational protocols. However, these protocols have been debated over the scalability factors as they are computationally extensive by nature. Bloom filter encoding (BFE) for private record linkage has become a topic of recent interest in the medical informatics community due to their versatility and ability to match records approximately in a manner that is (ostensibly) privacy-preserving. It also has the advantage of computing matches directly in plaintext space making them much faster than their secure mutli-party computation counterparts. The trouble with BFEs lies in their security guarantees: by their very nature BFEs leak information to assist in the matching process. Despite this known shortcoming, BFEs continue to be studied in the context of new heuristically designed countermeasures to address known attacks.

A new class of set-intersection attack is proposed in this thesis which re-examines the security of BFEs by conducting experiments, demonstrating an inverse relationship between security and accuracy.

With real-world deployment of BFEs in the health information sector approaching, the results from this work will generate renewed discussion around the security of BFEs as well as motivate research into new, more efficient multi-party protocols for private approximate matching.


Share

COinS