Electronic Thesis and Dissertation Repository

Leveraging Homomorphic Encryption for Privacy-Preserving Data Analysis

Mounika Pratapa, The University of Western Ontario

Abstract

The reduced cost of genome sequencing opened a vast potential for genetic research. However, ethical and privacy concerns prohibit the free sharing of genomic data across institutions. Homomorphic encryption(HE) enables us to perform computations on encrypted data. Our research aims to develop better protocols for performing genomic data analysis while keeping sensitive information secure, focusing on improving their security, communication overhead, and computational complexity. We divide our research into three areas. (1) Secure Function Extensions to Additively HE Cryptosystems (SFE) research presented a novel approach to extend the functionality of additively HE schemes. This approach lets us securely compute functions with a finite integer domain mapped to a binary range. Our results indicate that this extension makes linear HE schemes practical for secure database query and PPML applications, offering a less computationally intense alternative to FHE. (2) Secure database querying performs secure searches in encrypted genomic databases and ensures the querier's and database owner's privacy. Our application achieves the required functionality in a single communication round compared to previous work by searching 100,000 records in under 35 seconds. (3) Privacy-Preserving Machine Learning(PPML) under two-party setting: (a) Autoencoders for secure genotype imputation that use FHE for security and quantization-aware training for optimization. Our results achieved better accuracy than related work; (b) TransPHErmer is the first secure transformer inference protocol built entirely on additively HE, ensuring that no intermediate results are exposed to either of the parties. We introduce a novel thresholded softmax attention mechanism, which eliminates the need for approximations when working with encrypted data and achieves ideal accuracy levels with significantly reduced communication overhead.