On Evolution of Ransomware & Malware Analysis in Cryptography

DPID: 909

Abstract

In collaborative learning (CL), multiple parties jointly train a machine learning model on their private datasets. However, data can not be shared directly due to privacy concerns. To ensure input confidentiality, cryptographic techniques, e.g., multi-party computation (MPC), enable training on encrypted data. Yet, even securely trained models are vulnerable to inference attacks aiming to extract memorized data from model outputs. To ensure output privacy and mitigate inference attacks, differential privacy (DP) injects calibrated noise during training. While cryptography and DP offer complementary guarantees, combining them efficiently for cryptographic and differentially private CL (CPCL) is challenging. Cryptography incurs performance overheads, while DP degrades accuracy, creating a privacy-accuracy-performance trade-off that needs careful design considerations. This work systematizes the CPCL landscape. We introduce a unified framework that generalizes common phases across CPCL paradigms, and identify secure noise sampling as the foundational phase to achieve CPCL. We analyze tradeoffs of different secure noise sampling techniques, noise types, and DP mechanisms discussing their implementation challenges and evaluating their accuracy and cryptographic overhead across CPCL paradigms. Additionally, we implement identified secure noise sampling options in MPC and evaluate their computation and communication costs in WAN and LAN. Finally, we propose future research directions based on identified key observations, gaps and possible enhancements in the literature. Index Terms-Differential privacy, cryptography, collaborative machine learning * Work done while he was at SAP SE. (CDP), users send raw data to a trusted third party (TTP) to add noise to the computation output. To avoid sharing raw data, in the local model (LDP), each user adds noise to its data. However, LDP yields lower accuracy than CDP. For example, Google's LDP telemetry system [6] failed to detect a common signal among 1 million users, despite billions of user reports. Enhancing cryptographic collaborative learning with DP to realize cryptographic and differentially private collaborative learning (CPCL) provides: (I) input confidentiality via cryptography, (II) output privacy via DP, and (III) high accuracy via secure sampling of CDP noise without a TTP. CPCL is an emerging topic with growing interest from academia [7]-[11] and industry [12]-[14]. For example, Google's large-scale deployment [13] enables next-word predictions for Gboard keyboards by aggregating masked noisy information from clients to satisfy DP. Apple uses DP and secure aggregation to learn popular scenes photographed by iOS users to create personalized Memories . While various SoKs [16]-[18] cover cryptographic CL, the integration of DP in cryptographic CL lacks a comprehensive systematization of key techniques, design considerations, and trade-offs. This work bridges this gap by introducing a comprehensive framework for CPCL, analyzing secure noise sampling techniques, and evaluating performance-accuracy trade-offs across learning paradigms. From our systematization, we identify two main learning paradigms: federated learning (FL) and outsourced learning (OL). In FL, data-holding clients iteratively encrypt and send local model updates to servers aggregating them into a global update (Sec. IV-C). In contrast, OL clients send their encrypted data to servers to train on global encrypted data (Sec. IV-D). While cryptography and DP offer complementary guarantees, their integration is challenging since both introduce performance overhead and accuracy trade-offs: Cryptography performance overhead: cryptographic techniques incur high communication/computation costs, e.g., computation-intensive HE or communication-intensive MPC. Overhead also depends on the learning paradigm: Sec. VI shows that OL can be 10 3 × slower than plaintext, while FL is 10× faster than OL but leaks intermediate model updates. Cryptography performance-accuracy trade-off: cryptographic techniques rely on fixed-point arithmetic trading accuracy for efficiency [2], and introducing numerical errors. OL approximates non-linear operations, e.g., Softmax, affect-This work has been accepted for publication at the IEEE Conference on Secure and Trustworthy Machine Learning (SaTML 2026). The final version will be available on IEEE Xplore.

Download PDF