Sarus Technologies has made significant strides in developing a privacy-preserving backbone for dataset manipulation, representing a core infrastructure designed to securely run remote data analysis jobs. The fundamental principle driving this innovation is the "computation-to-data" paradigm, which ensures that sensitive information remains within its controlled environment while analyses are brought to the data. This approach aims to resolve the "innovation-privacy dilemma," enabling robust analytics and AI model training without compromising individual privacy, a critical shift from traditional data-sharing models that proved insufficient against re-identification risks.
A foundational achievement of this backbone is its sophisticated mechanism for tracing individual "Privacy Units" (PUs) across complex data transformations, including those spanning multiple rows and tables in relational databases. This capability is crucial for providing meaningful user-level Differential Privacy (DP) guarantees, which protect all data associated with a single individual as a unified entity. Furthermore, the system supports recursive DP compilation and incorporates a meticulous privacy accountant that tracks cumulative privacy loss (ϵ,δ) across iterative analytical workflows, ensuring that the predefined total privacy budget is never exceeded, even through repeated queries.
In the realm of Differentially Private SQL (DP-SQL), Sarus introduced Qrlew, an open-source library that functions as a SQL-to-SQL rewriter. This innovation allows a standard SQL query to be intercepted, transformed into a mathematically equivalent but differentially private version, and then compiled back into standard SQL for execution on any existing SQL datastore. Qrlew employs a proprietary Intermediate Representation (IR) called "Relation" and utilizes advanced range propagation techniques like k-Intervals and Piecewise-Monotonic Functions to precisely calculate query sensitivity, thereby minimizing the noise added and maximizing utility. It also provides a flexible, declarative language for data owners to specify how individuals are identified across complex relational schemas, enabling true user-level privacy.
For Differentially Private Artificial Intelligence (DP-AI), Sarus has addressed the formidable challenges of applying DP-SGD (Differentially Private Stochastic Gradient Descent) to large-scale models, particularly Large Language Models (LLMs). The core innovation lies in a technology stack that synergistically combines DP-SGD with Parameter-Efficient Fine-Tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA) and QLoRA, alongside optimizations like quantization and gradient checkpointing. This makes DP fine-tuning practical and efficient for enterprise applications, with empirical validation demonstrating a successful utility-privacy trade-off, including experimental support for models like Mistral 7B and Llama2 7B. Additionally, Sarus developed DP-RAG, a novel framework for Differentially Private Retrieval-Augmented Generation, applying privacy mechanisms to both document retrieval and response generation to mitigate data leakage risks.
The enterprise readiness of Sarus's privacy backbone is powerfully demonstrated through its synergistic integration with Azure Confidential Clean Rooms. This creates a sophisticated, two-layer defense-in-depth architecture: Azure Confidential Clean Rooms provide hardware-level protection using confidential computing, while the Sarus backbone acts as an automated, dynamic application-level privacy enforcement layer within this enclave. This collaboration, publicly announced with Microsoft and EY, replaces slow, manual pre-approval processes with real-time, automated privacy enforcement, enabling agile multi-party collaboration for highly sensitive data, such as financial crime detection with Canadian banks.
Overall, these achievements deliver a robust, practical system that bridges the gap between theoretical Differential Privacy and the complex demands of enterprise data science. Sarus has also released open-source Python libraries, such as arena-ai
and structured-logprobs
, to foster trust and broader adoption of privacy-enhancing technologies. Future work includes expanding DP mechanisms in Qrlew, improving the efficiency and utility of private LLM training and inference, and exploring new applications for this composite architecture with confidential computing.