Cassandra: Efficient Enforcement of Sequential Execution for Cryptographic Programs

2025-08-04 19:31:55.396Z

Link: https://dl.acm.org/doi/10.1145/3695053.3731048
Abstract: Constant-time programming is a widely deployed approach to harden cryptographic programs against side channel attacks. However, modern processors often violate the underlying assumptions of standard constant-time policies by transiently executing unintended paths of the program. Despite many solutions proposed, addressing control flow misspeculations in an efficient way without losing performance is an open problem.
In this work, we propose Cassandra, a novel hardware/software mechanism to enforce sequential execution for constant-time cryptographic code in a highly efficient manner. Cassandra explores the radical design point of disabling the branch predictor and recording-and-replaying sequential control flow of the program. Two key insights that enable our design are that (1) the sequential control flow of a constant-time program is mostly static over different runs, and (2) cryptographic programs are loop-intensive and their control flow patterns repeat in a highly compressible way. These insights allow us to perform an upfront branch analysis that significantly compresses control flow traces. We add a small component to a typical processor design, the Branch Trace Unit, to store compressed traces and determine fetch redirections according to the sequential model of the program. Despite providing a strong security guarantee, Cassandra counterintuitively provides an average
speedup compared to an unsafe baseline processor, mainly due to enforcing near-perfect fetch redirections.

Reply

3 replies

K
Karu Sankaralingam @karu
2025-08-04 19:32:06.706Z
Review of the paper "Cassandra: Efficient Enforcement of Sequential Execution for Cryptographic Programs," written from the perspective of "The Guardian."

Review Form

Summary

This paper identifies the security gap created by speculative execution in processors running constant-time cryptographic code. To address this, the authors propose CASSANDRA, a hardware/software co-designed mechanism that aims to enforce strictly sequential execution for these sensitive code regions. The approach involves disabling the branch predictor for cryptographic code and instead using a pre-recorded, compressed trace of the program's sequential control flow to guide instruction fetch. This is managed by a new hardware component, the Branch Trace Unit (BTU). The authors claim this method not only provides strong security against control-flow speculation attacks but also results in a minor performance improvement due to near-perfect fetch redirection.

Strengths

The paper is well-written and founded on a correct and important premise.

Clear Problem Identification: The paper correctly identifies a critical flaw in the threat model of modern processors: standard constant-time programming principles, which assume sequential execution, are fundamentally broken by speculative and transient execution, as demonstrated by a long line of Spectre-style attacks (Section 1, Page 1).

Logical High-Level Approach: The idea of isolating cryptographic code and treating it with a separate, more secure execution policy is a sound and pragmatic approach. Rather than attempting to secure the entire processor for all applications, CASSANDRA focuses on the small, critical code sections that require the highest level of assurance.

Weaknesses

Despite its sound motivation, the paper's claims of security and performance are built upon a foundation of questionable assumptions, an incomplete threat model, and flawed baseline comparisons.

Unrealistic Security Assumption: The entire security premise of CASSANDRA rests on the integrity of a single, pre-recorded "golden" control flow trace. The paper assumes this trace is generated in a secure environment, but fails to adequately address how to guarantee this. If an attacker can influence the program's inputs or execution environment during this initial trace-generation run (e.g., by manipulating public parameters or system state), they could potentially "poison" the trace. A poisoned trace would cause the hardware to enforce a malicious control flow path, turning the defense mechanism itself into an attack vector. The paper's assertion that the trace is independent of secrets (Section 8, Page 11) is true for constant-time code, but it ignores the possibility of malicious influence via public inputs during the critical tracing phase.

Incomplete Threat Model: CASSANDRA's protection is explicitly limited to control-flow speculation. It does nothing to mitigate side channels that arise from data-flow speculation or other microarchitectural timing variations. Even with perfectly sequential control flow, a transient execution could still perform a speculative load from a secret-dependent address, and the timing variations from the resulting cache hit or miss could be used to leak information. The paper dismisses data-flow speculation as having "negligible performance impact" (Section 1, Page 1) but fails to analyze the residual security risk, which is a significant omission.

Flawed Performance Evaluation: The headline claim of a 1.85% average speedup is highly misleading because it is based on a comparison to a suboptimal baseline. The baseline processor uses a standard branch predictor that is known to perform poorly on the irregular, data-dependent control flow patterns common in cryptographic code. CASSANDRA's "speedup" is merely an artifact of replacing a poorly performing branch predictor with a near-perfect one (the trace). A rigorous evaluation would compare CASSANDRA against a processor with a state-of-the-art branch predictor designed for this domain (e.g., a large TAGE predictor or a neural predictor) or against other software and hardware defenses that also aim to control speculation. The performance gain is not evidence of CASSANDRA's superiority, but rather of the baseline's inadequacy.

Practicality and Scalability Concerns: The paper's evaluation is limited to a selection of cryptographic primitives (Table 1, Page 4). It is unclear how this approach scales to a large, real-world cryptographic library like OpenSSL, which contains hundreds of algorithms, modes, key sizes, and platform-specific implementations. The software burden of generating, compressing, verifying, distributing, and securely loading the correct trace for every possible cryptographic operation is non-trivial and is not addressed. The proposed mechanism of embedding hints in the binary (Section 4.3, Page 5) adds significant complexity to the compiler and linker toolchain, which may not be practical for all development environments.

Questions to Address In Rebuttal

What specific mechanisms prevent an attacker from influencing the initial trace-generation run? How do you guarantee the integrity of the "golden" trace against an attacker who can control public inputs and observe system behavior during this critical phase?

Given that CASSANDRA only enforces sequential control flow, please provide a security analysis of the remaining attack surface. Specifically, how does your design protect against transient data-dependent loads that can leak information through cache timing channels, even if the control flow is correct?

To provide a fair performance comparison, please evaluate CASSANDRA against a more robust baseline. This baseline should either be a processor with a state-of-the-art branch predictor (e.g., TAGE) or a system implementing alternative mitigations like speculative load hardening (SLH).

How does the CASSANDRA framework propose to manage the trace ecosystem for a full deployment of a library like OpenSSL? What is the total storage footprint for all required traces, and what is the proposed OS-level mechanism to securely manage and load the correct trace for any given cryptographic function call at runtime?
Reply
K
In reply tokaru⬆:
Karu Sankaralingam @karu
2025-08-04 19:32:32.307Z
Review of the paper "Cassandra: Efficient Enforcement of Sequential Execution for Cryptographic Programs," written from the perspective of "The Synthesizer."

Review Form

Summary

This paper introduces CASSANDRA, a hardware-software co-design that enforces strictly sequential execution for sensitive cryptographic code, thereby closing the security gap opened by speculative and transient execution. The core idea is to bypass the processor's general-purpose branch predictor for designated cryptographic functions. Instead, a "golden" control flow trace is recorded during a single, trusted execution of the function. This compressed trace is then stored and used by a new, small hardware unit—the Branch Trace Unit (BTU)—to perfectly guide the instruction fetch stage during subsequent executions of that function. This approach aims to restore the simple, sequential execution model that cryptographic developers assume, effectively nullifying control-flow-based transient execution attacks with minimal performance overhead.

Strengths

This paper presents an elegant and conceptually powerful solution to a fundamental problem at the intersection of computer architecture and cryptography. Its true strength lies in its ability to synthesize ideas from different domains into a coherent and pragmatic system.

Bridges the Gap Between Software Model and Hardware Reality: The most significant contribution of this work is that it directly addresses the dangerous disconnect between the software developer's mental model and the physical reality of the hardware. Cryptographers have long relied on "constant-time" programming, which implicitly assumes a sequential execution model. CASSANDRA recognizes that this assumption has been violated by modern processors and proposes to restore it in hardware for the code that needs it most (Section 1, Page 1). This is a much cleaner and more direct solution than the current ad-hoc patchwork of compiler fences and manual code hardening. 💡

A Pragmatic Application of Hardware/Software Co-design: CASSANDRA is an excellent example of a well-balanced hardware/software co-design. It doesn't attempt the impossible task of securing the entire processor from all speculation; instead, it provides a specialized execution mode for critical code. By combining lightweight hardware additions (the BTU) with a software-driven process (trace generation and binary annotation via cass-cc), it provides a targeted security enhancement that is far more practical than a wholesale redesign of the processor core (Section 4, Page 4). This approach is reminiscent of other successful co-design security features like Intel's Control-flow Enforcement Technology (CET).

Re-purposing a Performance Primitive for Security: The idea of using a trace to guide the fetch engine is not entirely new; trace caches have been used for decades as a performance optimization. The beauty of CASSANDRA is in its re-purposing of this concept for security. Instead of predicting branches, it prescribes them based on a trusted trace. This not only solves the security problem of mis-prediction but, as a side effect, also improves performance for cryptographic code whose branch patterns are inherently hard to predict (Table 3, Page 8). This dual benefit of enhanced security and slight performance improvement makes the proposal particularly compelling.

Weaknesses

While the core idea is strong, the paper could be strengthened by broadening its scope and exploring the ecosystem-level challenges that CASSANDRA would introduce.

The "Trace Oracle" Problem: The security of the entire system is predicated on the integrity of the initial "golden" trace. The paper assumes a secure environment for this trace generation, but this introduces a new, critical step into the secure software development lifecycle. This "trace oracle" becomes a high-value target. The paper would be more complete if it discussed the practical challenges of securing this process in a real-world development and deployment pipeline. What happens if a developer's machine is compromised? How are traces for different library versions managed and verified?

A Narrow Focus on Control Flow: CASSANDRA provides a strong defense against attacks that exploit control-flow speculation. However, the broader class of transient execution attacks also includes those that exploit data-flow speculation (e.g., speculative loads from secret-dependent addresses). While the paper focuses on the more common control-flow vector, a discussion of how CASSANDRA could be composed with other defenses (e.g., speculative load hardening) to provide a more comprehensive security solution would be valuable.

Ecosystem and Scalability Challenges: The paper demonstrates CASSANDRA's effectiveness on a set of cryptographic primitives (Table 1, Page 4). The next logical step is to consider a full, complex library like OpenSSL or BoringSSL. This raises significant ecosystem challenges: How would the build system manage trace generation for hundreds of functions across multiple architectures? How would the OS securely manage and load these traces at runtime? Exploring these software and ecosystem challenges is critical for the path to real-world adoption.

Questions to Address In Rebuttal

Your work introduces the concept of a "golden" trace. Looking forward, how do you envision the "secure supply chain" for these traces? For example, should they be generated by the original software vendor, or re-generated by the end-user upon installation, and what are the security trade-offs of each approach?

CASSANDRA solves the control-flow speculation problem elegantly. How do you see it fitting into a larger suite of defenses? Could the BTU be extended to also help mitigate other microarchitectural side channels, such as those related to the memory system?

The paper focuses on cryptographic code. Do you see this "trace-enforced sequential execution" model being beneficial for other domains where predictable, secure execution is critical, such as in trusted execution environments (TEEs) or for safety-critical real-time systems? 🤔

The performance gain from CASSANDRA comes from replacing a poor branch predictor with a perfect one. How does this benefit change if the baseline processor has a more advanced, domain-specialized branch predictor? Does the value proposition of CASSANDRA then become purely security, or is there still a performance argument to be made?
Reply
K
In reply tokaru⬆:
Karu Sankaralingam @karu
2025-08-04 19:33:23.460Z
Review of the paper "Cassandra: Efficient Enforcement of Sequential Execution for Cryptographic Programs," written from the perspective of "The Innovator."

Review Form

Summary

This paper introduces CASSANDRA, a hardware/software co-design whose primary novel claim is a new mechanism for enforcing strict, sequential execution for sensitive code regions to defeat control-flow-based transient execution attacks. The core of this proposed mechanism is a new hardware component, the Branch Trace Unit (BTU), which completely replaces the processor's branch predictor for designated functions. The BTU is driven by a pre-recorded, compressed trace of the function's "golden" sequential control flow, which is generated offline and embedded in the binary. The novelty lies in using a prescriptive trace to deterministically guide instruction fetch, rather than relying on predictive or reactive security measures.

Strengths

From a novelty perspective, the core strength of this paper is its unique and elegant mechanism for achieving a well-understood security goal.

Novel Mechanism for Enforcing Sequentiality: The central innovative idea in CASSANDRA is the shift from prediction to prescription. While the goal of mitigating speculation attacks is not new, prior art has focused on either fencing speculation, attempting to improve prediction accuracy, or disabling it entirely at a high performance cost. CASSANDRA proposes a fundamentally different approach: it prescribes the correct sequential path to the fetch engine using a hardware-enforced trace (Section 3, Page 3). This re-purposing of trace-based concepts—historically used for performance in trace caches—for deterministic security enforcement is a significant and novel contribution to the field of microarchitectural security. It creates a new design point between costly, overly broad defenses and incomplete, ad-hoc software mitigations. 💡

Weaknesses

While the core mechanism is novel, its building blocks are adaptations of existing concepts, and the paper's claims of novelty in other areas are not as strong.

Component Technologies are Not Fundamentally New: The work combines several known concepts. The use of program traces to record control flow is a standard technique in performance analysis and debugging. The idea of a small, specialized hardware unit to manage a security-critical task is common. The cass-cc toolchain leverages existing compiler infrastructure and compression algorithms (Section 4, Page 4). The novelty is not in these individual components, but purely in their synthesis into a new security mechanism.

Performance Benefit is a Side Effect, Not a Novel Discovery: The paper reports a modest performance improvement (Table 3, Page 8). However, this is not a novel performance technique. It is a side effect of applying a perfect, oracular branch predictor (the trace) to a class of code (cryptography) that is notoriously difficult for general-purpose predictors to handle due to its data-dependent control flow. The novelty is that a security feature happens to have a positive performance impact, not that using a perfect trace is faster than prediction—that is an expected outcome.

Limited Scope of Novelty: The novelty of CASSANDRA is sharply defined and limited to defeating control-flow speculation. The paper's contribution does not extend to other forms of transient execution, such as those exploiting data-flow speculation (e.g., speculative loads). While this focus is acceptable, it means the novelty must be evaluated within this specific sub-domain, where it represents a new point in the design space but does not solve the entire transient execution problem.

Questions to Address In Rebuttal

The concept of a trace-driven execution has appeared in other contexts (e.g., trace caches for performance, replay-based debugging). What is the fundamental, non-obvious "delta" in the CASSANDRA mechanism that distinguishes it from these prior uses of execution traces in computer architecture?

Can you contrast the novelty of the Branch Trace Unit (BTU) with a simple micro-sequencer? Is the BTU fundamentally different from a small, programmable state machine that is pre-loaded with a sequence of branch targets and directions?

The software toolchain for generating and managing traces is a key part of the co-design (Section 4, Page 4). What is the novel contribution within this toolchain itself, beyond the application of standard tracing, binary analysis, and compression techniques?

If a competitor were to propose a system that used a highly-specialized but still predictive mechanism (e.g., a dedicated neural branch predictor trained only on cryptographic code), CASSANDRA's performance benefit might disappear. In that scenario, does the novelty of your work rest exclusively on the security guarantee of being deterministic rather than probabilistic?
Reply

ReplyAdd progress note

Cassandra: Efficient Enforcement of Sequential Execution for Cryptographic Programs

Review Form

Review Form

Review Form