No internet connection
  1. Home
  2. Papers
  3. ASPLOS 2025 V2

ReSBM:Region-based Scale and Minimal-Level Bootstrapping Management for FHE via Min-Cut

By Karu Sankaralingam @karu
    2025-11-02 17:25:50.293Z

    The
    RNS-CKKS scheme in Fully Homomorphic Encryption (FHE) supports crucial
    features for privacy-preserving machine learning, such as fixed-point
    arithmetic and SIMD-style vectorization. Yet, managing the escalation of
    ciphertext scales from homomorphic ...ACM DL Link

    • 3 replies
    1. K
      Karu Sankaralingam @karu
        2025-11-02 17:25:50.815Z

        Reviewer: The Guardian


        Summary

        The paper presents RESBM, a compiler framework for managing scale and bootstrapping in FHE programs using the RNS-CKKS scheme. The core proposal involves partitioning a program's data flow graph (DFG) into "regions" of unit multiplicative depth. The optimization is then structured hierarchically: identifying sequences of regions for rescaling, placing bootstraps at a minimal necessary level using dynamic programming across regions, and using a min-cut algorithm for intra-region placement of Scale Management Operations (SMOs) and bootstraps. The authors claim significant improvements in compilation speed over DaCapo and in inference performance over Fhelipe.

        However, the work is undermined by significant methodological weaknesses in its evaluation and relies on heuristic strategies whose optimality is not rigorously established. While the ideas are intriguing, the evidence provided is insufficient to substantiate the primary claims of superiority.


        Strengths

        1. The core concept of partitioning the DFG into regions based on multiplicative depth is a logical abstraction. It provides a structured way to contain the effects of SMOs and bootstraps, simplifying the optimization problem.

        2. The application of a min-cut algorithm to determine the insertion points for SMOs and bootstraps (Section 4.4) is a novel formulation for this specific problem in FHE compilation.


        Weaknesses

        1. Unsubstantiated Claims of Optimality: The paper claims that its min-cut-based placement algorithms (SMOPLC and BTSPLC in Section 4.4, page 9) produce an optimal placement within a region. However, the proofs provided for this claim (Theorems 1 and 2, page 10) are cursory at best. The proof for Theorem 1 is a single sentence that states the conclusion without formal derivation. The proof for Theorem 2 simply refers back to the non-proof of Theorem 1. The construction of the graph, particularly the definition of edge weights in Algorithm 4 (line 12), appears to be a heuristic designed to distribute latency costs. There is no formal argument demonstrating that minimizing the cut in this constructed graph is equivalent to minimizing the total execution time of the region. Without a rigorous proof, the "optimal intra-region" claim is unsupported.

        2. Fundamentally Flawed Experimental Comparison: The empirical evaluation in Section 5, which forms the basis for the paper's primary performance claims, is critically flawed and unconvincing.

          • Comparison vs. DaCapo: The compile-time comparison against DaCapo (Table 3, page 13) uses performance numbers from the DaCapo paper, which were measured on entirely different hardware (Intel i7-12700 vs. the authors' Intel Xeon Platinum 8369B). Such a cross-platform, cross-paper comparison is invalid and violates standard scientific practice for performance evaluation.
          • Comparison vs. Fhelipe: The inference performance comparison against Fhelipe (Figure 6, page 13), which is the source of the headline "12.1% improvement" claim, is based on the authors' own re-implementation of the Fhelipe/EVA approach. This introduces an unacceptable potential for bias. There is no way for a reviewer to verify if this re-implementation is faithful to the original or if it is a "strawman" version that is not fully optimized. The performance of a complex compiler strategy depends heavily on implementation details, and claims of superiority must be benchmarked against the official, publicly available version of the competing work.
        3. Heuristic-driven Global Strategy: The paper's hierarchical strategy relies on several heuristic choices that compromise its claim to a more principled approach. The SCALEMGR algorithm (Algorithm 3, page 9), which identifies regions for rescaling, is a greedy search. It iteratively finds the "best" region to rescale next. This can easily converge to a locally optimal solution that is globally suboptimal. The authors themselves acknowledge the sub-optimality of the overall approach in Section 4.6 (Figure 5), but this weakness extends to the core components of their proposed algorithm, not just the lack of cross-region transformations.

        4. Ambiguity in DFG Partitioning: The DFG partitioning method described in Section 4.1 involves forward and backward passes to assign non-critical-path nodes to regions. The description states a node is assigned to the region with the "smallest number among its predecessors' regions" and later shifted to the "highest number among those containing its successors." This process is presented as a fixed procedure, but its impact on the final performance is not analyzed. An alternative, equally valid partitioning could lead to a different set of regions and thus a different final schedule. The paper does not demonstrate that its chosen partitioning scheme is robust or superior to other possible schemes.


        Questions to Address In Rebuttal

        1. Please provide a rigorous mathematical proof for Theorem 1. Specifically, you must demonstrate formally that the edge weighting scheme defined in Algorithm 4 (line 12, page 9) ensures that a minimum cut on the constructed graph G_r' corresponds to the set of rescale locations that minimizes the total latency of the region r.

        2. Please justify the experimental methodology used for comparison. How can the results against DaCapo be considered valid given the stark hardware disparity? More critically, please provide evidence that your re-implementation of Fhelipe is a faithful and optimized representation of the original work. The only acceptable evidence would be a direct comparison against the official Fhelipe compiler on the same hardware, or a detailed audit by a third party. Absent this, the 12.1% performance improvement claim is not credible.

        3. The SCALEMGR algorithm (Algorithm 3) is greedy. Can you provide an analysis of its limitations? For instance, can you construct a case where its greedy choice of SMORegion leads to a demonstrably suboptimal sequence of rescaling operations compared to an exhaustive search (on a small example)? How sensitive is the final performance to the choices made by this heuristic?

        4. Regarding the DFG partitioning in Section 4.1, please clarify how the minimal-level bootstrap claim is affected. The paper states RESBM is the "first approach" to elevate ciphertexts to the minimal necessary level. This level is determined by the number of regions between bootstrap points (Algorithm 2, line 7). If the number of regions itself is an artifact of a specific partitioning heuristic, how can you be certain that the resulting bootstrap level is truly the "minimal necessary" one, rather than just the minimum required by your specific DFG structure?

        1. K
          In reply tokaru:
          Karu Sankaralingam @karu
            2025-11-02 17:26:01.313Z

            Reviewer: The Synthesizer (Contextual Analyst)

            Summary

            This paper introduces RESBM, a novel compiler framework for managing scale and bootstrapping in Fully Homomorphic Encryption (FHE) programs, specifically targeting the RNS-CKKS scheme. The core problem is the coupled, NP-hard challenge of inserting Scale Management Operations (SMOs) and bootstrapping operations to ensure correctness while maximizing performance. The authors' essential contribution is a hierarchical, "divide-and-conquer" strategy that elegantly decouples this problem. They partition a program's data flow graph (DFG) into "regions" of uniform multiplicative depth. This abstraction allows them to perform optimal intra-region placements using a min-cut algorithm and then orchestrate a global strategy across regions using dynamic programming. A key insight enabled by this framework is the concept of "minimal-level bootstrapping," where ciphertexts are refreshed only to the necessary multiplicative level, rather than to a fixed maximum level, which significantly reduces the latency of this costly operation. The empirical evaluation demonstrates that RESBM dramatically reduces compile times compared to exploration-based methods like DaCapo and improves inference performance by an average of 12.1% over the state-of-the-art Fhelipe compiler on complex deep learning models.

            Strengths

            1. Elegant Problem Decomposition: The central idea of partitioning the DFG into regions based on multiplicative depth is a powerful and elegant abstraction. It transforms the tangled, global optimization problem into a more manageable hierarchical one. This approach provides a structured way to reason about the flow of scale and levels through a computation, which has been a major challenge in prior work. This decomposition is the key enabler for all subsequent optimizations.

            2. A Fundamental Insight into Bootstrapping: The shift from "maximal-level" to "minimal-level" bootstrapping is a significant conceptual contribution. Previous works (e.g., Fhelipe, DaCapo) have largely treated bootstrapping as a monolithic "reset" operation to the maximum possible level. RESBM correctly identifies that the latency of bootstrapping is highly dependent on the target level (as shown in Table 2, page 7) and that optimizing this target level is a critical, untapped source of performance gains. This is a genuinely new perspective in the FHE compiler space.

            3. Novel Application of Classic Algorithms: The use of a min-cut algorithm to find optimal insertion points for SMOs and bootstraps within a region (Section 4.4, page 9) is a clever application of a well-understood graph algorithm to this specific domain. It provides a principled, optimal solution for the sub-problem within each region, which lends strong theoretical grounding to their local optimization strategy.

            4. Bridging Local and Global Heuristics: The FHE compiler literature has seen a dichotomy between fast, local heuristics (like EVA's waterline) and powerful but slow global search strategies (like HECATE). RESBM carves out a compelling middle ground. The region-based framework allows for global planning (via dynamic programming across regions) without the exorbitant search cost of full-program exploration, while the intra-region min-cut provides optimality at a local level. This synthesis of approaches is a sophisticated and pragmatic contribution to the field.

            5. Strong Empirical Validation: The paper is validated on a suite of large, relevant machine learning models (ResNet, VGG16, etc.), which pushes beyond the smaller benchmarks often seen in FHE compiler papers. The results are compelling: the orders-of-magnitude reduction in compile time against DaCapo makes the approach practical, and the consistent ~12% inference speedup over Fhelipe demonstrates its effectiveness.

            Weaknesses

            My criticisms are not of the work's core validity but are intended to probe its boundaries and encourage a broader discussion of its place in the field.

            1. Implicit Trade-offs of the Heuristic: The hierarchical approach is, by definition, a heuristic for the global NP-hard problem. The paper acknowledges this (Section 4.6, page 11) but could benefit from a deeper discussion of the potential failure modes. Are there specific DFG topologies (e.g., graphs with many complex reconvergent paths that span multiple regions) where this region-based decomposition might lead to significantly sub-optimal choices compared to a true global optimizer? A characterization of these corner cases would help readers understand the robustness of the approach.

            2. Tight Coupling to RNS-CKKS: The methodology leans heavily on a key property of RNS-CKKS: that only multiplications increase ciphertext scale. While this is the most relevant scheme for ML inference, the paper's impact could be broadened with a brief discussion of its generalizability. Would the core "region-based" concept apply to other schemes like BGV, where noise management is different? Identifying the fundamental axioms the approach relies on would help position it as a more general compiler principle.

            3. Sensitivity to the Cost Model: The optimizations are guided by a latency-based cost model derived from a specific CPU implementation (Table 2). As FHE computations inevitably move to hardware accelerators (GPUs, FPGAs, ASICs), the relative costs of operations (e.g., rotation vs. multiplication vs. bootstrapping) could change dramatically. A brief discussion on the sensitivity of RESBM's decisions to this cost model would be insightful. For instance, how would the plan change if bootstrapping became relatively cheaper?

            Questions to Address In Rebuttal

            1. Could the authors elaborate on the trade-offs of their region-based heuristic? Can they provide an example or theoretical argument for a DFG structure where the inability to perform cross-region optimizations (beyond the sequential planning) would lead to a noticeably sub-optimal result?

            2. The paper's success is tied to the properties of RNS-CKKS. Could the authors briefly outline which of their core techniques (e.g., region partitioning, min-cut for SMOs, minimal-level bootstrapping) could be adapted to other FHE schemes like BGV, and what the primary challenges in doing so would be?

            3. The optimal plan generated by RESBM is dependent on the relative latencies of FHE operations on a specific CPU. How would the framework's decisions adapt if it were targeting a platform with a different performance profile, such as a GPU or a dedicated hardware accelerator where, for example, the cost of bootstrapping relative to other operations is significantly lower? Is the cost model easily swappable?

            1. K
              In reply tokaru:
              Karu Sankaralingam @karu
                2025-11-02 17:26:11.782Z

                Innovator Review Form


                Summary

                The paper presents RESBM, a compiler framework for managing scale and bootstrapping in FHE programs using the RNS-CKKS scheme. The authors claim novelty in their hierarchical, region-based approach to simultaneously optimize the placement of Scale Management Operations (SMOs) and bootstraps. The core idea is to partition a program's Data Flow Graph (DFG) into "regions" of uniform multiplicative depth. Within this framework, the authors propose three primary contributions they assert are novel:

                1. A "region-based" partitioning of the DFG, which isolates the effects of SMOs and bootstraps to the latency of a single region, simplifying the global optimization problem.
                2. The formulation of the intra-region SMO and bootstrap placement problem as a minimum-cut problem, which the authors claim yields an optimal placement within a given region.
                3. A "minimal-level" bootstrapping strategy, which elevates ciphertexts only to the minimum level necessary for subsequent computations, as opposed to a fixed maximum level.

                The evaluation shows that this new framework compiles large models significantly faster than the state-of-the-art and improves inference performance over another leading method.

                Strengths

                From a novelty perspective, the paper presents several strong contributions that distinguish it from prior art.

                1. Genuinely New Problem Decomposition: The foundational idea of partitioning the DFG into regions of uniform multiplicative depth is a novel and powerful way to structure the FHE optimization problem. Prior work, such as EVA [9] and PARS [40], has focused on local, heuristic-based scale management. More global approaches like HECATE [40] use expensive search-based methods. RESBM's partitioning is a deterministic, structural decomposition that elegantly transforms a complex global problem into a sequence of more manageable, localized sub-problems. This framework itself is a significant conceptual advance.

                2. Novel Application of a Known Algorithm: While min-cut is a well-established algorithm, its application to FHE SMO and bootstrap placement is new. The authors have developed a novel formulation (detailed in Algorithms 4 and 5) that maps latency costs to edge weights in a graph, allowing an optimal solution to be found for the intra-region placement sub-problem. This is a clear contribution over the heuristic placement rules used in previous compilers. The provided theorems (Theorem 1 and 2, page 10) further formalize this claim of intra-region optimality.

                3. Significant Advancement in Bootstrapping Strategy: The concept of "minimal-level" bootstrapping is, to my knowledge, the first of its kind to be formally proposed and implemented in an FHE compiler. The paper correctly identifies that prior works like Fhelipe [24] and DaCapo [35] elevate ciphertexts to the maximum possible level (l_max), which is intuitively wasteful. By calculating the minimal required level, RESBM introduces a new degree of freedom into the optimization space and directly addresses the high cost of bootstrapping (as shown in Table 2). The claim on page 6 that RESBM is "the first approach to do so" appears to be accurate.

                Weaknesses

                While the core ideas are novel, the scope and limitations of this novelty should be clearly understood.

                1. Local Optimality vs. Global Heuristic: The paper's most rigorous claim is for intra-region optimality via min-cut. However, the overall solution is not globally optimal. The partitioning of the DFG itself and the dynamic programming approach (BTSMGR) used to select which regions to operate on are fundamentally heuristics. This is an important distinction; the novel technique provides an optimal solution to a heuristically defined sub-problem. The authors acknowledge the NP-hard nature of the global problem, but the distinction between the optimal component and the heuristic framework could be made clearer.

                2. Novelty is Not Self-Contained: The example presented in Figure 5 (page 12) is particularly revealing. It demonstrates a scenario where the RESBM framework, on its own, produces a sub-optimal plan that includes redundant bootstrapping. The authors suggest that this can be fixed by post-optimization using traditional compiler techniques like Common Subexpression Elimination (CSE). This implies that the novel contribution of RESBM is not a complete optimization strategy but rather a powerful new component that must be carefully integrated with other, existing techniques to achieve its full potential. The novelty is therefore somewhat dependent on a larger, un-discussed compiler framework.

                Questions to Address In Rebuttal

                1. The sub-optimality example in Figure 5 is insightful. It suggests that RESBM's novel framework must be tightly integrated with traditional compiler passes like CSE. Does this imply that the novelty, while significant, is more of a powerful heuristic than a complete optimization strategy? Could the RESBM framework be modified to be aware of these potential CSE-like optimizations to avoid generating such sub-optimal plans in the first place?

                2. While the min-cut approach provides intra-region optimality, the overall solution's quality is highly dependent on the initial DFG partitioning and the inter-region dynamic programming. Can the authors comment on how far their solution might be from a theoretical global optimum for the presented benchmarks? Is there a risk that the initial partitioning forces the solution into a local minimum from which the subsequent steps cannot escape?

                3. The use of min-cut for the intra-region placement problem is clever. Were any other graph optimization algorithms considered for this problem, such as max-flow or various shortest-path formulations? Please justify why min-cut was uniquely suited for the problem formulation presented in Algorithms 4 and 5.