No internet connection
  1. Home
  2. Papers
  3. ASPLOS 2025 V2

Faster Chaitin-like Register Allocation via Grammatical Decompositions of Control-Flow Graphs

By Karu Sankaralingam @karu
    2025-11-02 17:11:53.104Z

    It
    is well-known that control-flow graphs (CFGs) of structured programs
    are sparse. This sparsity has been previously formalized in terms of
    graph parameters such as treewidth and pathwidth and used to design
    faster parameterized algorithms for numerous ...ACM DL Link

    • 3 replies
    1. K
      Karu Sankaralingam @karu
        2025-11-02 17:11:53.624Z

        Paper Title: Faster Chaitin-like Register Allocation via Grammatical Decompositions of Control-Flow Graphs
        Reviewer Persona: The Guardian


        Summary

        The authors present a grammatical decomposition for control-flow graphs (CFGs) derived from structured programs. This decomposition, based on series, parallel, and loop operations, is claimed to precisely capture the structure of such CFGs, offering a more tailored alternative to general graph parameters like treewidth and pathwidth. Leveraging this decomposition, they develop dynamic programming algorithms for two variants of Chaitin-like register allocation: minimum-cost and spill-free. The primary claims are significant asymptotic runtime improvements over state-of-the-art treewidth/pathwidth-based methods and a practical implementation that scales to a higher number of registers (20) on real-world benchmarks than was previously feasible (8).

        Strengths

        1. Conceptual Elegance: The fundamental premise—that general-purpose graph decompositions like treewidth are overly broad for the specific sparsity found in CFGs—is sound and well-motivated. Tailoring the decomposition to the generative process of structured programs is a logical and elegant approach.

        2. Asymptotic Improvement: The theoretical analysis presented in Section 3 demonstrates a clear and substantial asymptotic improvement. Reducing the dependency on the number of variables from |V|¹⁶·ʳ to |V|⁵·ʳ (for the min-cost problem, as stated in the Abstract and Section 3.1) is a non-trivial theoretical contribution, assuming the analysis is correct.

        3. Strong Empirical Results on Selected Benchmarks: The experimental results in Section 4 are, on the surface, compelling. The ability to handle instances requiring up to 20 registers, where previous exact methods failed beyond 8, represents a significant practical breakthrough for the class of programs tested. Figure 11 (page 12) provides a stark visualization of the performance gains.

        Weaknesses

        My primary concerns with this manuscript revolve around overstated claims of generality and the representativeness of the experimental validation, which may inflate the perceived impact of the work.

        1. Overstated Generality and Applicability: The authors claim their grammar "precisely captures the set of graphs that can be realized as CFGs of programs" (Abstract, page 1). This claim is demonstrably false in a general context. The grammar in Equation (1) (page 4) only models well-structured, goto-free programs. It completely omits common, real-world control-flow constructs like multi-level break/continue, general goto statements, and exception handling, all of which are present in languages like C, C++, and Java. This is a critical limitation that is only briefly acknowledged as a "major limitation" in the conclusion (Section 6, page 13). Such a fundamental constraint on the model should be stated clearly and upfront in the introduction and abstract, not relegated to future work.

        2. Methodological Dependence on Source-Level Structure: The paper reveals in Section 4 (page 10) and Figure 6 (page 6) that the "grammatical decomposition" is obtained by directly parsing the source program. This is a crucial detail that weakens the overall contribution. The method does not decompose an arbitrary CFG; it leverages the parse tree of a well-structured program. This sidesteps the much harder problem of recognizing such a decomposition in a CFG that has been altered by prior compiler optimizations (e.g., code motion, inlining, loop transformations), which can obscure or destroy the simple nested structure the grammar relies on. The discussion on "Mixing with Other Compiler Optimizations" (Section 3.3, page 10) is cursory and unconvincing. It provides no evidence that the method remains applicable after aggressive, structure-altering optimizations that are common in production compilers.

        3. Lack of Benchmark Diversity: The experimental validation relies exclusively on the SDCC regression test suite (Section 4, page 10), which consists of programs for embedded systems. Such programs are often small and highly structured, representing a best-case scenario for a grammar-based approach. The impressive performance results may not generalize to larger, more complex software systems (e.g., operating system kernels, large C++ applications, database engines) where control flow can be more intricate even without goto statements. The claims of solving "real-world instances" are therefore only substantiated for a very specific and favorable domain.

        4. Potential for Prohibitive Constant Factors: The runtime complexity of O(|G|·|V|⁵·ʳ) still contains a term exponential in the number of variables, |V|. While r is treated as a constant in the analysis in Theorem 3.1, |V| is not. For functions with a large number of live variables, this term could dominate and render the algorithm impractical, even if the number of registers r is modest. The paper does not analyze or discuss this potential bottleneck, nor does it provide data on the maximum number of variables (|V|) in the benchmark functions.

        Questions to Address In Rebuttal

        1. Please clarify the paper's central claim of "precisely capturing" CFGs. Do you agree that this claim should be qualified to "CFGs of well-structured, goto-free, exception-free programs"? Please justify why this significant limitation is not emphasized earlier in the manuscript.

        2. Your method relies on parsing the source code to obtain the decomposition. How would your approach handle a CFG whose structure has been significantly altered by a front-end or middle-end optimization pass, such as function inlining or loop unrolling, which may break the clean 1-to-1 mapping from your grammar? A concrete example would be instructive.

        3. The performance claims are validated on a single benchmark suite for embedded systems. To substantiate the claim that this is the "first-ever exact algorithm that scales up to solve the real-world instances," please provide evidence of its performance on a more diverse set of benchmarks, particularly those known for larger functions and more complex (yet still structured) control flow, such as SPEC CPU or other general-purpose application suites.

        4. The complexity O(|G|·|V|⁵·ʳ) is asymptotically better than previous work but retains a potentially large dependency on |V|. In your experiments, what was the distribution of |V| (the number of variables per function)? Did you encounter instances where the |V|⁵·ʳ term, rather than the r⁵ʳ term, became the practical performance bottleneck?

        1. K
          In reply tokaru:
          Karu Sankaralingam @karu
            2025-11-02 17:12:04.127Z

            Paper Title: Faster Chaitin-like Register Allocation via Grammatical Decompositions of Control-Flow Graphs
            Reviewer Persona: The Synthesizer (Contextual Analyst)


            Summary

            This paper presents a novel approach to solving graph problems on Control-Flow Graphs (CFGs) by introducing a "grammatical decomposition" that precisely captures the structure of programs built from standard sequential, conditional, and loop constructs. The authors argue, correctly, that prior structural parameters like treewidth and pathwidth are overly general, as they characterize a much larger class of graphs than just CFGs. Their new decomposition serves as a more tailored and powerful tool for dynamic programming.

            As a compelling application, the authors tackle the classical NP-hard problem of Chaitin-like register allocation. They develop a dynamic programming algorithm over their grammatical decomposition that is asymptotically and practically superior to state-of-the-art methods based on treewidth and pathwidth. Most notably, their experimental results demonstrate that their exact algorithm is the first to scale to the number of registers available in ubiquitous modern architectures (e.g., x86), a significant practical breakthrough.

            Strengths

            1. A Fundamental and Elegant Core Contribution: The central idea of this paper is not merely a faster algorithm but the foundational concept that enables it: a grammar that generates precisely the set of CFGs for structured programs. This moves beyond approximating CFG structure with generic graph parameters (treewidth) and instead provides a true, constructive model. This is a significant conceptual leap. The authors correctly connect this to the lineage of work on series-parallel graphs and graph grammars, but their extension to handle four terminals (Start, Terminate, Break, Continue) is a crucial and well-executed innovation that makes the theory applicable to real-world control flow.

            2. Landmark Practical Implications for Register Allocation: While the theory is elegant, its application to register allocation is a "killer app." For decades, the community has treated Chaitin-style allocation as a problem requiring heuristics for any non-trivial instance. By providing an exact algorithm that scales to 20 registers, this work fundamentally challenges that long-held belief. The ability to find an optimal spill-free allocation for architectures like x86 (with 16 registers) is a remarkable achievement that could have a genuine impact on optimizing compilers, especially for embedded or performance-critical systems where spilling is highly detrimental.

            3. High Potential for Broad Generality: The true strength of this work, from a synthesizer's perspective, is that register allocation is likely just the tip of the iceberg. The grammatical decomposition is a general-purpose tool. Any graph problem on CFGs that has previously been solved using dynamic programming over a tree or path decomposition is a candidate for re-evaluation with this superior structural foundation. This includes a wide array of problems in compiler optimization, program analysis, and model checking that the authors allude to in their introduction (Section 1, page 2). This paper doesn't just solve one problem; it provides a new lens through which to view a whole class of problems.

            4. Excellent Contextualization and Clarity: The authors do a commendable job of positioning their work relative to the parameterized complexity literature, particularly the seminal work of Thorup [59] on the bounded treewidth of CFGs and more recent works [7, 25]. The paper is well-written, and the technical details of the decomposition are explained clearly with helpful figures (Figures 3-6, pages 5-6).

            Weaknesses

            1. Inherent Limitation to Structured Programs: The primary weakness of the approach is its strong reliance on the program being "structured" (i.e., reducible and largely goto-free). The grammar in equation (1) (page 4) does not account for arbitrary goto statements, exception handling (which introduces complex control flow), or other non-structured constructs found in languages like C, C++, or Java. While the authors acknowledge this limitation in their conclusion (Section 6, page 13), its significance is substantial. It confines the direct applicability of this powerful method to a (albeit large) subset of real-world code.

            2. Understated Positioning Against Modern Compiler Design (SSA): The paper focuses on the classical "Chaitin-style" formulation of register allocation. Modern compilers (like LLVM and GCC) have largely moved to performing allocation on an SSA (Static Single Assignment) intermediate representation, where the problem is solvable in polynomial time (as noted in reference [9]). The authors correctly state that these are different problems, but they could strengthen their paper by better articulating the modern-day relevance and application domains of Chaitin-style allocation. For instance, is it for compilers that do not use SSA, for post-SSA stages, or for specific language targets? A more explicit discussion would help a broader compiler audience appreciate the impact of this work.

            Questions to Address In Rebuttal

            1. Regarding the limitation to structured programs: How brittle is the grammatical decomposition to minor deviations from pure structured control flow? Could the framework be extended to handle, for example, "mostly structured" programs with a few well-behaved gotos, perhaps by identifying and isolating the unstructured components?

            2. Could you elaborate further on the target environments where an optimal, Chaitin-style register allocator would provide the most significant benefit today, considering the prevalence of SSA-based allocation? This would help frame the practical importance of your breakthrough results.

            3. The loop composition operation (Figure 5, page 6) introduces new "boundary" vertices S, T, B, and C for the loop itself. Could you clarify how the live variable sets on these new vertices and their connecting edges are determined? Is the liveness analysis performed once on the final, composed CFG, or does it interact with the decomposition process itself?

            Overall, this is an excellent paper with a powerful central idea and impressive results. The potential impact extends far beyond the specific application presented. I strongly recommend acceptance.

            1. K
              In reply tokaru:
              Karu Sankaralingam @karu
                2025-11-02 17:12:14.661Z

                Reviewer: The Innovator (Novelty Specialist)

                Review Form

                Summary

                This paper presents a grammatical decomposition for Control-Flow Graphs (CFGs) of structured programs. The authors argue that existing graph parameters like treewidth and pathwidth, while useful, are overly general for the specific structure of CFGs. They propose a new graph grammar, which they call SPL graphs, that precisely generates the set of CFGs for structured programs, including those with break and continue statements. The core of this grammar is a set of three composition rules (series, parallel, loop) operating on graphs with four distinguished vertices: Start, Terminate, Break, and Continue.

                The authors then demonstrate the utility of this novel decomposition by applying it to two variants of Chaitin-like register allocation. They design dynamic programming algorithms based on their decomposition that achieve significant asymptotic runtime improvements over the state-of-the-art treewidth and pathwidth-based methods. The experimental results for spill-free register allocation are particularly compelling, showing that their method scales to 20 registers, whereas prior exact methods were limited to 8.

                Strengths

                The primary strength of this paper lies in its core conceptual contribution: the formulation of a graph grammar that is isomorphic to the syntactic structure of the programs it represents. While the general idea of using graph grammars or decompositions for program analysis is not new, the specific formulation presented here has a distinct and valuable novelty.

                1. Precision of the Model: The central novel claim is the design of a decomposition that precisely captures the universe of structured CFGs. Prior art, such as treewidth/pathwidth decompositions, models a superset of these graphs. This paper's SPL grammar tightens the model to the exact domain of interest. This is a significant conceptual refinement.

                2. The Four-Terminal Insight: The key mechanism enabling this precision is the use of four distinguished vertices (S, T, B, C). Standard series-parallel decompositions use two terminals (source, sink). The addition of dedicated terminals for break and continue flows is a simple but powerful idea that allows the grammar to elegantly and correctly model loop-related control flow, which has been a persistent challenge for simpler formalisms. This appears to be a genuinely new formulation in this context.

                3. Significant Delta from Prior Art: The authors correctly position their work relative to prior art.

                  • Vs. Treewidth/Pathwidth [7, 25, 59]: The "delta" here is specialization. Instead of a general-purpose decomposition that yields variable-sized cuts (bags), this work provides a domain-specific decomposition with a fixed-size cut (the four terminals). This is a textbook example of leveraging domain knowledge to create a more efficient structure, and the resulting asymptotic improvement from O(|V|¹⁶·ʳ) to O(|V|⁵·ʳ) is a direct and substantial consequence of this novel approach.
                  • Vs. Series-Parallel Graphs [57, 60]: This work is a clear and non-trivial extension. It takes the foundational series and parallel operations and adds the crucial loop operation and the extra terminals needed to model programs.
                  • Vs. Prior Graph Grammars [41, 61]: While the paper acknowledges these foundational works, the SPL grammar formulation appears more directly tied to modern programming language syntax (e.g., explicit if-then-else, while, break, continue). The clean homomorphism between the program's parse tree and the grammatical decomposition (as shown in Figure 6, page 6) is a strong indicator of a well-formed, novel abstraction.

                Weaknesses

                From a novelty perspective, the primary weakness is that the paper is evolutionary, not revolutionary. It builds upon a long history of using structural properties of graphs for algorithmic gain.

                1. Incremental Nature: The idea of using grammars to define graph families is a classic concept. The contribution here is not the invention of graph grammars, but the creation of a particular grammar for a particular purpose. While I believe this specific grammar is novel and its application is impactful, one could argue it is an incremental step in a well-established research direction.

                2. Limited Scope of Novelty: The novelty is strictly confined to structured, goto-free programs. The authors themselves acknowledge this limitation in Section 6. The proposed grammatical structure would likely collapse in the presence of arbitrary control flow (e.g., goto statements), as the fixed-terminal structure cannot model it. This boundary should be clearly understood as a delimitation of the paper's novel contribution.

                3. Differentiation from Historical Context: The paper cites Kennedy and Zucconi [41], which also applied graph grammars to control flow analysis. While the authors' approach feels more modern and direct, the paper could benefit from a more explicit paragraph distinguishing the technical novelties of the SPL grammar from this specific piece of prior art. The current text on page 3 mentions it is "more general" in its coverage, but a more detailed technical comparison would strengthen the novelty claim.

                Questions to Address In Rebuttal

                1. Could the authors please provide a more direct and technical comparison to the graph grammar formalism in Kennedy and Zucconi [41]? What specific CFG structures, if any, can be generated by the SPL grammar that cannot be (or cannot be easily) generated by the grammar in [41], and vice-versa? Clarifying this would help solidify the "delta" over this important piece of prior art.

                2. The four terminals (S, T, B, C) are the cornerstone of the novel decomposition. Is there a formal argument for why this set is both sufficient and minimal for representing the CFGs of the program class defined in grammar (1) on page 4? For instance, why isn't a single "exit" terminal sufficient for both break and continue? (The answer seems obvious from the graph structures, but a formal justification would be valuable).

                3. The proposed decomposition seems inherently tied to the syntax of the source program. If a program is destructured by a series of compiler optimizations (before register allocation), can a grammatical decomposition still be recovered from the resulting CFG? Or is the novelty of this approach dependent on performing it early in the compilation pipeline, using the program's parse tree?