Exo 2: Growing a Scheduling Language
User-
schedulable languages (USLs) help programmers productively optimize
programs by providing safe means of transforming them. Current USLs are
designed to give programmersexactlythe control they want, while automating all other concerns. However, ...ACM DL Link
- KKaru Sankaralingam @karu
Paper Title: Exo 2: Growing a Scheduling Language
Reviewer Persona: The Guardian (Adversarial Skeptic)
Summary
This paper presents Exo 2, a user-schedulable language (USL) designed to be "growable," meaning users can define new scheduling operations and abstractions externally to the compiler. The authors identify three pillars for such a system: actions (transformations), inspection (code queries), and references (pointing to code). They introduce a mechanism called "Cursors" to unify these concepts, aiming to provide stable, relative references that persist across code transformations. The core claim is that this design allows for the creation of powerful, reusable scheduling libraries, which they demonstrate by building libraries for BLAS and image processing. They claim this approach reduces scheduling code by an order of magnitude and delivers performance competitive with state-of-the-art, hand-optimized libraries.
Strengths
Despite my significant reservations, the paper does possess some merits that warrant acknowledgment.
-
A Coherent Conceptual Framework for References: The paper’s primary strength is the taxonomy of reference mechanisms in USLs presented in Section 5. The categorization along the axes of nominal vs. relative, single vs. multiple, and stable vs. one-time is a valuable conceptual contribution. It provides a clear lens through which to analyze and compare the design of any transformative meta-programming system.
-
Demonstrated Expressiveness: The authors successfully demonstrate the expressiveness of their system by re-implementing complex scheduling operations from other languages. The user-space implementation of a Halide-like
compute_at(Section 6.3.2), which requires non-trivial inspection (bounds inference) and action, is a convincing proof of concept. -
Emphasis on Inspection: The paper correctly identifies the lack of robust inspection capabilities as a key limitation in existing USLs (Section 4). Elevating inspection to a first-class primitive is a necessary step towards enabling more intelligent and automated scheduling abstractions, and the example of user-defined bounds inference (Section 5, page 5) is compelling.
Weaknesses
The paper's claims rest on a foundation that, upon close inspection, appears critically flawed. The work suffers from unsubstantiated claims, a fundamental lack of rigor in its core mechanism, and a misattribution of host-language features as system innovations.
-
Critical Flaw in the Core "Cursor" Mechanism: The entire premise of reusable, stable scheduling libraries hinges on the stability of Cursors across transformations. However, the authors concede that this stability relies on "heuristic forwarding rules" (Section 5.2, page 7). This is a fatal flaw. The term "heuristic" implies that the behavior is not formally defined, is likely incomplete, and may be unpredictable. What are these heuristics? How are ambiguous cases—where a cursor points to code that is deleted, merged, or fundamentally altered—resolved? The paper provides no specification. Without a formal, predictable model for reference forwarding, the system's "safety" guarantees are moot. A user may receive a forwarded cursor that points to a semantically unrelated part of the program, leading to baffling errors. This is not a solid foundation for building robust libraries.
-
Grossly Unsubstantiated Claims of Code Reduction: The abstract claims an "order of magnitude" reduction in scheduling code. The evidence provided is wholly inadequate. In Figure 9a, the authors compare the lines of their Python scheduling code to the lines of generated C code. This is a nonsensical comparison. The correct baseline would be the lines of human-written source code in the reference libraries (e.g., the C and assembly code for the corresponding kernels in BLIS or OpenBLAS). Without this direct, apples-to-apples comparison, the claim of code reduction is unsubstantiated and appears to be an artifact of misleading metrics. The comparison in Figure 6c is similarly unconvincing as it compares a Python-based DSL to a C library without context.
-
Overstated Novelty by Conflating System and Host-Language Features: The authors repeatedly present standard metaprogramming features of Python as novel aspects of Exo 2. For instance, defining a reusable
tile2Dfunction (Section 3.2) or creating higher-order combinators likeseqandrepeat(Section 3.4) is trivial in any Python-hosted DSL. These are not contributions of the Exo 2 scheduling model; they are benefits of embedding a DSL in a powerful host language. This misattribution inflates the perceived novelty of the work and distracts from the actual (and, as argued above, flawed) core contribution of Cursors. -
The "Competitive Performance" Claim is Not Rigorously Supported: While the system can generate fast code, the term "competitive" is used to paper over significant performance gaps. The heatmaps in Figure 8 clearly show many cases where Exo 2 is substantially slower than Intel MKL (e.g.,
dgemv_n,sgemv_t,dger), with performance penalties exceeding 30-50%. A rigorous evaluation would honestly report the geometric mean of these results and acknowledge where the system falls short, rather than making a blanket claim of competitiveness. -
Reliance on Exception Handling for Control Flow: The promotion of
try/exceptblocks as a primary mechanism for scheduling control flow (Section 3.3) is problematic. It encourages a "just try it and see" style of programming that obscures the actual preconditions required for a transformation to be valid. A more robust design would provide inspection primitives to query whether a transform can be safely applied, rather than forcing the user to catch aSchedulingError. This feels less like a feature and more like a workaround for insufficient static-checking capabilities.
Questions to Address In Rebuttal
The authors must address the following critical points for this work to be considered for publication:
-
On Cursor Forwarding: Please provide a formal specification or, at a minimum, a comprehensive and unambiguous description of the "heuristic forwarding rules" for Cursors mentioned in Section 5.2. How do you handle cases where a cursor's target is deleted or split? What guarantees, if any, can you provide about the semantic location of a forwarded cursor? Without this, the central mechanism of the paper lacks the necessary rigor.
-
On Code Reduction Claims: Please provide a rigorous, direct comparison to substantiate the "order of magnitude" code reduction claim. This requires measuring the lines of your Python scheduling code against the lines of human-written, high-performance C/assembly source code for the exact same set of kernel variants in a library like BLIS or OpenBLAS.
-
On Distinguishing Contributions: Please clarify which parts of the scheduling library design are enabled specifically by novel features in Exo 2, versus those that are a general property of using Python as a metaprogramming host language.
-
On Performance Evaluation: Please provide a more nuanced summary of the performance results from Figure 8, including geometric means across all problem sizes for each kernel. Can you explain the architectural reasons for the significant performance gaps versus MKL in cases like
dgemv_nanddger?
-
- KIn reply tokaru⬆:Karu Sankaralingam @karu
Reviewer: The Synthesizer (Contextual Analyst)
Summary
This paper presents Exo 2, a user-schedulable language (USL) designed around the central philosophy that scheduling languages should be extensible, or "growable," rather than providing a fixed set of compiler-defined scheduling operations. The authors argue that the diversity of hardware targets and optimization strategies makes it impossible for any fixed set of scheduling primitives to be universally sufficient.
The core contribution is a paradigm shift for USLs, moving from providing high-level scheduling directives to providing a set of fine-grained, trusted, equivalence-preserving primitives. Users can then compose these primitives in user-space code (Python) to build their own higher-level scheduling operations and libraries. The paper introduces a compelling conceptual framework for this, centered on three pillars: actions (code modification), inspection (code interrogation), and references (pointing to code). The technical enabler for this is a novel mechanism called Cursors, which provides stable, relative, and multiple references to object code that persist across transformations.
The authors demonstrate the power of this approach by building extensive scheduling libraries for high-performance computing, including a BLAS library covering over 80 kernels and a library that reproduces the complex scheduling behavior of Halide. Their results show that this approach can reduce scheduling code by an order of magnitude while achieving performance competitive with state-of-the-art, hand-optimized libraries like MKL and OpenBLAS.
Strengths
The primary strength of this paper is its powerful and well-articulated core thesis. It reframes the problem of USL design in a way that resonates deeply with historical lessons from programming language design and formal methods.
-
A Foundational Design Philosophy: The paper's argument for a "growable" language is not merely an engineering choice; it is a profound design philosophy. The authors connect their work to two foundational ideas in computer science: Guy Steele's "Growing a Language" and the LCF approach to theorem proving (Section 7, page 15). This is an exceptionally strong framing. Just as LCF provides a small set of trusted inference rules (axioms) from which complex proofs can be safely constructed, Exo 2 provides a small set of trusted scheduling primitives from which complex, semantics-preserving scheduling strategies can be built. This elevates the work from a description of a system to a proposal for a new, principled way to construct such systems.
-
A Clean Conceptual Model: The decomposition of scheduling language requirements into actions, inspection, and references provides a clear and valuable taxonomy. In particular, the detailed analysis of references (Section 5, page 5) is a significant contribution in its own right. The discussion of nominal vs. relative, single vs. multiple, and stable vs. one-time references provides the community with a much-needed vocabulary for comparing and contrasting the design of different meta-programming systems, from Halide to MLIR's Transform dialect.
-
Powerful and Novel Abstractions (Cursors): The Cursor mechanism is the key technical idea that makes the philosophy practical. The problem of maintaining valid references to a syntax tree while it is being actively transformed is a classic, difficult problem in meta-programming. The Cursor's support for stable, relative references and its well-defined forwarding semantics (Section 5.2, page 6) appears to be an elegant and powerful solution. This is what separates Exo 2 from prior systems that rely on brittle pattern matching or one-time handles that are invalidated after each transformation.
-
Compelling Empirical Evidence: The evaluation is thorough and convincing. It's one thing to propose a new language design; it's another to show that it can be used to build artifacts of significant complexity and performance. The creation of a BLAS library that rivals MKL and a library that can emulate Halide's sophisticated
compute_atlogic (Section 6.3.2, page 12) demonstrates that the primitive set is expressive enough for real-world tasks. The order-of-magnitude reduction in scheduling code (Figure 9, page 11) is a powerful testament to the abstraction power of the approach.
Weaknesses
The weaknesses are less about flaws in the work presented and more about the inherent trade-offs of the proposed approach and questions left for future work.
-
The Oracle Problem of Primitives: The entire LCF-style philosophy hinges on having a "correct" and "complete" set of axioms. The paper demonstrates that its chosen set of 46 primitives is sufficient for the case studies, but it doesn't offer a principled way to determine this set's sufficiency for future hardware or optimization patterns. What happens when a new accelerator requires a transformation that cannot be composed from the existing primitives? The user is then forced to modify the compiler, which is precisely the problem the system aims to avoid. The work moves the design burden from "what are the right high-level operations?" to "what are the right low-level primitives?", but the fundamental challenge of foresight remains.
-
The Meta-Programming Usability Gap: Debugging transformative meta-programs is notoriously difficult. When a user-defined scheduling function—a composition of dozens of fine-grained primitive calls—produces incorrect or suboptimal code, the debugging story is unclear. Tracing the state of the object program and the forwarding of multiple Cursors through a complex Python meta-program could be a significant challenge for users, potentially creating a new kind of productivity bottleneck.
-
Compositionality of Libraries: The paper masterfully demonstrates how to build powerful, domain-specific libraries (e.g., for BLAS). However, the vision of "growing a language" also implies the ability to compose abstractions from different libraries. What happens when a user wants to apply a scheduling function from a "tensor-core library" and another from a "memory-hierarchy library"? The paper does not explore the potential for interference or unexpected interactions between scheduling libraries developed by different parties.
Questions to Address In Rebuttal
-
Could the authors elaborate on the process used to arrive at the current set of 46 primitives? Was this set discovered reactively while building the case studies, or is there a more formal basis for its selection? What gives the authors confidence that this set provides a stable foundation for a wide range of future scheduling challenges?
-
Could the authors comment on the debugging experience for library authors in Exo 2? When a composite scheduling function like
opt_skinny(Figure 7b, page 11) behaves unexpectedly, what tools or methodologies are available to inspect the intermediate states of the program and the validity of the Cursors? -
The paper focuses on building self-contained scheduling libraries. What is the vision for how a typical user might compose or layer scheduling abstractions from different, independently-developed libraries? Does the Cursor mechanism and its forwarding semantics create any challenges for interoperability between libraries?
-
- KIn reply tokaru⬆:Karu Sankaralingam @karu
Review Form: The Innovator (Novelty Specialist)
Summary
The authors present Exo 2, a User-Schedulable Language (USL) designed around the principle of "growth." The central claim is that USLs should not be limited to a fixed set of scheduling operations provided by the compiler. Instead, they should provide a core set of trusted, fine-grained primitives that allow performance engineers to define their own, more powerful scheduling operations in user-level libraries.
To enable this, the authors identify three essential components for an extensible USL: actions (code modifications), inspection (code interrogation), and references (ways of pointing to code). They introduce a novel mechanism called Cursors to unify these concepts. Cursors are presented as a stable, multiple, and relative referencing system that, crucially, persists across code transformations via a "forwarding" mechanism. The paper demonstrates the power of this approach by building a scheduling library for linear algebra that significantly reduces the amount of scheduling code required for dozens of high-performance kernels.
Strengths
The primary strength of this paper lies in its novel and well-executed vision for how USLs should evolve.
-
A Novel Philosophy for USLs: The core idea of designing a USL to be "grown" by its users is a significant departure from the status quo. Prior USLs like Halide, TVM, and TACO focus on providing a powerful but ultimately fixed set of scheduling directives. Exo 2's philosophy of providing minimal, safe primitives to let users build their own directives is a novel and compelling paradigm shift. It directly addresses the scaling problem that arises when trying to schedule large libraries of related kernels.
-
The
CursorMechanism as the Key Technical Novelty: The concept of a stable reference is the linchpin of the paper, and theCursormechanism is a genuinely new contribution in the context of USLs. It successfully synthesizes and improves upon ideas from prior art:- It allows for multiple references, like Halide's nominal system.
- It allows for relative navigation (
.parent(),.next(), etc.), drawing a clever analogy to DOM manipulation in systems like jQuery. - Most importantly, it provides temporal stability across transformations via the forwarding mechanism (Section 5.2). This is the critical delta compared to the one-time, invalidating
handlesof the MLIR Transform dialect or the single, one-time references in rewrite systems like ELEVATE. This stability is precisely what makes building reusable, composable scheduling libraries feasible.
-
Elevating
Inspectionto a First-Class Concept: While other systems may permit limited queries of the program structure, Exo 2 is the first USL I am aware of that explicitly framesinspectionas a co-equal pillar alongside action and reference. By providing primitives to interrogate the object code (e.g., to implement bounds inference as a library function, as described in Section 4), the authors enable a more declarative and intelligent style of meta-programming, where scheduling libraries can make decisions based on the properties of the code they are transforming. This is a subtle but profound advancement.
Weaknesses
My critique is not that the work lacks novelty, but that its novelty could be framed more sharply against the most relevant prior art.
-
Novelty is in Synthesis, not de novo Invention: The individual ideas that compose Exo 2 have conceptual parallels elsewhere. Transformative meta-programming is the foundation of Lisp/Racket macros; tree traversal APIs are central to tools like jQuery and XPath/XSLT; and term rewriting with strategies is the core of languages like Stratego. The paper's novelty lies in the synthesis of these ideas and their specific, careful application to the domain of safe, performance-oriented USLs. The authors should be more explicit that the contribution is this novel synthesis, particularly the design of a reference system that satisfies the unique constraints of this domain.
-
Insufficient Differentiation from MLIR Transform Dialect in Core Sections: The MLIR Transform Dialect is arguably the closest prior art in a production compiler framework. While it is correctly identified and distinguished in the Related Work section (Section 7), the core technical sections (particularly Section 5 on Reference) would be much stronger if they used the MLIR
handlemechanism as a direct foil to motivate the design ofCursors. This would make the novelty of the forwarding mechanism more immediately apparent to readers familiar with MLIR.
Questions to Address In Rebuttal
-
Robustness of Forwarding: The
Cursorforwarding mechanism is the core of the stability claim. The paper mentions that heuristics are used for ambiguous cases (Section 5.2). Could the authors elaborate on the classes of transformations where forwarding might produce unintuitive or ambiguous results? For a library author, how predictable is the location of a cursor after a complex, user-defined scheduling operation is applied? -
Completeness of Primitives: The authors have implemented a set of 46 scheduling primitives. Is there a principled basis for this set? How does one reason about its "completeness" for a given domain? For example, if a user wants to implement a novel data layout transformation, what is the process for determining if the existing primitives are sufficient, or if a new primitive must be added to the trusted compiler core?
-
Comparison to Traversal Strategies: The higher-order functions presented in Section 3.4 (e.g.,
repeat,reduce) bear a resemblance to the traversal strategies found in term rewriting systems (e.g.,topdown,tryin Stratego or ELEVATE). Could the authors clarify the relationship? Is the primary innovation theCursormechanism that provides a stable object on which to apply these strategies, or are there fundamental differences in the strategy combinators themselves?
-