| ISCA 2025 | A | 3 | 2025-11-04 04:37:45.176Z |
Rethinking Prefetching for Intermittent ComputingPrefetching improves performance by reducing cache misses. However, conventional prefetchers are too aggressive to serve batteryless energy harvesting systems (EHSs) where energy efficiency is the utmost design priority due to weak input energy and t... | ISCA 2025 | A | 3 | 2025-11-04 04:37:13.135Z |
Precise exceptions in relaxed architecturesTo manage exceptions, software relies on a key architectural guarantee,precision: that exceptions appear to execute between instructions. However, this definition, dating back over 60 years, fundamentally assumes a sequential programmers model. Moder... | ISCA 2025 | A | 3 | 2025-11-04 04:36:41.080Z |
The XOR Cache: A Catalyst for CompressionModern computing systems allocate significant amounts of resources for caching, especially for the last level cache (LLC). We observe that there is untapped potential for compression by leveraging redundancy due to private caching and inclusion that ... | ISCA 2025 | A | 3 | 2025-11-04 04:36:08.736Z |
Avant-Garde: Empowering GPUs with Scaled Numeric FormatsThe escalating computational and memory demands of deep neural networks have outpaced chip density improvements, making arithmetic density a key bottleneck for GPUs. Scaled numeric formats, such as FP8 and Microscaling (MX), improve arithmetic densit... | ISCA 2025 | A | 3 | 2025-11-04 04:35:36.684Z |
Forest: Access-aware GPU UVM ManagementWith GPU unified virtual memory (UVM), CPU and GPU can share a flat virtual address space. UVM enables the GPUs to utilize the larger CPU system memory as an expanded memory space. However, UVM’s on-demand page migration is accompanied by expensive p... | ISCA 2025 | A | 3 | 2025-11-04 04:35:04.613Z |
| ISCA 2025 | A | 3 | 2025-11-04 04:34:32.571Z |
| ISCA 2025 | A | 3 | 2025-11-04 04:34:00.443Z |
| ISCA 2025 | A | 3 | 2025-11-04 04:33:28.387Z |
| ISCA 2025 | A | 3 | 2025-11-04 04:32:56.338Z |
| Explain icons... |
| ISCA 2025 | A | 3 | 2025-11-04 04:32:24.137Z |
FRED: A Wafer-scale Fabric for 3D Parallel DNN TrainingWafer- scale systems are an emerging technology that tightly integrates high-end accelerator chiplets with high-speed wafer-scale interconnects, enabling low-latency and high-bandwidth connectivity. This makes them a promising platform for deep neura... | ISCA 2025 | A | 3 | 2025-11-04 04:31:52.070Z |
| ISCA 2025 | A | 3 | 2025-11-04 04:31:19.770Z |
| ISCA 2025 | A | 3 | 2025-11-04 04:30:47.372Z |
| ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:33:41.915Z |
| ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:33:09.885Z |
| ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:32:37.789Z |
| ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:32:05.295Z |
Tela:A Temporal Load-Aware Cloud Virtual Disk Placement SchemeCloud Block Storage (CBS) relies on Cloud Virtual Disks (CVDs) to provide block interfaces to Cloud Virtual Machines. The process of allocating user-subscribed CVDs to physical storage warehouses in cloud data centers, known as CVD placement, ...ACM ... | ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:31:33.140Z |
Target-Aware Implementation of Real ExpressionsNew low-precision accelerators, vector instruction sets, and library functions make maximizing accuracy and performance of numerical code increasingly challenging. Two lines of work---traditional compilers and numerical compilers---attack this proble... | ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:31:01.109Z |
| ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:30:29.062Z |
SuperNoVA: Algorithm-Hardware Co-Design for Resource-Aware SLAMSimultaneous Localization and Mapping (SLAM) plays a crucial role in robotics, autonomous systems, and augmented and virtual reality (AR/VR) applications by enabling devices to understand and map unknown environments. However, deploying SLAM in AR/VR... | ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:29:56.842Z |
SmoothE: Differentiable E-Graph ExtractionE- graphs have gained increasing popularity in compiler optimization, program synthesis, and theorem proving tasks. They enable compact representation of many equivalent expressions and facilitate transformations via rewrite rules without phase order... | ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:29:24.614Z |
Selectively Uniform Concurrency TestingBuggy behaviors in concurrent programs are notoriously elusive, as they may manifest only in few of exponentially many possible thread interleavings. Randomized concurrency testing techniques probabilistically sample from (instead of enumerating) the... | ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:28:52.552Z |
| ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:28:20.216Z |
| ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:27:48.033Z |
| ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:27:15.972Z |
Rethinking Java Performance AnalysisRepresentative workloads and principled methodologies are the foundation of performance analysis, which in turn provides the empirical grounding for much of the innovation in systems research. However, benchmarks are hard to maintain, methodologies a... | ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:26:43.954Z |
| ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:26:11.782Z |
| ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:25:39.793Z |
| ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:25:07.361Z |
| ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:24:35.110Z |
| ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:24:02.866Z |
| ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:23:30.716Z |
PCcheck: Persistent Concurrent Checkpointing for MLTraining large-scale machine learning (ML) models is expensive and time-intensive, consuming many hardware accelerators for days or weeks. As the scale of hardware deployments and training time continue to grow, the probability of failures also ...AC... | ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:22:58.429Z |
| ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:22:26.402Z |
Optimizing Quantum Circuits, Fast and SlowOptimizing quantum circuits is critical: the number of quantum operations needs to be minimized for a successful evaluation of a circuit on a quantum processor. In this paper we unify two disparate ideas for optimizing quantum circuits,rewrite rules,... | ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:21:53.958Z |
Optimizing Datalog for the GPUModern Datalog engines (e.g., LogicBlox, Soufflé, ddlog) enable their users to write declarative queries which compute recursive deductions over extensional facts, leaving high-performance operationalization (query planning, semi-naïve evaluation, an... | ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:21:21.630Z |
| ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:20:49.378Z |
| ASPLOS 2025 V2 | A | 3 | 2025-11-02 17:20:17.365Z |