Instead, we propose addressing the root cause of the heuristics problem by allowing software to explicitly specify to the device if submitted requests are latency-sensitive. While several new GNN architectures have been proposed, the scale of real-world graphsin many cases billions of nodes and edgesposes challenges during model training. She also invented the spanning tree algorithm, which transformed Ethernet from a technology that supported a few hundred nodes, to something that can support large networks. Many application domains can benefit from hybrid transaction/analytical processing (HTAP) by executing queries on real-time datasets produced by concurrent transactions. For example, talks may be shorter than in prior years, or some parts of the conference may be multi-tracked. Even the little publishable OS work that is not based on Linux still assumes the same simplistic hardware model (essentially a multiprocessor VAX) that bears little resemblance to modern reality. This paper presents the design and implementation of CLP, a tool capable of losslessly compressing unstructured text logs while enabling fast searches directly on the compressed data. Second, GNNAdvisor implements a novel and highly-efficient 2D workload management tailored for GNN computation to improve GPU utilization and performance under different application settings. Used Zotero to organize papers about the stress and diffusion between anode and electrolyte and made a summary . SC is being increasingly adopted by industry for a variety of applications. Horcrux-compliant web servers perform offline analysis of all the JavaScript code on any frame they serve to conservatively identify, for every JavaScript function, the union of the page state that the function could access across all loads of that page. We discuss the design and implementation of TEMERAIRE including strategies for hugepage-aware memory layouts to maximize hugepage coverage and to minimize fragmentation overheads. The device then "calibrates" its interrupts to completions of latency-sensitive requests. Unfortunately, because devices lack the semantic information about which I/O requests are latency-sensitive, these heuristics can sometimes lead to disastrous results. Camera-ready submission (all accepted papers): 15 Mars 2022. We focus on NVMe storage devices and show that it is natural to express these semantics in the kernel and the application and only requires a modest two-bit change to the device interface. Shaghayegh Mardani, UCLA; Ayush Goel, University of Michigan; Ronny Ko, Harvard University; Harsha V. Madhyastha, University of Michigan; Ravi Netravali, Princeton University. One important reason for the high cost is, as we observe in this paper, that many sanitizer checks are redundant the same safety property is repeatedly checked leading to unnecessarily wasted computing resources. We propose PET, the first DNN framework that optimizes tensor programs with partially equivalent transformations and automated corrections. OSDI '21 Technical Sessions | USENIX The NAL eliminates remote PM accesses to hot items without inducing extra local PM accesses. To enable FL developers to interpret their results in model testing, Oort enforces their requirements on the distribution of participant data while improving the duration of federated testing by cherry-picking clients. Widely used log-search tools like Elasticsearch and Splunk Enterprise index the logs to provide fast search performance, yet the size of the index is within the same order of magnitude as the raw log size. As a member of ACCT, I have served two years on the bylaws and governance committee and two years on the finance and audit committee. The full program will be available in May 2021. Software Systems Laboratory Wins Best Paper Award at OSDI 2022 blk-switch evaluation over a variety of scenarios shows that it consistently achieves s-scale average and tail latency (at both 99th and 99.9th percentiles), while allowing applications to near-perfectly utilize the hardware capacity. Second, it innovates on the underlying cryptographic machinery and constructs a new private information retrieval scheme, FastPIR, that reduces the time to process oblivious access requests for mailboxes. A.H. Hunter, Jane Street Capital; Chris Kennelly, Paul Turner, Darryl Gove, Tipp Moseley, and Parthasarathy Ranganathan, Google. For example, traditional compute resources are replenishable while privacy is not: a CPU can be regained after a model finishes execution while privacy budget cannot. Our evaluation on the SPEC benchmarks shows that SanRazor can reduce the overhead of sanitizers significantly, from 73.8% to 28.062.0% for AddressSanitizer, and from 160.1% to 36.6124.4% for UndefinedBehaviorSanitizer (depending on the applied reduction scheme). Sponsored by USENIX in cooperation with ACM SIGOPS. These are hard deadlines, and no extensions will be given. The OSDI Symposium emphasizes innovative research as well as quantified or insightful experiences in systems design and implementation. . Web pages today commonly include large amounts of JavaScript code in order to offer users a dynamic experience. Camera-ready submission (all accepted papers): 2 April 2021; Main conference program: 27-28 April 2021; All deadline times are . Penglai also reduces the latency of secure memory initialization by three orders of magnitude and gains 3.6x speedup for real-world applications (e.g., MapReduce). Secure Computation (SC) is a family of cryptographic primitives for computing on encrypted data in single-party and multi-party settings. We develop a prototype of Zeph on Apache Kafka to demonstrate that Zeph can perform large-scale privacy transformations with low overhead. OSDI brings together professionals from academic and industrial backgrounds in a premier forum for discussing the design, implementation, and implications of systems software. USENIX, like other scientific and technical conferences and journals, prohibits these practices and may, on the recommendation of a program chair, take action against authors who have committed them. Moreover, as of October 2020, a review of the 50 most cited empirical papers that list personality as a keyword indicates that all 50 papers were authored by people with insti tutional affiliations in the United States, Canada, Germany, the UK, and New Zealand, and only three papers included samples outside of these regions (see Supplementary Abstract registrations that do not provide sufficient information to understand the topic and contribution (e.g., empty abstracts, placeholder abstracts, or trivial abstracts) will be rejected, thereby precluding paper submission. Researchers from the Software Systems Laboratory bagged Best Paper Awards at the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2021) and the 2021 USENIX Annual Technical Conference (USENIX ATC 2021).. Jay Lepreau Best Paper Award, OSDI'21. We present application studies for 8 applications, improving requests-per-second (RPS) by 7.7% and reducing RAM usage 2.4%. Tej Chajed, MIT CSAIL; Joseph Tassarotti, Boston College; Mark Theng, MIT CSAIL; Ralf Jung, MPI-SWS; M. Frans Kaashoek and Nickolai Zeldovich, MIT CSAIL. Reviews will be available for response on Wednesday, March 3, 2021. Authors may upload supplementary material in files separate from their submissions. DistAI generates data by simulating the distributed protocol at different instance sizes and recording states as samples. The 20th ACM Workshop on Hot Topics in Networks (HotNets 2021) will bring together researchers in computer networks and systems to engage in a lively debate on the theory and practice of computer networking. Fortunately, we observe that the backups for high availability in modern distributed OLTP systems can be retrofitted to bridge the analytical queries and transactions in HTAP workloads. Session Chairs: Dushyanth Narayanan, Microsoft Research, and Gala Yadgar, TechnionIsrael Institute of Technology, Jinhyung Koo, Junsu Im, Jooyoung Song, and Juhyung Park, DGIST; Eunji Lee, Soongsil University; Bryan S. Kim, Syracuse University; Sungjin Lee, DGIST. Instead of choosing among a small number of known algorithms, our approach searches in a "policy space" of fine-grained actions, resulting in novel algorithms that can outperform existing algorithms by specializing to a given workload. Professor Veloso has been recognized with a multiple honors, including being a Fellow of the ACM, IEEE, AAAS, and AAAI. The conference papers and full proceedings are available to registered attendees now and will be available to everyone beginning Wednesday, July 14, 2021. Log search and log archiving, despite being critical problems, are mutually exclusive. Despite their extensive use for debugging and vulnerability discovery, sanitizer checks often induce a high runtime cost. She is the author of the textbook Interconnections (about network layers 2 and 3) and coauthor of Network Security. The 15th USENIX Symposium on Operating Systems Design and Implementation seeks to present innovative, exciting research in computer systems. Distributed Trust: Is Blockchain the answer? We introduce a hybrid cryptographic protocol for privacy-adhering transformations of encrypted data. OSDI will provide an opportunity for authors to respond to reviews prior to final consideration of the papers at the program committee meeting. Perennial 2.0 makes this possible by introducing several techniques to formalize GoJournals specification and to manage the complexity in the proof of GoJournals implementation. Amy Tai, VMware Research; Igor Smolyar, Technion Israel Institute of Technology; Michael Wei, VMware Research; Dan Tsafrir, Technion Israel Institute of Technology and VMware Research. OSDI brings together professionals from academic and industrial backgrounds in what has become a premier forum for discussing the design, implementation, and implications of systems software. See www.cs.cmu.edu/~mmv/Veloso.html for her scientific publications. J.P. Morgan AI Research partners with applied data analytics teams across the firm as well as with leading academic institutions globally. Jiachen Wang, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Ding Ding, Department of Computer Science, New York University; Huan Wang, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Conrad Christensen, Department of Computer Science, New York University; Zhaoguo Wang and Haibo Chen, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Jinyang Li, Department of Computer Science, New York University. Fluffy found two new consensus bugs in the most popular Geth Ethereum client which were exploitable on the live Ethereum mainnet. In this paper, we propose Oort to improve the performance of federated training and testing with guided participant selection. First, it enables a caller to push a message to a callee in two hops, using a new way of assigning mailboxes to users that resembles how a post office assigns PO boxes to its customers. In some cases, the quality of these artifacts is as important as that of the document itself. Erhu Feng, Xu Lu, Dong Du, Bicheng Yang, and Xueqiang Jiang, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Yubin Xia, Binyu Zang, and Haibo Chen, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China. Submissions violating the detailed formatting and anonymization rules will not be considered for review. Attaching supplementary material is optional; if your paper says that you have source code or formal proofs, you need not attach them to convince the PC of their existence. The 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI '21) will take place as a virtual event on July 1416, 2021. Devices employ adaptive interrupt coalescing heuristics that try to balance between these opposing goals. Concretely, Dorylus is 1.22 faster and 4.83 cheaper than GPU servers for massive sparse graphs. 23 artifacts received the Artifacts Functional badge (88%). Please identify yourself as a presenter and include your mailing address in your email. Petuum Awarded OSDI 2021 Best Paper for Goodput-Optimized Deep Learning To help more profitably utilize sanitizers, we introduce SanRazor, a practical tool aiming to effectively detect and remove redundant sanitizer checks. Storm ensures security using a Security Typed ORM that refines the (type) abstractions of each layer of the MVC API with logical assertions that describe the data produced and consumed by the underlying operation and the users allowed access to that data. We have made Fluffy publicly available at https://github.com/snuspl/fluffy to contribute to the security of Ethereum. In this paper, we propose a software-hardware co-design to support dynamic, fine-grained, large-scale secure memory as well as fast-initialization. We demonstrate that the hardware thread scheduler is able to lower RPC tail response time by about 5 while enabling the system to sustain 20% higher load, relative to traditional thread scheduling techniques. Paper abstracts and proceedings front matter are available to everyone now. VLDB 2021 - 47th International Conference on Very Large Data Bases Distributed systems are notoriously hard to implement correctly due to non-determinism. We present DPF (Dominant Private Block Fairness) a variant of the popular Dominant Resource Fairness (DRF) algorithmthat is geared toward the non-replenishable privacy resource but enjoys similar theoretical properties as DRF. Han Meng - Research Assistant - Michigan State University | LinkedIn Pollux is implemented and publicly available as part of an open-source project at https://github.com/petuum/adaptdl. HotNets provides a venue for discussing innovative ideas and for debating future research agendas in networking. We have implemented a prototype of our design based on Penglai, an open-sourced enclave system for RISC-V. Based on the observation that real-world workloads always feature skewed access patterns, Nap introduces a NUMA-aware layer (NAL) on the top of existing concurrent PM indexes, and steers accesses to hot items to this layer. We observe that scalability challenges in training GNNs are fundamentally different from that in training classical deep neural networks and distributed graph processing; and that commonly used techniques, such as intelligent partitioning of the graph do not yield desired results. Kirk Rodrigues, Yu Luo, and Ding Yuan, University of Toronto and YScope Inc. The symposium emphasizes innovative research as well as quantified or insightful experiences in systems design and implementation. In 2023 I started another two-year term on the . We present the results of a 1% experiment at fleet scale as well as the longitudinal rollout in Googles warehouse scale computers. Our evaluation shows that PET outperforms existing systems by up to 2.5, by unlocking previously missed opportunities from partially equivalent transformations. We also welcome work that explores the interface to related areas such as computer architecture, networking, programming languages, analytics, and databases. OSDI is "a premier forum for discussing the design, implementation, and implications of systems software." A total of six research papers from the department were accepted to the . Yet, existing efforts randomly select FL participants, which leads to poor model and system efficiency. My paper has accepted to appear in the EuroSys2020; I will have a talk at the Hotstorage'19; The Paper about GCMA Accepted to TC; We identify that current systems for learning the embeddings of large-scale graphs are bottlenecked by data movement, which results in poor resource utilization and inefficient training. In particular, I'll argue for re-engaging with what computer hardware really is today and give two suggestions (among many) about how the OS research community can usefully do this, and exploit what is actually a tremendous opportunity. The chairs may reject abstracts or papers on the basis of egregious missing or extraneous conflicts. AI enables principled representation of knowledge, complex strategy optimization, learning from data, and support to human decision making. For example, optimistic concurrency control (OCC) is better than two-phase-locking (2PL) under low contention, while the converse is true under high contention. PET discovers and applies program transformations that improve computation efficiency but only maintain partial functional equivalence. The paper review process is double-blind. For more details on the submission process, and for templates to use with LaTeX, Word, etc., authors should consult the detailed submission requirements. This paper describes the design, implementation, and evaluation of Addra, the first system for voice communication that hides metadata over fully untrusted infrastructure and scales to tens of thousands of users. Just using Lambdas on top of CPU servers offers up to 2.75 more performance-per-dollar than training only with CPU servers. We demonstrate that Marius achieves the same level of accuracy but is up to one order of magnitude faster. We conclude with a discussion of additional techniques for improving the allocator development process and potential optimization strategies for future memory allocators. Leveraging these information, Pollux dynamically (re-)assigns resources to improve cluster-wide goodput, while respecting fairness and continually optimizing each DL job to better utilize those resources. JEL codes: Q18, Q28, Q57 . How can we design systems that will be reliable despite misbehaving participants? blk-switch uses this insight to adapt techniques from the computer networking literature (e.g., multiple egress queues, prioritized processing of individual requests, load balancing, and switch scheduling) to the Linux kernel storage stack. For conference information, . These scripts often make pages slow to load, partly due to a fundamental inefficiency in how browsers process JavaScript content: browsers make it easy for web developers to reason about page state by serially executing all scripts on any frame in a page, but as a result, fail to leverage the multiple CPU cores that are readily available even on low-end phones. PC members are not required to read supplementary material when reviewing the paper, so each paper should stand alone without it. Cores can safely and concurrently read from their local kernel replica, eliminating remote NUMA accesses. All the times listed below are in Pacific Daylight Time (PDT). This fast path contains programmable hardware support for low latency transport and congestion control as well as hardware support for efficient load balancing of RPCs to cores. This motivates the need for a new approach to data privacy that can provide strong assurance and control to users. Jaehyun Hwang and Midhul Vuppalapati, Cornell University; Simon Peter, UT Austin; Rachit Agarwal, Cornell University. Responses should be limited to clarifying the submitted work. This formulation of memory management, which we call memory programming, is a generalization of paging that allows MAGE to provide a highly efficient virtual memory abstraction for SC. We will look at various problems and approaches, and for each, see if blockchain would help. Under different configurations of TPC-C and TPC-E, Polyjuice can achieve throughput numbers higher than the best of existing algorithms by 15% to 56%. This change is receiving considerable attention in the architecture and security communities, for example, but in contrast, so-called OS researchers are mostly in denial. Existing frameworks optimize tensor programs by applying fully equivalent transformations, which maintain equivalence on every element of output tensors. In contrast, CLP achieves significantly higher compression ratio than all commonly used compressors, yet delivers fast search performance that is comparable or even better than Elasticsearch and Splunk Enterprise. Message from the Program Co-Chairs. Pages should be numbered, and figures and tables should be legible in black and white, without requiring magnification. As a result, the design of a file system with respect to space management and crash consistency is simplified, requiring only 10.8K LOC for full functionality. In particular, responses must not include new experiments or data, describe additional work completed since submission, or promise additional work to follow. Today, privacy controls are enforced by data curators with full access to data in the clear. Thanks to selective profiling, DMons profiling overhead is 1.36% on average, making it feasible for production use. An evaluation of Addra on a cluster of 80 machines on AWS demonstrates that it can serve 32K users with a 99-th percentile message latency of 726 msa 7 improvement over a prior system for text messaging in the same threat model. However, existing enclave designs fail to meet the requirements of scalability demanded by new scenarios like serverless computing, mainly due to the limitations in their secure memory protection mechanisms, including static allocation, restricted capacity and high-cost initialization. This approach misses possible optimization opportunities as transformations that only preserve equivalence on subsets of the output tensors are excluded. We present TEMERAIRE, a hugepage-aware enhancement of TCMALLOC to reduce CPU overheads in the applications code. The wire-to-wire RPC response time through the nanoPU is just 69ns, an order of magnitude quicker than the best-of-breed, low latency, commercial NICs. Session Chairs: Moshe Gabel, University of Toronto, and Joseph Gonzalez, University of California, Berkeley, John Thorpe, Yifan Qiao, Jonathan Eyolfson, and Shen Teng, UCLA; Guanzhou Hu, UCLA and University of Wisconsin, Madison; Zhihao Jia, CMU; Jinliang Wei, Google Brain; Keval Vora, Simon Fraser; Ravi Netravali, Princeton University; Miryung Kim and Guoqing Harry Xu, UCLA. For any further information, please contact the PC chairs: pc-chairs-2022@eurosys.org. OSDI '21 Call for Papers | USENIX We demonstrate that KEVIN reduces the amount of I/O traffic between the host and the device, and remains particularly robust as the system ages and the data become fragmented. Submission of a response is optional. If the conference registration fee will pose a hardship for the presenter of the accepted paper, please contact conference@usenix.org. Swapnil Gandhi and Anand Padmanabha Iyer, Microsoft Research. With her students, she had led research in AI, with a focus on robotics and machine learning, having concretely researched and developed a variety of autonomous robots, including teams of soccer robots, and mobile service robots. Simultaneous submission of the same work to multiple venues, submission of previously published work, or plagiarism constitutes dishonesty or fraud. Important Dates Abstract registrations due: Thursday, December 3, 2020, 3:00 pm PST Complete paper submissions due: Thursday, December 10, 2020, 3:00pm PST Author Response Period Papers not meeting these criteria will be rejected without review, and no deadline extensions will be granted for reformatting. And yet, they continue to rely on centralized search engines and indexers to help users access the content they seek and navigate the apps. We also show that Marius can scale training to datasets an order of magnitude beyond a single machine's GPU and CPU memory capacity, enabling training of configurations with more than a billion edges and 550 GB of total parameters on a single machine with 16 GB of GPU memory and 64 GB of CPU memory. Ankit Bhardwaj and Chinmay Kulkarni, University of Utah; Reto Achermann, University of British Columbia; Irina Calciu, VMware Research; Sanidhya Kashyap, EPFL; Ryan Stutsman, University of Utah; Amy Tai and Gerd Zellweger, VMware Research. People often assume that blockchain has Byzantine robustness, so adding it to any system will make that system super robust against any calamity. The OSDI '21 program co-chairs have agreed not to submit their work to OSDI '21. Author Response Period VLDB 2021: Venue Tivoli Hotel & Congress Center Arni Magnussons Gade 2 1577 Copenhagen, Denmark +45 3268 4300 In-person attendees can purchase tickets for the park / gardens with a 15% discount, which is a special offer by Tivoli Hotel & Congress Center to VLDB 2021 attendees. We particularly encourage contributions containing highly original ideas, new approaches, and/or groundbreaking results. Authors may use this for content that may be of interest to some readers but is peripheral to the main technical contributions of the paper. Petuum Awarded OSDI 2021 Best Paper for Goodput-Optimized Deep Learning Research Petuum CASL research and engineering team's Pollux technical paper on adaptive scheduling for optimized. The chairs will review paper conflicts to ensure the integrity of the reviewing process, adding or removing conflicts if necessary.