Five papers of the Programming Language Lab was Accepted at OOPSLA'24

Title: Newtonian Program Analysis of Probabilistic Programs

Authors: Di Wang, Thomas Reps

Abstract: Due to their quantitative nature, probabilistic programs pose non-trivial challenges for designing compositional and efficient program analyses. Many analyses for probabilistic programs rely on iterative approximation. This article presents an interprocedural dataflow-analysis framework, called NPA-PMA, for designing and implementing (partially) non-iterative program analyses of probabilistic programs with unstructured control-flow, nondeterminism, and general recursion. NPA-PMA is based on Newtonian Program Analysis (NPA), a generalization of Newton's method to solve equation systems over semirings. The key challenge for developing NPA-PMA is to handle multiple kinds of confluences in both the algebraic structures that specify analyses and the equation systems that encode control flow: semirings support a single confluence operation, whereas NPA-PMA involves three confluence operations (conditional, probabilistic, and nondeterministic).

Our work introduces ω-continuous pre-Markov algebras (ωPMAs) to factor out common parts of different analyses; adopts regular infinite-tree expressions to encode probabilistic programs with unstructured control-flow; and presents a linearization method that makes Newton's method applicable to the setting of regular-infinite-tree equations over ωPMAs. NPA-PMA allows analyses to supply a non-iterative strategy to solve linearized equations. Our experimental evaluation demonstrates that (i) NPA-PMA holds considerable promise for outperforming Kleene iteration, and (ii) provides great generality for designing program analyses.


Title: Programmable MCMC with Soundly Composed Guide Programs

Authors: Long Pham, Di Wang*, Feras A. Saad, Jan Hoffmann

Abstract:

Probabilistic programming languages (PPLs) provide language support for expressing flexible probabilistic models and solving Bayesian inference problems. PPLs with programmable inference make it possible for users to obtain improved results by customizing inference engines using guide programs that are tailored to a corresponding model program. However, errors in guide programs can compromise the statistical soundness of the inference. This article introduces a novel coroutine-based framework for verifying the correctness of user-written guide programs for a broad class of Markov chain Monte Carlo (MCMC) inference algorithms. Our approach rests on a novel type system for describing communication protocols between a model program and a sequence of guides that each update only a subset of random variables. We prove that, by translating guide types to context-free processes with finite norms, it is possible to check structural type equality between models and guides in polynomial time. This connection gives rise to an efficient type-inference algorithm for probabilistic programs with flexible constructs such as general recursion and branching. We also contribute a coverage-checking algorithm that verifies the support of sequentially composed guide programs agrees with that of the model program, which is a key soundness condition for MCMC inference with multiple guides. Evaluations on diverse benchmarks show that our type-inference and coverage-checking algorithms efficiently infer types and detect sound and unsound guides for programs that existing static analyses cannot handle.


Title:Semantics Lifting for Syntactic Sugar

Authors:Zhichao Guan、Yiyuan Cao、Tailai Yu、Ziheng Wang、Di Wang*、Zhenjiang Hu

Abstract:Syntactic sugar plays a crucial role in engineering programming languages. It offers convenient syntax and higher-level of abstractions, as witnessed by its pervasive use in both general-purpose and domain-specific contexts. Unfortunately, the traditional approach of translating programs containing syntactic sugars into the host language can lead to abstraction leakage, breaking the promise of convenience and hindering program comprehension. To address this challenge, we introduce the idea of semantics lifting that aims to statically derive self-contained evaluation rules for syntactic sugars. More specifically, we propose a semantics-lifting framework that consists of (i) a general algorithm for deriving host-independent semantics of syntactic sugars from the semantics of the host language and the desugaring rules, (ii) a formulation of the correctness and abstraction properties for a lifted semantics, and (iii) a systematic investigation of sufficient conditions that ensure a lifted semantics is provably correct and abstract. To evaluate our semantics-lifting framework, we have implemented a system named Osazone and conducted several case studies, demonstrating that our approach is flexible, effective, and practical for implementing domain-specific languages.


Title:Scaling Abstraction Refinement for Program Analyses in Datalog using Graph Neural Networks

Authors:Zhenyu Yan、Xin Zhang*、Peng Di

Abstract:Counterexample-guided abstraction refinement (CEGAR) is a popular approach for automatically selecting abstractions with high precision and low time costs. Existing works cast abstraction refinements as constraint-solving problems. Due to the complexity of these problems, they cannot be scaled to large programs or complex analyses. We propose a novel approach that applies graph neural networks to improve the scalability of CEGAR for Datalog-based program analyses. By constructing graphs directly from the Datalog solver’s calculations, our method then uses a neural network to score abstraction parameters based on the information in these graphs. Then we reform the constraint problems such that the constraint solver ignores parameters with low scores. This in turn reduces the solution space and the size of the constraint problems. Since our graphs are directly constructed from Datalog computation without human effort, our approach can be applied to a broad range of parametric static analyses implemented in Datalog. We evaluate our approach on a pointer analysis and a typestate analysis and our approach can answer 2.83× and 1.5× as many queries as the baseline approach on large programs for the pointer analysis and the typestate analysis, respectively.


Title:Learning Abstraction Selection for Bayesian Program Analysis

Authors:Yifan Zhang、Yuanfeng Shi、Xin Zhang*

Abstract:We propose a learning-based approach to select abstractions for Bayesian program analysis. Bayesian program analysis converts a program analysis into a Bayesian model by attaching probabilities to analysis rules. It computes probabilities of analysis results and can update them by learning from user feedback, test runs, and other information. Its abstraction heavily affects how well it learns from such information. There exists a long line of works in selecting abstractions for conventional program analysis but they are not effective for Bayesian program analysis. This is because they do not optimize for generalization ability. We propose a data-driven framework to solve this problem by learning from labeled programs. Starting from an abstraction, it decides how to change the abstraction based on analysis derivations. To be general, it considers graph properties of analysis derivations; to be effective, it considers the derivations before and after changing the abstraction. We demonstrate the effectiveness of our approach using a datarace analysis and a thread-escape analysis.


Previous Article:Ph.D. Thesis of Qihao Zhu won the Distinguished Thesis Award by TCSE, CCF
Next Article:One Paper was Accepted as the Best Paper at ECOOP'24