Three Papers of the Programming Language Research Laboratory were Accepted by ICSE’22-Programming Languages Lab, Peking University

Three Papers of the Programming Language Research Laboratory were Accepted by ICSE’22

Date：March 4, 2022 Source：Author：

ICSE'22, a well-known conference in the field of software engineering, recently announced the list of accepted papers and three papers "Fault Localization via Efficient Probabilistic Modeling of Program Semantics", "Improving Machine Translation Systems via Isotopic Replacement", and "Towards Bidirectional Live Programming for Incomplete Programs" by the Programming Language Lab were accepted. The details of the papers are listed below.

Title: Fault Localization via Efficient Probabilistic Modeling of Program Semantics

Authors: Muhan Zeng#, Yiqian Wu#, Zhentao Ye, Yingfei Xiong*, Xin Zhang, Lu Zhang

Abstract:

Testing-based fault localization has been a significant topic in software engineering in the past decades. It localizes a faulty program element based on a set of passing and failing test executions. Since whether a fault could be triggered and detected by a test is related to program semantics, it is crucial to model program semantics in fault localization approaches. Existing approaches either consider the full semantics of the program (e.g., mutation-based fault localization and angelic debugging), leading to scalability issues or ignore the semantics of the program (e.g., spectrum-based fault localization), leading to imprecise localization results.

Our key idea is: by modeling only the correctness of program values but not their full semantics, a balance could be reached between effectiveness and scalability. To realize this idea, we introduce a probabilistic approach to model program semantics and utilize information from static analysis and dynamic execution traces in our modeling. Our approach, SmartFL, is evaluated on a real-world dataset, Defects4J. The top-1 statement-level accuracy of our approach is 21%, which is the best among state-of-the-art methods. The average time cost is 210 seconds per fault while existing methods that capture full semantics are often 10x or slower.

Title: Improving Machine Translation Systems via Isotopic Replacement

Authors: Zeyu Sun#, Jie M. Zhang, Yingfei Xiong*, Mark Harman, Mike Papadakis, Lu Zhang

Abstract:

Machine translation plays an essential role in people’s daily international communication. However, machine translation systems are far from perfect. To tackle this problem, researchers have proposed several approaches to testing machine translation. A promising trend among these approaches is to use word replacement, where only one word in the original sentence is replaced with another word to form a sentence pair. However, precise control of the impact of word replacement remains an outstanding issue in these approaches.

To address this issue, we propose CAT, a novel word-replacement-based approach, whose basic idea is to identify word replacement with controlled impact (referred to as isotopic replacement). To achieve this purpose, we use a neural-based language model to encode the sentence context and design a neural network-based algorithm to evaluate the context-aware semantic similarity between two words. Furthermore, similar to TransRepair, a state-of-the-art word-replacement-based approach, CAT also provides automatic fixing of revealed bugs without model retraining.

Our evaluation on Google Translate and Transformer indicates that CAT achieves significant improvements over TransRepair. In particular, 1) CAT detects seven more types of bugs than TransRepair; 2) CAT detects 129% more translation bugs than TransRepair; 3) CAT repairs twice more bugs than TransRepair, many of which may bring serious consequences if left unfixed; and 4) CAT has better efficiency than TransRepair in input generation (0.01s v.s. 0.41s) and comparable efficiency with TransRepair in bug repair (1.92s v.s. 1.34s).

Title: Towards Bidirectional Live Programming for Incomplete Programs

Authors: Zhang Xing#, Zhenjiang Hu*.

Abstract:

Bidirectional live programming not only allows software developers to see continuous feedback on the output as they write the program but also allows them to modify the program by directly manipulating the output, so that the modified program can get the output that was directly manipulated. Despite the appealing of existing bidirectional live programming systems, there is a big limitation: they cannot deal with incomplete programs where code blanks exist in the source programs.

In this paper, we propose a framework to support bidirectional live programming for incomplete programs, by extending the output value structure, introducing hole binding, and formally defining bidirectional evaluators that are well-behaved. To illustrate the usefulness of the framework, we realize the core bidirectional evaluations of incomplete programs in a tool called Bidirectional Preview. Our experimental results show that our extended backward evaluation for incomplete programs is as efficient as that for complete programs, and our extended forward evaluation makes no difference. Furthermore, we use quick sort and student grades, two nontrivial examples of incomplete programs, to demonstrate its usefulness in algorithm teaching and program debugging.

#: (Co-)First author

*: Corresponding author

Previous Article：A paper accepted at ASE'22
Next Article：Associate Professor Xiong Yingfei Won CCF-IEEE CS Young Scientist Award