Abstract
Smart contracts play an increasingly important role in Ethereum platform. It provides various functions implementing numerous services, whose bytecode runs on Ethereum Virtual Machine. To use services by invoking corresponding functions, the callers need to know the function signatures. Moreover, such signatures provide crucial information for many downstream applications, e.g., identifying smart contracts, fuzzing, detecting vulnerabilities, etc. However, it is challenging to infer function signatures from the bytecode due to a lack of type information. Existing work solving this problem depended heavily on limited databases or hard-coded heuristic patterns. However, these approaches are hard to be adapted to semantic differences in distinct languages and various compiler versions when developing smart contracts. In this paper, we propose a novel framework DeepInfer that first leverages deep learning techniques to automatically infer function signatures and returns. The novelties of DeepInfer are: 1) DeepInfer lifts the bytecode into the Intermediate Representation (IR) to preserve code semantics; 2) DeepInfer extracts the type-related knowledge (e.g., critical data flows, constant values, and control flow graphs) from the IR to recover function signatures and returns. We conduct experiments on Solidity and Vyper smart contracts and the results show that DeepInfer performs faster and more accurate than existing tools, while being immune to changes in different languages and various compiler versions.
Original language | English |
---|---|
Title of host publication | Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering |
Pages | 745–757 |
Publication status | Published - Nov 2023 |