From 76a64079b6c7b706400ed78b0ce9ae01f3482a86 Mon Sep 17 00:00:00 2001 From: hustcw Date: Wed, 6 Mar 2024 11:50:49 +0800 Subject: [PATCH 1/2] Adding CEBin (2024) for binary code clone detection and CLAP (2024) for binary code representation --- README.md | 53 +++++++++++++++++++++++++++++++---------------------- 1 file changed, 31 insertions(+), 22 deletions(-) diff --git a/README.md b/README.md index e859d69..86a5f74 100644 --- a/README.md +++ b/README.md @@ -9,33 +9,42 @@ Please feel free to send a pull request to add papers and relevant content that > Note: to quickly access this page, use [ml4se.dev](https://ml4se.dev/) ## Content +- [Machine Learning for Software Engineering](#machine-learning-for-software-engineering) + - [Content](#content) - [Papers](#papers) - - [Type Inference](#type-inference) - - [Code Completion](#code-completion) - - [Code Generation](#code-generation) - - [Code Summarization](#code-summarization) - - [Code Embeddings/Representation](#code-embeddingsrepresentation) - - [Code Changes/Editing](#code-changesediting) - - [Code Comments](#code-comments) - - [Bug/Vulnerability Detection](#bugvulnerability-detection) - - [Source Code Modeling](#source-code-modeling) - - [Program Repair](#program-repair) - - [Program Translation](#program-translation) - - [Program Analysis](#program-analysis) - - [Software Testing](#software-testing) - - [Code Clone Detection](#code-clone-detection) - - [Code Language Models](#code-language-models) - - [Code Review](#code-review) - - [Code Documentation](#code-documentation) - - [Empirical Studies](#empirical-studies) - - [Surveys](#surveys) - - [Misc](#misc) + - [Type Inference](#type-inference) + - [Code Completion](#code-completion) + - [Code Generation](#code-generation) + - [Code Summarization](#code-summarization) + - [Code Embeddings/Representation](#code-embeddingsrepresentation) + - [Code Changes/Editing](#code-changesediting) + - [Code Comments](#code-comments) + - [Bug/Vulnerability Detection](#bugvulnerability-detection) + - [Source Code Modeling](#source-code-modeling) + - [Program Repair](#program-repair) + - [Program Translation](#program-translation) + - [Program Analysis](#program-analysis) + - [Software Testing](#software-testing) + - [Code Clone Detection](#code-clone-detection) + - [Code Search](#code-search) + - [Code Language Models](#code-language-models) + - [Code Review](#code-review) + - [Code Documentation](#code-documentation) + - [Empirical Studies](#empirical-studies) + - [Surveys](#surveys) + - [Misc](#misc) - [PhD Theses](#phd-theses) - [Talks](#talks) - [Datasets](#datasets) - [Tools](#tools) + - [Source Code Analysis \& Processing](#source-code-analysis--processing) + - [Machine Learning](#machine-learning) + - [Code de-duplication](#code-de-duplication) + - [Misc](#misc-1) - [Research Groups](#research-groups) - [Venues](#venues) + - [Conferences](#conferences) + - [Journals](#journals) # Papers @@ -197,7 +206,7 @@ Please feel free to send a pull request to add papers and relevant content that - **A Convolutional Attention Network for Extreme Summarization of Source Code** (2016), ICML 2016, Allamanis, Miltiadis, et al. [[pdf]](http://www.jmlr.org/proceedings/papers/v48/allamanis16.pdf) ## Code Embeddings/Representation - +- **CLAP: Learning Transferable Binary Code Representations with Natural Language Supervision** (2024),ISSTA'24, Wang, Hao, et al. [[pdf]](https://arxiv.org/pdf/2402.16928.pdf) [[code]](https://github.com/Hustcw/CLAP) - **kTrans: Knowledge-Aware Transformer for Binary Code Embedding** (2023), arxiv, Wenyu, Zhu, et al. [[pdf]](https://arxiv.org/pdf/2308.12659.pdf)[[code]](https://github.com/Learner0x5a/kTrans-release) - **TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills** (2023), arxiv, Sun, Qiushi, et al. [[pdf]](https://arxiv.org/pdf/2306.07285) - **CodeGrid: A Grid Representation of Code** (2023), ISSTA'23, Kaboré, Abdoul Kader, et al. @@ -431,7 +440,7 @@ Please feel free to send a pull request to add papers and relevant content that - **On Learning Meaningful Assert Statements for Unit Test Cases** (2020), ICSE'20, Watson, Cody, et al. ## Code Clone Detection - +- **CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection** (2024),ISSTA'24, Wang, Hao, et al. [[pdf]](https://arxiv.org/pdf/2402.18818.pdf) [[code]](https://github.com/Hustcw/CEBin) - **ZC3: Zero-Shot Cross-Language Code Clone Detection** (2023), arxiv, Li, Jia, et al. [[pdf]](https://arxiv.org/pdf/2308.13754) - **Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey** (2023), arxiv, Dou, Shihan, et al. [[pdf]](https://arxiv.org/pdf/2308.01191) - **Comparison and Evaluation of Clone Detection Techniques with Different Code Representations** (2023), ICSE'23, Wang, Yuekun, et al. [[pdf]](https://wu-yueming.github.io/Files/ICSE2023_TACC.pdf) From 087c6e477aa69366a108b8bcd2ac3c0c12af02a9 Mon Sep 17 00:00:00 2001 From: hustcw Date: Wed, 6 Mar 2024 11:58:20 +0800 Subject: [PATCH 2/2] Adding CEBin (2024) for binary code clone detection and CLAP (2024) for binary code representation --- README.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/README.md b/README.md index 86a5f74..40e5f1a 100644 --- a/README.md +++ b/README.md @@ -9,8 +9,6 @@ Please feel free to send a pull request to add papers and relevant content that > Note: to quickly access this page, use [ml4se.dev](https://ml4se.dev/) ## Content -- [Machine Learning for Software Engineering](#machine-learning-for-software-engineering) - - [Content](#content) - [Papers](#papers) - [Type Inference](#type-inference) - [Code Completion](#code-completion)