EvaClone

Overview

Clone detection finds application to many software engineering activities such as comprehension and refactoring. However, the confounding configuration choice problem poses a widely-acknowledged threat to the validity of previous empirical analyses. We introduce a search based solution that finds suitable configurations for empirical studies. We introduce both a desktop and parellised cloud-deployed implementations, using them to evaluate our approach on 6 popular clone detection tools applied to the Bellon suite of 8 subject systems. Our evaluation reports the results of 9.3 million total executions of a clone tool, which required a total of 15 CPU-years of computation time. It is the largest empirical study of clone detection to date. Our approach finds configurations that are significantly better (p < 0.05) than the defaults currently used in clone detection experiments, thereby providing evidence that our approach can ameliorate the confounding configuration choice problem.

Additional Results

This website was created to accompany our FSE 2013 submission which is currently under review. In its present form it only provides results from our analysis (that could not be fitted into the page limits of the paper). The following figures show the agreement levels for the all the subject systems in Bellon's benchmarks achieved by the Default, General and Individual configurations.

Tools

We implemented our clone evaluation approach as a desktop application CloneEva. We also implemented a cloud-based parallelised 'sister version' of CloneEva, called CloudEvaClone. The architecture design are shown below.

Eva Clone

Cloud Eva Clone

Download EvaClone

About GCF

General Clone Format (GCF) was designed to cater to anticipated development in the clone community, which is currently focussing on so-called `gapped clones'. We also develop a GCF Converter to covert the output of other clone detection tools to GCF files. Eva Clone

GCF file converter inputs:

subject_name: The name of the subject program
subject_path: The path of the subject program
clone_file_path: The path of the input clone file
min_line: The minimum lines of each clone

Examples:

RCF files (IClone): java -jar GCF_Fileconverters.jar 5 clone_file_path minline
CCfinder: java -jar GCF_Fileconverters.jar 1 clone_file_path minline
PMD/CPD: java -jar GCF_Fileconverters.jar 3 subject_path clone_file_path minline
ConQAT: java -jar GCF_Fileconverters.jar 4 subject_name clone_file_path minline
Simian: java -jar GCF_Fileconverters.jar 6 subject_path clone_file_path minline
NiCAD: java -jar GCF_Fileconverters.jar 7 subject_name clone_file_path minline

Download GCF-Converter

Fitness Database

Download Fitness evaluation database

Contact

yue.jia@ucl.ac.uk