Clone detection finds application to many software engineering activities such as comprehension and refactoring. However, the confounding configuration choice problem poses a widely-acknowledged threat to the validity of previous empirical analyses. We introduce a search based solution that finds suitable configurations for empirical studies. We introduce both a desktop and parellised cloud-deployed implementations, using them to evaluate our approach on 6 popular clone detection tools applied to the Bellon suite of 8 subject systems. Our evaluation reports the results of 9.3 million total executions of a clone tool, which required a total of 15 CPU-years of computation time. It is the largest empirical study of clone detection to date. Our approach finds configurations that are significantly better (p < 0.05) than the defaults currently used in clone detection experiments, thereby providing evidence that our approach can ameliorate the confounding configuration choice problem.
This website was created to accompany our FSE 2013 submission which is currently under review. In its present form it only provides results from our analysis (that could not be fitted into the page limits of the paper). The following figures show the agreement levels for the all the subject systems in Bellon's benchmarks achieved by the Default, General and Individual configurations.
We implemented our clone evaluation approach as a desktop application CloneEva. We also implemented a cloud-based parallelised 'sister version' of CloneEva, called CloudEvaClone. The architecture design are shown below.
GCF file converter inputs:
Examples:
yue.jia@ucl.ac.uk
2012-2013 © Tiantian Wang, Mark Harman, Yue Jia and Jens Krinke