Network Embedding Techniques for Predicting Software Defects: A Review
Downloads
In the software development process, ensuring the quality of the software is essential. Software defect prediction (SDP) is of significant importance in identifying software modules with a high likelihood of defects. Several machine learning-based defect prediction models have been developed and implemented in recent years. Researchers have also utilized network embedding for SDP, showcasing the adaptability of Natural Language Processing techniques within the domain of defect prediction. This study aims to review, investigate, and discuss network embedding's use in SDP. We examined the previous 15 years' defect prediction articles using network embedding, the majority of which were published in notable conferences and software engineering journals. Each network embedding technique, its findings, and its particular roles in SDP have been described in detail. The papers that have been reviewed are listed in the order of publication along with their comparative assessment. We have developed three research questions that emphasize the significance of analyzing network representations, particularly network embedding, for identifying potential software defects. According to our knowledge, this review is the first to include a thorough analysis of both the transductive and inductive variants of network embedding, along with their potential in machine learning (ML) for predicting software defects. This article extensively explores the challenges and puts forth potential research directions as solutions, intending to effectively guide future research efforts for academics and practitioners in the field of SDP.
Downloads
1. Alharthi, Z. S., Alsaeedi, A., & Yafooz, W. M. S. (2021). Software defect prediction approaches: A review. In Proceedings of the 4th International Conference on Bio-Engineering for Smart Technologies (pp. 1-6). https://doi.org/10.1109/BioSMART54244.2021.9677869
2. Ali, Z., Qi, G., Muhammad, K., Ali, B., & Abro, W. A. (2020). Paper recommendation based on heterogeneous network embedding. Knowledge-Based Systems, 210, 106438. https://doi.org/10.1016/j.knosys.2020.106438
3. Bahaweres, R. B., Jumral, D., Hermadi, I., Suroso, A. I., & Arkeman, Y. (2021). Hybrid software defect prediction based on LSTM (Long Short Term Memory) and word embedding. In Proceedings of the 2nd International Conference On Smart Cities, Automation & Intelligent Computing Systems (pp. 70-75). https://doi.org/10.1109/ICON-SONICS53103.2021.9617182
4. Hossain, M., & Chen, H. (2022). Application of Machine Learning on Software Quality Assurance and Testing: A Chronological Survey. International Journal of Computers and their Applications, 29(3), 150-157.
5. Cai, H., Zheng, V., & Chang, K. (2018). A comprehensive survey of graph embedding: Problems, Techniques, and Applications. IEEE Transactions on Knowledge & Data Engineering, 30(9), 1616-1637. https://doi.org/10.1109/TKDE.2018.2807452
6. Cao, S., Lu, W., & Xu, Q. (2015). Grarep: Learning graph representations with global structural information. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (pp. 891-900). ACM. https://doi.org/10.1145/2806416.2806512
7. Cao, S., Lu, W., & Xu, Q. (2016). Deep neural networks for learning graph representations. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (pp. 1145-1152). AAAI Press.
8. Chen, H., Su, X., Tian, Y., Perozzi, B., Chen, M., & Skiena, S. (2018). Enhanced network embeddings via exploiting edge labels. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (pp. 4 pages). https://doi.org/10.1145/3269206.3269270
9. Chen, L., Ma, W., Zhou, Y., Xu, L., Wang, Z., Chen, Z., & Xu, B. (2016). Empirical analysis of network measures for predicting high severity software faults. Science China Information Sciences, 59, Article 122901. https://doi.org/10.1007/s11432-015-5426-3
10. Coscia, J. L. O., Crasso, M., Mateos, C., & Zunino, A. (2012). Estimating Web service interface complexity and quality through conventional object-oriented metrics. In 15th Ibero-American Conference on Software Engineering. https://doi.org/10.19153/cleiej.16.1.4
11. Coscia, J. L. O., Crasso, M., Mateos, C., Zunino, A., & Misra, S. (2012). Predicting web service maintainability via object-oriented metrics: A statistics-based approach. Computational Science and Its Applications, Lecture Notes in Computer Science, 7336. https://doi.org/10.1007/978-3-642-31128-4_3
12. Dai, Q., Shen, X., Zhang, L., Li, Q., & Wang, D. (2019). Adversarial Training Methods for Network Embedding. In Proceedings of the World Wide Web Conference (pp. 329-339). https://doi.org/10.1145/3308558.3313445
13. Dong, T., Shi, H., Zhu, Y., Li, K., Chai, F., & Wang, Y. (2019). Embedded software reliability prediction based on software life cycle. In Proceedings of the IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (pp. 725-729). https://doi.org/10.1109/ISKE47853.2019.9170437
14. Dong, Y., Chawla, N. V., & Swami, A. (2017). metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 135-144). https://doi.org/10.1145/3097983.3098036
15. Dong, Y., Tang, Y., Cheng, X., Yang, Y., & Wang, S. (2023). SedSVD: Statement-level software vulnerability detection based on Relational Graph Convolutional Network with subgraph embedding. Information and Software Technology, 158. https://doi.org/10.1016/j.infsof.2023.107168
16. Du, X., Wang, T., Wang, L., Pan, W., Chai, C., Xu, X., Jiang, B., & Wang, J. (2022). CoreBug: Improving effort-aware bug prediction in software systems using generalized k-core decomposition in class dependency networks. Axioms, 11(5), 205. https://doi.org/10.3390/axioms11050205
17. Du, X., Yan, J., Zhang, R., & Zha, H. (2022). Cross-Network Skip-Gram Embedding for Joint Network Alignment and Link Prediction. IEEE Transactions on Knowledge and Data Engineering, 34(3), 1080-1095. https://doi.org/10.1109/TKDE.2020.2997861
18. Fan, G., Diao, X., Yu, H., Yang, K., & Chen, L. (2019). Deep semantic feature learning with embedded static metrics for software defect prediction. In Proceedings of the 26th Asia-Pacific Software Engineering Conference (pp. 244-251). https://doi.org/10.1109/APSEC48747.2019.00041
19. Gao, H., Lu, M., Pan, C., & Xu, B. (2019). Empirical Study: Are complex network features suitable for cross-version software defect prediction? In Proceedings of the IEEE 10th International Conference on Software Engineering and Service Science (pp. 1-5). https://doi.org/10.1109/ICSESS47205.2019.9040793
20. Gong, L., Rajbahadur, G. K. K., Hassan, A. E., & Jiang, S. (2021). Revisiting the impact of dependency network metrics on software defect prediction. IEEE Transactions on Software Engineering. https://doi.org/10.1109/TSE.2021.3131950
21. Goyal, P., & Ferrara, E. (2018). Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems, 151, 78-94. https://doi.org/10.1016/j.knosys.2018.03.022
22. Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd International Conference on Knowledge Discovery & Data Mining (pp. 855-864). https://doi.org/10.1145/2939672.2939754
23. Gurung, S. (2022). Performing software defect prediction using deep learning. Computer and Information Science, 1697. Springer. https://doi.org/10.1007/978-3-031-22405-8_25
24. Halstead, M. H. (1977). Elements of software science (Operating and programming systems series).
25. Hamilton, W. L., Ying, R., & Leskovec, J. (2017). Representation learning on graphs: Methods and Applications. IEEE Data Engineering, 40(3), 52-74. arXiv:1709.05584
26. Hamilton, W. L., Ying, Z., & Leskovec, J. (2017). Inductive representation learning on large graphs. In Proceedings of the 28th International Conference on Neural Information Processing Systems (pp. 1025-1035). https://doi.org/10.48550/arXiv.1706.02216
27. Harrison, R., Counsell, S. J., & Nithi, R. V. (1998). An evaluation of the mood set of object-oriented software metrics. IEEE Transactions on Software Engineering, 24(6), 491-496. https://doi.org/10.1109/32.689404
28. Hou, M., Ren, J., Zhang, D., Kong, X., Zhang, D., & Xia, F. (2020). Network embedding: Taxonomies, frameworks and applications. Computer Science Review, 38, 100296. https://doi.org/10.1016/j.cosrev.2020.100296
29. Huo, X., Yang, Y., Li, M., & Zhan, D. (2018). Learning semantic features for software defect prediction by code comments embedding. In Proceedings of the IEEE International Conference on Data Mining (pp. 1049-1054). https://doi.org/10.1109/ICDM.2018.00133
30. Jureczko, M., & Spinellis, D. (2010). Using object-oriented design metrics to predict software defects. Models and Methods of System Dependability (pp. 69-81). Oficyna Wydawnicza Politechniki Wrocławskiej.
31. Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations (pp. 1-14). arXiv:1609.02907
32. Li, N., Liu, J., He, Z., Zhang, C., & Xie, J. (2022). Network Embedding with dual generation tasks. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2022.3187851
33. Li, T., Zhang, J., Yu, P. S., Zhang, Y., & Yan, Y. (2018). Deep dynamic network embedding for link prediction. IEEE Access, 6, 29219-29230. https://doi.org/10.1109/ACCESS.2018.2839770
34. Ma, W., Chen, L., Yang, Y., Zhou, Y., & Xu, B. (2016). Empirical analysis of network measures for effort-aware fault-proneness prediction. Information and Software Technology, 69, 50-70. https://doi.org/10.1016/j.infsof.2015.09.001
35. McCabe, T. J. (1976). A complexity measure. IEEE Transactions on Software Engineering, 2(4), 308-320. https://doi.org/10.1109/TSE.1976.233837
36. Narayana, A., Chandramohan, M., Venkatesan, R., Chen, L., Liu, Y., & Jaiswal, S. (2017). graph2vec: Learning distributed representations of graphs. arXiv:1707.05005
37. Nguyen, T. H. D., Adams, B., & Hassan, A. E. (2010). Studying the impact of dependency network measures on software quality. In Proceedings of the IEEE International Conference on Software Maintenance (pp. 1-10). https://doi.org/10.1109/ICSM.2010.5609560
38. Ou, M., Cui, P., Pei, J., Zhang, Z., & Zhu, W. (2016). Asymmetric transitivity preserving graph embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1105-1114). https://doi.org/10.1145/2939672.2939751
39. Pan, W., Ming, H., Yang, Z., & Wang, T. (2022). Comments on using k-core decomposition on class dependency networks to improve bug prediction model's practical performance. IEEE Transactions on Software Engineering. https://doi.org/10.1109/TSE.2022.3140599
40. Pereira, J., Groen, A. K., Stroes, E. S. G., & Levin, E. (2019). Graph space embedding. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (pp. 3253-3259). https://doi.org/10.24963/ijcai.2019/451
41. Perozzi, B., Kulkarni, V., & Skiena, S. (2016). Walklets: Multiscale graph embeddings for interpretable network classification. ArXiv:abs/1605.02115.
42. Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge discovery and data mining (pp. 701-710). https://doi.org/10.1145/2623330.2623732
43. Pinzger, M., Nagappan, N., & Murphy, B. (2008). Can developer-module networks predict failures? In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (pp. 2-12). https://doi.org/10.1145/1453101.1453105
44. Premraj, R., & Herzig, K. (2011). Network versus code metrics to predict defects: A replication study. In International Symposium on Empirical Software Engineering and Measurement (pp. 215-224). https://doi.org/10.1109/ESEM.2011.30
45. Qiu, J., Yuxiao, D., Ma, H., Li, J., Wang, K., & Tang, J. (2018). Network embedding as matrix factorization: Unifying DeepWalk, LINE, PTE, and node2vec. In Proceedings of the 11th ACM Int. Conf. on Web Search and Data Mining (pp. 459-467). https://doi.org/10.1145/3159652.3159706
46. Qu, Y., Liu, T., Chi, J., Jin, Y., Cui, D., He, A., Zheng, Q. (2018). Node2defect: using network embedding to improve software defect prediction. In Proceedings of the 33rd ACM/IEEE Int. Conf. on Automated Software Engineering (pp. 844-849). https://doi.org/10.1145/3238147.3240469
47. Qu, Y., & Yin, H. (2021). Evaluating network embedding techniques' performances in software bug prediction. Empirical Software Engineering, 26, 60. https://doi.org/10.1007/s10664-021-09965-5
48. Qu, Y., Zheng, Q., Chi, J., Jin, Y., He, A., Cui, D., Zhang, H., & Liu. (2021). Using K-core Decomposition on Class Dependency Networks to improve bug prediction model's practical performance. IEEE Transactions on Software Engineering, 47, 348-366. https://doi.org/10.1109/TSE.2019.2892959
49. Ribeiro, L. F. R., Saverese, P. H., & Figueiredo, D. R. (2017). Struc2vec: Learning node representations from structural identity. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 385-394). https://doi.org/10.1145/3097983.3098061
50. Shen, X., Pan, S., Liu, W., Ong, Y., & Sun, Q. (2018). Discrete network embedding. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (pp. 3549-3555).
51. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015). LINE: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web (pp. 1067-1077). https://doi.org/10.1145/2736277.2741093
52. Tang, S., Meng, Z., & Liang, S. (2022). Dynamic Co-Embedding Model for temporal attributed networks. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2022.3193564
53. Tang, W., Tang, M., Ban, M., Zhao, Z., & Feng, M. (2023). CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection. Journal of Systems and Software, 199. https://doi.org/10.1016/j.jss.2023.111623
54. Tong, H., Liu, B., & Wang, S. (2019). Kernel spectral embedding transfer ensemble for heterogeneous defect prediction. IEEE Transactions on Software Engineering, 47(9), 1886-1906. https://doi.org/10.1109/TSE.2019.2939303
55. Tosun, A., Turhan, B., & Bener, A. (2009). Validation of network measures as indicators of defective modules in software systems. In Proceedings of the 5th International Conference on Predictor Models in Software Engineering (pp. 1-9). https://doi.org/10.1145/1540438.1540446
56. Wang, D., Cui, P., & Zhu, W. (2016). Structural deep network embedding. In Proceedings of the 22nd Int. Conf. on Knowledge Discovery and Data Mining (pp. 1225-1234). https://doi.org/10.1145/2939672.2939753
57. Wang, X., Lu, L., Wang, B., Shang, Y., & Yang, H. (2022). Software defect prediction via GIN with hybrid graphical features. In IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion, 411-416. https://doi.org/10.1109/QRS-C57518.2022.00066
58. Wang, Z., Ye, X., Wang, C., Cui, J., & Yu, P. S. (2021). Network embedding with completely-imbalanced labels. IEEE Transactions on Knowledge and Data Engineering, 33(11), 3634-3647. https://doi.org/10.1109/TKDE.2020.2971490
59. Xie, Y., Yu, B., Lv, S., Zhang, C., Wang, G., & Gong, G. (2021). A survey on heterogeneous network representation learning. Pattern Recognition, 116, 107936. https://doi.org/10.1016/j.patcog.2021.107936
60. Xu, J., Ai, J., & Shi, T. (2021). Software Defect Prediction for Specific Defect Types based on Augmented Code Graph Representation. In Proceedings of the Conference on Dependable Systems and Their Applications (pp. 669-678). https://doi.org/10.1109/DSA52907.2021.00097
61. Yang, C., Shi, C., Liu, Z., Tu, C., & Sun, M. (2021). Network Embedding: Theories, methods, and applications. Springer Cham.
62. Yang, F., Huang, Y., Xu, H., Xiao, P., & Zheng, W. (2022). Fine-Grained software defect prediction based on the method-call sequence. Computational Intelligence and Neuroscience, 4311548. https://doi.org/10.1155/2022/4311548
63. Yang, F., Xu, H., Xiao, P., Zhong, F., & Zeng, G. (2023). A Method-Level defect prediction approach based on structural features of method-calling network. IEEE Access, 11, 7933-7946. https://doi.org/10.1109/ACCESS.2023.3239266
64. Yang, Y., Ai, J., & Wang, F. (2018). Defect prediction based on the characteristics of multilayer structure of software network. In Proceedings of the IEEE International Conference on Software Quality, Reliability and Security Companion (pp. 27-34). https://doi.org/10.1109/QRS-C.2018.00019
65. Yang, Y., Harman, M., Krinke, J., Islam, S., Binkley, D., Zhou, Y., & Xu, B. (2016). An empirical study on dependence clusters for effort-aware fault-proneness prediction. In Proceedings of the 31st IEEE/ACM Int. Conf. on Automated Software Engineering (pp. 296-307).
66. Yang, Z., Cohen, W. W., & Salakhutdinov, R. (2016). Revisiting semi-supervised learning with graph embeddings. In Proceedings of the 33rd Int. Conf. on Int. Conf. on Machine Learning (pp. 40-48). https://doi.org/10.48550/arXiv.1603.08861
67. Zeng, C., Zhou, C. Y., Lv, S. K., He, P., & Huang, J. (2021). GCN2defect: Graph Convolutional Networks for SMOTETomek-based software defect prediction. In IEEE 32nd International Symposium on Software Reliability Engineering (pp. 69-79). https://doi.org/10.1109/ISSRE52982.2021.00020
68. Zhang, D., Yin, J., Zhu, X., & Zhang, C. (2021). Search efficient binary network embedding. ACM Transactions on Knowledge Discovery and Data, 15(4), Article 61, 1-27. https://doi.org/10.1145/3436892
69. Zhang, J., Dong, Y., Wang, Y., Tang, J., & Ding, M. (2019). ProNE: Fast and scalable network representation learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (pp. 4278-4284). https://doi.org/10.24963/ijcai.2019/594
70. Zhu, W., Wang, X., & Cui, P. (2020). Deep Learning for learning graph representations. W. Pedrycz & S. M. Chen (Eds.), Deep Learning: Concepts and Architectures. Studies in Computational Intelligence, 866, 99-115. https://doi.org/10.1007/978-3-030-31756-0_6
71. Zimmermann, T., & Nagappan, N. (2008). Predicting defects using network analysis on dependency graphs. In Proceedings of the ACM/IEEE 30th Int. Conf. on Software Engineering (pp. 531-540). https://doi.org/10.1145/1368088.1368161
Copyright (c) 2025 Sweta Mehta, Pankaj K. Goswami, K. Sridhar Patnaik

This work is licensed under a Creative Commons Attribution 4.0 International License.