ISSN (Online): 2321-3418
server-injected
Engineering and Computer Science
Open Access

Network Embedding Techniques for Predicting Software Defects: A Review

, ,
DOI: 10.18535/ijsrm/v13i06.ec05· Pages: 2254-2275· Vol. 13, No. 06, (2025)· Published: June 9, 2025
PDF
Views: 669 PDF downloads: 428

Abstract

In the software development process, ensuring the quality of the software is essential. Software defect prediction (SDP) is of significant importance in identifying software modules with a high likelihood of defects. Several machine learning-based defect prediction models have been developed and implemented in recent years. Researchers have also utilized network embedding for SDP, showcasing the adaptability of Natural Language Processing techniques within the domain of defect prediction. This study aims to review, investigate, and discuss network embedding's use in SDP. We examined the previous 15 years' defect prediction articles using network embedding, the majority of which were published in notable conferences and software engineering journals. Each network embedding technique, its findings, and its particular roles in SDP have been described in detail. The papers that have been reviewed are listed in the order of publication along with their comparative assessment. We have developed three research questions that emphasize the significance of analyzing network representations, particularly network embedding, for identifying potential software defects. According to our knowledge, this review is the first to include a thorough analysis of both the transductive and inductive variants of network embedding, along with their potential in machine learning (ML) for predicting software defects. This article extensively explores the challenges and puts forth potential research directions as solutions, intending to effectively guide future research efforts for academics and practitioners in the field of SDP.

Keywords

Software Defect PredictionNetwork EmbeddingMachine LearningSoftware Dependency

References

  1. Alharthi, Z. S., Alsaeedi, A., & Yafooz, W. M. S. (2021). Software defect prediction approaches: A review. In Proceedings of the 4th International Conference on Bio-Engineering for Smart Technologies (pp. 1-6). https://doi.org/10.1109/BioSMART54244.2021.9677869DOI ↗Google Scholar ↗
  2. Ali, Z., Qi, G., Muhammad, K., Ali, B., & Abro, W. A. (2020). Paper recommendation based on heterogeneous network embedding. Knowledge-Based Systems, 210, 106438. https://doi.org/10.1016/j.knosys.2020.106438DOI ↗Google Scholar ↗
  3. Bahaweres, R. B., Jumral, D., Hermadi, I., Suroso, A. I., & Arkeman, Y. (2021). Hybrid software defect prediction based on LSTM (Long Short Term Memory) and word embedding. In Proceedings of the 2nd International Conference On Smart Cities, Automation & Intelligent Computing Systems (pp. 70-75). https://doi.org/10.1109/ICON-SONICS53103.2021.9617182DOI ↗Google Scholar ↗
  4. Hossain, M., & Chen, H. (2022). Application of Machine Learning on Software Quality Assurance and Testing: A Chronological Survey. International Journal of Computers and their Applications, 29(3), 150-157.Google Scholar ↗
  5. Cai, H., Zheng, V., & Chang, K. (2018). A comprehensive survey of graph embedding: Problems, Techniques, and Applications. IEEE Transactions on Knowledge & Data Engineering, 30(9), 1616-1637. https://doi.org/10.1109/TKDE.2018.2807452DOI ↗Google Scholar ↗
  6. Cao, S., Lu, W., & Xu, Q. (2015). Grarep: Learning graph representations with global structural information. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (pp. 891-900). ACM. https://doi.org/10.1145/2806416.2806512DOI ↗Google Scholar ↗
  7. Cao, S., Lu, W., & Xu, Q. (2016). Deep neural networks for learning graph representations. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (pp. 1145-1152). AAAI Press.Google Scholar ↗
  8. Chen, H., Su, X., Tian, Y., Perozzi, B., Chen, M., & Skiena, S. (2018). Enhanced network embeddings via exploiting edge labels. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (pp. 4 pages). https://doi.org/10.1145/3269206.3269270DOI ↗Google Scholar ↗
  9. Chen, L., Ma, W., Zhou, Y., Xu, L., Wang, Z., Chen, Z., & Xu, B. (2016). Empirical analysis of network measures for predicting high severity software faults. Science China Information Sciences, 59, Article 122901. https://doi.org/10.1007/s11432-015-5426-3DOI ↗Google Scholar ↗
  10. Coscia, J. L. O., Crasso, M., Mateos, C., & Zunino, A. (2012). Estimating Web service interface complexity and quality through conventional object-oriented metrics. In 15th Ibero-American Conference on Software Engineering. https://doi.org/10.19153/cleiej.16.1.4DOI ↗Google Scholar ↗
  11. Coscia, J. L. O., Crasso, M., Mateos, C., Zunino, A., & Misra, S. (2012). Predicting web service maintainability via object-oriented metrics: A statistics-based approach. Computational Science and Its Applications, Lecture Notes in Computer Science, 7336. https://doi.org/10.1007/978-3-642-31128-4_3DOI ↗Google Scholar ↗
  12. Dai, Q., Shen, X., Zhang, L., Li, Q., & Wang, D. (2019). Adversarial Training Methods for Network Embedding. In Proceedings of the World Wide Web Conference (pp. 329-339). https://doi.org/10.1145/3308558.3313445DOI ↗Google Scholar ↗
  13. Dong, T., Shi, H., Zhu, Y., Li, K., Chai, F., & Wang, Y. (2019). Embedded software reliability prediction based on software life cycle. In Proceedings of the IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (pp. 725-729). https://doi.org/10.1109/ISKE47853.2019.9170437DOI ↗Google Scholar ↗
  14. Dong, Y., Chawla, N. V., & Swami, A. (2017). metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 135-144). https://doi.org/10.1145/3097983.3098036DOI ↗Google Scholar ↗
  15. Dong, Y., Tang, Y., Cheng, X., Yang, Y., & Wang, S. (2023). SedSVD: Statement-level software vulnerability detection based on Relational Graph Convolutional Network with subgraph embedding. Information and Software Technology, 158. https://doi.org/10.1016/j.infsof.2023.107168DOI ↗Google Scholar ↗
  16. Du, X., Wang, T., Wang, L., Pan, W., Chai, C., Xu, X., Jiang, B., & Wang, J. (2022). CoreBug: Improving effort-aware bug prediction in software systems using generalized k-core decomposition in class dependency networks. Axioms, 11(5), 205. https://doi.org/10.3390/axioms11050205DOI ↗Google Scholar ↗
  17. Du, X., Yan, J., Zhang, R., & Zha, H. (2022). Cross-Network Skip-Gram Embedding for Joint Network Alignment and Link Prediction. IEEE Transactions on Knowledge and Data Engineering, 34(3), 1080-1095. https://doi.org/10.1109/TKDE.2020.2997861DOI ↗Google Scholar ↗
  18. Fan, G., Diao, X., Yu, H., Yang, K., & Chen, L. (2019). Deep semantic feature learning with embedded static metrics for software defect prediction. In Proceedings of the 26th Asia-Pacific Software Engineering Conference (pp. 244-251). https://doi.org/10.1109/APSEC48747.2019.00041DOI ↗Google Scholar ↗
  19. Gao, H., Lu, M., Pan, C., & Xu, B. (2019). Empirical Study: Are complex network features suitable for cross-version software defect prediction? In Proceedings of the IEEE 10th International Conference on Software Engineering and Service Science (pp. 1-5). https://doi.org/10.1109/ICSESS47205.2019.9040793DOI ↗Google Scholar ↗
  20. Gong, L., Rajbahadur, G. K. K., Hassan, A. E., & Jiang, S. (2021). Revisiting the impact of dependency network metrics on software defect prediction. IEEE Transactions on Software Engineering. https://doi.org/10.1109/TSE.2021.3131950DOI ↗Google Scholar ↗
  21. Goyal, P., & Ferrara, E. (2018). Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems, 151, 78-94. https://doi.org/10.1016/j.knosys.2018.03.022DOI ↗Google Scholar ↗
  22. Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd International Conference on Knowledge Discovery & Data Mining (pp. 855-864). https://doi.org/10.1145/2939672.2939754DOI ↗Google Scholar ↗
  23. Gurung, S. (2022). Performing software defect prediction using deep learning. Computer and Information Science, 1697. Springer. https://doi.org/10.1007/978-3-031-22405-8_25DOI ↗Google Scholar ↗
  24. Halstead, M. H. (1977). Elements of software science (Operating and programming systems series).Google Scholar ↗
  25. Hamilton, W. L., Ying, R., & Leskovec, J. (2017). Representation learning on graphs: Methods and Applications. IEEE Data Engineering, 40(3), 52-74. arXiv:1709.05584Google Scholar ↗
  26. Hamilton, W. L., Ying, Z., & Leskovec, J. (2017). Inductive representation learning on large graphs. In Proceedings of the 28th International Conference on Neural Information Processing Systems (pp. 1025-1035). https://doi.org/10.48550/arXiv.1706.02216DOI ↗Google Scholar ↗
  27. Harrison, R., Counsell, S. J., & Nithi, R. V. (1998). An evaluation of the mood set of object-oriented software metrics. IEEE Transactions on Software Engineering, 24(6), 491-496. https://doi.org/10.1109/32.689404DOI ↗Google Scholar ↗
  28. Hou, M., Ren, J., Zhang, D., Kong, X., Zhang, D., & Xia, F. (2020). Network embedding: Taxonomies, frameworks and applications. Computer Science Review, 38, 100296. https://doi.org/10.1016/j.cosrev.2020.100296DOI ↗Google Scholar ↗
  29. Huo, X., Yang, Y., Li, M., & Zhan, D. (2018). Learning semantic features for software defect prediction by code comments embedding. In Proceedings of the IEEE International Conference on Data Mining (pp. 1049-1054). https://doi.org/10.1109/ICDM.2018.00133DOI ↗Google Scholar ↗
  30. Jureczko, M., & Spinellis, D. (2010). Using object-oriented design metrics to predict software defects. Models and Methods of System Dependability (pp. 69-81). Oficyna Wydawnicza Politechniki Wrocławskiej.Google Scholar ↗
  31. Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations (pp. 1-14). arXiv:1609.02907Google Scholar ↗
  32. Li, N., Liu, J., He, Z., Zhang, C., & Xie, J. (2022). Network Embedding with dual generation tasks. IEEE Transactions on Knowledge and Data Engineering. https://doi.org/10.1109/TKDE.2022.3187851DOI ↗Google Scholar ↗
  33. Li, T., Zhang, J., Yu, P. S., Zhang, Y., & Yan, Y. (2018). Deep dynamic network embedding for link prediction. IEEE Access, 6, 29219-29230. https://doi.org/10.1109/ACCESS.2018.2839770DOI ↗Google Scholar ↗
  34. Ma, W., Chen, L., Yang, Y., Zhou, Y., & Xu, B. (2016). Empirical analysis of network measures for effort-aware fault-proneness prediction. Information and Software Technology, 69, 50-70. https://doi.org/10.1016/j.infsof.2015.09.001DOI ↗Google Scholar ↗
  35. McCabe, T. J. (1976). A complexity measure. IEEE Transactions on Software Engineering, 2(4), 308-320. https://doi.org/10.1109/TSE.1976.233837DOI ↗Google Scholar ↗
  36. Narayana, A., Chandramohan, M., Venkatesan, R., Chen, L., Liu, Y., & Jaiswal, S. (2017). graph2vec: Learning distributed representations of graphs. arXiv:1707.05005Google Scholar ↗
  37. Nguyen, T. H. D., Adams, B., & Hassan, A. E. (2010). Studying the impact of dependency network measures on software quality. In Proceedings of the IEEE International Conference on Software Maintenance (pp. 1-10). https://doi.org/10.1109/ICSM.2010.5609560DOI ↗Google Scholar ↗
  38. Ou, M., Cui, P., Pei, J., Zhang, Z., & Zhu, W. (2016). Asymmetric transitivity preserving graph embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1105-1114). https://doi.org/10.1145/2939672.2939751DOI ↗Google Scholar ↗
  39. Pan, W., Ming, H., Yang, Z., & Wang, T. (2022). Comments on using k-core decomposition on class dependency networks to improve bug prediction model's practical performance. IEEE Transactions on Software Engineering. https://doi.org/10.1109/TSE.2022.3140599DOI ↗Google Scholar ↗
  40. Pereira, J., Groen, A. K., Stroes, E. S. G., & Levin, E. (2019). Graph space embedding. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (pp. 3253-3259). https://doi.org/10.24963/ijcai.2019/451DOI ↗Google Scholar ↗
  41. Perozzi, B., Kulkarni, V., & Skiena, S. (2016). Walklets: Multiscale graph embeddings for interpretable network classification. ArXiv:abs/1605.02115.Google Scholar ↗
  42. Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge discovery and data mining (pp. 701-710). https://doi.org/10.1145/2623330.2623732DOI ↗Google Scholar ↗
  43. Pinzger, M., Nagappan, N., & Murphy, B. (2008). Can developer-module networks predict failures? In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (pp. 2-12). https://doi.org/10.1145/1453101.1453105DOI ↗Google Scholar ↗
  44. Premraj, R., & Herzig, K. (2011). Network versus code metrics to predict defects: A replication study. In International Symposium on Empirical Software Engineering and Measurement (pp. 215-224). https://doi.org/10.1109/ESEM.2011.30DOI ↗Google Scholar ↗
  45. Qiu, J., Yuxiao, D., Ma, H., Li, J., Wang, K., & Tang, J. (2018). Network embedding as matrix factorization: Unifying DeepWalk, LINE, PTE, and node2vec. In Proceedings of the 11th ACM Int. Conf. on Web Search and Data Mining (pp. 459-467). https://doi.org/10.1145/3159652.3159706DOI ↗Google Scholar ↗
  46. Qu, Y., Liu, T., Chi, J., Jin, Y., Cui, D., He, A., Zheng, Q. (2018). Node2defect: using network embedding to improve software defect prediction. In Proceedings of the 33rd ACM/IEEE Int. Conf. on Automated Software Engineering (pp. 844-849). https://doi.org/10.1145/3238147.3240469DOI ↗Google Scholar ↗
  47. Qu, Y., & Yin, H. (2021). Evaluating network embedding techniques' performances in software bug prediction. Empirical Software Engineering, 26, 60. https://doi.org/10.1007/s10664-021-09965-5DOI ↗Google Scholar ↗
  48. Qu, Y., Zheng, Q., Chi, J., Jin, Y., He, A., Cui, D., Zhang, H., & Liu. (2021). Using K-core Decomposition on Class Dependency Networks to improve bug prediction model's practical performance. IEEE Transactions on Software Engineering, 47, 348-366. https://doi.org/10.1109/TSE.2019.2892959DOI ↗Google Scholar ↗
  49. Ribeiro, L. F. R., Saverese, P. H., & Figueiredo, D. R. (2017). Struc2vec: Learning node representations from structural identity. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 385-394). https://doi.org/10.1145/3097983.3098061DOI ↗Google Scholar ↗
  50. Shen, X., Pan, S., Liu, W., Ong, Y., & Sun, Q. (2018). Discrete network embedding. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (pp. 3549-3555).Google Scholar ↗
  51. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015). LINE: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web (pp. 1067-1077). https://doi.org/10.1145/2736277.2741093DOI ↗Google Scholar ↗
  52. Tang, S., Meng, Z., & Liang, S. (2022). Dynamic Co-Embedding Model for temporal attributed networks. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2022.3193564DOI ↗Google Scholar ↗
  53. Tang, W., Tang, M., Ban, M., Zhao, Z., & Feng, M. (2023). CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection. Journal of Systems and Software, 199. https://doi.org/10.1016/j.jss.2023.111623DOI ↗Google Scholar ↗
  54. Tong, H., Liu, B., & Wang, S. (2019). Kernel spectral embedding transfer ensemble for heterogeneous defect prediction. IEEE Transactions on Software Engineering, 47(9), 1886-1906. https://doi.org/10.1109/TSE.2019.2939303DOI ↗Google Scholar ↗
  55. Tosun, A., Turhan, B., & Bener, A. (2009). Validation of network measures as indicators of defective modules in software systems. In Proceedings of the 5th International Conference on Predictor Models in Software Engineering (pp. 1-9). https://doi.org/10.1145/1540438.1540446DOI ↗Google Scholar ↗
  56. Wang, D., Cui, P., & Zhu, W. (2016). Structural deep network embedding. In Proceedings of the 22nd Int. Conf. on Knowledge Discovery and Data Mining (pp. 1225-1234). https://doi.org/10.1145/2939672.2939753DOI ↗Google Scholar ↗
  57. Wang, X., Lu, L., Wang, B., Shang, Y., & Yang, H. (2022). Software defect prediction via GIN with hybrid graphical features. In IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion, 411-416. https://doi.org/10.1109/QRS-C57518.2022.00066DOI ↗Google Scholar ↗
  58. Wang, Z., Ye, X., Wang, C., Cui, J., & Yu, P. S. (2021). Network embedding with completely-imbalanced labels. IEEE Transactions on Knowledge and Data Engineering, 33(11), 3634-3647. https://doi.org/10.1109/TKDE.2020.2971490DOI ↗Google Scholar ↗
  59. Xie, Y., Yu, B., Lv, S., Zhang, C., Wang, G., & Gong, G. (2021). A survey on heterogeneous network representation learning. Pattern Recognition, 116, 107936. https://doi.org/10.1016/j.patcog.2021.107936DOI ↗Google Scholar ↗
  60. Xu, J., Ai, J., & Shi, T. (2021). Software Defect Prediction for Specific Defect Types based on Augmented Code Graph Representation. In Proceedings of the Conference on Dependable Systems and Their Applications (pp. 669-678). https://doi.org/10.1109/DSA52907.2021.00097DOI ↗Google Scholar ↗
  61. Yang, C., Shi, C., Liu, Z., Tu, C., & Sun, M. (2021). Network Embedding: Theories, methods, and applications. Springer Cham.Google Scholar ↗
  62. Yang, F., Huang, Y., Xu, H., Xiao, P., & Zheng, W. (2022). Fine-Grained software defect prediction based on the method-call sequence. Computational Intelligence and Neuroscience, 4311548. https://doi.org/10.1155/2022/4311548DOI ↗Google Scholar ↗
  63. Yang, F., Xu, H., Xiao, P., Zhong, F., & Zeng, G. (2023). A Method-Level defect prediction approach based on structural features of method-calling network. IEEE Access, 11, 7933-7946. https://doi.org/10.1109/ACCESS.2023.3239266DOI ↗Google Scholar ↗
  64. Yang, Y., Ai, J., & Wang, F. (2018). Defect prediction based on the characteristics of multilayer structure of software network. In Proceedings of the IEEE International Conference on Software Quality, Reliability and Security Companion (pp. 27-34). https://doi.org/10.1109/QRS-C.2018.00019DOI ↗Google Scholar ↗
  65. Yang, Y., Harman, M., Krinke, J., Islam, S., Binkley, D., Zhou, Y., & Xu, B. (2016). An empirical study on dependence clusters for effort-aware fault-proneness prediction. In Proceedings of the 31st IEEE/ACM Int. Conf. on Automated Software Engineering (pp. 296-307).Google Scholar ↗
  66. Yang, Z., Cohen, W. W., & Salakhutdinov, R. (2016). Revisiting semi-supervised learning with graph embeddings. In Proceedings of the 33rd Int. Conf. on Int. Conf. on Machine Learning (pp. 40-48). https://doi.org/10.48550/arXiv.1603.08861DOI ↗Google Scholar ↗
  67. Zeng, C., Zhou, C. Y., Lv, S. K., He, P., & Huang, J. (2021). GCN2defect: Graph Convolutional Networks for SMOTETomek-based software defect prediction. In IEEE 32nd International Symposium on Software Reliability Engineering (pp. 69-79). https://doi.org/10.1109/ISSRE52982.2021.00020DOI ↗Google Scholar ↗
  68. Zhang, D., Yin, J., Zhu, X., & Zhang, C. (2021). Search efficient binary network embedding. ACM Transactions on Knowledge Discovery and Data, 15(4), Article 61, 1-27. https://doi.org/10.1145/3436892DOI ↗Google Scholar ↗
  69. Zhang, J., Dong, Y., Wang, Y., Tang, J., & Ding, M. (2019). ProNE: Fast and scalable network representation learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (pp. 4278-4284). https://doi.org/10.24963/ijcai.2019/594DOI ↗Google Scholar ↗
  70. Zhu, W., Wang, X., & Cui, P. (2020). Deep Learning for learning graph representations. W. Pedrycz & S. M. Chen (Eds.), Deep Learning: Concepts and Architectures. Studies in Computational Intelligence, 866, 99-115. https://doi.org/10.1007/978-3-030-31756-0_6DOI ↗Google Scholar ↗
  71. Zimmermann, T., & Nagappan, N. (2008). Predicting defects using network analysis on dependency graphs. In Proceedings of the ACM/IEEE 30th Int. Conf. on Software Engineering (pp. 531-540). https://doi.org/10.1145/1368088.1368161DOI ↗Google Scholar ↗
Author details
Sweta Mehta
Department of CSE, Sarala Birla University, Ranchi
✉ Corresponding Author
👤 View Profile →
Pankaj K. Goswami
Department of CSE, Sarala Birla University, Ranchi
👤 View Profile →🔗 Is this you? Claim this publication
K. Sridhar Patnaik
Department of CSE, Birla Institute of Technology, Mesra, Ranchi
👤 View Profile →🔗 Is this you? Claim this publication