Implementing Apache Spark GraphX in Big Data Using Breadth-First Search
Keywords:
Breadth-first search, Apache spark, GraphX, big data analytics, graph traversal, distributed processingAbstract
Application of Apache's Breadth-First Search (BFS) to Big Data with order to efficiently traverse graphs with massive datasets, Spark GraphX intends to make use of Apache Spark's robust distributed processing capabilities. Focusing on scalability and speedup in processing vast graph structures, the purpose is to show that utilizing Spark GraphX for BFS is feasible and has performance advantages. By showcasing the benefits of fault tolerance and parallel processing that are intrinsic to the Spark environment, we want to provide a thorough technique for installing BFS in Spark GraphX. Applications for this method might include bioinformatics, recommendation systems, and social networks, all of which deal with large graphs. Exploring the possibility of handling complicated graph queries with substantial gains in processing speed and resource usage is possible via the integration of BFS with Spark GraphX. This advances the capabilities of big data analytics in graph processing. In one example of Node Distance from Source, the numbers vary from 32 to 93 kilometers, while in another case with 5 nodes linked, the results range from 14 to 78 kilometers. When comparing the two sets of data, the values are 23 to 98 kilometers. These ranges are all derived from the Sample Graph Data.
References
[1]. L. Meng, Y. Shao, L. Yuan, L. Lai, P. Cheng, X. Li, W. Yu, W. Zhang, X. Lin, and J. Zhou, “A Survey of Distributed Graph Algorithms on Massive Graphs,” arXiv preprint arXiv: 2404.06037, pp. 1-35, 2024.
[2]. G. Theodorakis, J. Clarkson, and J. Webber, “Aion: Efficient Temporal Graph Data Management,” International Conference on Extending Database Technology, pp. 501-514, 2024.
[3]. S. U. Rehman, M. I. Khalil, M. Kundi, and T. AlSaedi, “A Study on Frequent Subgraph Mining Approaches: Challenges and Future Directions,” Research Updates in Mathematics and Computer Science, vol. 4, pp. 33-63, 2024.
[4]. L. Zeng, H. Huang, B. Zheng, K. Yang, S. Shao, J. Zhou, J. Xie, R. Zhao, and X. Chen, “WindGP: Efficient Graph Partitioning on Heterogenous Machines,” arXiv preprint arXiv: 2403.00331, pp. 1-19, 2024.
[5]. S. B. Gandreti, and S. PS, “Breadth-First Search Approach for Mining Serial Episodes with Simultaneous Events,” Joint International Conference on Data Science and Management of Data, pp. 36-44, 2024.
[6]. K. Olgu, T. Kenter, J. Nunez-Yanez, and S. Mcintosh-Smith, “Optimisation and Evaluation of Breadth First Search with oneAPI/SYCL on Intel FPGAs: from Describing Algorithms to Describing Architectures,” International Workshop on OpenCL and SYCL, pp. 1-11, 2024.
[7]. M. D. Pratama, R. Abdillah, D. Herumurti, and S. C. Hidayati, “Algorithmic Advancements in Heuristic Search for Enhanced Sudoku Puzzle Solving Across Difficulty Levels,” Building of Informatics, Technology and Science (BITS), vol. 5, no. 4, pp. 659-671, 2024.
[8]. P. Gepner, B. Kocot, M. Paprzycki, M. Ganzha, L. Moroz, and T. Olas, “Performance Evaluation of Parallel Graphs Algorithms Utilizing Graphcore IPU,” Electronics, vol. 13, no. 11, pp. 1-15, 2024
[9]. S. Lemons, W. Ruml, R. Holte, and C. L. López, “Rectangle Search: An Anytime Beam Search,” InProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 18, pp. 20751-20758, 2024
[10]. Salem N, Haneya H, Balbaid H, Asrar M. “Exploring the Maze: A Comparative Study of Path Finding Algorithms for PAC-Man Game,” Effat University, pp. 1-7, 2024.
[11]. J. Zhang, “Distributed Graph Algorithms: From Local Data to Global Solutions,” Science and Technology of Engineering, Chemistry and Environmental Protection, vol. 1, no. 5, pp. 1-4, 2024.
[12]. D. Li, B. Yang, L. Liu, C. Chen, C. Sun, L. Ma, S. Xiao, and J. Sun, “Research on a Unified Data Model for Power Grids and Communication Networks Based on Graph Databases,” Electronics, vol. 13, no. 11, pp. 1-14, 2024.
[13]. X. Dong, Y. Gu, Y. Sun, and L. Wang, “PASGAL: Parallel And Scalable Graph Algorithm Library,” arXiv preprint arXiv:2404.17101, pp. 1-7, 2024.
[14]. Q. Ci, “Research on the Application of Computer Big Data Technology in Smart Travel,” Journal of Combinatorial Mathematics and Combinatorial Computing, vol. 119, pp. 291-303, 2024.
[15]. S. Yang, B. Zhao, and C. Xie, “AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability,” arXiv preprint arXiv: 2402.09404, pp. 1-18, 2024.
[16]. O. Kahramanogulları, “Deep Inference in Proof Search: The Need for Shallow Inference,” InProceedings of 25th Conference on Logic for Pro, vol. 100, pp. 370-389, 2024.
[17]. W. H. Li, and Y. C. Huang, “Syndrome-based Fusion Rules in Heterogeneous Distributed Quickest Change Detection,” arXiv preprint arXiv: 2405.06933, pp. 1-7, 2024.
[18]. I. Ben-Noah, J. J. Hidalgo, and M. Dentz, “Note on Using Singular Value Decomposition to Solve Diluted Pore Networks,” arXiv preprint arXiv:2403.13462, pp. 1-16, 2024
[19]. S. Climer, K. Smith Jr, W. Yang, L. D. Fuentes, V. G. Dávila-Román, and C. C. Gu, “Sifting out communities in large sparse networks,” arXiv preprint arXiv: 2405.00816, pp. 1-24, 2024.
[20]. V. Chelkovan, “The Use of Artificial Intelligence Methods in the Intelligent Decision Support System,” Electronics and Control Systems, vol. 1, no. 79, pp. 16-21, 2024.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Palaniraj Rajidurai Parvathy, J Lenin, Suman Mishra

This work is licensed under a Creative Commons Attribution 4.0 International License.