publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2024
- PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPUarXiv preprint arXiv:2401.06089, 2024
2023
- FUNNL: Fast Nonlinear Nonnegative Unmixing for Alternate Energy SystemsIn Knowledge-Guided Machine Learning , 2023
- Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU ClustersIn Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , 2023
- Optimizing Communication in 2D Grid-Based MPI Applications at ExascaleIn Proceedings of the 30th European MPI Users’ Group Meeting , 2023
- Brief Announcement: Communication Optimal Sparse LU Factorization for Planar MatricesIn Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures , 2023
2022
- A single-tree algorithm to compute the Euclidean minimum spanning tree on GPUsIn Proceedings of the 51st International Conference on Parallel Processing , 2022
- Exaflops biomedical knowledge graph analyticsIn 2022 SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (SC) , 2022
- Newly Released Capabilities in Distributed-memory SuperLU Sparse Direct SolverACM Transactions on Mathematical Software, 2022
- Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale (Version 2.0)2022
2021
- Sparse Binary Matrix-Vector Multiplication on Neuromorphic ComputersIn 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) , 2021
2020
- Scalable All-pairs Shortest Paths for Huge Graphs on Multi-GPU ClustersIn Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing , 2020
- Scalable knowledge graph analytics at 136 petaflop/sIn SC20: International Conference for High Performance Computing, Networking, Storage and Analysis , 2020
- A supernodal all-pairs shortest path algorithmIn Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , 2020
- Traversing large graphs on GPUs with unified memoryProceedings of the VLDB Endowment, 2020
2019
- Multifrontal Non-negative Matrix FactorizationIn International Conference on Parallel Processing and Applied Mathematics , 2019
- Self-stabilizing Connected ComponentsIn 2019 IEEE/ACM 9th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS) , 2019
- A Communication-avoiding 3D Sparse Triangular Solve AlgorithmIn International Conference on Supercomputing , Jun 2019
- A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systemsJournal of Parallel and Distributed Computing, Jun 2019
2018
- A communication-avoiding 3D LU factorization algorithm for sparse matricesIn Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS) , May 2018
- Scalable and Resilient Sparse Linear SolversGeorgia Institute of Technology , Aug 2018
2016
- A Self-Correcting Connected Components AlgorithmIn Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale , Aug 2016
- SuperLU Users’ GuideAug 2016
2015
- A Sparse Direct Solver for Distributed Memory Xeon Phi-accelerated SystemsIn Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International , Aug 2015
2014
- A distributed CPU-GPU sparse direct solverIn European Conference on Parallel Processing , Aug 2014
- A distributed kernel summation framework for general-dimension machine learningStatistical Analysis and Data Mining: The ASA Data Science Journal, Aug 2014
2013
- Self-stabilizing iterative solversIn Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems , Aug 2013
2011
- Model Order Reduction Techniques for VLSI Circuit SimulationIIT Madras , May 2011