publications by categories in reversed chronological order. generated by jekyll-scholar.
- PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPUarXiv preprint arXiv:2401.06089, 2024
- FUNNL: Fast Nonlinear Nonnegative Unmixing for Alternate Energy SystemsIn Knowledge-Guided Machine Learning, 2023
- Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU ClustersIn Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2023
- Optimizing Communication in 2D Grid-Based MPI Applications at ExascaleIn Proceedings of the 30th European MPI Users’ Group Meeting, 2023
- Brief Announcement: Communication Optimal Sparse LU Factorization for Planar MatricesIn Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, 2023
- A single-tree algorithm to compute the Euclidean minimum spanning tree on GPUsIn Proceedings of the 51st International Conference on Parallel Processing, 2022
- Exaflops biomedical knowledge graph analyticsIn 2022 SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2022
- Newly Released Capabilities in Distributed-memory SuperLU Sparse Direct SolverACM Transactions on Mathematical Software, 2022
- Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale (Version 2.0)2022
- Sparse Binary Matrix-Vector Multiplication on Neuromorphic ComputersIn 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2021
- Scalable All-pairs Shortest Paths for Huge Graphs on Multi-GPU ClustersIn Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing, 2020
- Scalable knowledge graph analytics at 136 petaflop/sIn SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, 2020
- A supernodal all-pairs shortest path algorithmIn Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020
- Traversing large graphs on GPUs with unified memoryProceedings of the VLDB Endowment, 2020
- Multifrontal Non-negative Matrix FactorizationIn International Conference on Parallel Processing and Applied Mathematics, 2019
- Self-stabilizing Connected ComponentsIn 2019 IEEE/ACM 9th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS), 2019
- A Communication-avoiding 3D Sparse Triangular Solve AlgorithmIn International Conference on Supercomputing, Jun 2019
- A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systemsJournal of Parallel and Distributed Computing, Jun 2019
- A communication-avoiding 3D LU factorization algorithm for sparse matricesIn Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2018
- Scalable and Resilient Sparse Linear SolversGeorgia Institute of Technology, Aug 2018
- A Self-Correcting Connected Components AlgorithmIn Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale, Aug 2016
- SuperLU Users’ GuideAug 2016
- A Sparse Direct Solver for Distributed Memory Xeon Phi-accelerated SystemsIn Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International, Aug 2015
- A distributed CPU-GPU sparse direct solverIn European Conference on Parallel Processing, Aug 2014
- A distributed kernel summation framework for general-dimension machine learningStatistical Analysis and Data Mining: The ASA Data Science Journal, Aug 2014
- Self-stabilizing iterative solversIn Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Aug 2013
- Model Order Reduction Techniques for VLSI Circuit SimulationIIT Madras, May 2011