publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2026
2025
-
-
Forward Error Bounds and Efficient Algorithms for Computing a Tensor Times Matrix Chain in Low Precision on GPUs2025SC25 poster
2024
-
PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPUarXiv preprint arXiv:2401.06089, 2024
-
Accelerated Constrained Sparse Tensor Factorization on Massively Parallel ArchitecturesIn Proceedings of the 53rd International Conference on Parallel Processing, 2024
2023
-
FUNNL: Fast Nonlinear Nonnegative Unmixing for Alternate Energy SystemsIn Knowledge-Guided Machine Learning, 2023
-
Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU ClustersIn Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2023
-
Optimizing Communication in 2D Grid-Based MPI Applications at ExascaleIn Proceedings of the 30th European MPI Users’ Group Meeting, 2023
-
Brief Announcement: Communication Optimal Sparse LU Factorization for Planar MatricesIn Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, 2023
2022
-
A single-tree algorithm to compute the Euclidean minimum spanning tree on GPUsIn Proceedings of the 51st International Conference on Parallel Processing, 2022
-
Exaflops biomedical knowledge graph analyticsIn 2022 SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2022
-
Newly Released Capabilities in Distributed-memory SuperLU Sparse Direct SolverACM Transactions on Mathematical Software, 2022
-
Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale (Version 2.0)2022
2021
-
Sparse Binary Matrix-Vector Multiplication on Neuromorphic ComputersIn 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2021
-
Dense semiring linear algebra on modern cuda hardware2021SIAM Computational Sciences and Engineering. SIAM
2020
-
Scalable All-pairs Shortest Paths for Huge Graphs on Multi-GPU ClustersIn Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing, 2020
-
Scalable knowledge graph analytics at 136 petaflop/sIn SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, 2020
-
A supernodal all-pairs shortest path algorithmIn Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020
-
Traversing large graphs on GPUs with unified memoryProceedings of the VLDB Endowment, 2020
2019
-
Multifrontal Non-negative Matrix FactorizationIn International Conference on Parallel Processing and Applied Mathematics, 2019
-
Self-stabilizing Connected ComponentsIn 2019 IEEE/ACM 9th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS), 2019
-
A Communication-avoiding 3D Sparse Triangular Solve AlgorithmIn International Conference on Supercomputing, Jun 2019
-
A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systemsJournal of Parallel and Distributed Computing, Jun 2019
2018
-
A communication-avoiding 3D LU factorization algorithm for sparse matricesIn Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2018
-
Scalable and Resilient Sparse Linear SolversGeorgia Institute of Technology, Aug 2018
2016
-
A Self-Correcting Connected Components AlgorithmIn Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale, Aug 2016
-
SuperLU Users’ GuideAug 2016
2015
-
A Sparse Direct Solver for Distributed Memory Xeon Phi-accelerated SystemsIn Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International, Aug 2015
2014
-
A distributed CPU-GPU sparse direct solverIn European Conference on Parallel Processing, Aug 2014
-
A distributed kernel summation framework for general-dimension machine learningStatistical Analysis and Data Mining: The ASA Data Science Journal, Aug 2014
2013
-
Self-stabilizing iterative solversIn Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Aug 2013
2011
-
Model Order Reduction Techniques for VLSI Circuit SimulationIIT Madras, May 2011