publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2026

  1. What Trace Powers Reveal About Log-Determinants: Closed-Form Estimators, Certificates, and Failure Modes
    Piyush Sao
    arXiv preprint arXiv:2601.12612, 2026
  2. Fast Evaluation of Truncated Neumann Series by Low-Product Radix Kernels
    Piyush Sao
    arXiv preprint arXiv:2602.11843, 2026

2025

  1. Fast Active-Set Thresholding Method for Nonnegative Least Squares
    Benjamin Cobb, Ramakrishnan Kannan, Konstantin Pieper, and 5 more authors
    In 2025 IEEE International Conference on Big Data (BigData), 2025
  2. Knowledge graph analytics kernels in high performance computing
    Ramakrishnan Kannan, Piyush K Sao, Hao Lu, and 5 more authors
    2025
    US Patent 12,417,246
  3. Forward Error Bounds and Efficient Algorithms for Computing a Tensor Times Matrix Chain in Low Precision on GPUs
    Julian Bellavita, Piyush Sao, and Ramakrishnan Kannan
    2025
    SC25 poster

2024

  1. PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU
    Piyush Sao, Andrey Prokopenko, and Damien Lebrun-Grandié
    arXiv preprint arXiv:2401.06089, 2024
  2. Interface for sparse linear algebra operations
    Ahmad Abdelfattah, Willow Ahrens, Hartwig Anzt, and 32 more authors
    arXiv preprint arXiv:2411.13259, 2024
  3. Accelerated Constrained Sparse Tensor Factorization on Massively Parallel Architectures
    Yongseok Soh, Ramakrishnan Kannan, Piyush Sao, and 1 more author
    In Proceedings of the 53rd International Conference on Parallel Processing, 2024

2023

  1. FUNNL: Fast Nonlinear Nonnegative Unmixing for Alternate Energy Systems
    Jeffrey A Graves, Thomas F Blum, Piyush Sao, and 2 more authors
    In Knowledge-Guided Machine Learning, 2023
  2. Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters
    Piyush Sao, Yang Liu, Nan Ding, and 2 more authors
    In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2023
  3. Optimizing Communication in 2D Grid-Based MPI Applications at Exascale
    Hao Lu, Piyush Sao, Michael Matheson, and 3 more authors
    In Proceedings of the 30th European MPI Users’ Group Meeting, 2023
  4. Brief Announcement: Communication Optimal Sparse LU Factorization for Planar Matrices
    Piyush Sao, and Xiaoye Sherry Li
    In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, 2023

2022

  1. A single-tree algorithm to compute the Euclidean minimum spanning tree on GPUs
    Andrey Prokopenko, Piyush Sao, and Damien Lebrun-Grandie
    In Proceedings of the 51st International Conference on Parallel Processing, 2022
  2. Exaflops biomedical knowledge graph analytics
    Ramakrishnan Kannan, Piyush Sao, Hao Lu, and 8 more authors
    In 2022 SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2022
  3. Newly Released Capabilities in Distributed-memory SuperLU Sparse Direct Solver
    Xiaoye S Li, Paul Lin, Yang Liu, and 1 more author
    ACM Transactions on Mathematical Software, 2022
  4. Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale (Version 2.0)
    Christian Engelmann, Rizwan Ashraf, Saurabh Hukerikar, and 2 more authors
    2022

2021

  1. Sparse Binary Matrix-Vector Multiplication on Neuromorphic Computers
    Catherine D Schuman, Bill Kay, Prasanna Date, and 3 more authors
    In 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2021
  2. Dense semiring linear algebra on modern cuda hardware
    Vijay Thakkar, Ramakrishnan Kannan, Piyush Sao, and 5 more authors
    2021
    SIAM Computational Sciences and Engineering. SIAM

2020

  1. Scalable All-pairs Shortest Paths for Huge Graphs on Multi-GPU Clusters
    Piyush Sao, Hao Lu, Ramakrishnan Kannan, and 3 more authors
    In Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing, 2020
  2. Scalable knowledge graph analytics at 136 petaflop/s
    Ramakrishnan Kannan, Piyush Sao, Hao Lu, and 5 more authors
    In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, 2020
  3. A supernodal all-pairs shortest path algorithm
    Piyush Sao, Ramakrishnan Kannan, Prasun Gera, and 1 more author
    In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020
  4. Traversing large graphs on GPUs with unified memory
    Prasun Gera, Hyojong Kim, Piyush Sao, and 2 more authors
    Proceedings of the VLDB Endowment, 2020

2019

  1. Multifrontal Non-negative Matrix Factorization
    Piyush Sao, and Ramakrishnan Kannan
    In International Conference on Parallel Processing and Applied Mathematics, 2019
  2. Self-stabilizing Connected Components
    Piyush Sao, Christian Engelmann, Srinivas Eswar, and 2 more authors
    In 2019 IEEE/ACM 9th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS), 2019
  3. A Communication-avoiding 3D Sparse Triangular Solve Algorithm
    Piyush Sao, Ramakrishnan Kannan, Xiaoye Li, and 1 more author
    In International Conference on Supercomputing, Jun 2019
  4. A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems
    Piyush Sao, Xiaoye S Li, and Richard Vuduc
    Journal of Parallel and Distributed Computing, Jun 2019

2018

  1. A communication-avoiding 3D LU factorization algorithm for sparse matrices
    Piyush Sao, Xiaoye S. Li, and Richard Vuduc
    In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2018
  2. Scalable and Resilient Sparse Linear Solvers
    Piyush Sao
    Georgia Institute of Technology, Aug 2018

2016

  1. A Self-Correcting Connected Components Algorithm
    Piyush Sao, Oded Green, Chirag Jain, and 1 more author
    In Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale, Aug 2016
  2. SuperLU Users’ Guide
    Xiaoye S Li, James W Demmel, John R Gilbert, and 4 more authors
    Aug 2016

2015

  1. A Sparse Direct Solver for Distributed Memory Xeon Phi-accelerated Systems
    Piyush Sao, Xing Liu, Richard Vuduc, and 1 more author
    In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International, Aug 2015

2014

  1. A distributed CPU-GPU sparse direct solver
    Piyush Sao, Richard Vuduc, and Xiaoye Sherry Li
    In European Conference on Parallel Processing, Aug 2014
  2. A distributed kernel summation framework for general-dimension machine learning
    Dongryeol Lee, Piyush Sao, Richard Vuduc, and 1 more author
    Statistical Analysis and Data Mining: The ASA Data Science Journal, Aug 2014

2013

  1. Self-stabilizing iterative solvers
    Piyush Sao, and Richard Vuduc
    In Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Aug 2013

2011

  1. Model Order Reduction Techniques for VLSI Circuit Simulation
    Piyush Sao
    IIT Madras, May 2011