Hi, I am a fourth year PhD student in the CS department at UW-Madison.

I'm honored to be advised by Prof. Po-Ling Loh and Prof. Varun Jog. Before coming to the lovely town of Madison, I spent five memorable years at IIT Kanpur.

Please feel free to contact me.


My research interests include -

  1. Robust Statistics

  2. Theoretical Machine Learning

  3. High-Dimensional Statistics

Publications (Google Scholar, dblp, Semantic Scholar)

  • Statistical Query Lower Bounds for List-Decodable Linear Regression
    I. Diakonikolas, D. M. Kane, A. Pensia, T. Pittas, and A. Stewart
    Manuscript, 2021
    [Abstract] [arXiv]

    We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples. Specifically, we are given a set \(T\) of labeled examples \((x,y) \in \mathbb{R}^d\times \mathbb{R}\) and a parameter \(0<\alpha<1/2\) such that an \(\alpha\)-fraction of the points in \(T\) are i.i.d. samples from a linear regression model with Gaussian covariates, and the remaining \((1−\alpha)\)-fraction of the points are drawn from an arbitrary noise distribution. The goal is to output a small list of hypothesis vectors such that at least one of them is close to the target regression vector. Our main result is a Statistical Query (SQ) lower bound of \(d^{\textrm{poly}(1/\alpha)}\) for this problem. Our SQ lower bound qualitatively matches the performance of previously developed algorithms, providing evidence that current upper bounds for this task are nearly best possible.

  • Estimating location parameters in sample-heterogeneous distributions
    A. Pensia, V. Jog, and P. Loh
    Information and Inference: a Journal of the IMA, 2021
    [Abstract] [arXiv] [Journal version] [PDF]

    Estimating the mean of a probability distribution using i.i.d. samples is a classical problem in statistics, wherein finite-sample optimal estimators are sought under various distributional assumptions. In this paper, we consider the problem of mean estimation when independent samples are drawn from \(d\)-dimensional non-identical distributions possessing a common mean. When the distributions are radially symmetric and unimodal, we propose a novel estimator, which is a hybrid of the modal interval, shorth, and median estimators, and whose performance adapts to the level of heterogeneity in the data. We show that our estimator is near-optimal when data are i.i.d. and when the fraction of ‘‘low-noise’’ distributions is as small as \(\Omega\left(\frac{d \log n}{n}\right)\), where \(n\) is the number of samples. We also derive minimax lower bounds on the expected error of any estimator that is agnostic to the scales of individual data points. Finally, we extend our theory to linear regression. In both the mean estimation and regression settings, we present computationally feasible versions of our estimators that run in time polynomial in the number of data points.

  • Robust regression with covariate filtering: Heavy tails and adversarial contamination
    A. Pensia, V. Jog, and P. Loh
    Manuscript, 2020
    [Abstract] [arXiv]

    We study the problem of linear regression where both covariates and responses are potentially (i) heavy-tailed and (ii) adversarially contaminated. Several computationally efficient estimators have been proposed for the simpler setting where the covariates are sub-Gaussian and uncontaminated; however, these estimators may fail when the covariates are either heavy-tailed or contain outliers. In this work, we show how to modify the Huber regression, least trimmed squares, and least absolute deviation estimators to obtain estimators which are simultaneously computationally and statistically efficient in the stronger contamination model. Our approach is quite simple, and consists of applying a filtering algorithm to the covariates, and then applying the classical robust regression estimators to the remaining data. We show that the Huber regression estimator achieves near-optimal error rates in this setting, whereas the least trimmed squares and least absolute deviation estimators can be made to achieve near-optimal error after applying a postprocessing step.

  • Outlier Robust Mean Estimation with Subgaussian Rates via Stability
    I. Diakonikolas, D. M. Kane, and A. Pensia
    Advances in Neural Information Processing Systems (NeurIPS), 2020
    [Abstract] [arXiv] [Conference version]

    We study the problem of outlier robust high-dimensional mean estimation under a finite covariance assumption, and more broadly under finite low-degree moment assumptions. We consider a standard stability condition from the recent robust statistics literature and prove that, except with exponentially small failure probability, there exists a large fraction of the inliers satisfying this condition. As a corollary, it follows that a number of recently developed algorithms for robust mean estimation, including iterative filtering and non-convex gradient descent, give optimal error estimators with (near-)subgaussian rates. Previous analyses of these algorithms gave significantly suboptimal rates. As a corollary of our approach, we obtain the first computationally efficient algorithm with subgaussian rate for outlier-robust mean estimation in the strong contamination model under a finite covariance assumption.

  • Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization is Sufficient
    A. Pensia, S. Rajput, A. Nagle, H. Vishwakarma, and D. Papailiopoulos
    Advances in Neural Information Processing Systems (NeurIPS), 2020
    [Abstract] [arXiv] [Conference version]

    The strong lottery ticket hypothesis (LTH) postulates that one can approximate any target neural network by only pruning the weights of a sufficiently over-parameterized random network. A recent work by Malach et al. (2020) establishes the first theoretical analysis for the strong LTH: one can provably approximate a neural network of width \(d\) and depth \(l\), by pruning a random one that is a factor \(O(d^4l^2)\) wider and twice as deep. This polynomial over-parameterization requirement is at odds with recent experimental research that achieves good approximation with networks that are a small factor wider than the target. In this work, we close the gap and offer an exponential improvement to the over-parameterization requirement for the existence of lottery tickets. We show that any target network of width d and depth l can be approximated by pruning a random network that is a factor \(O(\log(dl))\) wider and twice as deep. Our analysis heavily relies on connecting pruning random ReLU networks to random instances of the SubsetSum problem. We then show that this logarithmic over-parameterization is essentially optimal for constant depth networks. Finally, we verify several of our theoretical insights with experiments.

  • Extracting robust and accurate features via a robust information bottleneck
    A. Pensia, V. Jog, and P. Loh
    IEEE Journal on Selected Areas in Information Theory, 2020
    [Abstract] [Journal version] [PDF]

    We propose a novel strategy for extracting features in supervised learning that can be used to construct a classifier which is more robust to small perturbations in the input space. Our method builds upon the idea of the information bottleneck, by introducing an additional penalty term that encourages the Fisher information of the extracted features to be small when parametrized by the inputs. We present two formulations where the relevance of the features to output labels is measured using either mutual information or MMSE. By tuning the regularization parameter, we can explicitly trade off the opposing desiderata of robustness and accuracy when constructing a classifier. We derive optimal solutions to both robust information bottleneck formulations when the inputs and outputs are jointly Gaussian, proving that the optimally robust features are also jointly Gaussian in this setting. We also propose methods for optimizing variational bounds on the robust information bottleneck objectives in general settings using stochastic gradient descent, which may be implemented efficiently in neural networks. Our experimental results for synthetic and real data sets show that the proposed feature extraction methods indeed produce classifiers with increased robustness to perturbations.

  • Mean estimation for entangled single-sample distributions
    A. Pensia, V. Jog, and P. Loh
    IEEE International Symposium on Information Theory (ISIT), 2019
    [Abstract] [Conference version][PDF]

    We consider the problem of estimating the common mean of univariate data, when independent samples are drawn from non-identical symmetric, unimodal distributions. This captures the setting where all samples are Gaussian with different unknown variances. We propose an estimator that adapts to the level of heterogeneity in the data, achieving near-optimality in both the i.i.d. setting and some heterogeneous settings, where the fraction of 'low-noise’ points is as small as \(\frac{\log n} {n}\). Our estimator is a hybrid of the modal interval, shorth, and median estimators from classical statistics. The rates depend on the percentile of the mixture distribution, making our estimators useful even for distributions with infinite variance.

  • Deep Topic Models for Multi-label Learning
    R. Panda, A. Pensia, N. Mehta, M. Zhou, and P. Rai
    International Conference on Artificial Intelligence and Statistics (AISTATS), 2019
    [Abstract] [Conference version]

    We present a probabilistic framework for multi-label learning based on a deep generative model for the binary label vector associated with each observation. Our generative model learns deep multi-layer latent embeddings of the binary label vector, which are conditioned on the input features of the observation. The model also has an interesting interpretation in terms of a deep topic model, with each label vector representing a bag-of-words document, with the input features being its meta-data. In addition to capturing the structural properties of the label space (e.g., a near-low-rank label matrix), the model also offers a clean, geometric interpretation. In particular, the nonlinear classification boundaries learned by the model can be seen as the union of multiple convex polytopes. Our model admits a simple and scalable inference via efficient Gibbs sampling or EM algorithm. We compare our model with state-of-the-art baselines for multi-label learning on benchmark data sets, and also report some interesting qualitative results.

  • Generalization Error Bounds for Noisy, Iterative Algorithms
    A. Pensia, V. Jog, and P. Loh
    IEEE International Symposium on Information Theory (ISIT), 2018
    [Abstract] [arXiv] [Conference version]

    In statistical learning theory, generalization error is used to quantify the degree to which a supervised machine learning algorithm may overfit to training data. Recent work [Xu and Raginsky (2017)] has established a bound on the generalization error of empirical risk minimization based on the mutual information \(I(S;W)\) between the algorithm input \(S\) and the algorithm output \(W\), when the loss function is sub-Gaussian. We leverage these results to derive generalization error bounds for a broad class of iterative algorithms that are characterized by bounded, noisy updates with Markovian structure. Our bounds are very general and are applicable to numerous settings of interest, including stochastic gradient Langevin dynamics (SGLD) and variants of the stochastic gradient Hamiltonian Monte Carlo (SGHMC) algorithm. Furthermore, our error bounds hold for any output function computed over the path of iterates, including the last iterate of the algorithm or the average of subsets of iterates, and also allow for non-uniform sampling of data in successive updates of the algorithm.

  • Fast Localization of Autonomous Vehicles Using Discriminative Metric Learning
    A. Pensia, G. Sharma, J. R. Mcbride, and G. Pandey
    14th Conference on Computer and Robot Vision (CRV), 2017
    [Abstract] [Conference version]

    In this paper, we report a novel algorithm for localization of autonomous vehicles in an urban environment using orthographic ground reflectivity map created with a three-dimensional (3D) laser scanner. It should be noted that the road paint (lane markings, zebra crossing, traffic signs etc.) constitute the distinctive features in the surface reflectivity map which are generally sparse as compared to the non-interesting asphalt and the off-road portion of the map. Therefore, we propose to project the reflectivity map to a lower dimensional space, that captures the useful features of the map, and then use these projected feature maps for localization. We use discriminative metric learning technique to obtain this lower dimensional space of feature maps. Experimental evaluation of the proposed method on real data shows that it is better than the standard image matching techniques in terms of accuracy. Moreover, the proposed method is computationally fast and can be executed at real-time (10 Hz) on a standard CPU.

Teaching Experience

  1. Graduate Teaching Assistant

    1. Mathematical Foundations of Machine Learning, UW-Madison, Fall 2018

    2. Introduction to Bioinformatics, UW-Madison, Fall 2017

    3. Probabilistic Mobile Robotics, IIT Kanpur, Fall 2016, Spring 2016

    4. Communication Systems, IIT Kanpur, Summer 2016

  2. Academic Mentor

    1. Representation and Analysis of Random Signals, IIT Kanpur, Fall 2016