Research Program

My research interests are broadly in the intersection of optimization and machine learning. Specifically, I'm very interested in going beyond accuracy (which today, thanks to deep learning, we have achieved near-human performance), but also try to achieve other desiderata such as compute and memory efficiency, human interaction, label efficiency, robustness, fairness, etc. I'm interested in efficiency on multiple fronts: label efficiency (how can we learn with less labeled data), model efficiency (reducing model complexity for resource-constrained environments), and time and resource efficiency (how do we reduce end to end running time of training, and train models on resource-constrained environments). I am also interested in building intelligent systems that organize, analyze, and summarize massive amounts of data, and also automatically learn from this. Below are some of the concrete applications my group is currently working on.

  • Data Subset Selection/Coresets for efficient learning: How do we select the right sets of data making training/inference/hyperparameter tuning etc. more efficient, particularly for deep learning? I'm interested in speeding up deep learning by an order of magnitude (e.g. 5x to 10x speedup) by training on much smaller subsets without significant loss in accuracy or other evaluation metrics.

  • Active Learning for Deep Models: How do we tradeoff between uncertainty and diversity in a principled manner for active learning (i.e. iteratively selecting labeled data points) in deep learning? This is particularly important since labeled data is very time-consuming and expensive to obtain for real-world problems. We study techniques that can achieve 2x - 5x labeling cost reductions for a wide range of applications.

  • Data Programming and Weak Supervision: Using Weak Supervision to automatically create noisy labeled data for reducing labeling costs

  • Robust Learning: How do we learn machine learning models in a robust manner in the presence of noisy labels, outliers, distribution shift, and imbalance.

  • Fair & Continuous Learning: Learning Deep Models and ML Models while ensuring fairness to under-represented and minority classes and attributes, and continuously learning from data in a resource and compute efficient manner.

  • Feature selection: What are principled ways of selecting the right sets of features and how to do these in model-dependent or model-independent ways? How do we do these when eliciting features have a cost associated (e.g. in medical domains, each additional medical test might have a cost) and in an online manner.

  • Neural Network Compression and Architecture Search in Resource Constraints: How do we compress neural networks (top-down) or search for resource-constrained architectures (bottom-up) in an efficient manner?

  • Data Summarization: What makes a good summary of data and how do we consume these summaries

  • Data Partitioning: Efficient partitioning of data for clustering and distributed training

I'm also interested in achieving multiple desiderata simultaneously, i.e., approaches that can be efficient (either label or compute efficient), while being robust, fair, etc.

Motivating Applications

Below are more details of the applications listed above. We study each of the applications below in a broad range of domains including computer vision, video analytics, speech recognition, and natural language processing/text classification.

Data Subset Selection/Coresets for Efficient Learning

Selecting the right dataset for training is a critical problem today given massive datasets – both from training efficiency and labeling cost. This could be unsupervised, where we don’t have labels (select a subset of unlabeled data points for labeling) or supervised, where we have labels (for faster training or hyper-parameter tuning). In either case, we are interested in obtaining a representative subset of instances for training machine learning models. We show that the problem of selecting a subset of data with maximum likelihood on the training set is a submodular optimization problem, for several classifiers. We show that by learning of the right data subsets, we can achieve significant speedups in training time (between 5x - 10x) with minimal loss in accuracy.

Active Learning in the Real World

Another very important applications we study in our group is Active Learning: How to reduce the labeling costs by selecting (in an active learning manner) the right subset/batch of examples to label. Active Learning approaches can reduce the amount labeled data required significantly (by almost 5x to 20x) while not significantly reducing accuracy. I am also interested in active learning in realistic settings, i.e. with OOD, rare classes, imbalance, etc. We applied active learning to a number of application domains including computer vision, text classification, and speech recognition, and below are some of our recent papers on this topic.

Semi-Supervised Learning in the Real World

In our group, we are also working on developing semi-supervised learning for real-world situations like out of distribution data and imbalance. We have also designed efficient approaches for semi-supervised learning. Finally, we also studied semi-supervised few shot and meta learning.

Few-Shot and Meta-Learning

Another application in data-efficient learning space is meta-learning and few-shot learning. We have studied meta-learning in real-world scenarios like noise and OOD data, and semi-supervised meta learning.

Data Programming & Weak Supervision

Getting high quality labelled data is very expensive, and machine learning models require massive amounts of labelled data. I am studying approaches of weak supervision for effectively learning machine learning models with very few labelled instances and a large number of unlabelled instances using noisy labels from multiple sources (semi-supervised data programming). I'm also interested in subset selection problems in this space (e.g. how do we select a subset of labeling functions for robustness, and selecting a subset of labeled instances to complement

Robust Learning

Can we make machine learning algorithms robust to noisy labels, out of distribution samples, distribution shift and imbalance? We study this problem in various settings (supervised, semi-supervised, and few shot learning) and also study the impact of robustness in these settings. We pose this problem as a bi-level optimization, and study algorithms for solving this.

Data Summarization

I am interested in several applications of data summarization including video summarization, image collection summarization, document/text summarization and summarization of topic hierarchies. We study questions like what are natural models for summarization, how do we choose the right models for different problems/domains and how do we learn the right combinations of functions for various tasks. A lot of effort is also spent on interpretability of models, evaluation and loss functions for summarization, and at the core of it, understanding what makes a good summary for the problem at hand. We have also created new datasets for domain specific video summarization and image collection summarization. We recently released a dataset called VISIOCITY with large videos for video summarization and video understanding.

Data Partitioning

We seek to intelligently partition data for large scale distributed training, so that we can achieve superior results compared to simple random partitioning and other baselines. We demonstrate that diversified partitioning via submodular functions can achieve significant improvements on several distributed deep learning and general machine learning tasks.

Feature Selection

Feature Selection is a very important pre-processing step for machine learning and data science applications, and is used to mostly reduce prediction time and memory, feature acquisition cost, and remove noisy and irrelevant features. We study a parameterized feature selection framework using submodular functions, and particularly using a family of mutual information based models. We show how this framework can be extended to cost-aware feature elicitation.

  • Rishabh Iyer, Jeff Bilmes, Algorithms for approximate minimization of the difference between submodular functions, with applications, Uncertainty in Artificial Intelligence (UAI) 2012

  • Srijita Das, Rishabh Iyer, Sriraam Natarajan , A Clustering based Selection Framework for Cost Aware and Test-time Feature Elicitation, In CODS-COMAD 2021 Research Track (Honorable Mention)

  • Srijita Das, Rishabh Iyer, Sriraam Natarajan, A Parameterized Information-theoretic Feature Selection Framework for Test-time Feature Elicitation, In Review 2021

Other Applications

In addition to the applications above, our group has also studied applications of combinatorial optimization (subset selection) and bi-level optimization in other applications like Fair Learning, Continuous Learning, and Social Networks.

Rishabh Tiwari, Krishnateja Killamsetty, Rishabh Iyer, and Pradeep Shenoy, GCR: Gradient Coreset based Replay Buffer Selection for Continual Learning, In Computer Vision and Pattern Recognition, CVPR 2022

MS Ozdayi, M Kantarcioglu, R Iyer, Fair Machine Learning under Limited Demographically Labeled Data, ICLR Workshop on Socially Responsible Machine Learning

Ping Zhang, Rishabh K Iyer, Ashish V. Tendulkar, Gaurav Aggarwal, Abir De, Learning to Select Exogenous Events for Marked Temporal Point Process, In Neural Information Processing Systems, NeurIPS 2021

Theoretical Advances

To solve the motivating applications listed in Thread 1, below are some of the theoretical directions I'm pursuing.


Submodular Optimization is a rich and expressive class of non-linear discrete optimization problems which generalize important combinatorial functions like set cover, facility location, log-determinants, etc. A number of applications such as data subset selection, data summarization, data partitioning, and active learning naturally involving flavors of submodular optimization. In this thread, we develop fast and scalable algorithms for a number of problems which occur in practice. Examples include submodular minimization, submodular maximization, difference of submodular optimization, submodular optimization subject to submodular constraints and ratio of submodular optimization. This framework of algorithms achieved (near) optimal approximation guarantees, while being easy to implement and scaling to massive datasets. Empirically, we demonstrated orders of magnitude speedups and our algorithms have been used in several real world applications. Our algorithms have been for several real world problems including cooperative cuts for image segmentation and cooperative matching, diffusion aware optimization, path planning, mobile crowd-sensing, trajectory optimization for aerial 3D scanning, sensor placement under cooperative costs, limited vocabulary speech data selection etc. Some relevant publications are:


While submodular optimization occurs in inference, a critical component of fitting submodular functions to machine learning applications is learning the right submodular functions. In this thread, we study a rich class of models associated with submodular functions and the associated learning problems.


This thread studies the intersection between submodular/combinatorial optimization and information theory via the study of submodular information measures. We study properties, modeling and representational power, instantiations, and applications of such measures. Examples of this include submodular mutual information, submodular distance metrics, divergences, and multi-set submodular information measures. We have successfully applied the submodular information functions to a number of application domains including active learning with rare classes and out-of-distribution data, targeted learning, personalized speech recognition, and semi-supervised meta-learning.


A growing number of applications for efficient and robust learning involve bi-level optimization. In this thread, we study approaches for solving such bilevel optimization problems in an efficient manner, particularly for deep learning models. I'm particularly interested in bi-level optimization problems that have a discrete component in them (i.e. a mixed discrete/continuous bi-level optimization problem).

  • Krishnateja Killamsetty, Changbin Li, Chen Zhou, Feng Chen, Rishabh Iyer, A Nested Bi-level Optimization Framework for Robust Few Shot Learning, To Appear In 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (15% Acceptance Rate)

  • Xujiang Zhao, Killamsetty Krishnateja, Rishabh Iyer, Feng Chen, Robust Semi-Supervised Learning with Out of Distribution Data, arXiv:2010.03658

  • Krishnateja Killamsetty, S Durga, Ganesh Ramakrishnan, and Rishabh Iyer, GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning, 35th AAAI Conference on Artificial Intelligence (AAAI) 2021

  • Ayush Maheshwari, Krishnateja Killamsetty, Ganesh Ramakrishnan, Rishabh Iyer, Marina Danilevsky, Lucian Popa, Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming, Findings of ACL, 2022 (Long Paper)