Code & Software

Code and Software

  • CORDS: COResets and Data Subset selection

    • Github Link:

    • Achieve 3x to 30x speedups for a number of ML tasks and domains by using informative data subsets in each epoch of training

    • Algorithms implemented: GLISTER, CRAIG, Grad-Match, Submodular Selection (Facility Location, Feature Based Functions, Coverage, Diversity etc.), and Random Selection.

    • Scenarios: Supervised, semi-supervised learning, and AutoML. Domains: Image Classification, NLP, Speech Recognition, and Tabular Data

172 ⭐ | 25 Forks

  • DISTIL: Deep dIverSied inTeractIve Learning

    • Github Link:

    • DISTIL implements a number of state of the art active learning algorithms.

    • Some of the algorithms currently implemented with DISTIL include: Uncertainty Sampling, Margin Sampling, Least Condence Sampling, FASS, BADGE, GLISTER-Active, CoreSetAL, Random Sampling, and Submodular Sampling

83 ⭐ | 14 Forks

  • SPEAR: Semi-supervised Data Programming Based Weak Supervision

    • Github Link:

    • DISTIL implements a number of state of the art data programming approaches including SPEAR, Snorkel, Imply Loss, Learning to Reweight, etc.

    • Enables writing Labeling functions for programmatic data labeling and semi-supervision.

72 ⭐ | 7 Forks

  • SubModLib: A Submodular Optimization Toolkit in C++

    • Github:

    • A general-purpose C++ toolkit for large scale submodular function optimization, which includes a large class of algorithms and commonly used submodular functions with python API

    • Has several memoization and implementation tricks to speed up the algorithms (including the implementations of the Lazy Greedy, Lazier than Lazy Greedy etc.)

    • Algorithms scale to massive datasets involving ground set sizes of several million instances.

    • Enables creating applications for several summarization (document/image/video) and data selection applications in a few lines of code!

21 ⭐ | 8 Forks

  • Jensen: An Easily-Extensible C++ Toolkit for Production-Level Machine Learning and Convex Optimization

    • Github Link:

    • A modular framework for Convex optimization including several common convex functions and algorithms used in Machine Learning

    • Implements several convex functions like Logistic Loss, Hinge Loss etc. and most convex optimization algorithms including LBFGS, Trust Region Newton, LBFGS-Owl, Stochastic Gradient Descent, Nesterov’s optimal algorithm, Gradient Descent with various update rules, Conjugate gradient descent etc.

    • Implements several basic Machine Learning classifiers such as L1/L2 regularized Logistic Regression, SVMs, Probit Regression etc.

42 ⭐ | 20 Forks

  • Sanjaya: A Scalable C++ deep video analytics engine (See this link)

    • Implements a scalable real time and post-mortem video analytics engine with several functionalities including object detection, face detection and recognition, human detection and human attribute recognition, vehicle detection and vehicle attribute recognition and face age/gender recognition, video summarization etc.

    • Integrates several open source software including OpenCV, Caffe, DarkNet, DLib and LibCCV, all in a single engine!

    • Ability to train customized object detection models and image classification models

    • Enables model finetuning and transfer learning

    • Supports live streams from surveillance cameras and several video file formats

    • Enables creating video analytics applications with a few lines of code!

    • Led to the development of two products Surakshavyuh (real time analytics and alerting) and Jigyasa (video search analytics). For more details on this, please visit this link.