Code & Software

Code and Software

  • CORDS: COResets and Data Subset selection

    • Github Link: https://github.com/decile-team/cords

    • Open-source toolkit implementing state of the art algorithms (many of them from our group) for coresets data subset selection.

    • Goal: Reduce end to end training time from days to hours and hours to minutes using coresets and data selection.

    • Algorithms implemented: GLISTER, CRAIG, Grad-Match, Submodular Selection (Facility Location, Feature Based Functions, Coverage, Diversity etc.), and Random Selection.


  • DISTIL: Deep dIverSied inTeractIve Learning

    • Github Link: https://github.com/decile-team/distil

    • DISTIL implements a number of state of the art active learning algorithms.

    • Some of the algorithms currently implemented with DISTIL include: Uncertainty Sampling, Margin Sampling, Least Condence Sampling, FASS, BADGE, GLISTER-Active, CoreSetAL, Random Sampling, and Submodular Sampling


  • SMTK: A Submodular Optimization Toolkit in C++

    • Joint work with Jeff Bilmes, Kai Wei, Yuzong Liu and several others (currently maintained by Melodi Lab, University of Washington)

    • Provided the first general purpose C++ toolkit for large scale submodular function optimization, which includes a large class of algorithms and commonly used submodular functions.

    • Has several memoization and implementation tricks to speed up the algorithms (including the implementations of the Lazy Greedy, Lazier than Lazy Greedy etc.)

    • Algorithms scale to massive datasets involving ground set sizes of several million instances.

    • Enables creating applications for several summarization (document/image/video) and data selection applications in a few lines of code!


  • Jensen: An Easily-Extensible C++ Toolkit for Production-Level Machine Learning and Convex Optimization (GitHub repo)

    • Github Link: https://github.com/decile-team/jensen

    • A modular framework for Convex optimization including several common convex functions and algorithms used in Machine Learning

    • Implements several convex functions like Logistic Loss, Hinge Loss etc. and most convex optimization algorithms including LBFGS, Trust Region Newton, LBFGS-Owl, Stochastic Gradient Descent, Nesterov’s optimal algorithm, Gradient Descent with various update rules, Conjugate gradient descent etc.

    • Implements several basic Machine Learning classifiers such as L1/L2 regularized Logistic Regression, SVMs, Probit Regression etc.


  • Sanjaya: A Scalable C++ deep video analytics engine (See this link)

    • Implements a scalable real time and post-mortem video analytics engine with several functionalities including object detection, face detection and recognition, human detection and human attribute recognition, vehicle detection and vehicle attribute recognition and face age/gender recognition, video summarization etc.

    • Integrates several open source software including OpenCV, Caffe, DarkNet, DLib and LibCCV, all in a single engine!

    • Ability to train customized object detection models and image classification models

    • Enables model finetuning and transfer learning

    • Supports live streams from surveillance cameras and several video file formats

    • Enables creating video analytics applications with a few lines of code!

    • Led to the development of two products Surakshavyuh (real time analytics and alerting) and Jigyasa (video search analytics). For more details on this, please visit this link.