Lab Affiliation(s):
Post Doc Sponsor / Advisor:
Tamara Broderick
Areas of Expertise:
  • Machine Learning
  • Probability and Statistics
  • Optimization
Date PhD Completed:
September, 2016
Expected End Date of Post Doctoral Position:
September 12, 2017

Trevor Campbell

  • Post Doctoral

MIT Unit Affiliation: 

  • Electrical Engineering & Computer Science

Lab Affiliation(s): 


Post Doc Sponsor / Advisor: 

Tamara Broderick

Date PhD Completed: 

Sep, 2016

Top 3 Areas of Expertise: 

Machine Learning
Probability and Statistics

Expected End Date of Post Doctoral Position: 

September 12, 2017


Thesis Title: 

Truncated Bayesian Nonparametrics

Thesis Abstract: 

Many datasets can be thought of as expressing a collection of underlying traits with unknown cardinality.  Moreover, these datasets are often persistently growing, and we expect the number of expressed traits to likewise increase over time.  Priors from Bayesian nonparametrics are well-suited to this modeling challenge: they generate a countably infinite number of underlying traits, which allows the number of expressed traits to both be random and to grow with the dataset size.  We also require corresponding streaming, distributed inference algorithms that handle persistently growing datasets without slowing down over time.  However, a key ingredient in streaming, distributed inference---an explicit representation of the latent variables used to statistically decouple the data---is not available for nonparametric priors, as we cannot simulate or store infinitely many random variables in practice.

One approach is to approximate the nonparametric prior by developing a sequential representation---such that the traits are generated by a sequence of finite-dimensional distributions---and subsequently truncating it at some finite level, thus allowing explicit representation.  However, truncated sequential representations have been developed only for a small number of priors in Bayesian nonparametrics, and the order they impose on the traits creates identifiability issues in the streaming, distributed setting.

This thesis provides a comprehensive theoretical treatment of sequential representations and truncation in Bayesian nonparametrics.  It details three sequential representations of a large class of nonparametric priors, and analyzes their truncation error and computational complexity. The results generalize and improve upon those existing in the literature.  Next, the truncated explicit representations are used to develop the first streaming, distributed, asynchronous inference procedures for models from Bayesian nonparametrics.  The combinatorial issues associated with trait identifiability in such models are resolved via a novel matching optimization.  The resulting algorithms are fast, learning rate-free, and truncation-free.

Taken together, these contributions provide the practitioner with the means to (1) develop multiple finite approximations for a given nonparametric prior; (2) determine which is the best for their application; and (3) use that approximation in the development of efficient streaming, distributed, asynchronous inference algorithms.

Top 5 Awards and honors (name of award, date received): 

2016 SBSS Student Paper Award
2012 GNC Best Presentation in Session
2011 NSERC PGS-M Research Fellowship
2011 James D. Todd Memorial Scholarship
2010 Kenneth Sullivan / Pratt Whitney Canada Scholarship

5 Recent Papers: 

T. Campbell, D. Cai, T. Broderick (2016), "Exchangeable trait allocations," arXiv:1609.09147 (submitted).

T. Campbell, J. Huggins, J. How, T. Broderick (2016), "Truncated random measures," arXiv:1603.00861 (submitted).

T. Campbell, J. Straub, J. Fisher, J. How (2015), "Streaming, massively parallel variational inference for Bayesian nonparametrics," NIPS.

J. Huggins, T. Campbell, T. Broderick (2016), "Coresets for scalable Bayesian logistic regression," NIPS.

D. Cai, T. Campbell, T. Broderick (2016), "Edge-exchangeable graphs and sparsity," NIPS.

Contact Information:
32 Vassar St