Next webinar organised by BNP-ISBA will be
DATE & TIME: 17:00 UTC on December 4th, 2024.
SPEAKER: Mario Beraha (Polytechnic of Milan)
TITLE: Bayesian analysis of (extended) feature allocation models: predictions, sufficientness, and applications
bnp-isba.github.io/webinars.…
📢I’m really proud of “Bayesian clustering of high-dimensional data via latent repulsive mixtures’’ just appeared on Biometrika, Advance articles, doi.org/10.1093/biomet/asae0…. Thank you to my terrific coauthors Lorenzo Ghilotti and @mberaha2.
📢The submission of contributed talks for the 14th International Conference on Bayesian Nonparametrics, UCLA (Los Angeles, US), June 23-27, 2025, is now OPEN! Deadline for submission: Dec 15, 2024.
Together with Stefano Favaro and Matteo Sesia, we just arXiv'd our latest take on frequency and cardinality estimation from compressed (sketched) data:
arxiv.org/abs/2309.15408
Hence, we adopt a pragmatic approach and propose the class "smoothed" estimators, which work very well in practice and are easy to compute! We extend the analysis to sketches obtained with multiple hash functions by drawing from the "multi-view" literature.
As a bonus, we show that our model-based approach can be used to infer the cardinality of the dataset as well. This is another classical problem in computer science which was typically solved using a different data structure!
Hence, we adopt a pragmatic approach and propose the class "smoothed" estimators, which work very well in practice and are easy to compute! We extend the analysis to sketches obtained with multiple hash functions by drawing from the "multi-view" literature.
As a bonus, we show that our model-based approach can be used to infer the cardinality of the dataset as well. This is another classical problem in computer science which was typically solved using a different data structure!
@mberaha2 and I have just published the first project we started collaborating 5 years ago! Meanwhile we have taken onboard 4 more coauthors, and I thank them all. See Childhood obesity in Singapore: A Bayesian nonparametric approach, SMJ OnlineFirst
Personal update. As of last week, I'm officially a Ph.D. in data science and computation. My thesis was on the statistical learning of RPMs, under the supervision of @AlessandraGugl9!
We also arXived the first two papers from my postdoc, joint works with Stefano Favaro.
Indeed, under a CRM prior, the predictive distribution of the number of "new" traits in an additional sample depends only on the sample size!
We propose a new class of priors derived from the scaled subordinators by James et al. (2015).
In particular, we show that, in some cases, this leads to a predictive distribution that depends on the sample size and the number of unique traits in the sample, similarly to what happens under the Pitman-Yor process prior!