ICBINB promotes “slow science”, pushes back against “leaderboard-ism”, revalues unexpected negative results, and helps people push their research when “stuck”. More broadly, our moon-shot is to transform how we do research by cracking-open the research process and inviting meta-dialog. Starting in February 2022, we will be hosting virtual seminar series monthly for promoting such values and sharing stories from or about research.
The ICBINB Monthly Seminar Series seeks to shine a light on the “stuck” phases of research. Speakers will tell us about their most beautiful ideas that didn’t “work”, about when theory didn’t match practice, or perhaps just when the going got tough. These talks will let us peek inside the file drawer of unexpected negative results and peer behind the curtain to see the real story of how real researchers do real research.
Do you have somebody in mind who has something to say on these topics? You are very welcome to nominate them here.
Join us again for our seminar series in 2023. Join us on June, 1st for a lecture by Matt Hoffmann!
Title: How (Not) to Be Bayesian in an Age of Giant Models
Date: June 1st, 2023 at 10am EDT/4pm CEST
Abstract: Giant neural networks have taken the world by storm. These neural networks are first pretrained using self-supervision strategies to soak up knowledge about common patterns in huge unlabeled datasets, and can then be fine-tuned to solve specific problems using small amounts of task-specific supervision. We might say that the pretrained model implicitly expresses a strong prior on what kinds of patterns are relevant, which implies that the ideal fine-tuning process must be some form of Bayesian inference! But it turns out that operationalizing this insight is harder than it sounds. I’ll discuss one of our explorations in this direction in which, with a fair amount of thought, work, specialized expertise, and extra computation, we were able to use Bayesian inference to get results that were…almost as good as just fine-tuning using gradient descent. Along the way, we (re)learned some lessons about prior specification, sparsity, random-matrix theory, and the naive genius of gradient descent.
Biography: Matt Hoffman is a Research Scientist at Google. His main research focus is in probabilistic modeling and approximate inference algorithms. He has worked on various applications including music information retrieval, speech enhancement, topic modeling, learning to rank, computer vision, user interfaces, user behavior modeling, social network analysis, digital imaging, and astronomy. He is a co-creator of the widely used statistical modeling package Stan, and a contributor to the TensorFlow Probability library.
Title: What’s in a Graph?
Date: May 4th, 2023 at 10am EDT/4pm CEST
Abstract: Graph learning is one of the most rapidly-growing subfields of machine learning research. With a deluge of different architectures available, one may get the impression that *anything* can be modelled as a graph. However, for some data sets, it turns out that structural features are not driving predictive performance, and the existence of edges may not even be beneficial for generalisation. These puzzling findings lead me to ponder new directions for our field and raise awareness about *how* we work with data.
Biography: Bastian Rieck is the Principal Investigator of the AIDOS Lab at the Institute of AI for Health and the Helmholtz Pioneer Campus of Helmholtz Munich, focusing on topology-driven machine learning methods in biomedicine. He also has the honour to be a TUM Junior Fellow and a member of ELLIS. Previously, Rieck was a senior assistant in the Machine Learning & Computational Biology Lab of Prof. Dr. Karsten Borgwardt at ETH Zürich. He obtained his Ph.D. in computer science from Heidelberg University, which is also where he got his master’s degree in mathematics.
Title: Why Neural Compression Has Not Taken Off (Yet)
Date: February 2nd, 2023 at 12pm EDT/6pm CET
Abstract: Despite recent advancements in neural data compression, classical codecs such as JPEG and BPG have remained industry standards to date. The talk will provide an introduction to the promising field of neural compression, focusing on why these new compression technologies have not seen the 10X performance boosts that deep learning has already achieved in other fields, such as NLP or vision. The talk will also present new avenues for neural compression research that provide novel directions for probabilistic modeling and show promise to make neural compression more practical and widely applicable across industries.
Biography: Stephan Mandt is an Associate Professor of Computer Science and Statistics at the University of California, Irvine. From 2016 until 2018, he was a Senior Researcher and Head of the statistical machine learning group at Disney Research in Pittsburgh and Los Angeles. He held previous postdoctoral positions at Columbia University and Princeton University. Stephan holds a Ph.D. in Theoretical Physics from the University of Cologne, where he received the German National Merit Scholarship. He is furthermore a recipient of the NSF CAREER Award, the UCI ICS Mid-Career Excellence in Research Award, the German Research Foundation’s Mercator Fellowship, a Kavli Fellow of the U.S. National Academy of Sciences, a member of the ELLIS Society, and a former visiting researcher at Google Brain. Stephan is an Action Editor of the Journal of Machine Learning Research and Transaction on Machine Learning Research and regularly serves as an Area Chair for NeurIPS, ICML, AAAI, and ICLR. His research is currently supported by NSF, DARPA, IARPA, DOE, Disney, Intel, and Qualcomm.
We will be hosting a seminar talk every first Thursday of each month, hold the date! Time may vary based on the time zone and availability of the speaker.
Title: Metrics Reloaded
Date: November 10th, 2022 at 7am EDT/1pm CEST
Abstract: Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. Specifically, chosen performance metrics do not necessarily reflect the domain interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. To break such historically grown poor practices, we followed a multidisciplinary cross-domain approach that enabled us to critically question common practice in different communities and integrate distributed knowledge in one common framework.
This talk will comprise two parts. The first part will cover common and rare pitfalls of metrics in the field of image analysis, which have been compiled by a large multidisciplinary international consortium in a community-powered process. The second part will present Metrics Reloaded, a comprehensive framework guiding researchers towards choosing metrics in a problem-aware manner. A specific focus will be put on recommendations that go beyond the current state of the art.
Biography: Lena Maier-Hein is head of the division Intelligent Medical Systems at the German Cancer Research Center (DKFZ) and serves as managing director of the DKFZ Data Science and Digital Oncology cross-topic program. Her research concentrates on machine learning-based biomedical image analysis with a specific focus on surgical data science, computational biophotonics and validation of machine learning algorithms. During her academic career, Lena Maier-Hein has been distinguished with several science awards including the 2013 Heinz Maier Leibnitz Award of the German Research Foundation (DFG) and the 2017/18 Berlin-Brandenburg Academy Prize. She is further a fellow of the Medical Image Computing and Computer Assisted Intervention (MICCAI) society and of the European Laboratory for Learning and Intelligent Systems (ELLIS). Further international recognitions include a European Research Council (ERC) starting grant (2015-2020) and consolidator grant (2021-2026).
Title: Heavy tails in Bayesian neural networks: expectation and reality
Date: October 20th, 2022 at 10am EDT/4pm CEST/7am PDT
Abstract: The finding of the connection between the Gaussian process and deep Bayesian neural networks in the wide limit increased the interest in research on Bayesian neural networks. On one side, it helped to reason about existing works and their assumptions, such as Gaussian activations assumption in the Edge of Chaos effect, or tuning priors over functions to get closer to some GP. On another side, it gave a new perspective to Bayesian neural networks that lead to the study of the training dynamics through the neural tangent kernel, improvements in variational inference, uncertainty quantification, and others.
However, empirically, the distance between a hidden unit distribution and a Gaussian process increased with depth for the same number of hidden units per layer. So one of the main directions became the study of the difference between finite and infinite width neural networks.
We showed the sub-Weibull and Weibull-tail properties of hidden units conjecturing that hidden units are heavier-tailed with going deeper in the network. This tail description reveals the difference between hidden units’ in finite- and infinite-widths. There are also parallel works that show the full description of hidden units’ distributions through Meijer G-functions that are consistent with our heavy-tailed result.
We found theoretically that the tail parameter increases linearly with depth. However, we could not observe the theoretical tail parameter empirically. At least, not that precise. In this talk, I give a retrospective on this line of work about the Bayesian neural networks. Further, I give details and possible explanations of our empirical results.
Biography: Mariia Vladimirova is a PostDoc researcher at Inria Grenoble Rhone-Alpes in the Statify team. Her research mostly focuses on exploring distributional properties of Bayesian neural networks. More specifically, she is interested in explaining the difference between deep learning models of wide and shallow regimes in order to improve the interpretability and efficiency of the models.
Mariia Vladimirova did her graduate studies in Statify and Thoth teams under supervision of Julyan Arbel and Jakob Verbeek. During November 2019-January 2020, she was visiting Duke University and working on prior predictive distributions in BNNs under supervision of David Dunson. Prior to that, she obtained my Bachelor degree at Moscow Institute of Physics and Technology (MIPT) and did the second year of Master program at Grenoble Institute of Technology (Grenoble – INP, Ensimag).
Title: I can’t believe my machine learning system is not better
Date: September 15th, 2022 at 10am EDT/4pm CEST/7am PDT
Abstract: Deploying and maintaining machine learning models in real-world scenarios is hard. But why? In this talk I will talk about several real anecdotes in which a good model (or what it was supposed to be a good model) failed to have an impact in the real-world. We will explore the bitter lesson that having a good machine learning model does not necessarily imply having a good machine learning system. We will visit examples that cover the design of microfluidic chips to study aging in yeast cells, the use electronic health records to predict the effects of medical interventions and the design of large decisions making system with many interconnected nodes. Although every application is different, I will share the lessons learned in these areas, that I hope will be useful for the audience of the talk when addressing real-word questions in the future.
Biography: Javier González is a Principal Researcher in the Biological NLP/Real World Evidence group at Health Futures, Microsoft. Javier works in machine methods for healthcare with special focus on uncertainty quantification and causal inference for precision medicine. Before joining Microsoft, Javier was leading a team in Amazon that developed and deployed machine learning methods for Prime Air, Alexa and the Amazon’s supply chain. Before that, he was a researcher associate at the machine learning group of the University of Sheffield where he worked on Bayesian optimization methods to scale the production of drugs compounds using hamster cells. He was also the main developer of GPyOpt, a widely used library for Bayesian optimization un the community. Javier co-founded Inferentia Ltd. toguether with Andreas Damianou, Zhenwen Dai and Neil Lawrence, a machine learning start-up that was acquired by Amazon in 2016. Between 2011 and 2013, Javier was post-doc at the University of Groningen where he worked on machine learning approaches to understand the dynamics of biological systems, in particular the causes of aging in yeast. Javier got his PhD in 2010 at Carlos III university of Madrid.
Title: From Identifiability to Structured Representation Spaces, and a Case for (Precise) Pragmatism in Machine Learning
Date: August 4th, 2022 at 11am EDT / 5pm CEST / 8am PDT
Abstract: There has been a recent surge in research activity related to identifiability in generative models involving latent variables. Why should we care whether a latent variable model is identifiable? I will give some pragmatic reasons, which differ philosophically from and which have different practical and theoretical implications than, classical views on identifiability, which usually relate to recovering the “true” distribution or “true” latent factors of variation. In particular, a pragmatic approach requires us to consider how the structure we are imposing (or not imposing) on the latent space relates to the problems we’re trying to solve. I will highlight how I think a lack of precise pragmatism is holding back modern methods in challenging settings, including how aspects of my own research on identiability has gotten stuck without problem-specific constraints. Elaborating on methods for representation learning more generally, I will discuss some ways we can (and are beginning to) structure our latent spaces to achieve specific goals other than vague appeals to general AI.
Biography: Benjamin Bloem-Reddy is an Assistant Professor of Statistics at the University of British Columbia. He works on problems in statistics and machine learning, with an emphasis on probabilistic approaches. He has a growing interest in causality and its interplay with knowledge and inference and also collaborates with researchers in the sciences on statistical problems arising in their research.
Bloem-Reddy was a PhD student with Peter Orbanz at Columbia and a postdoc with Yee Whye Teh in the CSML group at the University of Oxford. Before moving to statistics and machine learning, he studied physics at Stanford University and Northwestern University.
Title: Struggling to Achieve Novelty Detection in Deep Learning
Date: July 7th, 2022 at 11am EDT / 5pm CEST / 8am PST
Abstract: In 2005, motivated by an open world computer vision application, I became interested in novelty detection. However, there were few methods available in computer vision at that time, and my research turned to studying anomaly detection in standard feature vector data. In that arena, many good algorithms were being published. Fundamentally, these methods rely on a notion of distance or density in feature space and detect anomalies as outliers in that space.
Returning to deep learning 10 years later, my students and I attempted, without much success. to apply these methods to the latent representations in deep learning. Other groups attempted to apply deep density models, again with limited success. Summary: I couldn’t believe it was not better. In the meantime, simple anomaly scores such as the maximum softmax probability of the max logit score were shown to be doing very well.
We decided that we had reached the limits of what macro-level analysis (error rates, AUC scores) could tell us about these techniques. It was time to look closely at the actual feature values. In this talk, I’ll show our analysis of feature activations and introduce the Familiarity Hypothesis, which states that the max logit/max softmax score is measuring the amount of familiarity in an image rather than the amount of novelty. This is a direct consequence of the fact that the only features that are learned are ones that capture variability in the training data. Hence, deep nets can only represent images that fall within this variability. Novel images are mapped into this representation, and hence cannot be detected as outliers.
I’ll close with some potential directions to overcome this limitation.
Biography: Dr. Dietterich (AB Oberlin College 1977; MS University of Illinois 1979; PhD Stanford University 1984) is Distinguished Professor Emeritus in the School of Electrical Engineering and Computer Science at Oregon State University. Dietterich is one of the pioneers of the field of Machine Learning and has authored more than 225 refereed publications and two books. His current research topics include robust artificial intelligence, robust human-AI systems, and applications in sustainability.
Dietterich has devoted many years of service to the research community and was recently given the ACML and AAAI distinguished service awards. He is a former President of the Association for the Advancement of Artificial Intelligence and the founding president of the International Machine Learning Society. Other major roles include Executive Editor of the journal Machine Learning, co-founder of the Journal for Machine Learning Research, and program chair of AAAI 1990 and NIPS 2000. He currently serves as one of the moderators for the cs.LG category on arXiv.
Title: Research Process for Interpretable Machine Learning
Date: June 16th, 2022 at 8:30am EDT / 2:30pm CEST
Abstract: There has been much interest in interpretable machine learning (and/or explainable AI) as a way to allow domain experts to vet machine learning systems as well as a way to assist in human+AI teaming. In this “chalk” talk, I’ll briefly provide a framework for thinking about the interdisciplinary ecosystem that interpretable machine learning provides and then dive into the process of doing high-quality, impactful machine learning research. Specifically, I’ll talk about:
In the spirit of ICBINB, I’ll draw my own experience, including examples of times when I think we got things right, and when we could have done better.
Biography: Finale Doshi-Velez is a Gordon McKay Professor in Computer Science at the Harvard Paulson School of Engineering and Applied Sciences. She completed her MSc from the University of Cambridge as a Marshall Scholar, her PhD from MIT, and her postdoc at Harvard Medical School. Her interests lie at the intersection of machine learning, healthcare, and interpretability.
Title: Limitations of the theory for sampling with kernelised Wasserstein gradient flows
Date: May 5th, 2022 at 10am EDT / 4pm CEST
Abstract: Sampling from a probability distribution whose density is only known up to a normalisation constant is a fundamental problem in statistics and machine learning. Recently, several algorithms based on interactive particle systems were proposed for this task, as an alternative to Markov Chain Monte Carlo methods or Variational Inference.
These particle systems can be designed by adopting an optimisation point of view for the sampling problem: an optimisation objective is chosen (which typically measures the dissimilarity to the target distribution), and its Wasserstein gradient flow is approximated by an interacting particle system, which can involve kernels. At stationarity, the stationarity states of these particle systems define an empirical measure approximating the target distribution.
In this talk I will present recent work on such algorithms, such as Stein Variational Gradient Descent  or Kernel Stein Discrepancy Descent . I will discuss some recent results that highlight bottlenecks and open questions: on the empirical side, these particle systems may suffer from convergence issues, while on the theoretical side, optimisation tools may not be sufficient to analyse these algorithms. Still, I will also discuss recent empirical results that show that there is hope in demonstrating nice approximation properties of these particle systems.
 A non-asymptotic Analysis of Stein Variational Gradient Descent. Korba, A., Salim, A., Arbel, M., Luise, G., Gretton, A. Neurips, 2020
 Kernel Stein Discrepancy Descent. Korba, A., Aubin-Frankowski, P.C., Majewski, S., Ablin, P. ICML, 2021.
Biography: Since September 2020, Anna Korba is an assistant professor at ENSAE/ CREST in the Statistics Department. Her main line of research is in statistical machine learning. She has been working on kernel methods, optimal transport and ranking data. Currently, she is particularly interested in dynamical particle systems for ML and kernel-based methods for causal inference.
Date: April 7th, 2022 at 10am EDT / 4pm CEST
Title: Applications Really Matter (And Publishing Them Is Essential For AI & Data Science)
Abstract: Many of us want to work on real-world machine learning problems that matter. However, it’s really hard for us to focus on such problems because it is extremely difficult to publish applied machine learning papers in top venues. I will argue that the lack of respect for applied papers has several wide-ranging applications:
1) Benefits to Science: We are unable to leverage scientific lessons learned through applications if we cannot publish them. Applications should actually be driving ML methods development. It is important to point out that applied papers are scientific. A boring bake-off or technical report is not a scientific applied paper. An applied scientific paper provides knowledge that is systematized and generalizes, just like any good scientific paper in any area of science.
2) Benefits to the Real World: We publish overly complicated methods when simpler ones would suffice. If we could focus on solving problems rather than developing methods, this issue could vanish. Much more importantly, if we actually focus on problems that benefit humanity, we might actually solve them.
3) Broadening our Community: By limiting our top venues mainly to methodology papers, we limit our community to those who care primarily about methods development. This further limits our community to those who come from narrow training pipelines. It also limits our field to exclude those whose primary goal is to directly improve the world. A really good applied data scientist from any country should be able to publish in a top tier venue in data science or AI.
4) Freeing our Top Scientists: By tying promotions of our top data scientists to publication venues that accept (essentially only) methodology, it means our top scientists cannot focus on real-world problems. This is particularly problematic if one wants to publish a data science paper in an area for which a specialized journal does not exist.
My proposed fix is to have tracks in major ML conferences and journals that focus on applications.
Biography: Cynthia Rudin is a professor at Duke University. Her goal is to design predictive models that are understandable to humans. She applies machine learning in many areas, such as healthcare, criminal justice, and energy reliability. She is the recipient of the 2022 Squirrel AI Award for Artificial Intelligence for the Benefit of Humanity from AAAI (the “Nobel Prize of AI”). She is a fellow of the American Statistical Association, the Institute of Mathematical Statistics, and AAAI. She is a three-time winner of the INFORMS Innovative Applications in Analytics Award. Her work has been featured in news outlets including the NY Times, Washington Post, Wall Street Journal, and Boston Globe.
Date: March 3rd, 2022 at 10am EST / 4pm CET
Title: I Can’t Believe Bayesian Deep Learning is not Better
Abstract: Bayesian deep learning is seductive: it combines the simplicity, coherence, and beauty of the Bayesian approach to problem solving together with the expressivity and compositional flexibility of deep neural networks. Yes, inference can be challenging, but the promises of improved uncertainty quantification, better out-of-distribution behaviour, and improved sample efficiency are worth it. Or is it? In this talk I will tell a personal story of being seduced by, then frustrated with, and now recovering from Bayesian deep learning. I will present the context of our work on the cold posterior effect, (Wenzel et al., 2020) and it’s main findings, as well as some more recent work that tries to explain the effect. I will also offer some personal reflections on research practice and narratives that contributed to the lack of progress in Bayesian deep learning.
Biography: Sebastian Nowozin is a deep learning researcher at Microsoft Research Cambridge, UK, where he currently leads the Machine Intelligence research theme. His research interests are in probabilistic deep learning and applications of machine learning models to real-world problems. He completed his PhD in 2009 at the Max Planck Institute in Tübingen, and has since worked on domains as varied as computer vision, computational imaging, cloud-based machine learning, and approximate inference.
Date: February 3rd, 2022 at 10am EST / 4pm CET
Title: An Automatic Finite-Sample Robustness Metric: Can Dropping a Little Data Change Conclusions?
Abstract: Imagine you’ve got a bold new idea for ending poverty. To check your intervention, you run a gold-standard randomized controlled trial; that is, you randomly assign individuals in the trial to either receive your intervention or to not receive it. You recruit tens of thousands of participants. You run an entirely standard and well-vetted statistical analysis; you conclude that your intervention works with a p-value < 0.01. You publish your paper in a top venue, and your research makes it into the news! Excited to make the world a better place, you apply your intervention to a new set of people and… it fails to reduce poverty. How can this possibly happen? There seems to be some important disconnect between theory and practice, but what is it? And is there any way you could have been tipped off about the issue when running your original data analysis? In the present work, we observe that if a very small percentage of the original data was instrumental in determining the original conclusion, we might worry that the conclusion could be unstable under new conditions. So we propose a method to assess the sensitivity of data analyses to the removal of a very small fraction of the data set. Analyzing all possible data subsets of a certain size is computationally prohibitive, so we provide an approximation. We call our resulting method the Approximate Maximum Influence Perturbation. Empirics demonstrate that while some (real-life) applications are robust, in others the sign of a treatment effect can be changed by dropping less than 0.1% of the data — even in simple models and even when p-values are small.
Biography: Tamara Broderick is an Associate Professor in the Department of Electrical Engineering and Computer Science at MIT. She is a member of the MIT Laboratory for Information and Decision Systems (LIDS), the MIT Statistics and Data Science Center, and the Institute for Data, Systems, and Society (IDSS). She completed her Ph.D. in Statistics at the University of California, Berkeley in 2014. Previously, she received an AB in Mathematics from Princeton University (2007), a Master of Advanced Study for completion of Part III of the Mathematical Tripos from the University of Cambridge (2008), an MPhil by research in Physics from the University of Cambridge (2009), and an MS in Computer Science from the University of California, Berkeley (2013). Her recent research has focused on developing and analyzing models for scalable Bayesian machine learning. She has been awarded selection to the COPSS Leadership Academy (2021), an Early Career Grant (ECG) from the Office of Naval Research (2020), an AISTATS Notable Paper Award (2019), an NSF CAREER Award (2018), a Sloan Research Fellowship (2018), an Army Research Office Young Investigator Program (YIP) award (2017), Google Faculty Research Awards, an Amazon Research Award, the ISBA Lifetime Members Junior Researcher Award, the Savage Award (for an outstanding doctoral dissertation in Bayesian theory and methods), the Evelyn Fix Memorial Medal and Citation (for the Ph.D. student on the Berkeley campus showing the greatest promise in statistical research), the Berkeley Fellowship, an NSF Graduate Research Fellowship, a Marshall Scholarship, and the Phi Beta Kappa Prize (for the graduating Princeton senior with the highest academic average).