ICBINB promotes “slow science”, pushes back against “leaderboard-ism”, revalues unexpected negative results, and helps people push their research when “stuck”. More broadly, our moon-shot is to transform how we do research by cracking-open the research process and inviting meta-dialog. Starting in February 2022, we will be hosting virtual seminar series monthly for promoting such values and sharing stories from or about research.

The ICBINB Monthly Seminar Series seeks to shine a light on the “stuck” phases of research. Speakers will tell us about their most beautiful ideas that didn’t “work”, about when theory didn’t match practice, or perhaps just when the going got tough. These talks will let us peek inside the file drawer of unexpected negative results and peer behind the curtain to see the real story of how real researchers do real research.
Do you have somebody in mind who has something to say on these topics? You are very welcome to nominate them here.

Join us again for our next ICBINB seminar series season starting in January!


November 10th, 2022

Lena Maier-Hein

Title: Metrics Reloaded

Date: November 10th, 2022 at 7am EDT/1pm CEST

[Recording]

Abstract: Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. Specifically, chosen performance metrics do not necessarily reflect the domain interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. To break such historically grown poor practices, we followed a multidisciplinary cross-domain approach that enabled us to critically question common practice in different communities and integrate distributed knowledge in one common framework.
This talk will comprise two parts. The first part will cover common and rare pitfalls of metrics in the field of image analysis, which have been compiled by a large multidisciplinary international consortium in a community-powered process. The second part will present Metrics Reloaded, a comprehensive framework guiding researchers towards choosing metrics in a problem-aware manner. A specific focus will be put on recommendations that go beyond the current state of the art.

Biography: Lena Maier-Hein is head of the division Intelligent Medical Systems at the German Cancer Research Center (DKFZ) and serves as managing director of the DKFZ Data Science and Digital Oncology cross-topic program. Her research concentrates on machine learning-based biomedical image analysis with a specific focus on surgical data science, computational biophotonics and validation of machine learning algorithms. During her academic career, Lena Maier-Hein has been distinguished with several science awards including the 2013 Heinz Maier Leibnitz Award of the German Research Foundation (DFG) and the 2017/18 Berlin-Brandenburg Academy Prize. She is further a fellow of the Medical Image Computing and Computer Assisted Intervention (MICCAI) society and of the European Laboratory for Learning and Intelligent Systems (ELLIS). Further international recognitions include a European Research Council (ERC) starting grant (2015-2020) and consolidator grant (2021-2026).


October 20th, 2022

Mariia Vladimirova

Title: Heavy tails in Bayesian neural networks: expectation and reality

Date: October 20th, 2022 at 10am EDT/4pm CEST/7am PDT

[Recording]

Abstract: The finding of the connection between the Gaussian process and deep Bayesian neural networks in the wide limit increased the interest in research on Bayesian neural networks. On one side, it helped to reason about existing works and their assumptions, such as Gaussian activations assumption in the Edge of Chaos effect, or tuning priors over functions to get closer to some GP. On another side, it gave a new perspective to Bayesian neural networks that lead to the study of the training dynamics through the neural tangent kernel, improvements in variational inference, uncertainty quantification, and others.

However, empirically, the distance between a hidden unit distribution and a Gaussian process increased with depth for the same number of hidden units per layer. So one of the main directions became the study of the difference between finite and infinite width neural networks.

We showed the sub-Weibull and Weibull-tail properties of hidden units conjecturing that hidden units are heavier-tailed with going deeper in the network. This tail description reveals the difference between hidden units’ in finite- and infinite-widths. There are also parallel works that show the full description of hidden units’ distributions through Meijer G-functions that are consistent with our heavy-tailed result.

We found theoretically that the tail parameter increases linearly with depth. However, we could not observe the theoretical tail parameter empirically. At least, not that precise. In this talk, I give a retrospective on this line of work about the Bayesian neural networks. Further, I give details and possible explanations of our empirical results.

Biography: Mariia Vladimirova is a PostDoc researcher at Inria Grenoble Rhone-Alpes in the Statify team. Her research mostly focuses on exploring distributional properties of Bayesian neural networks. More specifically, she is interested in explaining the difference between deep learning models of wide and shallow regimes in order to improve the interpretability and efficiency of the models.

Mariia Vladimirova did her graduate studies in Statify and Thoth teams under supervision of Julyan Arbel and Jakob Verbeek. During November 2019-January 2020, she was visiting Duke University and working on prior predictive distributions in BNNs under supervision of David Dunson. Prior to that, she obtained my Bachelor degree at Moscow Institute of Physics and Technology (MIPT) and did the second year of Master program at Grenoble Institute of Technology (Grenoble – INP, Ensimag).


Javier Gonzalez

Title: I can’t believe my machine learning system is not better

Date: September 15th, 2022 at 10am EDT/4pm CEST/7am PDT

[Recording]

Abstract: Deploying and maintaining machine learning models in real-world scenarios is hard. But why? In this talk I will talk about several real anecdotes in which a good model (or what it was supposed to be a good model) failed to have an impact in the real-world. We will explore the bitter lesson that having a good machine learning model does not necessarily imply having a good machine learning system. We will visit examples that cover the design of microfluidic chips to study aging in yeast cells, the use electronic health records to predict the effects of medical interventions and the design of large decisions making system with many interconnected nodes. Although every application is different, I will share the lessons learned in these areas, that I hope will be useful for the audience of the talk when addressing real-word questions in the future.

Biography: Javier González is a Principal Researcher in the Biological NLP/Real World Evidence group at Health Futures, Microsoft. Javier works in machine methods for healthcare with special focus on uncertainty quantification and causal inference for precision medicine. Before joining Microsoft, Javier was leading a team in Amazon that developed and deployed machine learning methods for Prime Air, Alexa and the Amazon’s supply chain. Before that, he was a researcher associate at the machine learning group of the University of Sheffield where he worked on Bayesian optimization methods to scale the production of drugs compounds using hamster cells. He was also the main developer of GPyOpt, a widely used library for Bayesian optimization un the community. Javier co-founded Inferentia Ltd. toguether with Andreas Damianou, Zhenwen Dai and Neil Lawrence, a machine learning start-up that was acquired by Amazon in 2016. Between 2011 and 2013, Javier was post-doc at the University of Groningen where he worked on machine learning approaches to understand the dynamics of biological systems, in particular the causes of aging in yeast. Javier got his PhD in 2010 at Carlos III university of Madrid.


August 4th, 2022

Benjamin Bloem-Reddy

Title: From Identifiability to Structured Representation Spaces, and a Case for (Precise) Pragmatism in Machine Learning

Date: August 4th, 2022 at 11am EDT / 5pm CEST / 8am PDT

[Recording]

Abstract: There has been a recent surge in research activity related to identifiability in generative models involving latent variables. Why should we care whether a latent variable model is identifiable? I will give some pragmatic reasons, which differ philosophically from and which have different practical and theoretical implications than, classical views on identifiability, which usually relate to recovering the “true” distribution or “true” latent factors of variation. In particular, a pragmatic approach requires us to consider how the structure we are imposing (or not imposing) on the latent space relates to the problems we’re trying to solve. I will highlight how I think a lack of precise pragmatism is holding back modern methods in challenging settings, including how aspects of my own research on identiability has gotten stuck without problem-specific constraints. Elaborating on methods for representation learning more generally, I will discuss some ways we can (and are beginning to) structure our latent spaces to achieve specific goals other than vague appeals to general AI.

Biography: Benjamin Bloem-Reddy is an Assistant Professor of Statistics at the University of British Columbia. He works on problems in statistics and machine learning, with an emphasis on probabilistic approaches. He has a growing interest in causality and its interplay with knowledge and inference and also collaborates with researchers in the sciences on statistical problems arising in their research.
Bloem-Reddy was a PhD student with Peter Orbanz at Columbia and a postdoc with Yee Whye Teh in the CSML group at the University of Oxford. Before moving to statistics and machine learning, he studied physics at Stanford University and Northwestern University.


July 7th, 2022

Thomas Dietterich

Title: Struggling to Achieve Novelty Detection in Deep Learning

Date: July 7th, 2022 at 11am EDT / 5pm CEST / 8am PST

[Recording]

Abstract: In 2005, motivated by an open world computer vision application, I became interested in novelty detection. However, there were few methods available in computer vision at that time, and my research turned to studying anomaly detection in standard feature vector data. In that arena, many good algorithms were being published. Fundamentally, these methods rely on a notion of distance or density in feature space and detect anomalies as outliers in that space.

Returning to deep learning 10 years later, my students and I attempted, without much success. to apply these methods to the latent representations in deep learning. Other groups attempted to apply deep density models, again with limited success. Summary: I couldn’t believe it was not better. In the meantime, simple anomaly scores such as the maximum softmax probability of the max logit score were shown to be doing very well.

We decided that we had reached the limits of what macro-level analysis (error rates, AUC scores) could tell us about these techniques. It was time to look closely at the actual feature values. In this talk, I’ll show our analysis of feature activations and introduce the Familiarity Hypothesis, which states that the max logit/max softmax score is measuring the amount of familiarity in an image rather than the amount of novelty. This is a direct consequence of the fact that the only features that are learned are ones that capture variability in the training data. Hence, deep nets can only represent images that fall within this variability. Novel images are mapped into this representation, and hence cannot be detected as outliers.

I’ll close with some potential directions to overcome this limitation.

Biography: Dr. Dietterich (AB Oberlin College 1977; MS University of Illinois 1979; PhD Stanford University 1984) is Distinguished Professor Emeritus in the School of Electrical Engineering and Computer Science at Oregon State University.  Dietterich is one of the pioneers of the field of Machine Learning and has authored more than 225 refereed publications and two books. His current research topics include robust artificial intelligence, robust human-AI systems, and applications in sustainability.

Dietterich has devoted many years of service to the research community and was recently given the ACML and AAAI distinguished service awards. He is a former President of the Association for the Advancement of Artificial Intelligence and the founding president of the International Machine Learning Society. Other major roles include Executive Editor of the journal Machine Learning, co-founder of the Journal for Machine Learning Research, and program chair of AAAI 1990 and NIPS 2000. He currently serves as one of the moderators for the cs.LG category on arXiv.


June 16th, 2022

Finale Doshi-Velez

Title: Research Process for Interpretable Machine Learning

Date: June 16th, 2022 at 8:30am EDT / 2:30pm CEST

[Recording]

Abstract: There has been much interest in interpretable machine learning (and/or explainable AI) as a way to allow domain experts to vet machine learning systems as well as a way to assist in human+AI teaming.  In this “chalk” talk, I’ll briefly provide a framework for thinking about the interdisciplinary ecosystem that interpretable machine learning provides and then dive into the process of doing high-quality, impactful machine learning research.  Specifically, I’ll talk about:

  • What are the kinds of interpretable machine learning questions that are computational and what are human factors?
  • How and when should we define abstractions between computational and human factor elements in interpretable machine learning?
  • When is a user study needed, and how should it be set up?

In the spirit of ICBINB, I’ll draw my own experience, including examples of times when I think we got things right, and when we could have done better.

Biography: Finale Doshi-Velez is a Gordon McKay Professor in Computer Science at the Harvard Paulson School of Engineering and Applied Sciences. She completed her MSc from the University of Cambridge as a Marshall Scholar, her PhD from MIT, and her postdoc at Harvard Medical School. Her interests lie at the intersection of machine learning, healthcare, and interpretability.


May 5th, 2022

Anna Korba

Title: Limitations of the theory for sampling with kernelised Wasserstein gradient flows

Date: May 5th, 2022 at 10am EDT / 4pm CEST

[Recording] [Slides]

Abstract: Sampling from a probability distribution whose density is only known up to a normalisation constant is a fundamental problem in statistics and machine learning. Recently, several algorithms based on interactive particle systems were proposed for this task, as an alternative to Markov Chain Monte Carlo methods or Variational Inference.
These particle systems can be designed by adopting an optimisation point of view for the sampling problem: an optimisation objective is chosen (which typically measures the dissimilarity to the target distribution), and its Wasserstein gradient flow is approximated by an interacting particle system, which can involve kernels. At stationarity, the stationarity states of these particle systems define an empirical measure approximating the target distribution.
In this talk I will present recent work on such algorithms, such as Stein Variational Gradient Descent [1] or Kernel Stein Discrepancy Descent [2]. I will discuss some recent results that highlight bottlenecks and open questions: on the empirical side, these particle systems may suffer from convergence issues, while on the theoretical side, optimisation tools may not be sufficient to analyse these algorithms. Still, I will also discuss recent empirical results that show that there is hope in demonstrating nice approximation properties of these particle systems.

[1]  A non-asymptotic Analysis of Stein Variational Gradient Descent. Korba, A., Salim, A., Arbel, M., Luise, G., Gretton, A. Neurips, 2020
[2] Kernel Stein Discrepancy Descent. Korba, A., Aubin-Frankowski, P.C., Majewski, S., Ablin, P. ICML, 2021.

Biography: Since September 2020, Anna Korba is an assistant professor at ENSAECREST in the Statistics Department. Her main line of research is in statistical machine learning. She has been working on kernel methods, optimal transport and ranking data. Currently, she is particularly interested in dynamical particle systems for ML and kernel-based methods for causal inference.


April 7th, 2022

Cynthia Rudin

Date: April 7th, 2022 at 10am EDT / 4pm CEST

Title: Applications Really Matter (And Publishing Them Is Essential For AI & Data Science)

[Recording] [Slides]

Abstract: Many of us want to work on real-world machine learning problems that matter. However, it’s really hard for us to focus on such problems because it is extremely difficult to publish applied machine learning papers in top venues. I will argue that the lack of respect for applied papers has several wide-ranging applications:

1) Benefits to Science: We are unable to leverage scientific lessons learned through applications if we cannot publish them. Applications should actually be driving ML methods development. It is important to point out that applied papers are scientific. A boring bake-off or technical report is not a scientific applied paper. An applied scientific paper provides knowledge that is systematized and generalizes, just like any good scientific paper in any area of science.

2) Benefits to the Real World: We publish overly complicated methods when simpler ones would suffice. If we could focus on solving problems rather than developing methods, this issue could vanish. Much more importantly, if we actually focus on problems that benefit humanity, we might actually solve them.

3) Broadening our Community: By limiting our top venues mainly to methodology papers, we limit our community to those who care primarily about methods development. This further limits our community to those who come from narrow training pipelines. It also limits our field to exclude those whose primary goal is to directly improve the world. A really good applied data scientist from any country should be able to publish in a top tier venue in data science or AI.

4) Freeing our Top Scientists: By tying promotions of our top data scientists to publication venues that accept (essentially only) methodology, it means our top scientists cannot focus on real-world problems. This is particularly problematic if one wants to publish a data science paper in an area for which a specialized journal does not exist.

My proposed fix is to have tracks in major ML conferences and journals that focus on applications.

Biography: Cynthia Rudin is a professor at Duke University. Her goal is to design predictive models that are understandable to humans. She applies machine learning in many areas, such as healthcare, criminal justice, and energy reliability. She is the recipient of the 2022 Squirrel AI Award for Artificial Intelligence for the Benefit of Humanity from AAAI (the “Nobel Prize of AI”). She is a fellow of the American Statistical Association, the Institute of Mathematical Statistics, and AAAI. She is a three-time winner of the INFORMS Innovative Applications in Analytics Award. Her work has been featured in news outlets including the NY Times, Washington Post, Wall Street Journal, and Boston Globe.


March 3rd, 2022

Sebastian Nowozin

Date: March 3rd, 2022 at 10am EST / 4pm CET

Title: I Can’t Believe Bayesian Deep Learning is not Better

[Recording] [Slides]

Abstract: Bayesian deep learning is seductive: it combines the simplicity, coherence, and beauty of the Bayesian approach to problem solving together with the expressivity and compositional flexibility of deep neural networks. Yes, inference can be challenging, but the promises of improved uncertainty quantification, better out-of-distribution behaviour, and improved sample efficiency are worth it. Or is it? In this talk I will tell a personal story of being seduced by, then frustrated with, and now recovering from Bayesian deep learning. I will present the context of our work on the cold posterior effect, (Wenzel et al., 2020) and it’s main findings, as well as some more recent work that tries to explain the effect. I will also offer some personal reflections on research practice and narratives that contributed to the lack of progress in Bayesian deep learning.

Biography: Sebastian Nowozin is a deep learning researcher at Microsoft Research Cambridge, UK, where he currently leads the Machine Intelligence research theme.  His research interests are in probabilistic deep learning and applications of machine learning models to real-world problems. He completed his PhD in 2009 at the Max Planck Institute in Tübingen, and has since worked on domains as varied as computer vision, computational imaging, cloud-based machine learning, and approximate inference.


February 3rd, 2022

Tamara Broderick

Date: February 3rd, 2022 at 10am EST / 4pm CET

Title: An Automatic Finite-Sample Robustness Metric: Can Dropping a Little Data Change Conclusions?

[Recording] [Slides]

Abstract: Imagine you’ve got a bold new idea for ending poverty. To check your intervention, you run a gold-standard randomized controlled trial; that is, you randomly assign individuals in the trial to either receive your intervention or to not receive it. You recruit tens of thousands of participants. You run an entirely standard and well-vetted statistical analysis; you conclude that your intervention works with a p-value < 0.01. You publish your paper in a top venue, and your research makes it into the news! Excited to make the world a better place, you apply your intervention to a new set of people and… it fails to reduce poverty. How can this possibly happen? There seems to be some important disconnect between theory and practice, but what is it? And is there any way you could have been tipped off about the issue when running your original data analysis? In the present work, we observe that if a very small percentage of the original data was instrumental in determining the original conclusion, we might worry that the conclusion could be unstable under new conditions. So we propose a method to assess the sensitivity of data analyses to the removal of a very small fraction of the data set. Analyzing all possible data subsets of a certain size is computationally prohibitive, so we provide an approximation. We call our resulting method the Approximate Maximum Influence Perturbation. Empirics demonstrate that while some (real-life) applications are robust, in others the sign of a treatment effect can be changed by dropping less than 0.1% of the data — even in simple models and even when p-values are small.

Biography: Tamara Broderick is an Associate Professor in the Department of Electrical Engineering and Computer Science at MIT. She is a member of the MIT Laboratory for Information and Decision Systems (LIDS), the MIT Statistics and Data Science Center, and the Institute for Data, Systems, and Society (IDSS). She completed her Ph.D. in Statistics at the University of California, Berkeley in 2014. Previously, she received an AB in Mathematics from Princeton University (2007), a Master of Advanced Study for completion of Part III of the Mathematical Tripos from the University of Cambridge (2008), an MPhil by research in Physics from the University of Cambridge (2009), and an MS in Computer Science from the University of California, Berkeley (2013). Her recent research has focused on developing and analyzing models for scalable Bayesian machine learning. She has been awarded selection to the COPSS Leadership Academy (2021), an Early Career Grant (ECG) from the Office of Naval Research (2020), an AISTATS Notable Paper Award (2019), an NSF CAREER Award (2018), a Sloan Research Fellowship (2018), an Army Research Office Young Investigator Program (YIP) award (2017), Google Faculty Research Awards, an Amazon Research Award, the ISBA Lifetime Members Junior Researcher Award, the Savage Award (for an outstanding doctoral dissertation in Bayesian theory and methods), the Evelyn Fix Memorial Medal and Citation (for the Ph.D. student on the Berkeley campus showing the greatest promise in statistical research), the Berkeley Fellowship, an NSF Graduate Research Fellowship, a Marshall Scholarship, and the Phi Beta Kappa Prize (for the graduating Princeton senior with the highest academic average).


Complete List: Monthly Seminar Series 2022

We will be hosting a seminar talk every first Thursday of each month, hold the date! Time may vary based on the time zone and availability of the speaker.

  • February 3rd: Tamara Broderick
  • March 3rd: Sebastian Nowozin
  • April 7th: Cynthia Rudin
  • May 5th: Anna Korba
  • June 16th: Finale Doshi-Velez (Please note the shift in the date!)
  • July 7th: Thomas Dietterich
  • August 4th: Benjamin Bloem-Reddy
  • September 15th: Javier Gonzalez
  • October 20th: Mariia Vladimirova
  • November 10th: Lena Maier-Hein
  • December 1st: There won’t be a seminar in December. See you instead at our NeurIPS workshop!