DoWhy: Causal Inference Made Easy

Kubric, Graph Neural Networks, SimCLR

Jun 06, 2022

Great Twitter thread if you are interested in datasets in 3D:

📢📢📢 Happy to introduce that "Kubric: a scalable dataset generator" was accepted at #CVPR2022. arxiv.org/abs/2203.03570 – Do you work on video and/or 3D vision from images? – Having a hard time applying the "scientific method" to your research? Look no further 🧵👇

The paper is here, and open-sourced code is here.

Articles

Amazon wrote about how they are using Graph Neural Networks through creating multi-step embedding representation between entities in the following blog post.

There was another similar from Twitter that talks about how to accommodate missing nodes in GNNs here.

Microsoft open-sourced a causal inference library called DoWhy and talked about some of the motivations and capabilities in this blog post.

It supports the following functionalities:

Modeling: Causal reasoning begins with the creation of a clear model of the causal assumptions being made. This involves documenting what is known about the data generating process and mechanisms. To get a valid answer to our cause-and-effect questions, we must be explicit about what we already know.

Identification: Next, we use the model to decide whether the causal question can be answered, and we provide the required expression to be computed. Identification is the process of analyzing our model.

Estimation: Once we have a strategy for identifying the causal effect, we can choose from several different statistical and machine learning-based estimation methods to answer our causal question. Estimation is the process of analyzing our data.

Refutation: Once we have our answer, we must do everything we can to test our underlying assumptions. Is our model consistent with the data? How sensitive is the answer to the assumptions made? If the model missed an unobserved confounder, will that change our answer a little or a lot?

Google wrote a blog post on how they built a version of Alternating Least Squares for solving matrix factorization. The paper is here. The code is written in Jax and it is available in GitHub.

If you are using large scale matrix factorization or non-negative matrix factorization variants and GCP, this might be a good library for you to try out.

Nikolas Adaloglou wrote on SimCLR in this blog post. It has excellent code walkthroughs and clear implementation details in PyTorch.

The full notebook is here that includes all of the code portions in the post.

Libraries

Lux is a Python library that facilitate fast and easy data exploration by automating the visualization and data analysis process. By simply printing out a dataframe in a Jupyter notebook, Lux recommends a set of visualizations highlighting interesting trends and patterns in the dataset. Visualizations are displayed via an interactive widget that enables users to quickly browse through large collections of visualizations and make sense of their data.
RQ-VAE Transformer is an official implementation for Autoregressive image generation through residual quantization paper. For autoregressive (AR) modeling of high-resolution images, they propose the two-stage framework, which consists of RQ-VAE and RQ-Transformer. Their framework can precisely approximate a feature map of an image and represent an image as a stack of discrete codes to effectively generate high-quality image.
Rubrix is a production-ready Python framework for exploring, annotating, and managing data in NLP projects.
Composer is a library written in PyTorch that enables you to train neural networks faster, at lower cost, and to higher accuracy. We've implemented more than two dozen speed-up methods that can be applied to your training loop in just a few lines of code, or used with our built-in Trainer. We continually integrate the latest state-of-the-art in efficient neural network training.
Mctx is a library with a JAX-native implementation of Monte Carlo tree search (MCTS) algorithms such as AlphaZero, MuZero, and Gumbel MuZero. For computation speed up, the implementation fully supports JIT-compilation. Search algorithms in Mctx are defined for and operate on batches of inputs, in parallel. This allows to make the most of the accelerators and enables the algorithms to work with large learned environment models parameterized by deep neural networks.
DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

Tutorials

Tensorflow has an Actor-Critic Method tutorial on how to use this technique in OpenAI Gym environment.

MLOps Newsletter

Discussion about this post