OpenAI introduces Outpainting to Dall-E

AudioLM from Google, Model-Soups and

Sep 12, 2022

If you like soups, now you can do model soups:

Model-soups seems an interesting direction: arxiv.org/abs/2203.05482 TLDR: - Paper proposes instead of picking the one that is winner per validation set, why do not we use the average weights for each model run with different hyperparameters and configuration, hence the soup!

The code is also available in GitHub!

Articles

OpenAI introduces a new feature called outpainting in this article to Dall-E.
- This would allow to extend and expand any image with Dall-E to incorporate more and different story lines into the image.
Google introduces a new language model for audio generation in this post.

They introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts audio generation as a language modeling task in this representation space. They show how existing audio tokenizers provide different trade-offs between reconstruction quality and long-term structure, and we propose a hybrid tokenization scheme to achieve both objectives. Namely, they leverage the discretized activations of a masked language model pre-trained on audio to capture long-term structure and the discrete codes produced by a neural audio codec to achieve high-quality synthesis. By training on large corpora of raw audio waveforms, AudioLM learns to generate natural and coherent continuations given short prompts. When trained on speech, and without any transcript or annotation, AudioLM generates syntactically and semantically plausible speech continuations while also maintaining speaker identity and prosody for unseen speakers. Furthermore, they demonstrate how our approach extends beyond speech by generating coherent piano music continuations, despite being trained without any symbolic representation of music.

CMU publishes a blog post for a new method for tracking pixel in a video using a new model architecture through MLPMixer:

They elected to use an MLP-Mixer, which they found to have a good trade-off between model capacity, training time, and generalization. They also tried convolutional models and transformers, but the convolutional models could not fit the data as well as the MLP-Mixer, and the transformers took too long to train.
They trained the model in synthetic data that they made (based on an existing optical flow dataset), where they could provide multi-frame ground-truth for targets that undergo occlusions. The animation below shows the kind of data we trained on. You might say the data looks crazy — but that’s the point! If you can’t get real data, your best bet is synthetic data with extremely high diversity.

The code is available in GitHub and more information can be found in the project page.

HuggingFace publishes a new license that allows to distribute open source models and their artifacts openly and they announced this in a blog post.

Open: these licenses allow royalty free access and flexible downstream use and re-distribution of the licensed material, and distribution of any derivatives of it.
Responsible: OpenRAIL licenses embed a specific set of restrictions for the use of the licensed AI artifact in identified critical scenarios. Use-based restrictions are informed by an evidence-based approach to ML development and use limitations which forces to draw a line between promoting wide access and use of ML against potential social costs stemming from harmful uses of the openly licensed AI artifact. Therefore, while benefiting from an open access to the ML model, the user will not be able to use the model for the specified restricted scenarios.

Libraries

ML-EXray is an open-source research library for ML execution monitoring and debugging on edge devices. It provides visibility into layer-level details of the ML execution, and helps developers analyze and debug cloud-to-edge deployment issues. It includes a suite of instrumentation APIs for ML execution logging and an end-to-end deployment validation library. Users and app developers can catch complicated deployment issues just by writing a few lines of instrumentation and assertion code.
Modin is a drop-in replacement for pandas. While pandas is single-threaded, Modin lets you instantly speed up your workflows by scaling pandas so it uses all of your cores. Modin works especially well on larger datasets, where pandas becomes painfully slow or runs out of memory.

Often data scientists have to switch between different tools for operating on datasets of different sizes. Processing large dataframes with pandas is slow, and pandas does not support working with dataframes that are too large to fit into the available memory. As a result, pandas workflows that work well for prototyping on a few MBs of data do not scale to tens or hundreds of GBs (depending on the size of your machine). Modin supports operating on data that does not fit in memory, so that you can comfortably work with hundreds of GBs without worrying about substantial slowdown or memory errors. With cluster and out of core support, Modin is a DataFrame library with both great single-node performance and high scalability in a cluster.

Hamilton is general purpose micro-framework for creating dataflows from python functions. Specifically, Hamilton defines a novel paradigm, that allows you to specify a flow of (delayed) execution, that forms a Directed Acyclic Graph (DAG). It was original built to solve creating wide (1000+) column dataframes. Core to the design of Hamilton is a clear mapping of function name to dataflow output. That is, Hamilton forces a certain paradigm with writing functions, and aims for DAG clarity, easy modifications, with always unit testable and naturally documentable code.

MLOps Newsletter

Discussion about this post