ImageGen becomes ImageGen Video

Linkedin makes Feathr to be part of LF AI & Data, Amazon uses GNN for recs

Oct 23, 2022

Which model architecture is the best? Transformer, may be:

The Transformer is a magnificient neural network architecture because it is a general-purpose differentiable computer. It is simultaneously: 1) expressive (in the forward pass) 2) optimizable (via backpropagation+gradient descent) 3) efficient (high parallelism compute graph)

Articles

LinkedIn makes Feathr to be part of Linux Foundation and announced this in the following blog post.
Main motivation is the grow the community and expanding the user base and possibly allowing other tools/libraries to integrate better in the tool.
We aim to support Feathr to expand its user base, grow its community of developers, become a leader within its own category, and enable collaboration and integration opportunities with other projects. We look forward to the project’s continued growth and success as part of LF AI & Data.”
The other tools that possibly want to integrate this library are Milvus, JanusGraph and OpenLineage.
Donating to the LF AI & Data will help ensure that Feathr continues to grow and evolve across various dimensions, including visibility, user base, and contributor base. Also, the Feathr development team will have more opportunities to collaborate with other member companies and projects, such as achieving richer online store support via integration with Milvus and JanusGraph, and adopting open data lineage standard from OpenLineage. As a result, we hope Feathr helps AI engineers build and scale feature pipelines and feature applications in ways that push MLOps tech stacks and the industry forward for years to come.

After PyTorch became a Linux Foundation project, companies like Google started providing more support as announced in the following blog post.

Google adapts their popular ImageGen diffusion model to video domain and built a generative process for video this time.

The model is a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. We describe how we scale up the system as a high definition text-to-video model including design decisions such as the choice of fully-convolutional temporal and spatial super-resolution models at certain resolutions, and the choice of the v-parameterization of diffusion models. In addition, we confirm and transfer findings from previous work on diffusion-based image generation to the video generation setting. Finally, we apply progressive distillation to our video models with classifier-free guidance for fast, high quality sampling. We find Imagen Video not only capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding.

The paper goes into a lot more details for the cascaded approach to create lower resolution to higher resolution video.

However, the model was not released due to concerns in the safety and ethical concerns in the training dataset.

Amazon published a blog post on how they use Graph Neural Networks(GNN) for product recommendations. In order to capture the directionality, they train two types of embeddings for products; one for main product for recommendations to provide and others are for embeddings for products to be recommended.

They go a lot more detail in the paper.

In this type of training, it also helps train the model in such a way that:

To reduce selection bias
To help on the cold-start products(where the engagement data is not available)

PyTorch wrote a post on how to use Fully Sharded Data Parallel(FSDP) through XLA in TPUs. This FSDP algorithm will be very important for researchers are using PyTorch with GPU and want to try TPUs in Google cloud.

Libraries

Feathr is the feature store that is used in production in LinkedIn for many years and was open sourced in April 2022. Feathr lets you:
- Define features based on raw data sources (batch and streaming) using pythonic APIs.
- Register and get features by names during model training and model inference.
- Share features across your team and company.
Feathr automatically computes your feature values and joins them to your training data, using point-in-time-correct semantics to avoid data leakage, and supports materializing and deploying your features for use online in production.
OpenLineage is an Open standard for metadata and lineage collection designed to instrument jobs as they are running. It defines a generic model of run, job, and dataset entities identified using consistent naming strategies. The core lineage model is extensible by defining specific facets to enrich those entities.
IRIS is a data-efficient agent trained over millions of imagined trajectories in a world model.
- The world model is composed of a discrete autoencoder and an autoregressive Transformer.
- Our approach casts dynamics learning as a sequence modeling problem, where the autoencoder builds a language of image tokens and the Transformer composes that language over time.

MLOps Newsletter

Discussion about this post