Graph Neural Networks in Tensorflow
sglang: Scaling Up Language Models with RadixAttention and a Domain-Specific Language
Articles
Google wrote an article on announcing Graph Neural Networks(GNN)s on Tensorflows.
They talk about following problems to solve:
Data complexity: How to efficiently process and learn from the intricate relationships within graph-structured data.
Limited resources: Address memory and computational constraints for training and running GNNs on large-scale graphs.
Generalizability: Develop GNN models that can adapt to unseen graphs and tasks beyond the training data.
The technical approach and some of the things that library brings:
Custom layers: Design GNN architectures by building custom layers in TensorFlow, allowing flexibility and customization.
Message passing functions: Define how information flows between nodes using these functions within custom layers.
Graph data structures: Utilize TensorFlow's data structures like
tf.SparseTensor
to represent sparse graphs efficiently.Automatic differentiation: Leverage TensorFlow's automatic differentiation capabilities for efficient gradient calculations during training.
+ More, some of the out of box functionality:
Sparse tensors: Reduce memory usage and computation for graphs with many missing edges.
Custom operations (ops): Implement optimized message passing logic as custom ops for improved performance.
Keras integration: Build and train GNNs seamlessly within the familiar Keras interface of TensorFlow.
TensorBoard visualization: Visualize intermediate representations and model behavior for debugging and interpretation.
It brings its own runner in the training, the code is available in GitHub, and there is an excellent notebook that shows end to end training.
LMSys wrote an article on sglang (Scaling Up Language Models with RadixAttention and a Domain-Specific Language) that solves three main problems in LLM:
High memory usage for storing intermediate results.
Repeated computations of similar tasks, wasting resources.
Difficulty for non-experts to effectively control LLMs.
In order to solve these problems, it comes up with:
RadixAttention: This technique automatically reuses parts of previous computations, reducing memory consumption and potentially speeding up calculations. It's unclear from the summary how this mechanism works exactly.
Domain-Specific Language (DSL): This provides a user-friendly way to interact with LLMs without requiring in-depth knowledge of their inner workings. The specific functionalities and syntax of the DSL are not mentioned in the summary.
Google wrote a blog post on VideoPoet, a large language model (LLM) capable of various video generation tasks.
Problem to Solve:
Existing video generation models often struggle with:
Limited motion complexity: Difficulty in creating realistic large-scale movements within videos.
Task specificity: Models trained for specific tasks like text-to-video might not generalize well to other tasks like image-to-video.
Paired data dependence: Requiring large amounts of paired data hinders flexibility and adaptation to new scenarios.
Approach:
VideoPoet uses a hierarchical LLM architecture with several key components:
Transformer encoder: Processes the input text or image, extracting semantic representations.
Hierarchical decoder: Generates video frames progressively, starting with low-resolution and refining details in higher resolutions.
Motion prediction module: Employs recurrent neural networks with attention mechanisms to capture temporal dependencies and generate coherent motion sequences.
Audio generation module (optional): Analyzes video content and generates corresponding audio using separate neural networks.
Above approaches enable the following capabilities:
Zero-shot learning: Achieved through transfer learning from a massive text corpus pre-trained on language understanding tasks.
Multimodal capabilities: The model can handle various input modalities (text, image, video) and generate different output formats (video, audio).
Their website has a number of really good examples.
Libraries
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with LLMs faster and more controllable by co-designing the frontend language and the runtime system.
The core features of SGLang include:
A Flexible Front-End Language: This allows for easy programming of LLM applications with multiple chained generation calls, advanced prompting techniques, control flow, multiple modalities, parallelism, and external interaction.
A High-Performance Runtime with RadixAttention: This feature significantly accelerates the execution of complex LLM programs by automatic KV cache reuse across multiple calls. It also supports other common techniques like continuous batching and tensor parallelism.
Holodeck can generate diverse types of 3D environments (arcade, spa, museum), customize for styles (victorian, bohemian), and understand fine-grained requirements ("has a cat", "fan of Star Wars").
3D simulated environments play a critical role in Embodied AI, but their creation requires expertise and extensive manual effort, restricting their diversity and scope. To mitigate this limitation, we present Holodeck, a system that generates 3D environments to match a user-supplied prompt fully automatedly. Holodeck can generate diverse scenes, e.g., arcades, spas, and museums, adjust the designs for styles, and can capture the semantics of complex queries such as "apartment for a researcher with a cat" and "office of a professor who is a fan of Star Wars". Holodeck leverages a large language model (GPT-4) for common sense knowledge about what the scene might look like and uses a large collection of 3D assets from Objaverse to populate the scene with diverse objects. To address the challenge of positioning objects correctly, we prompt GPT-4 to generate spatial relational constraints between objects and then optimize the layout to satisfy those constraints.
Code of Holodeck is available in GitHub.
PIA is a personalized image animation method which can generate videos with high motion controllability and strong text and image alignment.
Ragna is an open source RAG orchestration framework.
With an intuitive API for quick experimentation and built-in tools for creating production-ready application, you can quickly leverage Large Language Models (LLMs) for your work. The code is available in GitHub.
⚔️ Chatbot Arena ⚔️ : Benchmarking LLMs in the Wild
FastChat is an open platform for training, serving, and evaluating large language model based chatbots.
FastChat powers Chatbot Arena (https://chat.lmsys.org/), serving over 6 million chat requests for 50+ LLMs.
Arena has collected over 100K human votes from side-by-side LLM battles to compile an online LLM Elo leaderboard.
FastChat's core features include:
The training and evaluation code for state-of-the-art models (e.g., Vicuna, MT-Bench).
A distributed multi-model serving system with web UI and OpenAI-compatible RESTful APIs.
LobeChat is an open-source, high-performance chatbot framework
that supports speech synthesis, multimodal, and extensible (Function Call) plugin system.