OpenAI announces ChatGPT
Netflix does FederatedSearch, Microsoft publishes Perception-Action Causal Transformer
If you have to read only a paper this week:
OpenAI publishes ChatGPT this week and I want to cover this outside of the articles that I share regularly.
The model itself is a subset of a family language of models that were published before by OpenAI called InstructGPT.
An excellent example is in the following where the chatbot helps the engineer to debug a problem.
USER
this code is not working like i expect — how do i fix it?
resultWorkerErr := make(chan error)
defer close(resultWorkerErr)
go func() {
defer cancel()
resultWorkerErr <- b.resultWorker(ctx)
}()
err := b.worker(ctx)
cancel()
if err == nil {
return <-resultWorkerErr
}
return multierror.Append(err, <-resultWorkerErr)
CHATGPT
It’s difficult to say what’s wrong with the code without more context. Can you provide more information about what the code is supposed to do and what isn’t working as expected? Also, is this the entire code or just a part of it?
USER
just part of it — the error never surfaces. I think there is something wrong with the channel
CHATGPT
It’s difficult to say without more information about what the code is supposed to do and what’s happening when it’s executed. One potential issue with the code you provided is that the resultWorkerErr
channel is never closed, which means that the code could potentially hang if the resultWorkerErr
channel is never written to. This could happen if b.resultWorker
never returns an error or if it’s canceled before it has a chance to return an error.
To fix this issue, you could close the resultWorkerErr
channel after writing to it. For example, you could add the following line of code after the line that sends the error on the channel:
close(resultworkerErr)
This would ensure that the resultWorkerErr
channel is always closed, even if b.resultWorker
never returns an error. Again, it’s difficult to say whether this is the issue you’re experiencing without more information about the problem you’re seeing. Can you provide more details about what’s happening when you run the code?
This level of details and understanding of the original code is itself pretty impressive, but going one step further and suggesting different approaches to the engineers how to debug the issue and possible suggestions on “how to debug better” the code is pretty impressive.
I am sure this is relatively cherry-picked example, but it shows the training methodology and reinforcement learning’s success on a very large language model well.
Even though there are some limitations that are outlined in the blog post, the bot is really good and has a large breadth of topics that it can hold intelligible conversations.
You can interact with the chatbot in the following website: https://chat.openai.com
Some of the limitations are in the following:
ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers. Fixing this issue is challenging, as: (1) during RL training, there’s currently no source of truth; (2) training the model to be more cautious causes it to decline questions that it can answer correctly; and (3) supervised training misleads the model because the ideal answer depends on what the model knows, rather than what the human demonstrator knows.
ChatGPT is sensitive to tweaks to the input phrasing or attempting the same prompt multiple times. For example, given one phrasing of a question, the model can claim to not know the answer, but given a slight rephrase, can answer correctly.
The model is often excessively verbose and overuses certain phrases, such as restating that it’s a language model trained by OpenAI. These issues arise from biases in the training data (trainers prefer longer answers that look more comprehensive) and well-known over-optimization issues.12
Ideally, the model would ask clarifying questions when the user provided an ambiguous query. Instead, our current models usually guess what the user intended.
While we’ve made efforts to make the model refuse inappropriate requests, it will sometimes respond to harmful instructions or exhibit biased behavior. We’re using the Moderation API to warn or block certain types of unsafe content, but we expect it to have some false negatives and positives for now. We’re eager to collect user feedback to aid our ongoing work to improve this system.
I will cover some of the possible applications in a much more depth in the following newsletter.
Articles
Netflix explained how they build a federated search on their heterogeneous contents for their content engineering.
Given the wide adoption of the federated gateway they adopted earlier within Content Engineering, they decided to implement the Studio Search service as a DGS (Domain Graph Service) that integrated with the federated gateway. The search APIs (besides search, we have other APIs to support faceted search, typeahead suggestions, etc) are exposed as GraphQL queries within the federated graph.
This integration with the federation gateway allows the search DGS to just return the matching entity keys from the search index instead of the whole matching document(s). Through the power of federation, users are then able to hydrate the search results with any data available in the federated graph. This allows the search indices to be lean by indexing only the fields necessary for the search experience and at the same time provides complete flexibility for the users to fetch any data available in the federated graph instead of being restricted to just the data available in the search index.
Microsoft publishes a blog post on Perception-Action Causal Transformer (PACT), a generative transformer-based architecture that aims to build representations directly from robot data in a self-supervised fashion. Through autoregressive prediction of states and actions over time, the model implicitly encodes dynamics and behaviors for a particular robot. This representation can then function as a single starting point to achieve distinct tasks through fine-tuning with minimal data. The code is available in GitHub.
Libraries
Milvus is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment.
BEIR is a heterogeneous benchmark containing diverse IR tasks. It also provides a common and easy framework for evaluation of your NLP-based retrieval models within the benchmark.
For an overview, checkout their wiki page: https://github.com/beir-cellar/beir/wiki.
For models and datasets, checkout out HuggingFace (HF) page: https://huggingface.co/BeIR.
For more information, checkout out their publications:
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models (NeurIPS 2021, Datasets and Benchmarks Track)
Da Vinci was a famous Renaissance painter who had the ability to turn corruption into magic. The DaVinci Toolkit is a set of intelligent image/video enhancement tools, aiming to solve the pain points of existing video enhancement and restoration tools, give full play to the advantages of AI technology and lower the threshold for users to process video footage.
PyTerrier is a Python framework for performing information retrieval experiments, built on Terrier.
nnprofiler provides a utility class
LayerProf
to measure the forward and backward execution time of PyTorch'snn.Module
which could be a single layer or a complete model fromtransformers
,torchvision
, etc or your own custom model. It captures and provides the timings for all the layers present in the model.NannyML is an open-source python library that allows you to estimate post-deployment model performance (without access to targets), detect data drift, and intelligently link data drift alerts back to changes in model performance. Built for data scientists, NannyML has an easy-to-use interface, interactive visualizations, is completely model-agnostic and currently supports all tabular use cases, classification and regression.
Posterior predictive distributions quantify uncertainties ignored by point estimates. The Neural TestBed provides tools for the systematic evaluation of agents that generate such predictions. Crucially, these tools assess not only the quality of marginal predictions per input, but also joint predictions given many inputs. Joint distributions are often critical for useful uncertainty quantification, but they have been largely overlooked by the Bayesian deep learning community.
This library automates the evaluation and analysis of learning agents:
Synthetic neural-network-based generative model.
Evaluate predictions beyond marginal distributions.
Reference implementations of benchmark agents (with tuning).
On Twitter
Random things that I found on Twitter that I found inspiring, funny and insightful: