If you need to know 5 things about PyTorch, take a look at this thread:
Articles
Stateof.ai published a post for last year trends in the following slides.
TLDR:
AI is stepping up in more concrete ways, including being applied to mission critical infrastructure like national electric grids and automated supermarket warehousing optimization during pandemics.
AI-first approaches have taken biology by storm with faster simulations of humans’ cellular machinery (proteins and RNA). This has the potential to transform drug discovery and healthcare.
Transformers have emerged as a general purpose architecture for machine learning, beating the state of the art in many domains including NLP, computer vision, and even protein structure prediction.
Investors have taken notice, with record funding this year into AI startups, and two first ever IPOs for AI-first drug discovery companies, as well as blockbuster IPOs for data infrastructure and cybersecurity companies that help enterprises retool for the AI-first era.
The under-resourced AI-alignment efforts from key organisations who are advancing the overall field of AI, as well as concerns about datasets used to train AI models and bias in model evaluation benchmarks, raises important questions about how best to chart the progress of AI systems with rapidly advancing capabilities.
AI is now an actual arms race rather than a figurative one. AI researchers have traditionally seen the AI arms race as a figurative one -- simulated dogfights between competing AI systems carried out in labs -- but that is changing with reports of recent use of autonomous weapons by various militaries.
Within the US-China rivalry, China's ascension in research quality and talent training is notable, with Chinese institutions now beating the most prominent Western ones. The world’s dependence on Taiwan's semiconductor industry, which makes AI chips for global tech giants, is a central point of geopolitical tension.
As with other aspects of the so-called “splinternet”, there is an emergence and nationalisation of large language models.
HuggingFace wrote a blog post about collaborative training. The main idea of the article is to how to enable distributed/decentralized training of large neural networks so that multiple participants can contribute to training to the large neural networks. By doing so, one can train very large models in a much more cost effective manner as “volunteers” will provide computing power without a central unit will take all of the computing costs on themselves. In the article and the paper, as much as $12M training cost for large models such as GPT-3, this method is provided in order to distribute the training as well as the cost of the training.
Google wrote a blog post end to end system for visual language model.
The model aims to build end to end language model with a single prefix language model objective which accepts prefix to generate text. For images, the prefix could be as simple as “a picture of”, “an image of”. It could also be a question form as in “what is the profession of this person”.
The model exhibits intriguing zero-shot behaviors in multimodal understanding tasks and worth to check out if you are interested in visual language models.
Twitter wrote about algorithmic amplification of political content.
TLDR:
Tweets about political content from elected officials, regardless of party or whether the party is in power, do see algorithmic amplification when compared to political content on the reverse chronological timeline.
Group effects did not translate to individual effects. In other words, since party affiliation or ideology is not a factor our systems consider when recommending content, two individuals in the same political party would not necessarily see the same amplification.
In six out of seven countries — all but Germany — Tweets posted by accounts from the political right receive more algorithmic amplification than the political left when studied as a group.
Right-leaning news outlets, as defined by the independent organizations listed above, see greater algorithmic amplification on Twitter compared to left-leaning news outlets. However, as highlighted in the paper, these third-party ratings make their own, independent classifications and as such the results of analysis may vary depending on which source is used.
Google published a post on dual deployments between TFX, Kubeflow and Vertex AI in this blog post. Using a common set of components from TFX/Kubeflow, this approach makes it easy to deploy Google Cloud as well as systems that uses Kubernetes. If you are doing development locally and using GCP for production systems, it might be a good introduction post. The code with examples is also available in here.
Twitter rearchitected their data processing pipeline through Kafka and Google Cloud and they talked about how the new architecture looks like in this post.
The new system is able to handle 4millions event per second and its latency is around 10 second for processing time.
DeepMind published a post on the opportunities and challenges to detoxify large language models.
They found the following approaches work well in order to reduce probability harmful text generation by large language models:
Filtering the LM training data annotated as toxic by Perspective API,
Filtering generated text for toxicity based on a separate, fine-tuned BERT classifier trained to detect toxicity
Steering the generation towards being less toxic, is highly effective at reducing LM toxicity, as measured by automatic toxicity metrics.
I wrote unlearning process in this newsletter.
Libraries
Last week, I have received some suggestions for AutoML libraries, this week, sharing more widely used libraries:
Gluon is an AutoML library from AWS that is easy-to-use and easy-to-extend AutoML with a focus on automated stack ensembling, deep learning, and real-world applications spanning text, image, and tabular data. Intended for both ML beginners and experts, AutoGluon enables you to:
Quickly prototype deep learning and classical ML solutions for your raw data with a few lines of code.
Automatically utilize state-of-the-art techniques (where appropriate) without expert knowledge.
Leverage automatic hyperparameter tuning, model selection/ensembling, architecture search, and data processing.
Easily improve/tune your bespoke models and data pipelines, or customize AutoGluon for your use-case.
EvalML is an AutoML library from Alteryx that builds, optimizes, and evaluates machine learning pipelines using domain-specific objective functions. If you are already using FeatureTools and/or Compose, it would be a perfect fit as it tightly integrates with these two libraries.
FLAML is an another library for AutoML from Microsoft. If you or your company uses Azure as a cloud provided, you might think of using that as it integrates well with Azure and has a number of functions and modules for Azure infrastructure.
Hivemind is a library that allows you to train models in a decentralized fashion.
In a nutshell, you want to train a neural network, but all you have is a bunch of enthusiasts with unreliable computers that communicate over the internet. Any peer may fail or leave at any time, but the training must go on. To meet this objective, hivemind models use a specialized layer type: the Decentralized Mixture of Experts (DMoE).
Books
Mathematics for Machine Learning is a freely available online book that talks about various mathematical concepts in machine learning.
Videos
4th Workshop on Closing the Loop Between Vision and Language can be watched in YouTube. Note that it is about 7:30 hours, you may need 1 full week to go through it.
MLIR Open meeting is a weekly meeting that goes over Torch MLIR project. This project aims to produce Multi-level Intermediate Representation for PyTorch. The slides are here.
This session talks about motivation behind of MLIR and gives an introduction on PyTorch on a simple IR(Intermediate Representation example)
XMRec Keynote covers fairness in machine learning and more specifically to the recommender systems.