This week we have few articles and libraries. I want to expand Kaggle’s survey a bit more by taking some of the screenshots from the slides. I recommend going through the slides as it helps building a point of view on what data science looks like(technical, location and salaries).
Google published a blog post on AI Pathways. Similar to Foundation Models research direction, Google also proposes learning a large model for various tasks and activate certain “pathways” in the model per task.
TLDR for why this is a better approach:
Today's models mostly focus on one sense. Pathways will enable multiple senses
Today's models are dense and inefficient. Pathways will make them sparse and efficient.
Today's AI models are typically trained to do only one thing. Pathways will enable us to train a single model to do thousands or millions of things.
Kaggle published a survey around data science and machine learning. I recommend checking out especially libraries and frameworks sections.
Ongoing learning section has the Coursera as the most commonly choice for learning resource.
Jupyter notebooks are uncontested the most popular IDE and it is followed by VSCode. This also very well aligns with my experience.
Matlab and RStudio is losing market share and that is due to the effect of Python I assume.
This slide is surprising to me not because the distribution, but rather Kaggle separated various deep learning architectures separately.
Scikit-Learn is the most commonly used library for non-deep learning machine learning methods.
Tensorboard leads the board for keeping and managing data scientists’ day to day workflow.
Google’s AutoML solution is the leading across the usage of data scientists. I was surprised to find out that Azure’s solution is more commonly used than Amazon’s SageMaker.
Libraries
AIMET is an AI Model Efficiency Toolkit that works with Tensorflow and PyTorch libraries which provided various quantization and compression techniques for the models. It works well with Qualcomm chips as the company that open-sourced is Qualcomm. It has a model zoo which has a number of widely used models and their quantized equivalents. There are a number of tutorials available in here.
PythonOT is a library that provides various advanced solvers to solve optimization problems in machine learning and computer vision.
Notebooks
Hierarchical Transformers which creates a hourglass structure transformer which shows impressive results. The paper is also available in here.