OpenAI Keynote was one of the biggest news in AI world and if you are not living under rock, you must have heard or hopefully watched it. If you have not, I recommend watching it:
Axios mentioned how this event could be a mass extinction event for a number of startups that either use directly OpenAI model(ChatGPT) or indirectly uses it in solving different business problems. Eh, I think this would be an overstatement in many dimensions as it would be stretch of saying that OpenAI would solve all of the jobs to be done in Generative AI or Foundation Model landscape. Not only this is very unlikely, but I think we are very early stage in this era. I do not think anyone can predict what this landscape would look like in two or three years.
OpenAI’s execution speed is amazing, FWIW. I am regularly amazed how quickly, they can update the model and able to put good control/safety layers on top of the model.
The news was exciting to me in different reasons, but now we have a keynote that talks about fine-tuning the models in depth and people are excited about that everyone would have each separate personalized ChatGPTs(expensive albeit!). Some people compared to Apple’s keynotes such as Stratechery. I will digress.
As part of the keynote, they announced new ChatGPT 4 Turbo where they added a number of new features to the ChatGPT 3.5:
Increased context length: GPT-4 Turbo can now process up to 128k tokens of context, which is equivalent to 300 pages of a standard book.
More control: Developers have more control over the model's inputs and outputs.
Better knowledge: The model is up-to-date with knowledge of the world as of April 2023, and developers can easily add their own knowledge base.
New modalities: The API now supports DALLE·3, GPT-4 Turbo with Vision, and TTS.
Safety features: OpenAI's safety systems will help developers protect their applications against misuse.
On top of these features on top of their flagship product ChatGPT, they made a number announcements:
A new research lab focused on safe and beneficial AI. The lab will focus on developing new safety measures for AI,as well as on understanding the potential social and economic impacts of AI.
A new dataset of AI safety benchmarks. The dataset will be used to measure the progress of AI safety research.
A partnership with Microsoft to develop new AI tools and technologies. The partnership will focus on developing AI for a variety of applications, including healthcare, education, and environmental sustainability.
Moat of Foundation Model
Before starting the moat in the foundation model, moat term(which I really like!), I will go in detail what moat is.
Moat
Moat is a term that Buffet popularized in 1995:
"What we're trying to find is a business that, for one reason or another -- it can be because it's the low-cost producer in some area, it can be because it has a natural franchise because of surface capabilities, it could be because of its position in the consumers' mind, it can be because of a technological advantage, or any kind of reason at all, that it has this moat around it."
And moat is of course following the castle analogy, how castle will be protected in future with competitive pressures:
"But we are trying to figure out what is keeping -- why is that castle still standing? And what's going to keep it standing or cause it not to be standing five, 10, 20 years from now. What are the key factors? And how permanent are they? How much do they depend on the genius of the lord in the castle?"
Roughly, we do not really want to depend on management of the companies, because people are mortal and they die. But if business has a strong moat and keeps investing on it, it is possible that this would be a good investment that can defend itself through ups and downs in the market. Charlie Munger reiterates how important this is:
"And then if we feel good about the moat, then we try to figure out whether, you know, the lord is going to try to take it all for himself, whether he's likely to do something stupid with the proceeds, et cetera."
As I mentioned above, I do not believe OpenAI will be the only company in this space as it is still very early and I do not think anyone figured out where the main “moat” is in the FM(foundation model) or within FM ecosystem. I think eventually there will be some “head” models that are very expensive for very specialized use cases where accuracy and precision are really important for the business problem and for these case uses.
Modeling Dimension
Some of the examples could be Gemini model from Google and other models that are business critical tasks like search for Google. Other cases, anyone can get an open-source model and fine-tune for their use cases and get a good accuracy and that might be where most of the business value is. Some examples could be chatbot in ecommerce companies or generating autocompletes for companies that build editor tools or email services. Due to that, I am still not sold on the promise of the model is actually the moat. On the contrary, I think the moat is anything but the model. Because of that, most of the modeling architecture innovations are amazing and important, but eventually these are going to be commoditized like all of the rest of software is.
HW Dimension
As USA invests and puts a lot of restriction in the exportability of certain HWs especially NVIDIA GPUs, I now feel that more and more value is actually captured in HW. Ability to iterate on the models for fine-tuning purposes, or growing the model in capability depends on GPUs and GPU capability. As Nvidia rolls out better and better GPUs with larger memory, better processing power, this area might be where most of the business value might be lying around and especially model scaling can unlock the next step function change in this domain. This could be also one of the reasons why AI startups can raise a lot of funding only to use this to spend on their capex; mainly NVIDIA GPUs.
This is also the one of the main reasons why “AI” is very different than rest of the software where it prefers a certain set of HW type to run very effectively and efficiently where the rest of the software can run commoditized HW stack which makes the HW to be completely fungible and make it multiples of magnitude less important. Sure, there are certain databases that can work much effectively against different HW types, but differentiation of this is not worth of the HW differentiation.
Startups do raise a lot of money such as inflection.ai where they raised $1.3 Billion, but most of the money will be spent to build a cluster of 22000 H100 gpus. As VCs can fund these startups where they are subsidizing Capex for these startups(long live NVIDIA!), I cannot wrap my head around how these startups would return 10x+ multiples that VCs would be looking for.
Unless that step function change would come through larger and bigger compute that is through GPUs, you will build very big models and create a breakthrough only through large cluster of GPUs, and then all of a sudden, product that you are building becomes significantly better that will unlock next level growth.
But still begs the question, if the next step function change will be unlocked through HW, is main value still captured in HW? I think if the hypothesis is above, still HW could continue to be the main differentiator.
Nvidia is reportedly building a public cloud ecosystem around GPUs and they want to preserve the value of the GPUs more like a service in the cloud. By doing so, they can capture better value over the long term as they can “lease” GPUs rather than selling them. This would reduce the supply of other vendors like AWS, GCP and Azure (which by the way they all have separate efforts to replace GPUs)
Risc-V and other companies such as AMD try to at least remove NVIDIA from its monopoly status, but it is too early to predict what will happen in this dimension.
Data Dimension
However, one dimension I am very clear is the data dimension to be moat for the foundation models. This unfortunately is not very different than traditional software as the “reads” of database becomes much more important than “writes” in the database. Reads in this case on how the user is using the app can become more important than what the app is actually useful for. In here, I want to introduce a concept of ML Data Growth Cycle
ML Data Growth Cycle
This use case is not Foundation Model advantage per se, but rather any ml system can benefit from this closed loop cycle. For example, Tesla is an excellent example of this. They sell more cars, by collecting more data, they can build better self driving capabilities and better self-driving capabilities actually enables them to sell more cars, etc.
For example, Quora/Reddit like online forums generally have a large number of questions and answers and people generally come to find the answers through the website. However, users can also vote and like different answers which can be used to train an ML model on what a good answer could look like. This makes the data collected much more useful from the users along with the answer as users can make the app to be more useful. Another good example is Stackoverflow where a lot of software engineers can ask a lot of questions and there are a good number of answers to these questions.
Getty image is one of those examples where they build a model on top of the licensed images. This continued to be a main moat I think as companies can build different models on different proprietary data that is not easy to commoditize.
For FMs, generally the first step to train these models through public data and data that is available online, but for more granular use cases and cases that require fine-tuning, one needs to tailor the dataset in a more tailored manner. This is due to that, a model can only be as useful as the dataset it has trained on, but more importantly labels and how noise-free labels are.
Due to that, I am very bullish on companies that have specific data that either they collect through their customers, or their business model allows them to create a moat around the data that they use. By doing so, they are able to build models with better labels and datasets and models can increase the product usage which results more usage and more training dataset. Rinse and repeat.