It’s a Race to the Top. Claude 3.5 Sonnet Claims the Crown. For Now...

I believe that competition makes most things better, and my goodness, do we have competition in the LLM/GenAI space right now! And, most importantly, users are the big winner!

It’s a Race to the Top. Claude 3.5 Sonnet Claims the Crown. For Now...
The AI arms race continues apace: Anthropic is launching its newest model, called Claude 3.5 Sonnet, which it says can equal or better OpenAI’s GPT-4o or Google’s Gemini across a wide variety of tasks. The new model is already available to Claude users on the web and on iOS, and Anthropic is making it available to developers as well. - https://www.theverge.com/2024/6/20/24181961/anthropic-claude-35-sonnet-model-ai-launch

Claude 3.5 Sonnet

Anthropic recently released a new versions of their frontier large language model: Claude 2.5 Sonnet. This model has performed well in benchmarks, besting GPT-4o in many.

Benchmark table showing Claude 3.5 Sonnet outperforming (as indicated by green highlights) other AI models on graduate level reasoning, code, multilingual math, reasoning over text, and more evaluations. Models compared include Claude 3 Opus, GPT-4o, Gemini 1.5 Pro, and Llama-400b.
Via Claud.ai

A nice chart of benchmark scores over time.

Image
Via Sam Mcallister of Anthropic

Artifacts

At the same time they introduced the new LLM, they also added a new interface component, something they call "artifacts". This allows you to have the things that Claude creates for you–the documents, the code, etc–open up in the same interface, not just in the chat window, and makes it easier to visualize and also iterate on those things. (As we know, it often takes quite a few prompts to get GenAI to create what we want it to, in part because we don't know exactly what we want until we get it. Iteration is key. But what does what we are actually iterating on look like?)

Artifacts is a simple-looking change, but I feel it's a powerful one. I should note, for the most part, I've thought of this as a programming interface. I ask Claude to write some code, and it does. Some of that code can be "previewed" visually, as in execute Javascript, in the artifacts window. This is a useful model.

However, from a programming perspective, the artifacts interface is just a different kind of IDE, and from that perspective, this is likely a stepping stone to something else, something that I don't think will happen in a browser, more likely in the Integrated Development Environment (IDE). This is an interesting situation in that we have to ask ourselves what programming is. Is it typing code in an IDE or is it asking a GenAI to build something and iterating on it in some non-IDE interface? Hard to say at this point, as it depends on how well GenAI ends up programming, i.e. there will be limits to what it can do...where will we hit those limits? In some ways it is the idea of some kind of no-code graphical interface versus what we are used to in IDEs–text manipulation.

Below is the "artifact" link for what Claude created, the "</>" section, which when clicked opens up in the GUI on the right side of the page.

Claude Artifacts

Now that code is opened into a window on the right side of the GUI.

Claude Artifacts

Overall Innovation Race

That is what I love about competition. We have a massive, massive amount of competition in the LLM world. We have several "frontier" LLMs, and large, well-resourced companies building (semi-)open source models that create competitive pressure in other ways.

  1. OpenAI - openai.com - Known for developing ChatGPT and GPT-4.
  2. Anthropic - anthropic.com - Focuses on creating reliable and interpretable AI systems.
  3. Cohere - cohere.ai - Provides access to LLM and NLP tools through their API.
  4. Mistral AI - mistral.ai - Develops open-source LLMs like Mistral 7B and Mixtral 8x7B.
  5. Perplexity - perplexity.ai - Offers AI-powered search and chat with access to various LLMs.
  6. Meta - meta.com - Develops semi-open-source models like Llama
  7. Google Gemini

Writing Code with Claude 3.5 Sonnet

One thing that people, myself included, immediately did with Claude was to create little games with it. I made a snake game and a letter based game. After a few iterations, I got it to a place where it was kind of fun, even if the graphics are simplistic.

That was all I gave Claude to start.

"can you make a game of snake in react, tron style"

Initially it created a perfectly fine, working version, if extremely simple, version of Snake.

I kept iterating.

  • A version where the computer opponent's trail is deleted when they "die."
  • A version where the trails stay behind.
  • Or here's a quick game, spent less than 5 minutes on it, called letterdrop, trying to combine something like Scrabble and Tetris. Fun!

Gotchas with Claude 3.5 Sonnet and Artifacts

  • I should note that at one point Claude's code used the three.js library, which is not available in the Claude interface, so it could not run this code.
👍
It makes sense that the GUI interface has limited libraries available for security reasons, but still, it makes it less fun as of course there are powerful libraries available overall, just not in Claude's GUI.

I have not tried this but here is a possible solution.

  • Even with a PRO subscription, you will run out of tokens, and get the "You are out of messages" error and you'll have to wait. I rarely run into this with ChatGPT. But of course behind the scenes these LLMs are spending a lot of $$$ in terms of GPU access and electrical power. Free AI isn't free.
💡
I have purchased a subscription to Anthropic's Claude, but even then it says you only get ~5x what a free subscription has. It's not unlimited.
  • Like other models, Claude 3.5 Sonnet will forget what code you were working on.
  • It also likes to not show all the code. Though it is better at showing all the code than ChatGPT is.
  • Long chats are a problem..."This conversation is getting a bit long."

Other Opinions

Here's a fun video from Kitze where he generates working Tailwind code from an image.

💡
There is some strong-ish language in the video from Kitze, but I like his reaction because it shows that some people are very excited for this kind of capability.

Here Rafal creates 3D physics simulation with a single request to Claude.

  • Put Twitter in Linkedin:

A user on Reddit:

This might be controversial, but my god. This thing is insane. I'm coding a browser in PyQt5, and, if there was an error, ChatGPT just couldn't fix it for some reason. Not only that, but if I wanted new features, I would have to hope that it actually ran. This is no longer a problem with Claude. If I ask it to add a new feature, it does so flawlessly, 90% of the time. If it does throw an error, Claude is able to fix it in 1 or 2 prompts max. If someone from Anthropic is reading this, you have absolutely outdone yourselves. This model is incredible, that is the best word I could come up with for this, I literally can't think of a better word. - https://www.reddit.com/r/ClaudeAI/comments/1dl43bs/claude_35_sonnet_absolutely_shits_on_gpt4o_in/?rdt=44307

A Long Way to Go Still, But Lots of Fun Right Now

I want to be very clear that I don't think GenAI can replace developers. I have more to come in this area on TIDAL SERIES, but I think it is easy to see that much of the work developers do is not related to actual programming–in many ways, programming is perhaps the smallest part of what developers do on a daily basis. Programming is hard, no doubt. But it is not all that developers do. Also, while GenAI can write code, it needs to know what to write; it doesn't have goals. It absolutely cannot do this on its own. It's a form of lossy compression/uncompression or token prediction; it can't think for itself. Someone has to guide it. Someone has to understand what it is producing.

With that statement out of the way, I feel like this version of Claude really moves the goalposts forward in the so-called "arms race" or what might be better terms "innovation race" in that 1) it's really good, 2) it's faster, and 3) we can see some of the results of the code we're asking the GenAI to produce right in the same interface. This makes it a lot of fun to use and build with. I have worked a lot with ChatGPT, writing code to see what it was capable of, watching over time as its programming capabilities have actually degraded as it has been potentially "nerfed", but now with Claude 3.5 Sonnet, I see a significant step forward, especially in combination with the Artifacts interface, and in fact will be using Claude 3.5 Sonnet as my "daily driver" LLM so to speak.

I love the competition we see in LLMs!

👊
Thanks for reading! Please forward on to your friends and colleagues.

Further Reading

Anthropic has a fast new AI model — and a clever new way to interact with chatbots
GPT-4o. Gemini 1.5. And now Claude 3.5 Sonnet.
Introducing Claude 3.5 Sonnet
Introducing Claude 3.5 Sonnet—our most intelligent model yet. Sonnet now outperforms competitor models and Claude 3 Opus on key evaluations, at twice the speed.
Why Anthropic’s Artifacts may be this year’s most important AI feature: Unveiling the interface battle
Anthropic’s Artifacts, a new AI workspace feature, may revolutionize human-AI interaction and reshape enterprise software, signaling a shift in focus from raw capabilities to user experience in the competitive AI landscape.

Subscribe to Tidal Series by Curtis Collicutt

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe