Claude Sonnet vs Opus for Vibe Coding (4.6): Real Test Results

Model releases are coming so fast that it’s hard to tell what’s real progress vs just incremental gains.

With the release of Sonnet 4.6 this week, I decided to do some head-to-head testing, comparing it to Opus 4.6.

The benchmarks that AI labs release are becoming less and less reliable so I ran a few simple test prompts in Converge.

The test(s)

My first prompt was to build a Tower Defense game. This is largely a frontend type of task that forces the model to juggle state, UI, rendering, and game logic all at once.

Here’s the prompt I used:

Build a complete tower defense game with a fixed path where enemies spawn in waves, you earn money per kill, and you lose lives when enemies reach the end. Include at least 3 tower types (different range/damage/attack speed) with upgrades, plus a simple UI to place/sell/upgrade towers and start the next wave; keep the code clean and modular and ship a playable, balanced MVP. Include basic polish: pause/restart + on-screen stats (wave, lives, money).

The rubric

I broke the prompt into a checklist so it wasn’t just “vibes”:

- Runs immediately (no missing pieces, all in 1-shot)
- Fixed path + enemies spawn in waves
- Money per kill + lives decrease on leaks
- 3 tower types (range/damage/speed)
- Upgrades work
- UI: place/sell/upgrade towers
- Start next wave control
- Basic polish: pause/restart + on-screen stats (wave/lives/money)
- Feels like a shippable MVP (not a broken demo)

Tower Defense test

Opus 4.6

Overall, Opus 4.6 did a really solid job.

What stood out:
- Solid baseline UI (not gorgeous, but functional)
- Core loop worked: waves, towers, kills → money
- Upgrades + basic UX touches were there (and even some hotkeys)

Scorecard

Runs immediately (no missing pieces, all in 1-shot) — ✅
Fixed path + enemies spawn in waves — ✅
Money per kill + lives decrease on leaks — ✅
3 tower types (range/damage/speed) — ✅
Upgrades work — ✅
UI: place/sell/upgrade towers — ✅
Start next wave control — ✅
Basic polish: pause/restart + on-screen stats (wave/lives/money) — ✅
Feels like a shippable MVP — ✅

Score: 9/9

Sonnet 4.5

I decided to test Sonnet 4.5 as well for an additional baseline and to better show the progress within the Sonnet family from 4.5-> 4.6. I was shocked at how much worse the generation was. Sonnet 4.5 was… clearly behind.

What I saw:
- The UI was way more basic
- Animation and overall “polish” lagged
- Towers and enemies didn’t always show on the UI even though you could see that the enemy count was decreasing.

Scorecard

Runs immediately (no missing pieces, all in 1-shot) — ❌
Fixed path + enemies spawn in waves — ✅
Money per kill + lives decrease on leaks — ✅
3 tower types (range/damage/speed) — ✅
Upgrades work — ✅
UI: place/sell/upgrade towers — ❌
Start next wave control — ✅
Basic polish: pause/restart + on-screen stats (wave/lives/money) — ❌
Feels like a shippable MVP — ✅

Score: 6/9

Sonnet 4.6

Sonnet 4.6 is where things got interesting! My favorite generation, but not by a wide margin.

- UI felt better and more closer to a typical game
- Gameplay and movement felt smoother and more coherent
- Overall it hit the checklist cleanly

On top of performing better, Sonnet also comes in almost 50% cheaper than Opus as well.

Scorecard

Runs immediately (no missing pieces, all in 1-shot) — ✅
Fixed path + enemies spawn in waves — ✅
Money per kill + lives decrease on leaks — ✅
3 tower types (range/damage/speed) — ✅
Upgrades work — ✅
UI: place/sell/upgrade towers — ✅
Start next wave control — ✅
Basic polish: pause/restart + on-screen stats (wave/lives/money) — ✅
Feels like a shippable MVP — ✅

Score: 9/9

An additional test: Building a ChatGPT clone

Since Sonnet 4.6 handled the tower defense game pretty easily I wanted to push it further by having it recreate ChatGPT. Again, this was done in Converge.

Here’s the prompt:

Create a full-featured AI chat application replicating ChatGPT with advanced functionalities, including:
Core Features:
Natural language conversation with context awareness and multi-turn dialogue
Support for text input and output with rich formatting (bold, italics, code blocks)
Real-time typing indicators and message delivery status
User authentication and profile management
Conversation history with search and export options
Customizable user settings (theme, font size, notification preferences)
Advanced Functionalities:
Ability to handle multimedia inputs (images, audio) and generate descriptive replies
Contextual memory allowing users to reference past conversations
Adaptive learning to personalize responses based on user interactions
User Interface Design:
Clean, modern, and minimalistic layout with a soothing color palette (e.g., deep navy #1A1F36, soft teal #4FB6AC, light gray #F5F7FA, and white)
Readable sans-serif typography with clear hierarchy and ample white space
Responsive design optimized for desktop, tablet, and mobile devices
Smooth animations for message transitions and interactive elements
Accessible design with keyboard navigation, screen reader support, and sufficient contrast
Interaction and Feedback:
Clear visual feedback for user actions (sending, receiving, errors)
Typing indicators and read receipts for enhanced communication flow
Quick reply suggestions and auto-complete for faster interactions
Ensure the application provides an intuitive, reliable, and engaging conversational AI experience that scales across devices and adapts to diverse user needs.

A lot worked in the first generation, with a few pieces of functionality missing.

One thing that was very impressive was that it nailed cross thread search. Below you’ll see I mention that I am a Lakers fan.

Then I started a new chat and asked it what team I like and it remembered! Under the hood this is all powered by the agent component in Converge

chat window for chatgpt clone with Sonnet 4.6

Turning the prompt into a checklist:

Multi-turn memory (per-thread + cross-thread context) — ✅
Authentication + user accounts (OAuth / SSO ready) — ✅
Persistent conversation history (search + export) — ✅
Streaming responses with delivery states — ✅
Cross thread search — ✅
Rich text + code rendering — ❌ (kinda)
Multimodal input (image + audio upload) — ✅
Image understanding — ✅
Personalization layer (adaptive memory)
File upload handling — ✅
Responsive, accessible UI (desktop → mobile) — ✅
User settings (theme, notifications, preferences) — ❌

Verdict

Sonnet 4.6 is an awesome model. I’ve been testing it all week and it is as good, if not better than Opus 4.6, while also being cheaper.

The rate of change in the AI world is relentless!