Deadlines in Software Development: A Double-Edged Sword
Explore the role of deadlines in software development, weighing their pros and cons, and comparing continuous deployment to deadline-driven approaches.
Discover which large language models our team relies on most, why GPT-4o, Claude 3.7 Sonnet & GPT-4 Turbo lead the pack, and what this means for developers.
Large-language-model (LLM) tooling has become the backbone of modern software development. From auto-completing code to generating entire modules, the right model can shave hours off a sprint and spark fresh ideas. At Ottia, our engineers interact with multiple models via the Continue plug-in, giving us a unique window into real-world usage patterns. Rather than guess which systems deliver the most value, we measured actual token consumption—a reliable proxy for how much each model is trusted day to day.
The findings are striking. Although more than twenty LLMs see at least occasional traffic, three models account for the lion’s share of tokens:
1. Anthropic Claude 3.7 Sonnet
2. OpenAI GPT-4o
3. OpenAI GPT-4 Turbo Preview
In the following sections we explore why these three dominate, highlight their standout features for software developers, and offer practical tips on picking the right model for each task.
To keep the comparison objective, we calculated what percentage of total tokens each LLM processed during the past quarter. A higher percentage signals that developers not only tested but continuously relied on a model in daily workflows. For clarity, we have converted absolute token counts into relative shares and sorted the models from least to most used.
While lightweight systems such as Claude 3 Sonnet (legacy version) and GPT-3.5-Turbo-Instruct register single-digit usage, heavyweight offerings tell a different story. Claude 3.7 Sonnet alone accounts for well over a third of all tokens, followed by GPT-4o and GPT-4 Turbo Preview. Together these three models cover the majority of total usage, confirming that our team gravitates toward premium, state-of-the-art capabilities when deadlines loom.
Claude 3.7 Sonnet is Anthropic’s latest mid-tier model, optimized for both reasoning depth and speed. Developers praise three capabilities in particular:
• Extended context window – Sonnet can comfortably juggle large codebases or multiple design documents without losing track, reducing the need for manual chunking.
• Strong code understanding – The model excels at explaining unfamiliar repositories, suggesting architectural patterns, and catching subtle logical errors.
• Built-in safety protocols – Anthropic’s Constitutional AI approach keeps outputs well-aligned, which matters when generating production-ready code snippets.
Because Ottia values responsible AI by design, Sonnet’s safety guardrails resonate with our engineering culture. The model’s popularity shows that speed does not have to come at the expense of compliance.
GPT-4o (“Omni”) burst onto the scene with native multimodal input—text, images, and soon audio—while retaining GPT-4-level reasoning. For developers, the gains are tangible:
• Visual debugging – By pasting screenshots of error dialogs, engineers receive immediate explanations and fixes, saving context-switching time.
• Data visualization insights – Feeding charts or architecture diagrams allows GPT-4o to propose optimizations that text-only models might miss.
• Reduced latency – Despite its broad skill set, GPT-4o responds faster than earlier GPT-4 versions, making it ideal for iterative coding cycles.
Usage metrics confirm that multimodality is not just a marketing term—it directly influences day-to-day productivity, especially during front-end or UI-heavy tasks.
While GPT-4o dazzles with new modalities, GPT-4 Turbo Preview focuses on volume and affordability:
• Twice-the-tokens context – Turbo handles massive prompts, handy for refactoring entire projects or reviewing large pull requests.
• Streamlined pricing – Lower per-token costs encourage liberal use, so developers feel free to ask follow-up questions without budgeting anxiety.
• Plugin synergy – In Continue, Turbo integrates seamlessly with popular VS Code and JetBrains extensions, shortening the gap between suggestion and implementation.
The model fills the sweet spot for backend microservice scaffolding and exhaustive unit-test generation, tasks that demand scale rather than vision.
1. Scoping a Greenfield Module? Start with Claude 3.7 Sonnet to brainstorm architecture while staying within safe, coherent boundaries.
2. Debugging a UI Glitch? GPT-4o’s image input can identify layout anomalies faster than describing them in text.
3. Generating Bulk Tests? Lean on GPT-4 Turbo Preview to pump out hundreds of cases without breaking the budget.
By mapping project phases to model strengths, Ottia engineers avoid one-size-fits-all pitfalls and extract the best from each LLM.
Although the top three dominate, emerging contenders—such as Mistral Medium, Google Gemini 1.5 Pro Preview, and Codestral—retain niche appeal. Developers sometimes prefer these models for:
• Language diversity – Certain models excel in non-English locales, aligning with Ottia’s global footprint.
• Fine-tuning options – Some open-weights systems permit on-prem retraining, useful for proprietary codebases.
• Experimental features – Early adopters appreciate trying cutting-edge research, even if it hasn’t yet reached mass adoption.
Monitoring token share helps Ottia decide where to expand internal support and ensure the Continue plug-in remains a unified interface to the best large language models available.
The data underscores a broader trend: developers gravitate toward models that combine deep reasoning with practical ergonomics. Context size, cost efficiency, and multimodal input are not just nice-to-have features; they actively shape which LLM tops the usage charts.
As vendors roll out even larger context windows and more specialized coding modes, competition will intensify. Ottia’s neutral, model-agnostic environment—powered by Continue—places us in an ideal position to adopt new winners quickly, keeping our teams ahead in the race for productivity.
• Real usage data beats hype when assessing LLM performance.
• Claude 3.7 Sonnet, GPT-4o, and GPT-4 Turbo Preview dominate because they marry depth, speed, cost, and multimodal support.
• Mapping each development task to the appropriate model maximizes return on token spend.
• Staying model-agnostic via tools like Continue ensures engineers can pivot as better systems emerge.
Large language models have evolved from experimental toys to indispensable partners in modern software engineering. Ottia’s in-house analytics show that three frontrunners—Claude 3.7 Sonnet, GPT-4o, and GPT-4 Turbo Preview—currently set the standard for usability, accuracy, and developer satisfaction. By continually measuring token share and aligning model strengths with project needs, we empower our teams to deliver higher-quality software, faster. The landscape will undoubtedly shift again, but with a data-driven approach we are ready for whatever comes next.
With 3000+ professionals on board, we’re ready to assist you with full-cycle development.