GPT-5 Pro & Claude Code: Physics Breakthroughs and the Rise of Agents

GPT-5 Pro and o3 are tackling long-standing challenges in quantum gravity and theoretical physics, shifting the AI research paradigm toward a focus on "result verification." Simultaneously, Claude Code is signaling a transformation in development workflows via specialized project review agents, while Subcube has pushed technical boundaries by achieving speeds 52 times faster than FlashAttention. Alongside quality optimizations from Google Pomely and the expansion of practical projects at Krish Naik Academy, we examine how frontier AI labs are evolving into "deployment machines" tailored for Wall Street and government agencies—and how they are closing the gap between infrastructure demand and the scale of CapEx investment.

Krish Naik Academy Expands Practical Projects to 100

Krish Naik Academy is expanding its range of industry-level projects from 75 to 100 to enhance practical AI capabilities. Offered via annual subscription, these projects span a wide array of categories, including Generative AI, Agentic AI applications, computer vision, data engineering, DevOps, cloud, machine learning, NLP, and Python. To rapidly incorporate the latest industry trends, the academy launches live bootcamps every two months and focuses on deepening learner expertise through specialized tracks, such as the Agentic AI specialization course.

The core of this practical training lies in mastering the efficient use of modern tools. A prime example is the agent system in Claude Code. In this system, an agent operates as an independent Claude instance running in a separate context window to focus on a specific task. Each sub-agent maintains its own context window and memory, returning only a single summarized result to the main instance upon completion. This serves as a "context protection" mechanism, preventing the main context from being polluted by the massive output generated when searching through hundreds of files, thereby allowing complex tasks to be executed while keeping the main conversation flow clean.

The process of building these agents is also automated to maximize operational efficiency. Through the "Generate with Claude" feature, users can describe the agent's role and trigger conditions in natural language, and the system configures the optimal agent accordingly. When creating specialized agents, such as a code improvement advisor, the system can automatically generate detailed system prompts exceeding 10,000 characters, incorporating senior-level expert prompting. The resulting agent configurations, project recognition behaviors, and analysis methodologies are automatically saved and systematically managed as MD files within the `.cloud` folder.

In the actual implementation phase, "Plan Mode" is utilized to ensure stability. In Plan Mode, plans proposed by the agent are not immediately reflected in the actual code; instead, users can configure a workflow where they select only the necessary parts of the proposal to write into a `plan.md` file for execution. This systematic approach goes beyond simple code generation by integrating the review and approval processes required in industry settings into the curriculum, helping learners experience professional-grade development processes.

Google Pomelli: Optimizing Quality by Excluding Text

Google Pomelli is a powerful tool capable of generating a diverse range of brand-ready visual assets from a single product image. The 'photoshoot' feature, in particular, plays a critical role in rapidly expanding product concepts. When a user selects from provided templates—such as studio, ingredient, in use, or contextual backgrounds—the AI automatically generates various iterations. This allows brands to secure at least four different concept sets without the need for physical shoots, providing high efficiency in diversifying marketing materials.

However, moving beyond static images to animation requires a strategic approach to maintain quality. The most critical consideration when using Pomelli's animation feature is how text is handled. When animations are generated with text included, the AI often struggles to process the characters, frequently resulting in blurred text or unexpected visual artifacts. Such issues diminish the professionalism of the output and can negatively impact brand image.

To achieve high-quality results, it is recommended to use the 'animate without text' option. By intentionally excluding the text generation process, the AI can focus solely on the harmony between the product's movement and the background, fundamentally preventing text distortion. This approach improves the overall polish of the animation and provides the most effective path to ensuring both visual quality and messaging clarity by adding precise text during the subsequent editing stage.

Additionally, Pomelli provides fine-tuning capabilities to optimize the composition of the generated output. If the product's position appears unnatural or the overall layout feels awkward, the 'fix layout' feature can resolve these issues. This function allows the AI to re-analyze the image and readjust the product's orientation or position to create a more visually stable composition. Ultimately, Google Pomelli delivers maximum value as a brand asset creation tool when the concept expansion of the photoshoot feature, the compositional optimization of layout correction, and the quality maintenance of the text-exclusion option are combined.

SubCube Achieves 52x Speed Increase Over Flash Attention

SubCube, positioning itself as a next-generation LLM architecture, has presented performance metrics that could redefine existing standards for computational efficiency. It claims to achieve computation speeds 52 times faster than Flash Attention, which is currently widely used across the industry. Notably, the ability to operate the model at less than 5% of the operating costs of Claude Opus suggests new possibilities for reducing AI infrastructure expenses.

The core of this dramatic performance improvement lies in a fundamental refinement of the Attention mechanism. Standard attention methods are limited by an exponential increase in computational load because they process every relationship between all words in a text. In reality, however, only a small fraction of these relationships contain meaningful information. SubCube has drastically reduced computational costs by introducing a methodology that identifies and focuses on only the critical relationships rather than scanning every single one.

This efficiency has the potential to shift the paradigm of model scaling. SubCube emphasizes that its new methodology allows for model scaling using 1,000 times fewer computing resources than previously required. Beyond mere speed increases, this means the context length that an LLM can process can be expanded significantly. For example, it becomes possible to process entire vast bodies of Python source code or large-scale libraries in a single input while maintaining high accuracy.

Consequently, SubCube's approach could serve as a critical turning point in lowering the barrier to entry for high-performance AI models and maximizing their utility. By drastically reducing the consumption of computational resources while simultaneously ensuring model performance and scalability, it is expected that new dimensions of LLM use cases will emerge, transcending existing hardware constraints.

GPT 5.2 pro: Formulating and Proving Physical Formulas

GPT 5.2 pro has demonstrated the ability to formulate hypotheses—a critical stage in scientific discovery—moving beyond simple information retrieval or computation. In deriving a specific physics formula, GPT 5.2 pro first made a conjecture regarding the final formula, establishing the hypothesis that grounded the research. This illustrates that AI has evolved beyond merely combining or summarizing training data to possess the advanced cognitive ability to logically infer new physical relationships and propose them as hypotheses.

This study employed a systematic workflow that clearly decoupled the hypothesis formulation and proof stages. The hypothesis derived by GPT 5.2 pro was subsequently subjected to a rigorous proof process using a separate internal OpenAI model. By utilizing different models for generation and verification, the system implemented a scientific research methodology based on internal cross-verification.

Strict constraints were applied to ensure the objectivity and reliability of the proof process. The internal model tasked with the proof began the operation in a completely fresh session, without access to prior conversation context or specific limiting cases. This strategic approach was designed to eliminate "spoon-feeding"—the provision of hints that guide the model toward the correct answer—to verify whether the AI could reach a logical conclusion independently and without external prompting.

This independent execution provides strong evidence that the AI's results are not the product of mere pattern matching or data contamination. The fact that two distinct models reached the same physical conclusion in isolated environments suggests that AI-driven scientific reasoning has achieved a level of practical reliability. Ultimately, this case demonstrates that AI can successfully manage a sophisticated professional research workflow, spanning from the hypothesis of complex physical laws to their rigorous proof.

ChatGPT Pro Derives Quantum Gravity Research Results

Artificial intelligence has moved beyond simple text generation and entered a stage where it can produce tangible research outcomes in the domain of advanced theoretical physics. Recently, ChatGPT Pro demonstrated this potential by generating meaningful results in the field of quantum gravity. In this process, the human acted in a "steering" role—asking the right questions and setting the research direction—while ChatGPT Pro handled the entirety of the complex mathematical derivations. This case illustrates that AI has reached a level capable of performing professional scientific reasoning and computation.

Specifically, this achievement was realized through the ability to generalize an existing paper on gluons to a gravity-based case. After the user provided the original gluon paper along with two key modifications required for the transition to gravity and assigned the AI the persona of a "distinguished theoretical physicist," the AI entered a full reasoning phase. Following approximately 20 minutes of deep thinking, the AI performed mathematical calculations and conducted sanity checks, ultimately producing a draft very similar to a paper published on arXiv.

These results suggest that AI is not merely recombining trained data, but possesses high-level capabilities to understand the logic of a specific physical system and extend it to another. While the human remained responsible for establishing the overall context and defining the physical significance of the results, the AI demonstrated high efficiency in the actual mathematical expansion and logical derivation. In particular, the attempt to understand how graviton amplitudes transform under physical symmetries shows that AI can be a core tool in identifying symmetries—the first step in discovering new theories.

Consequently, ChatGPT Pro has proven its ability to understand the structure of professional academic papers and derive precise mathematical outputs based on given constraints and modifications. This marks a shift in the methodology of theoretical physics research, suggesting that combining human intuitive guidance with AI's precise computational power can drastically increase the speed and accuracy of research.

GPT Shifts the AI Research Paradigm Toward Verification

While AI has drastically accelerated the pace of research, the nature of the time researchers spend has fundamentally shifted. Where most energy was previously spent on the "generation" process of finding answers, "verification"—determining whether the results provided by AI are accurate—has now become the core of research. In one instance, although ChatGPT provided an answer in just 3 days, it took 3 weeks to finally publish the paper. This was because the majority of the time was spent on the rigorous process of verifying the accuracy of the AI's output. In other words, a structural shift has occurred where verification now demands far more time than the actual writing process.

This paradigm shift underscores the importance of "harnesses" or "scaffolding" surrounding a model, rather than the performance of the AI model itself. Tools such as Claude Code, Codex, and Cursor serve as structural mechanisms and toolsets that enable models to perform actual tasks. NVIDIA's Voyager, which perceived and manipulated its environment using only a text-based harness system without a visual model, demonstrates that the system design used to control and verify output—rather than the model's raw generative capability—is the key driver of practical results.

However, a new bottleneck known as the "Deployment Gap" is emerging during these verification and implementation stages. While AI model performance is already sufficiently high, there is a shortage of professional personnel capable of building harnesses and deploying them to fit real-world environments. To build a successful AI solution, the knowledge of domain experts who understand internal business intricacies must be combined with the expertise of engineers who understand the inner workings of models and harnesses.

This explains why Palantir achieved success with its "Forward Deployed Engineer (FDE)" model, where elite engineers are dispatched directly to client sites. Moving beyond simply selling a product and assisting with installation, this approach involves engineers residing at the client's office to write code and configure harnesses, taking responsibility for the entire process until the product is actually operational and verified. This is becoming the practical implementation strategy for the AI era.

GPT-5 and o3: Mastering Complex Physics Calculations

For a long time, AI capabilities have been centered on linguistic generation, such as writing. The prevailing perception was that there were clear limitations in mathematical reasoning and advanced scientific computation, but OpenAI's recent o3 and GPT-5 models are completely challenging this perception. In particular, the emergence of o3—the first model equipped with powerful reasoning capabilities—suggests that AI has entered a stage where it can perform complex logical thinking and precise calculations beyond simple text generation.

GPT-5 demonstrated its capabilities by successfully reproducing high-difficulty physics calculations that only a handful of experts worldwide can perform. A representative example is GPT-5's accurate reproduction of the highly challenging calculation process included in the paper "Why is there no love in black holes?". This is a symbolic milestone, demonstrating that AI has begun to practically implement the advanced insight and computational power of human experts.

This shift has brought about a fundamental change in how we view AI. In the past, strict formal verification was considered essential because AI showed limitations in mathematical tasks; however, o3 and GPT-5 have raised questions about that necessity by accurately performing and reproducing extremely difficult calculations. Just as human experts discuss and understand proof processes in natural language, AI has now reached a level where it can directly perform complex mathematical and scientific reasoning based on advanced intelligence.

Ultimately, the achievements of o3 and GPT-5 hold the potential to shift the paradigm of scientific research. Mastering expert-level physics calculations is more than just a performance boost; it means AI has become a powerful tool capable of playing a central role in actual scientific discovery and research. The combination of accurate reproduction of high-difficulty calculations and reasoning capabilities is expected to drastically expand the scope of AI application in future scientific research.

GPT-5 Pro Demonstrates Superiority in Mathematical Physics

GPT-5 Pro has demonstrated performance that surpasses competing models in mathematical physics, a field requiring advanced reasoning capabilities. The principle of symmetry, central to the laws of physics, is a critical tool for explaining why certain physical quantities equal zero and is essential for understanding complex phenomena, such as the fact that black holes do not experience tidal forces. Through sophisticated logical progression, GPT-5 Pro derived correct answers for these mathematical physics tasks, establishing itself as the highest-performing model in the domain.

The model's capabilities were particularly evident in equation tests addressing symmetry problems in flat spacetime. After approximately 9 minutes of deep reasoning, GPT-5 Pro produced a highly elegant and perfectly structured solution. This suggests the model possesses the ability to mathematically reconstruct and solve complex physical concepts rather than simply synthesizing existing information. Other competing models tested simultaneously failed to provide the correct answer, confirming a significant performance gap.

GPT-5 Pro also achieved meaningful results with more complex black hole-related problems. In one specific instance, a "warmer problem" was used as a priming strategy to allow the model to grasp and adapt to the context. Following this preliminary step, GPT-5 Pro derived the correct answer in under 30 minutes. Once again, other competing models failed to find a solution, further validating GPT-5 Pro's relative advantage in high-difficulty mathematical physics tasks.

However, the model revealed clear limitations regarding recent research findings not included in its training data. When presented with a symmetry problem involving black hole perturbation equations from a paper published in June, GPT-5 Pro reasoned for approximately 5 minutes before incorrectly concluding that no symmetry existed. This indicates that reasoning errors can still occur when dealing with recent academic data that the model has not yet learned. Consequently, while GPT-5 Pro is dominant in solving established complex physics problems, its performance varies depending on the inclusion of the most current knowledge.

AI Solving Unsolved Problems in Theoretical Physics

Artificial intelligence is shifting the paradigm of theoretical physics research by solving physical challenges that have long eluded human experts. Recently, AI provided clear solutions to specific physics problems that experts such as Andy, Alfredo, and David had spent a year attempting to solve without success. While AI has not yet fully conquered the grand challenges that have stumped the entire academic community for decades, it has demonstrated that it has passed the threshold for solving specific problems that require expert-level analysis.

This capability is leading to a dramatic increase in research productivity. For problems similar in nature to previously performed calculations, AI can find a solution in just 30 minutes and has reached a level where it can draft a paper for submission to an archive. Given the right guidelines and direction, it is theoretically possible to produce one paper per day, establishing AI as a new reality in physics research.

More notably, the scope of AI's capabilities is expanding beyond simple computational assistance to address "open questions" in theoretical physics. Within the past month, AI models have begun tackling unsolved problems, particularly in the fields of quantum gravity and quantum field theory, which are considered the pinnacles of modern physics. This suggests that AI can deliver substantive results even in cutting-edge theoretical domains that require highly abstract thinking.

The way AI reasons is also evolving. In the past, formal verification using languages such as Lean was considered essential; however, as model intelligence has surged, AI is now performing mathematical proofs in a manner similar to how human researchers discuss proofs using natural language. By handling complex physical proofs through advanced natural-language reasoning rather than relying on rigid symbolic logic, AI is approaching theoretical challenges in a way that more closely mirrors the thought processes of human experts.

AI Research: Result Verification Emerges as the New Bottleneck

The adoption of AI is fundamentally shifting the paradigm of scientific research. Its efficiency is particularly striking in the early stages of manuscript preparation. While researchers previously spent significant time and effort analyzing vast datasets and drafting logical narratives, AI has now drastically shortened this process. However, increased writing speed does not automatically translate into a proportional reduction in the overall research timeline.

The 'graviton' paper project serves as a clear example of how research processes are evolving in the AI era. In this project, AI was used to complete the initial draft with remarkable speed. However, the researchers spent the majority of their time not on drafting, but on the meticulous process of verifying the accuracy of the AI-generated results. This suggests that while AI has boosted productivity, the verification required to ensure reliability has emerged as a new bottleneck, slowing the overall research workflow.

This phenomenon occurs because the speed at which AI models perform complex calculations and data processing far exceeds the pace of human review. While a model can instantaneously produce sophisticated-looking outputs, identifying latent errors and confirming the final accuracy still requires intense focus and significant time from the researcher. Consequently, the central challenge of research has shifted from a matter of productivity—"how to write quickly"—to a matter of reliability—"how to verify accurately."

Consequently, for AI models in scientific research to truly evolve, they must move beyond simple text generation and strengthen their verification capabilities. To overcome current limitations and optimize the overall research workflow, models need integrated features that either ensure accuracy autonomously or enable researchers to verify results more efficiently. Resolving this verification bottleneck will be the critical factor determining the success of future AI-driven scientific research.

Building a Dedicated Project Review Agent with Claude Code

Claude Code enables the construction and operation of dedicated agents capable of overseeing an entire project and suggesting directions for improvement. By creating a specialized agent, such as a 'Code Improvement Advisor,' developers can move beyond file-level analysis to conduct deep dives into the entire codebase, receiving specific recommendations that include high-priority, critical issues. This approach is highly efficient, as it automates the identification of structural flaws and potential errors that are often difficult for developers to detect manually.

The configuration process allows for precise tool settings to ensure security and stability. Although the agent requires access to the entire project folder to read and analyze code, 'read-only tools' can be configured to prevent the risk of unauthorized code modifications. This ensures the agent remains focused on its role as an analyst—safely scanning the project and delivering analysis results—allowing users to obtain reliable reviews without concerns over code tampering.

Once established, these agents are stored in a library, becoming reusable assets that can be summoned on demand. Users trigger the automated review process by selecting the pre-configured 'Code Improvement Advisor' from the library and assigning a specific task to review the full codebase and suggest improvements. Throughout this process, the agent systematically performs its analysis while maintaining a comprehensive understanding of the project's context.

After the analysis is complete, the agent highlights the identified issues, categorized by severity as High, Medium, or Low. This provides users with a clear priority list for remediation. By addressing high-severity items first, teams can rapidly stabilize the project and adopt a strategic approach to incrementally enhancing overall code quality.

AI Infrastructure Demand Outpaces CapEx Investment

While capital expenditure (CapEx) for AI infrastructure is increasing astronomically, actual market demand is expanding at an even faster rate. In the first quarter of this year, CapEx from the Magnificent 7 (Mag 7) companies exceeded $400 billion. However, the demand backlog—combining reported figures and estimates—has reached approximately $1.3 trillion. This indicates that the volume of infrastructure required by the market overwhelmingly exceeds the scale of current investments, and the gap between the two continues to widen.

This surge in demand is clearly evident in the token sales market. Infrastructure providers are currently unable to sell tokens quickly enough to keep up with the influx of requests. This serves as tangible evidence that demand at the actual service stage is exceeding supply capacity, rather than simply reflecting large investment volumes. Essentially, a bottleneck is intensifying as the pace of infrastructure expansion fails to keep up with the growth of demand.

The growth of individual companies is even more dramatic. According to analysis by SemiAnalysis, Anthropic's annual recurring revenue (ARR) has grown explosively from $9 billion to over $44 billion recently. This demonstrates that demand for AI models from enterprises and users is rapidly converting into actual revenue, moving beyond mere anticipation. Anthropic's case is a definitive indicator of the current state of demand overheating in the AI infrastructure market.

The velocity of this growth is remarkable. Anthropic's ARR has been doubling every six weeks, and according to calculations by analyst Ming Li, approximately $96 million in ARR is being added daily. Such a steep growth curve suggests that existing CapEx investment plans may be insufficient to fully meet future demand. Consequently, the AI infrastructure market has entered an unprecedented demand-driven growth phase where the pace of demand expansion is outstripping the scale of investment.

Frontier AI Labs Transform Wall Street and Government into Deployment Engines

Frontier AI labs are now moving beyond mere technical development, transforming massive organizations like Wall Street and government agencies into AI deployment systems. This signifies that AI technology has reached a critical inflection point, moving past theoretical possibilities to penetrate deeply into actual societal systems. While previous AI efforts were limited to laboratory achievements or improving convenience in a few services, the financial sector—the core of capital markets—and government agencies—the backbone of public administration—are now evolving into deployment engines that fully embrace AI.

Until recently, market sentiment remained skeptical. There was a lack of concrete use cases for AI and no clear model for how it would generate revenue. These doubts naturally fueled theories of an AI bubble. Many media outlets and experts offered pessimistic forecasts, arguing that the current AI craze was based on baseless expectations and that the entire system would collapse once the bubble burst.

However, recent trends are unfolding in a markedly different direction. While concerns about an AI bubble have not entirely vanished, the center of the discussion has shifted from the possibility of a collapse to the stage of actual deployment. As the technical maturity provided by frontier AI labs increases, vague expectations are being implemented as functioning systems. This is more than just the adoption of tools; it is a process in which the operational frameworks of massive institutions are being reorganized around AI.

Consequently, Wall Street and government agencies have become powerful deployment engines that spread AI technology most rapidly and extensively. As questions regarding the efficacy of the technology are replaced by actual implementation cases, the pace of AI deployment is accelerating to an unstoppable level. This shift is expected to be the decisive moment when AI establishes itself not as a supplementary tool for specific industries, but as the core infrastructure driving national systems and the global financial framework.

AI Deployment Machines: Solving the Challenges of Enterprise AI Implementation

While the potential of AI is immense, deploying and implementing it within actual business environments is far more difficult than anticipated. Many are impressed by AI's powerful performance and expect immediate adoption across all corporate processes, but real-world implementation is a different matter. Unlike the initial excitement felt during the testing phase, building and deploying AI agents in a production environment entails significant technical and operational challenges, which serve as primary bottlenecks slowing the diffusion of the technology.

A particular issue is the vast gap between research achievements and practical application. Deep research labs often employ various "hacks" to find ways to make AI operate efficiently within simple applications. However, adapting these ideas to fit the complex business configurations of an actual enterprise can often take 12 months or more. There is a structural limitation where proven technical feasibility does not immediately translate into business productivity.

Consequently, skepticism has emerged regarding the feasibility or profitability of enterprise-level AI implementation, with some studies reporting failure cases. To confront these obstacles, AI companies are strategizing by aligning with major industrial players. Their core objective is to build what is known as an "AI deployment machine"—a systematic approach designed to minimize the trial and error experienced by individual companies and rapidly integrate AI technology across corporate processes.

Ultimately, the creation of an AI deployment machine is a strategic move to rapidly convert laboratory breakthroughs into business value, moving beyond simple technical support. By uniting AI firms with industry practitioners, the goal is to eliminate implementation bottlenecks and accelerate the practical application of AI. This is expected to be the decisive driver in narrowing the gap between the pace of AI development and its industrial adoption, allowing AI to quickly establish itself as a core corporate competency.

The FDE Model: Targeting High-Value Industries

The Forward Deployed Engineer (FDE) model is more than simple technical support; it is a strategy optimized for penetrating high-value industries characterized by complex requirements and significant risk. Its true value is realized in high-stakes environments—such as hospitals, banks, and government agencies—where operational specificity is high and minor errors can lead to catastrophic outcomes. These sectors face unique and challenging problems that differ from general enterprises, requiring a sophisticated approach that goes beyond the mere delivery of a product.

Because standard Software as a Service (SaaS) products are designed for versatility, they often struggle to meet the highly specific demands of particular industries. Many of the challenges faced by high-value sectors fall outside the capabilities of off-the-shelf products, inevitably necessitating customized implementations. Ultimately, the ability to accurately diagnose a client's complex problems and provide tailored solutions becomes the primary driver of high revenue generation.

Palantir serves as a pioneering example of this approach. Even amidst the current wave of Large Language Models, Palantir identified a way to achieve tangible results early on through the FDE model. This structure involves deploying engineers with advanced technical expertise directly into the field to solve customer problems collaboratively. They recognized that real value—extending beyond the technology itself—is created only when personnel with expertise on par with OpenAI engineers are integrated with the complex mechanisms of actual industrial operations.

Consequently, the FDE model provides a stronger competitive advantage in markets with higher technical complexity and risk. The process of having expert engineers deeply embed themselves in a client's specific environment to resolve issues allows a company to establish an irreplaceable position in high-barrier, high-value markets. This is a quintessential example of how specificity and customized solutions can be converted into business value in an era of commoditized technology.