What is the Future of Programming with Large Language Models?

The New Programming Paradigm

Weeks after ChatGPT's launch, renowned computer scientist Andrej Karpathy, OpenAI founding team member and former Tesla AI director, tweeted: "The most popular new programming language is English." This statement captures the current capabilities of ChatGPT and other Large Language Models (LLMs) to autocomplete source code and generate programs based on human instructions, marking another example of AI's penetration into fields requiring cognitive and specialized skills.

The question "Will programming disappear?" is likely an oversimplification. Instead, a comprehensive discussion about LLMs' impact on programming is complex, multifaceted, and nuanced—unfortunately often clouded by considerable background noise in ongoing debates.

Current Consensus and Open Questions

There's undeniable evidence that coding assistants like GitHub Copilot are improving competent programmers' productivity, who often perceive themselves as "augmented" rather than replaced. It's also accepted that LLMs can create software applications for tasks that aren't overly open-ended.

However, a different question remains: whether users of all programming experience levels will be able to easily create complex, specific applications without direct code interactions. This uncertainty prompted an innovative classroom experiment at Esade's MSc in Business Analytics program.

The Classroom Experiment

Instead of organizing a theoretical debate, the "Data-Based Prototypes" elective course conducted a hands-on experiment where students discovered first-hand evidence through practical challenges. Students were assigned three brief programming tasks:

Create a website with specific functionality using an unfamiliar programming language
Build a web application with the same functionality using a familiar programming language
Develop a simple game using a known language but with complex operational flow

The key constraint: students could only use ChatGPT prompts to generate code.

While students already had experience with ChatGPT and coding assistants like GitHub Copilot for development, debugging, and explanations, this exercise placed them in unfamiliar situations under competitive pressure across various scenarios (known vs. unknown languages, easy vs. difficult logic).

Key Findings and Insights

LLMs as "Code Interpreters"

LLMs function like "code interpreters" that process natural language inputs and generate code fragments. Users describe their programming needs, LLMs respond with code, and users iterate until achieving desired results. This familiar workflow underlies innovations like OpenAI's Data Analyst or Anthropic's Artifacts.

The 80/20 Rule of Code Automation

Code automation should be understood through an 80/20 lens. While generating basic designs and functionality through LLM prompts is relatively quick, students observed that achieving detailed, specific functionality can be challenging. Experienced users view LLMs not as omnipotent tools, but as instruments capable of completing 80% of work in just 20% of the time. This approach's utility depends on the importance and cost of the remaining 20%.

Programming Knowledge Remains Valuable

Understanding technical concepts and providing specific prompts (e.g., "use HTML's Bootstrap library") yields better LLM results and faster progress. Human-AI collaboration produces maximum efficiency—confirmed by data showing that while 48% of prompts specified functional prototype characteristics, 19% included technical concepts, code update instructions, or code itself.

Effective Patterns Emerge

API Integration: LLMs excel at generating "bridge code" to connect various API calls, supporting the rapidly evolving programming landscape with growing API support.

Human Adaptation: Students quickly discovered that translating code between languages worked more effectively than requesting instructions for new languages. As text-generating algorithms, LLMs perform more reliably on deterministic tasks (code translation) while producing more variability in open-ended tasks.

Planning Improves Performance: Prompts like "Make a plan first" not only inform users but lead to better code generation. Due to models' autoregressive nature—where each text fragment generates based on previous text—maintaining coherence without intermediate planning can be challenging.

Conclusions and Future Implications

These results highlighted debate points that students could have read independently but instead experienced through hands-on practice. However, many debate angles remain uncovered by this exercise. Programming challenges were kept simple to fit classroom activity constraints, but larger application development involves greater complexity.

The debate is anchored in current tool forms, so future innovations might introduce new interfaces or paradigms for code generation and assistance. What we can say now is that LLMs benefit both expert and novice programmers while simultaneously blurring lines between coding and natural language for both groups.

Human-AI collaboration appears to generate the most effective results, underscoring programming skills' continued value. We're navigating a dynamic landscape where LLM paradigms and tools are still evolving, making future direction predictions difficult.

As Pixar founder and "Creativity Inc." author Ed Catmull wisely said, sometimes the best way to explore a path is to walk it. Both students and instructors are encouraged to embrace this uncertainty by actively experimenting and exploring new approaches, as practical experience is crucial for discovering innovative solutions and gaining deeper understanding.