Vibe Coding: Why the Great Divide?
Hi! I’m Kevin Turnbull.
I’ve been working in bespoke software development throughout the course of an established career, and today I’m giving some secrets away.
I’ve worked as lead developer, project manager, portfolio manager, and mentor for upcoming developers, so I’m confident that wherever you are in your career—I’ve got some good advice for you.
It’s by way of my work history that I find myself very well positioned to benefit from the AI revolution. It feels like many of these tools were designed specifically for a career product manager like myself.
What I’m seeing though is a divide:
“Vibe coding is so easy! I can do anything!”
— Some Smart People
“All that this AI produces is useless slop!”
— Equally Smart People
I’ve been on both sides of this and I’ve built up a number of habits that I think are useful best practices to keep in mind when vibe coding.
The Introduction
My colleague David V. Kimball put out a video recently with his top 10 tips for vibe coding, which did a great job of opening the door for a lot of people to have more nuanced discussions about their vibe coding practices. Now that the door is open, I’d like to give you a bit of a tour with more of a deep dive into these topics.
This list here was prepared by David and represents a great cross-section of the challenges that come up for people getting into vibe coding:
- Use version control
- One feature at a time
- Clarify before proceeding
- Maintain an AGENTS.md file
- Use debugging and logging when possible
- Test/validate every change
- Review diffs & request change summary
- Determine whether you should start a new chat
- Use other AI agents
- Be AI platform agnostic
My hope is to take a more technical lens to the topic and dive a little deeper on some of the ideas he brought up.
The Methodology
My view of how to describe projects is derived from project manager and portfolio manager best practices. I have found these habits to be extremely helpful in working with LLM-based code generators.
It’s been a core part of my work to interact with all the stakeholders of a project, identify what they need, and then figure out how to make the best possible version of it available to them.
To put it another way—I was KevinGPT long before ChatGPT started muscling in on my territory. I’ve been academically and commercially exposed to and formally responsible for driving the kinds of conversations that need to happen in order to create really high-quality enterprise software.
My goal here is to define some terms of art used in AI, then use those definitions to help you get the best results out of your vibe coding efforts.
The List
As I go through this list of tips, I’ll be bringing my experience to expand on David’s introduction. I’ll be covering the topics a bit out of order so that seasoned industry insiders can skip some of the basic tips without missing out on some of the really great tips that I break out toward the end.
Also, David made some great points which I won’t be repeating because I want to leave you with a good reason to watch his video too.
Basic Tips
Test/Validate Every Change
One thing I’d like to add to what David’s already said about this is that automated testing can be a great tool that works really well with agentic development workflows.
I mention this because when you work with highly generated code, you notice that it tends to pick a dead-end solution a minimum of… let’s generously say 10% of the time. When this happens, often the best move is to just delete everything and start from a recent checkpoint after identifying some lessons learned. By having a few automated tests—you can save a lot of time by throwing out that bottom 10% without taking any of your time reviewing it.
This is especially true if you can feed the results of the test suite back into the LLM. If you do this right, you’re using it as an input to create an “inner loop” that updates an activity log that an LLM can use to determine the root cause of whatever problem it’s working on originates. This is way better than letting it keep guessing at fixes without any new information—the goal is to allow it to stay focused on the “outer loop” of creating the new version rather than getting stuck in infinite loops where it retries the same broken patch ideas.
Recommended reading here would be the August 2025 MLE-STAR algorithm paper—making considerations for:
- The Outer Loop as a planner
- The automated test-driven Inner Loop representing Aabl
- You and your LLM session as Aretriever
Review Diffs & Request Change Summary
I don’t have anything much to say here except that at the end of the day—you’re responsible for shipping good results. Reviewing changes before you finalize them in a commit is not just best practice; in commercial development, most people won’t care what tool you use—but they will care what you commit. If you don’t understand what the LLM was doing on your behalf—your minimum responsibility is to ask it to explain it back to you before you force everyone else on the team to figure it out.
Use Other AI Agents
Part of the reason why I have a well-developed intuition for how these LLM systems operate is that in addition to using cloud-based services, I run a couple of pretty modest video cards in computers on my home network. This workbench allows me to use, explore, and swap between some of the smaller open-weights models.
This has exposed me directly to a lot of the limitations, including common failure modes and what they look like—in fact, a big part of the reason I use local models despite their context window size, knowledge cut-off, and poor general intelligence scores is to force myself to think through how to get the best results out of a bad model so that I can get excellent results out of infrastructure-scale LLM services.
My brother goes so far as to run double-blind experiments where he puts two LLMs up against each other to process the same context. Then he has each LLM give scorecards for both outputs. You can learn a lot when they agree on which output is better and even more when they disagree.
With that being said…
Be AI Platform Agnostic
I don’t recommend the average person goes fully local because cloud models offer a lot of advantages, and we haven’t yet seen a good wave of new hardware that’s been optimized for consumer-level LLM inferencing to bring down costs.
With that being said—you should try out different service providers. You’ll learn a lot by comparing and contrasting their offerings. It’s the only way to figure out what you like best.
Best Practices
Clarify Before Proceeding
This was a huge insight from David—I can’t overstate the value of requesting an LLM provide a list of clarifying questions. Most LLMs have a big problem of “champing at the bit”—which is to say they’re very likely to jump into writing code as soon as you describe just the opening few ideas of what you’re looking to achieve rather than asking probing questions to dig deeper on the goals.
Often these early code submissions are less valuable and may significantly miss the mark for what you want to achieve in a work session if you expect to have a conversation but the machine jumps right into writing code.
The simplest solution is to use “Plan” Mode (or your tool’s equivalent) where you’ve blocked the tool from making actual code changes. In this mode, it’s unable to move forward with generating code, so it asks exactly the kind of probing questions that you need to draw out the details of the work to be performed.
There’s a balancing act here though—you don’t want to have a long discussion in plan mode and fill the context window up with either conflicting or repeated opinions; eventually you need to either formalize the discussion into a written plan or move over to actually generating code for the session. Getting the timing right depends on what tools you’re using.
If your tool doesn’t provide a plan mode, you should focus on authoring really good project context documents as we discuss in an upcoming section before letting it know what language you want it written in. By doing this and rejecting any attempts to write code before you’ve got a detailed plan in place—you’re forcing it out of the engineer role and into more of a technical architect position.
The next level up is to start to view the code itself as a form of build artifact. Under this model—you write plans before you use the plans to generate code. The underlying goal here is to reduce the back-and-forth discussion mode and force the LLM into more of a “one-shot challenge” which aligns well with the core training layer of the models rather than the arguably weaker reinforcement learning side. Aside from aligning the context with the LLM’s training environment—it also helps you by guiding you towards a clarity of understanding of exactly what needs to be built and in what particular order. This ensures that you spend less time figuring out minor bugs in a module that’s planned to be thrown away anyway in the next minor version release.
Lastly—avoid relying on the latent space to retain the “meaning” of what you want done. You should be asking the LLM probing questions that force it to think through side effects that are important to you. LLMs store a really amazing amount of context when generating their responses—but they’re not actually trained on what you specifically want to accomplish today. The more explicit you can be with the instructions, the lower the error rate will be. Getting the LLM to write out its context window gives you an opportunity to course-correct when things are getting lost in the weeds.
How to Use Version Control
Something that I think is important to keep in mind when creating anything that’s going to take a long time is making a plan to manage the many evolving versions it will go through.
One of the key functions of version control is to keep a clean “digital workspace.” The repository is the workshop that your LLM tools will have to operate within. If you’ve ever worked in a physical workshop, you know to put things away when you’re done so that others can find them. Version control is in many ways the digital equivalent of this.
Here are some key tips:
- Place commits any time:
- You have a new working version
- You want to have a place to rollback to if something goes wrong with prompting
- You want to switch what workstation you’re using
- You have finalized a specific change to documentation
- Define new versions any time:
- You want an LLM to solve a group of similar problems all at once
- You want to split up a large feature into smaller components that can be built in one-shot
Create a Plan for Each Version and Commit It
It can often be a good idea to have a detailed discussion about what should be accomplished in the next block of work and to solidify this as a plan before making any code changes.
I’ve found it helpful to commit this plan and have the code changes be a side effect of the agent making an attempt at executing the plan.
If the agent is unsuccessful because it can’t get all the automated tests to pass—then all of the file-based context that’s being used to generate the prompt can be easily reused after rolling back the code changes.
If the failure is recurring—you can add notes into the plan so that the next generation is forewarned and forearmed.
What’s the Benefit of Committing Just the Plan?
The “dream” of vibe coding is to say one thing and have the AI “read your mind” in writing a whole app that matches what you’ve described.
This would be called “one-shot prompting” where you provide a description and the AI runs with it. This is in contrast to “few-shot” or “conversational” prompting where you uncover ideas and solutions through an extended chat.
If we want to live the vibe coding dream—we need to lay the right foundations. That “magic one-shot” prompt can be just “Generate version 1.3.44 and run the test suite. Iterate over code changes until all errors are resolved and the test suite passes in full.” (or whatever)
For it to be that simple, the plan needs to include a fulsome description of the project at hand and the goals of the version being worked on must be clear.
As an advanced method, if you get a really great description written for a version, then it’s a great opportunity to try out parallelized generation of candidate code bases. You can be testing one version while the next version is being generated. If the manual tests pass—then that branch can be moved forward to work on the next feature unlocked by the next minor release. If each novel version is committed to its own branch, you can do a depth-first search to find the most feature-rich versions of the product and a breadth-first search to find sibling code bases most similar to the current one.
A Clean Workspace
Circling back to the concept of version control as being analogous to keeping a clean shared workspace—when we finish execution of a plan file, there are a number of custodial tasks to be done. Change logs and README files need to be updated as well as status documents or related plans.
It almost sounds like there’s a lot of work to be done still… because… well… there is. You can rely on an LLM to do this part of the work also, but perhaps due to my background, I’ve found this to be the position that provides me the highest leverage to determine the overall success and direction of the project.
You might find however that other operations within the software development lifecycle suit your skills better. For example—you might prefer to use an LLM to generate a base class or interface for a new feature and leave the core business logic of the implementation up to you. If your project is highly technical in nature and LLMs have been generating unusable results, then this is often the best way forward.
If you’re having trouble getting the results you’re hoping for, you should reach out to us—we’ll help get you back on the right track. Just a reminder since it bears repeating: software engineering is hard work. Vibe coding makes it more accessible—but it hasn’t become easy.
My elderly father has been a self-described “bad programmer for 50 odd years” and now he’s able to actually create the things that he’s always thought would be cool. He’s been able to achieve this mostly through vibe coding—so I’m a huge fan of everyone giving it a try. With that said, he’s not targeting enterprise-ready release candidates—so security, optimization, scalability, and all sorts of other constraints are basically deleted… which is great.
Perhaps it helped my father to have the benefit of quite a few chats about the limitations and frustrations of how LLMs process their world. If you’re interested in discussing these kinds of topics—I’d love to have you by for some of my twice-weekly online office hours.
One Feature at a Time… Sort-Of
As the product manager in charge of running an agentic workflow, you’ve got a few different jobs that need to be done. You’ve got to generate a strong business case and organize the order of operations for how to approach the overall product development process. In performing this work, you won’t be working on just one feature at a time; you’ll be planning across different user stories and objectives.
Consider “semantic versioning” standards when documenting changes—you will have seen three numbers separated by periods in the past. “Semantic versioning” is a description of how to decide if you should release a “major” change, a “minor” change, or a “patch.” Doing this right makes it easier to plan ahead of the kind of work that can’t be done until current work in progress has been cleared up.
One of the key reasons that you need to focus the code changes you make in one session with an LLM on just one feature at a time is that LLMs suffer from both too little and too much context.
The balance point is always hard to achieve and particularly so if you work on too many things at once. One thing that LLMs are notoriously bad at is juggling. Whenever possible—give an LLM a single task and when it is complete, reset back to a known standard starting point then update relevant documentation with lessons learned.
What Is Context Anyway?
Psychologist George Miller’s 1956 paper popularized a “theory of the mind” for how our human brains work. The concept was that we can hold 7±2 (or… more simply, 5 to 9) “items” in our working memory at any given time. It draws an informal boundary on what we can be thinking about when we form a response to a question that’s asked of us or complete a body of work.
In the case of a human mind, an “item” can be a highly detailed and complicated skill-set that took years to develop into muscle memory (like “be safe while doing carpentry”), an abstract data structure that suits the software problem being worked on, or a cluster of a few digits of a telephone number. These items are the “context” you’re working with at any given time.
Computers work differently—the chunks they work with as “items” are much smaller; they’re words or word fragments. We call the smallest pieces “tokens” and different models currently generally work with numbers of tokens in the tens of thousands to millions of them.
Too Little Context
Some people don’t get the kinds of results they’re looking for because their instructions are vague or were delivered as written instructions distributed sparsely throughout the chat history or spread out through a bunch of different rules files.
This can make it hard for the LLM to decide what’s important right now. While it might feel like you’re giving the AI “a lot to work with” by chatting with it for a couple hours—if you’re typing out a message of a few hundred or even thousands of words in your prompts, it’s still a tiny amount of content in comparison to the business requirement documents that have traditionally been required to guide engineering teams in achieving business objectives.
Too Much Context
On the other hand—what happens when you run out of space.
Depending on what kind of system you’re using, it might simply start deleting stuff from the beginning… like… its own name, what kind of project you’re looking to build, the vital importance of not deleting your production database. In other words… really important stuff.
Without some way of keeping the size of the conversation small enough for the LLM to fit it all in memory—they often start to generate useless outputs.
What most cloud providers do if you’re coming close to the maximum that they can store about an active conversation is to “compact” it—sometimes they’ll warn you ahead of time, but it always comes right after the LLM says:
“I finally understand exactly what you were looking to have me do.”
Compacting Conversation
And as soon as compaction finishes—it can feel like calling a friend to check in and you get the feeling that they’ve somehow completely forgotten you and anything you’ve ever done together.
I’ve got some tips about how to handle compaction coming up.
Pro Tips
Maintain an AGENTS.md File
What’s the Value of an AGENTS.md File?
When compaction happens—it’s good to have a way to recover the work in progress from a known checkpoint.
The Goldilocks Context of an AGENTS.md File
Finding the balance between too much and too little context is one of the biggest challenges that I see when working with LLMs.
As a general rule—less context is better as long as everything that’s needed is provided.
This is because as the size of context grows—LLM performance drops off. So getting really great results means giving enough context, but not too much.
I break my minimal context for a project up into a couple files:
AGENTS.md
- Assume the agent knows nothing. It’s clay with no shape. What do you want it to have top-of-mind about the project when it first wakes up in the morning.
- This is the mission briefing for a specific work session—you describe the current status of the project and the goals that you want it to accomplish by the end of the session.
- It serves the role of being a “table of contents” for the AI to use when deciding what files to load into its working memory.
- Some additional details about programming standards, preferred frameworks, and special cases or rules are appropriate to add into an AGENTS.md but may be more suited to an IDE or system-level rules file.
WIP.md
- This is an in-repository backlog of active issues. Rather than requiring an LLM to load this kind of information from an MCP server, have a file available with the key information needed for the project’s current state.
- By active issues—I mean that this file shouldn’t contain plans for the work in progress; just a clear description of the problem.
- Brief descriptions of current work in progress or known issues (with file references).
CHANGELOG.md
- Description of recent changes and how they have developed over time.
- Brief release notes for versions.
- When items are resolved from WIP, they’re moved to CHANGELOG.md.
README.md
- A human-readable description of the project. This should include information that a person would need in order to start using or contributing to the project.
- In smaller projects—all of the other documents defined here might be stored all in the README.md file.
Use Debugging and Logging When Possible
The best practices of debugging are a big part of developing code at scale—and doing them right can make development significantly easier.
Good error or logging messages traditionally do two things.
- They help you find what code is executing around the message (you can copy part of the text and do a search through the code base for that text).
- They tell you what’s in memory at that point so that you can mentally process the rest of the algorithm that’s occurring around that message.
When working with LLMs, they now provide an additional key benefit—they allow the LLM to trace the execution path of a specific session. If this session is derived from an automated test case, then it’s going to be much easier to identify root causes for problems as they arise.
Some tips that I like for this are:
- Let the LLM run wild with adding emojis. They’re fine in a console and they help significantly with visually tracking what’s going on when there are a lot of moving pieces.
- Include with each error message the key pieces of data that are in play while executing that method.
- Consider adding a slug to each error message—a unique string that you can use when referencing the issue in a changelog, todo list, or other documentation.
- Consider logging techniques that include an up-to-date timestamp so you can identify performance bottlenecks, race conditions, or other timing issues.
- When using an agent to load your application and watch its logs—make sure that error messages are routed to somewhere visible to the LLM; without this information, the LLM is a bit blind as to what the root cause might be when working on front-end errors.
- Terminal or IDE-integrated LLMs are quite good at performing search on large files. They use commands to read just relevant parts of the file into the context window—use this to your advantage by writing out logging of application sessions to files and have the LLM search those files for the “inside track” of what happened during the session.
Determine Whether You Should Start a New Chat
The Zen of Compaction
BEWARE! COMPACTION!
So lots of cloud-based services do what’s called “compaction.” I’m going to oversimplify here—but they tell the model to look at the whole conversation and generate a summary of everything in as few words as it can. This helps with avoiding a bloated context window; a lot. But it also means that a lot of nuance of work in progress can get lost when these unpredictable “compaction” events happen.
One thing that I like to do when I’ve had a long conversation with an LLM is to proactively do my own “compaction.” If I see that the conversation is cutting pretty close to the context window limits of the model I’m working with, I often tell the AI to summarize everything we’ve been discussing into a CRUSH.md file. I then review the crush file to identify any factual errors, correct those, then reset the LLM and tell it to integrate the crush file into the other context files of the project (AGENTS, WIP, README, etc.) before I ultimately delete the crush file itself. I explicitly tell the LLM that it will need to pick up from where it left off by reading these files and to be careful to retain information which will be helpful in achieving the goals of the AGENTS file.
Some Additional Tips
In addition to my elaboration on David’s 10 great tips—I’ve got a few tips of my own.
Data Pipeline Thinking
The things that LLMs do best isn’t being an entity you can chat with. They can do that—but it’s not the core of their value.
Where LLMs really shine is in producing stuff quickly with great pattern matching. If we move our attention away from the classical chatbot model for a moment—we see that they can be used for a lot of other interesting use cases.
Recently, lilAgents took an almost 30-year-long archive of PDF newsletters with a variety of different layouts and produced a high-quality searchable index for the client’s new website. We did this by using multiple layers of LLM inferencing followed by programmatic validations.
For example:
- To perform text extraction, we simply took all of the layouts that we had identified by reviewing all the newsletters and extracted the text using those layouts as bounding regions.
- That gave us a lot of differently overlapped versions of the text on the page (formally called convolutions)—many of which are garbled by taking parts of one line from one column and other parts of the text from another section.
- An LLM was responsible for deciding which convolution of the page was the best one to use for subsequent steps by reading the text and giving it a score of how much “sense” it made.
- Were there irregularly truncated words?
- Did sentences end abruptly or pick up midway through a thought?
- The best-scoring convolution for each page was selected.
- The selected convolution was used to rebuild each page as a clean markdown file with the original text without any layout artifacts.
- An LLM was then used to read each of the linearized newsletters and create different kinds of summaries.
- An LLM scored and ranked these summaries against the original newsletter and rejected versions with low accuracy.
- A traditional data cleaning step was then performed to ensure the files matched the expected schema for each file; rejecting ones which failed validation.
- It then looped over the summarization and validation steps until all the documents had high accuracy scores with a complete set of metadata.
The goal was to create a statically hostable index of each newsletter and rely on the client device to perform search within that index. This process allowed us to get highly responsive search of a large body of work which no member of our team has ever directly read. Furthermore—by making the search index static—we provide the search without the burden of server-side infrastructure which would need to be maintained. By producing summaries, we increased the density of searchable terms to a level that most consumer devices have no problem producing search results from the index file.
To date—we’ve received no critique of the quality of the data we produced; all of which was done on local hardware running modest LLMs that independently produce relatively high error rates. By treating it like a more traditional data normalization pipeline that happens to use LLM-derived components, we were able to achieve results that would have been extremely hard to produce manually across a large dataset—especially to the level of consistency that was achieved.
Conclusion
Vibe Coding
What I want to say about vibe coding is that:
- It is fun, but it isn’t magic.
- It’s highly accessible, but that doesn’t make it effortless.
- It’s a great way to generate business value, but it isn’t inherently stable or secure.
- It can be a way to get results quick, just don’t expect to get rich quick.
- It is highly recommended—especially when coordinated with an understanding of computer science fundamentals; but it isn’t always fun.
At the end of the day—we at lilAgents have found the best results come when an enthusiastic business owner has a vibe-coded or legacy application that sort-of works and we help them get it across the finish line by making it real, modern, and standards-compliant. A weekend vibe-coded prototype can save us weeks of discovery where we otherwise would need to work closely with the product owner to determine what needs to be built.
The value of the prototype is that it helps the business owner to express the intent of what they want to happen without worrying about the final implementation’s details. This allows engineering-focused people like me to plan out the best approach to achieve the same result; but better, safer, and more reliably.
If you haven’t watched David V. Kimball’s 10 tips for vibe coding, I can’t recommend it enough.
Thanks for reading all the way to the end! I know everyone says this—but please don’t forget to like and share as it really helps pump the algorithm.






