Article

Build Your Own AI Coding Assistant From Plan to Execution with Python and Ollama

In today’s fast-paced development world, Large Language Models (LLMs) are becoming invaluable assistants. But what if you could build an AI agent that not only writes code but also plans its approach, asks for your approval, and even debugs its own work until it’s successful?

This tutorial will guide you through creating such an AI Coding Assistant using Python, the LangChain library for interacting with LLMs, and Ollama to run powerful open-source models locally. Our agent will take your request, propose a plan, get your green light, write the code, test it, debug it iteratively if needed, and finally, engage you with a thoughtful follow-up.

1. What You’ll Build:

An AI agent that can:

Understand Your Goal: Take a natural language request for a Python script.
Propose a Plan: Ask an LLM to outline a high-level plan (in pseudocode) to achieve the goal.
Seek Your Confirmation: Present the plan to you for approval, allowing for one round of adjustments.
Generate Code: Instruct the LLM to write the full Python script based on the approved plan.
Execute and Test: Run the generated script.
Iteratively Debug: If the script fails, the agent feeds the error and the faulty code back to the LLM to get a corrected version, repeating this process until the script works or a maximum number of attempts is reached.
Engage with Follow-up: After a successful execution, the agent uses the LLM to ask you a relevant follow-up question, demonstrating contextual awareness.

2. Prerequisites:

Python 3.7+: Ensure Python is installed on your system.
Ollama: You need Ollama installed and running. Ollama allows you to run open-source LLMs like Llama 3, Mistral, Gemma, etc., locally.
- Download Ollama: https://ollama.ai/
- Pull a model: After installing Ollama, pull a model you want to use. For example, in your terminal:
```
ollama pull gemma3:12b
```
LangChain Libraries: Install the necessary Python packages:
```
pip install langchain langchain-community
```

3. Code Deep Dive

Let’s break down the script’s components.

3.1. Configuration

import os, re, subprocess
from langchain_community.llms import Ollama
import warnings

warnings.filterwarnings(action="ignore")

# --- Configuration ---
MODEL_NAME   = "gemma3:12b"     # Your Ollama model tag
MAX_ATTEMPTS = 5                # How many retry loops before giving up
PROMPT_FILE  = "prompt.txt"     # Optional text file for your request
TEMP_SCRIPT  = "temp_script.py" # Where generated scripts get saved

# Patterns to catch errors even when exit code == 0
ERROR_PATTERNS = [
    r"Traceback \(most recent call last\):",
    r"Exception:", r"Error occurred", r"Error:",
    r"SyntaxError:", r"NameError:", r"TypeError:", r"AttributeError:",
    r"ImportError:", r"IndexError:", r"KeyError:", r"ValueError:", r"FileNotFoundError:"
]

MODEL_NAME: Specifies the Ollama model tag. Crucially, change this to a model you have downloaded.
MAX_ATTEMPTS: The maximum number of times the agent will try to generate and debug code for a single request after the plan is approved.
PROMPT_FILE: An optional text file (e.g., prompt.txt) where you can write your detailed script request. If this file isn’t found, the agent will ask for input directly.
TEMP_SCRIPT: The filename used to save and execute the LLM-generated Python code.
ERROR_PATTERNS: A list of regular expressions used to scan the output of the generated script for common error indicators.

3.2. Helper Functions

These functions perform essential tasks:

extract_code_block(text: str) -> str | None:
```
def extract_code_block(text: str) -> str | None:
    if not text:
        return None
    m = re.search(r"```(?:python)?\s*(.*?)\s*```", text, re.DOTALL)
    return m.group(1).strip() if m else None
```
Uses regular expressions to find and extract Python code enclosed in Markdown-style triple backticks (e.g., python ... or ...). The re.DOTALL flag is important for code blocks that span multiple lines.

run_script(path: str, timeout: int = 180) -> tuple[int, str]:

def run_script(path: str, timeout: int = 180) -> tuple[int, str]:
    try:
        p = subprocess.run(
            ["python", path], capture_output=True, text=True,
            timeout=timeout, check=False
        )
        return p.returncode, (p.stdout or "") + (p.stderr or "")
    except subprocess.TimeoutExpired:
        return -1, f"⏰ Timeout after {timeout}s"
    except FileNotFoundError:
        return -1, f"❗ Script '{path}' not found."
    except Exception as e:
        return -1, f"❗ Error running script: {e}"

Executes the Python script saved at path using subprocess.run. It captures stdout and stderr, returns the script’s exit code, and handles potential timeouts or other execution errors.

invoke_llm(llm_instance: Ollama, prompt: str, extract_code: bool = True) -> tuple[str|None, str]:
```
def invoke_llm(llm_instance: Ollama, prompt: str, extract_code: bool = True) -> tuple[str|None, str]:
    print("🧠 Thinking…")
    full = llm_instance.invoke(prompt)
    if extract_code:
        return extract_code_block(full), full
    return full, full
```
This is the gateway to your LLM. It sends a prompt, gets the full_response, and optionally tries to extract a code block. It prints a “Thinking…” message to let you know the LLM is working, keeping the actual prompt hidden for a cleaner interface.
save_code(code: str, path: str): A straightforward function to write the LLM-generated code to TEMP_SCRIPT.
output_has_errors(output: str) -> bool: Checks if the script’s captured output string contains any of the patterns listed in ERROR_PATTERNS. This helps detect failures even if the script exits with a return code of 0.

3.3. The main() Function: Orchestrating the Agent

This is where the magic happens, following a clear, phased approach:

Phase 1 & 2: LLM Initialization and Loading User Request

def main_interactive_loop():
    print("\n🔮 AI Agent: Plan ▶ Confirm ▶ Generate ▶ Debug ▶ Follow-up 🔮\n")
    
    llm = None # Initialize llm to None
    try:
        llm = Ollama(model=MODEL_NAME)
        print(f"🤖 LLM '{MODEL_NAME}' initialized.")
    except Exception as e:
        print(f"❌ Cannot start LLM '{MODEL_NAME}': {e}")
        print("   Ensure Ollama is running and the model name is correct (e.g., 'ollama list' to check).")
        return

    user_req_original = "" # This will be updated in each iteration of the outer loop

    # Outer loop for continuous interaction
    while True:
        # 2) Load User Request (or get follow-up as new request)
        if not user_req_original: # First time or after an explicit 'new'
            if os.path.isfile(PROMPT_FILE) and os.path.getsize(PROMPT_FILE) > 0: # Check if prompt file exists and is not empty
                try:
                    with open(PROMPT_FILE, 'r+', encoding="utf-8") as f: # Open in r+ to read and then truncate
                        user_req_original = f.read().strip()
                        f.seek(0) # Go to the beginning of the file
                        f.truncate() # Empty the file
                    if user_req_original:
                        print(f"📄 Loaded request from '{PROMPT_FILE}' (file will be cleared after use).")
                    else: # File was empty
                        user_req_original = input("Enter your Python-script request (or type 'exit' to quit): ").strip()
                except Exception as e:
                    print(f"Error reading or clearing {PROMPT_FILE}: {e}")
                    user_req_original = input("Enter your Python-script request (or type 'exit' to quit): ").strip()
            else:
                user_req_original = input("Enter your Python-script request (or type 'exit' to quit): ").strip()
        
        if user_req_original.lower() == 'exit':
            print("👋 Exiting agent.")
            break
        if not user_req_original:
            print("❌ No request provided. Please enter a request or type 'exit'.")
            user_req_original = "" # Reset to ensure it asks again
            continue

        current_contextual_request = user_req_original # Initialize for the current task cycle

The LLM is initialized. Note the absence of StreamingStdOutCallbackHandler to prevent token-by-token printing of the LLM’s raw response. The user’s initial request for the script is loaded either from prompt.txt or by asking for input.

Phase 3: Planning and User Confirmation

# 3) PLAN PHASE
        plan_approved = False
        plan_code = ""

        for plan_attempt in range(2): # Allow one initial plan + one adjustment attempt
            print(f"\n🧠 Phase: Proposing Plan (Attempt {plan_attempt + 1}/2 for current request)")
            plan_prompt = (
                "You are an expert Python developer and system architect.\n"
                "Your task is to create a super short super high-level plan just in 3 to 5 sentences "
                "(in Python-style pseudocode with numbered comments) "
                "to implement the following user request. Do NOT write the full Python script yet, only the plan.\n\n"
                f"User Request:\n'''{current_contextual_request}'''\n\n"
                "Instructions for your plan:\n"
                "- Use numbered comments (e.g., # 1. Initialize variables).\n"
                "- Keep it high-level but clear enough to guide implementation.\n"
                "- Wrap ONLY the pseudocode plan in a ```python ... ``` block."
            )
            extracted_plan, plan_resp_full = invoke_llm(llm, plan_prompt)

            if not extracted_plan:
                print(f"❌ LLM did not return a plan in the expected format (attempt {plan_attempt + 1}).")
                if plan_attempt == 0:
                     retry_plan = input("Try generating plan again? (Y/n): ").strip().lower()
                     if retry_plan not in ("", "y", "yes"):
                        print("Aborting plan phase for current request.")
                        # Go to end of inner task cycle, which will then loop outer for new request
                        plan_code = None # Signal plan failure
                        break 
                else: # Second attempt also failed
                    print("Aborting plan phase after adjustment attempt failed.")
                    plan_code = None # Signal plan failure
                    break
                continue # To next plan attempt

            plan_code = extracted_plan
            print("\n📝 Here’s the proposed plan:\n")
            print(plan_code)

            ok = input("\nIs this plan OK? (Y/n/edit) ").strip().lower()
            if ok in ("", "y", "yes"):
                plan_approved = True
                print("✅ Plan approved by user.")
                break 
            elif ok == "edit":
                adjustment_notes = input("What should be adjusted in the plan or original request? (Your notes will be added to the request context): ").strip()
                if adjustment_notes:
                    current_contextual_request = f"{user_req_original}\n\nUser's Plan Adjustment Notes:\n'''{adjustment_notes}'''"
                    print("✅ Plan adjustment notes added. Regenerating plan...")
                else:
                    print("No adjustment notes provided. Assuming current plan is OK.")
                    plan_approved = True
                    break
            else: 
                print("Plan not approved. This task will be skipped.")
                plan_code = None # Signal plan rejection
                break # Exit plan loop for this task
        
        if not plan_approved or not plan_code:
            print("❌ Plan not finalized or approved for the current request.")
            user_req_original = "" # Reset to ask for a new request in the next outer loop iteration
            print("-" * 30)
            continue # Go to next iteration of the outer while loop

This is a crucial interactive step.

A carefully crafted plan_prompt asks the LLM for a short, high-level pseudocode plan (3-5 sentences as per your latest script’s prompt addition), not the full code.
The extracted plan is shown to you.
You can type Y (or just Enter) to approve, n to reject (which exits), or edit.
If you choose edit, you can provide adjustment notes. These notes are appended to the original request to form current_contextual_request, and the agent tries to generate an updated plan (one retry).

Phase 4: Code Generation and Iterative Debugging

# 4) GENERATE & DEBUG PHASE
        print("\n🧠 Phase: Generating and Debugging Code...")
        last_script_output = ""
        final_working_code = ""
        script_succeeded_this_cycle = False

        for attempt in range(1, MAX_ATTEMPTS + 1):
            print(f"🔄 Code Generation/Debug Attempt {attempt}/{MAX_ATTEMPTS}")
            gen_prompt = ""
            # ... (gen_prompt logic for attempt 1 and debug attempts - remains the same) ...
            if attempt == 1:
                gen_prompt = (
                    "You are an expert Python programmer.\n"
                    "Based on the following **approved plan**:\n"
                    f"```python\n{plan_code}\n```\n\n"
                    "And the original user request (with any adjustment notes):\n"
                    f"'''{current_contextual_request}'''\n\n"
                    "Write a Python script as short and simple as possible. Ensure all necessary imports are included. "
                    "Focus on fulfilling the plan and request accurately.\n"
                    "Wrap your answer ONLY in a ```python ... ``` code block. No explanations outside the block."
                )
            else: # Debugging
                gen_prompt = (
                    "You are an expert Python debugger.\n"
                    "The goal was to implement this plan:\n"
                    f"```python\n{plan_code}\n```\n"
                    "And this overall request:\n"
                    f"'''{current_contextual_request}'''\n\n"
                    "The previous attempt at the script was:\n"
                    f"```python\n{final_working_code}\n```\n"
                    "Which produced this output (indicating errors):\n"
                    f"```text\n{last_script_output}\n```\n\n"
                    "Please meticulously analyze the errors, the code's deviation from the plan, and the original request. "
                    "Provide a **fully corrected, complete Python script** that fixes the issues and aligns with the plan and request. "
                    "Wrap your answer ONLY in a ```python ... ``` code block."
                )

            code_block, code_resp_full = invoke_llm(llm, gen_prompt)
            if not code_block:
                print(f"❌ LLM did not return a code block in attempt {attempt}.")
                if attempt == MAX_ATTEMPTS: break
                last_script_output = f"LLM failed to provide a code block. Response: {code_resp_full}"
                continue

            final_working_code = code_block
            save_code(final_working_code, TEMP_SCRIPT)
            print(f"💾 The followig script generated and saved to '{TEMP_SCRIPT}':\n\n f{final_working_code}.\n\n Running…")

            rc, out = run_script(TEMP_SCRIPT)
            print(f"  ▶ Script Return code: {rc}")
            if len(out or "") < 600: print(f"  📋 Script Output:\n{out}")
            else: print(f"  📋 Script Output (last 500 chars):\n{(out or '')[-500:]}")
            last_script_output = out

Once the plan is approved:

Initial Generation: For attempt == 1, gen_prompt instructs the LLM to write the full Python script based on plan_code and current_contextual_request. Your script now includes “Write a Python script as short and simple as possible.”
Debugging: If the script fails (non-zero rc or error patterns in out), for subsequent attempts, gen_prompt provides the LLM with:
- The original plan and request.
- The final_working_code (which was the code that just failed).
- The last_script_output (the error messages from the failed run). It explicitly asks the LLM to analyze and correct the script.
This loop continues for MAX_ATTEMPTS.

Phase 5: Follow-up Question (After Success)

    if rc == 0 and not output_has_errors(out):
                print("\n✅🎉 Success! Script ran cleanly for the current request.")
                script_succeeded_this_cycle = True
                break # Exit debug loop on success
            else:
                print("⚠️ Errors detected or non-zero return code; will attempt to debug...")

        if not script_succeeded_this_cycle:
            print(f"\n❌ All {MAX_ATTEMPTS} debug attempts exhausted for the current request. Last script is in '{TEMP_SCRIPT}'.")
            user_req_original = "" # Reset to ask for new request
            print("-" * 30)
            continue # Go to next iteration of the outer while loop
            
        # 5) FOLLOW-UP QUESTION PHASE (Only if script_succeeded_this_cycle is True)
        print("\n🧠 Phase: Follow-up")
        follow_up_context_prompt = (
            "You are a helpful AI assistant.\n"
            "The user had an initial request:\n"
            f"'''{user_req_original}'''\n" # Use the original request for this specific cycle for context
            "An execution plan was approved:\n"
            f"```python\n{plan_code}\n```\n"
            "The following Python script was successfully generated and executed to fulfill this:\n"
            f"```python\n{final_working_code}\n```\n"
            "The script's output (last 500 chars) was:\n"
            f"```text\n{last_script_output[-500:]}\n```\n\n"
            "Now, explain the code first very shortly and then ask the user a concise and relevant follow-up question based on this success. "
            "For example, ask if they want to modify the script, save its output differently, "
            "run it with new parameters, or tackle a related task. Do not wrap your question in any special tags."
        )
        follow_up_question_text, _ = invoke_llm(llm, follow_up_context_prompt, extract_code=False)
        print(f"\n🤖 Assistant: {follow_up_question_text.strip()}")
        
        user_response_to_follow_up = input("Your response (or type 'new' for a new unrelated task, 'exit' to quit): ").strip()

        if user_response_to_follow_up.lower() == 'exit':
            print("👋 Exiting agent.")
            break # Exit outer while loop
        elif user_response_to_follow_up.lower() == 'new':
            user_req_original = "" # Clear it so it asks for a fresh prompt
        else:
            # Treat the response as a new request, potentially related to the last one.
            # The LLM doesn't have explicit memory of this Q&A for the *next* planning phase
            # unless we build that into the prompt. For now, it's a new user_req_original.
            user_req_original = "The following Python script was successfully generated and executed to fulfill this:\n"
            f"```python\n{final_working_code}\n```\n" + \
                "user had the following follow-up request:" + \
                user_response_to_follow_up 
                
        
        print("-" * 30) # Separator for the next cycle

If the script runs successfully:

A detailed follow_up_context_prompt is constructed, giving the LLM the full story: the initial request, the plan, the successful code, and a snippet of its output.
The LLM is then tasked to ask you a relevant follow-up question. This demonstrates a simple form of memory and contextual awareness.
The agent prints the LLM’s question and then exits. (For a continuous conversation, you’d add an input loop here).

4. How to Use the AI Coding Assistant

Save the Code: Copy the entire Python script above and save it as a file, for example, ai_agent.py.
Set MODEL_NAME: Open ai_agent.py and change the MODEL_NAME variable to the exact tag of an LLM you have downloaded in Ollama (e.g., "llama3:8b", "mistral:latest", "gemma2:9b").
Run Ollama: Ensure your Ollama application is running and the chosen model is available.
Run the Agent: Open your terminal or command prompt, navigate to the directory where you saved ai_agent.py, and run:
```
python ai_agent.py
```
Interact:
- The agent will ask for your request.
- It will show you a “Thinking…” message and then present a plan.
- Respond with Y (or Enter) to approve, n to reject, or edit to provide adjustment notes.
- If approved, it will generate and test the code, showing you script outputs and success/failure status.
- If successful, it will ask a follow-up question.

Example Interaction:

🔮 AI Agent: Plan ▶ Confirm ▶ Generate ▶ Debug ▶ Follow-up 🔮

🤖 LLM 'gemma3:12b' initialized.
Enter your Python-script request (or type 'exit' to quit): get financial statements for tesla from yahoo finance and store them in csv files.

🧠 Phase: Proposing Plan (Attempt 1/2 for current request)
🧠 Thinking...

📝 Here’s the proposed plan:

# 1. Define functions: fetch_financial_data(ticker) to retrieve data from Yahoo Finance API, and save_to_csv(data, filename) to store it.
# 2. Initialize ticker symbol (e.g., "TSLA") and a list of financial statement types (e.g., ["income_stmt", "balance_sheet", "cash_flow"]).
# 3. Iterate through the list of financial statement types, calling fetch_financial_data() for each, and then save_to_csv() to store the retrieved data as CSV files.
# 4. Implement error handling within the loop to manage potential API issues or data retrieval failures (e.g., try-except blocks).
# 5. Add a main execution block to run the process only when the script is run directly, ensuring reusability.

Is this plan OK? (Y/n/edit) y
✅ Plan approved by user.

🧠 Phase: Generating and Debugging Code...
🔄 Code Generation/Debug Attempt 1/5
🧠 Thinking...
💾 The followig script generated and saved to 'temp_script.py':

 fimport yfinance as yf
import pandas as pd

def fetch_financial_data(ticker):
    try:
        data = yf.Ticker(ticker).financials
        return data
    except Exception as e:
        print(f"Error fetching data for {ticker}: {e}")
        return None

def save_to_csv(data, filename):
    try:
        if data is not None:
            data.to_csv(filename)
            print(f"Data saved to {filename}")
        else:
            print(f"No data to save to {filename}")
    except Exception as e:
        print(f"Error saving to {filename}: {e}")

if __name__ == "__main__":
    ticker = "TSLA"
    financial_statements = ["income_stmt", "balance_sheet", "cash_flow"]

    for statement_type in financial_statements:
        data = fetch_financial_data(ticker)
        if data is not None:
            filename = f"{ticker}_{statement_type}.csv"
            save_to_csv(data, filename).

 Running…
  ▶ Script Return code: 0
  📋 Script Output:
Data saved to TSLA_income_stmt.csv
Data saved to TSLA_balance_sheet.csv
Data saved to TSLA_cash_flow.csv


✅🎉 Success! Script ran cleanly for the current request.

🧠 Phase: Follow-up
🧠 Thinking...

🤖 Assistant: The code retrieves financial statements (income statement, balance sheet, and cash flow) for Tesla (TSLA) from Yahoo Finance using the `yfinance` library and saves each statement as a separate CSV file. Error handling is included to manage potential issues during data fetching or saving.

Would you like to modify the script to retrieve data for a different ticker symbol?
Your response (or type 'new' for a new unrelated task, 'exit' to quit): exit
👋 Exiting agent.

5. Key Concepts Demonstrated

LLM as a Multi-Role Tool: Used for planning, code generation, debugging, and even generating conversational follow-ups.
Prompt Engineering: The script uses different, carefully crafted prompts for each distinct task (planning, initial code generation, debugging, follow-up). The quality of these prompts heavily influences the LLM’s performance.
Iterative Refinement: The debugging loop is a prime example of iterative refinement, where the agent learns from failures.
User-in-the-Loop: The plan confirmation stage ensures human oversight and alignment before significant computation (code generation) occurs.
Local and Private AI: By using Ollama, the entire process can run locally, keeping your requests and code private.

6. Potential Improvements & Customization

This agent is a strong foundation. Here are some ideas to extend it:

Advanced Plan Refinement: Instead of just one adjustment, allow a multi-turn dialogue to refine the plan.
Persistent Memory for Follow-ups: Use LangChain’s ConversationChain and memory modules if you want the follow-up interaction to be a longer, stateful conversation.
Tool Usage: For more complex tasks, explore LangChain Agents that can use tools (e.g., web search for API docs, file system access).
GUI/Web Interface: Create a more user-friendly interface instead of the command line.
Saving Successful Scripts: Automatically save successfully generated scripts with meaningful names instead of just temp_script.py.
More Sophisticated Error Analysis: Instead of just regex patterns, use the LLM to analyze the stderr more deeply to understand the root cause of errors during debugging.
Cost/Token Management: If using paid LLM APIs (not the case with local Ollama here, but for future reference), tracking token usage would be important.

7. Conclusion

You’ve now explored the architecture of an AI Coding Assistant that goes beyond simple code generation. By incorporating planning, user confirmation, and robust iterative debugging, this agent provides a more intelligent and collaborative approach to leveraging LLMs for development tasks. The ability to run this locally with Ollama opens up many possibilities for customization and private, powerful AI assistance. Experiment with different models, refine the prompts, and happy coding!