AI Agent Suggestions For Training An AI Software Developer

This article is a personal R&D notebook using AI agents to help refine a new process to train an AI software developer. This new AI agent will be used in a real-world environment where I plan to implement a custom tailored AI agent to assist with Store Locator Plus® coding and deployment. I am looking to learn about integrating both public AI tools as well as tools specific to the DGX Spark into a standard software development lifecycle. The bigger picture is to learn more about integrating customized AI into business processes to act as a force multiplier versus a cool “parlor trick” of code suggestions or creating AI images or AI slop cat videos.

This article is not meant to be analytical. It is meant to function more as my long term scratch pad that I can refer back to in the future.

AI Agent Notes

Of the agents tested, ChatGPT, Claude, Gemini, and Grok – Gemini had the weakest answer despite all agents being given the same prompt. It was interesting to see the overlap in the analysis and suggestions but they did not all perform the task at the same level. My ranking for the quality and depth or response (given my limited knowledge in AI system development and based purely on the plausibility and depth of responses):

  1. Antrhopic’s Claude Sonnet 4.5
  2. OpenAI ChatGPT 5
  3. xAI Grok 4
  4. Google Gemini 2.5

It will be interesting to see which of the AI LLM agents will come out on top in the coming years. They are all racing to the same “AGI Finishe Line” (if they ever get there) but I personally think this will be a Highlander-like scenario where “there can only be one”.. maybe a couple. Like the browsers wars and the search engine wars of the Internet days we will likely see one emerge as the “go to”. Currently that is ChatGPT, but it is early in the game – first is not always best, ask Yahoo! how that works. There will be a couple of “don’t forget about us” stragglers that will serve those humans that always refuse to be “in the mainstream” , but eventually this generation of dozens of AI agent services will atrophy to a few remaining contenders. It will be interesting to see which survive the AI gauntlet of the “mid 2020s”.

Asking The AI Agents For Help

The Prompt

I am looking to build an AI agent that can help refine an existing SaaS application. The agent needs to be proficient in updating existing code and following the architecture and design of the existing application.

The application is written primarily in PHP and JavaScript.
It is build on WordPress Multisite using that as an application framework. The JavaScript is built primarily on jQuery and Vue frameworks that I would like to replace with React.

I am looking to build an AI agent using various technologies including various public AI services.  However I do have a NVIDIA DGX Spark available and am considering training an open source model and refining the model using RAGs, LoRA, or fine tuning if viable.

My tool kit for coding is mostly designed around the JetBrains code editing tools including phpStorm and the built-in AI agent that connects to various AI models including OpenAI GPTs, Antrhopic Claude, or Gemini.    I also have the NVIDIA AI Workbench installed but am not as familiar with that tool.   I am not against moving to other IDEs such as VSCode or Cursor if they can accelerate the use of the new AI agent.

What technologies do you suggest given the above outline?

The Responses

Technology Comparison Matrix

  • Public Models/APIs
    • GPT-4 Turbo – GPT5, Claude 4.5, Gemini 2.5, Grok 4
    • Claude 3.5 – GPT5, Claude 4.5, Gemini 2.5, Grok 4
    • Gemini 1.5 – GPT5, Gemini 2.5, Grok 4
  • Local DGX Spark Models
    • DeepSeek-Coder 33B – GPT5, Claude 4.5
    • CodeLlama-34B – GPT5, Gemini 2.5
    • CodeLlama 70B – Claude 4.5, Gemini 2.5
    • StarCoder2 – GPT5, Gemini 2.5
    • Phind-CodeLlama – Gemini 2.5
    • Llama 3.1 (Meta) – Grok 4
    • Mistral 7B/8x7B – Grok 4
  • RAG Orchestration
    • LangChain – GPT5, Sonnet 4.5, Grok 4
    • LlamaIndex – GPT5, Sonnet 4.5
    • FAISS – Grok 4
  • Knowledge Indexing
    • ChromaDB – GPT5, Claude 4.5
    • Weaviate – GPT5, Claude 4.5
    • LlamaIndex – GPT5
  • Agent Workflow
    • OpenDevin – GPT5
    • AutoGPTQ – GPT5
  • Model Training
    • NeMo Framework – GPT5
  • Tools
    • Cursor w/ LangChain plugin – GPT5, Claude 4.5, Grok 4
    • JetBrains IDE w/ AI Agent – GPT5, Grok 4
    • VSCode w/ LangChain plugin – GPT5, Grok 4
    • NVIDIA AI Workbench – GPT5
    • GitHub CoPilot – GPT5
    • Hugging Face – GPT 5, Claude 4.5, Grok 4

OpenAI ChatGPT 5

1. Agent Framework

To orchestrate AI reasoning, file context, and tool execution:

  • LangChain or LlamaIndex – For managing context windows, RAG, and integrating code repositories.
  • OpenDevin or AutoGPTQ (emerging frameworks) – For supervised autonomous agent workflows.
  • NVIDIA NeMo Framework – Ideal for training and deploying open-source models locally on your DGX Spark; integrates with Workbench.

💡 Recommendation:
Use LangChain for RAG + NeMo for fine-tuning — a strong balance between flexibility and performance.

2. Foundation Models

You can combine public APIs for “reasoning” tasks with local models for code-specific tasks:

  • Public APIs (reasoning): GPT-4, Claude 3.5, Gemini 1.5 — best for architecture-level understanding, documentation generation, and reasoning.
  • Local models (code fine-tuned):
    • CodeLlama-34B / StarCoder2 / DeepSeek-Coder for PHP/JavaScript environments.
    • Fine-tune using LoRA adapters on your DGX Spark with NeMo or HuggingFace Transformers.
3. Retriever-Reader System

To make your AI agent “code-aware”:

  • Index your WordPress Multisite PHP codebase with LlamaIndex + ChromaDB or Weaviate.
  • Use semantic code embeddings (e.g., OpenAI text-embedding-3-large or local NVIDIA embedding models).
  • Insert inline architectural notes (custom metadata fields) for persistent context.
4. Development Environment
ToolFunctionRecommendation
phpStorm / JetBrains AICode understanding, architecture alignmentKeep for PHP layer
Cursor / VSCode + LangChain pluginLive AI agent buildingExcellent for React modernization
NVIDIA AI WorkbenchModel training + LoRA fine-tuningLearn it; ideal for on-prem training
GitHub Copilot or JetBrains AIHandle fast repetitive code conversionsUse in tandem for productivity
5. Model Deployment & Inference
  • Triton Inference Server (bundled with DGX) → host fine-tuned or quantized models.
  • Expose as a private API for your IDE agents or plugins via REST/gRPC.
🧩 Recommended Tech Stack Summary
CategoryTechnology
Code UnderstandingLangChain + LlamaIndex
Model TrainingNVIDIA NeMo + HuggingFace
EmbeddingsOpenAI or NVIDIA embeddings
Vector StoreChromaDB, Weaviate, or PGVector
IDE Plugin IntegrationVSCode LangChain, JetBrains AI
Inference DeploymentTriton Inference Server
Version ControlGitHub + AI Copilot PR suggestions

Anthropic : Claude Sonnet 4.5

Primary Development Agent:
  • Cursor IDE (strongly recommended over JetBrains for this use case)
AI Model Strategy:
  1. Primary: Claude 3.5 Sonnet (via Cursor/API)
    • Excellent at understanding existing codebases and maintaining architectural patterns
    • Superior PHP and WordPress knowledge
    • Strong at incremental refactoring (jQuery/Vue → React)
  2. Secondary: GPT-4 Turbo
    • For specific React component generation
    • Alternative perspective on complex refactoring decisions
  3. Local Model (DGX Spark): CodeLlama 70B or DeepSeek Coder 33B
    • Fine-tune with RAG for your specific codebase patterns
    • Use for privacy-sensitive code analysis
    • Faster iteration for repetitive tasks
Knowledge Enhancement Strategy

RAG System (Recommended over LoRA/Fine-tuning initially): Your Codebase → Vector Database (Pinecone/Weaviate) → Context Injection

  • LangChain or LlamaIndex for RAG orchestration
  • ChromaDB or Pinecone for vector storage
  • Continue.dev (open-source Cursor alternative that integrates with JetBrains if you prefer staying there)

Google : Gemini 2.5 Flash

Public AI Services (Initial Development & General Tasks): 

Continue leveraging services like OpenAI GPTs, Anthropic Claude, or Gemini through your JetBrains IDEs. These are excellent for general code generation, refactoring suggestions, documentation, and understanding existing codebases quickly.

Open Source Model (Specialized Refinement & Fine-tuning): 

Model Choice: Consider models like Code Llama, StarCoder, or Phind-CodeLlama.

Development Environment & Tooling
  • JetBrains IDEs (phpStorm)
  • VSCode / Cursor (for React Development)

xAI : Grok 4

  • Public AI APIs for Initial Development
    • OpenAI GPT-4o or GPT-4
    • Anthropic Claude 3.5 Sonnet
    • Google Gemini 1.5
  • Open-Source Models and Customization with NVIDIA DGX Spark
    • Base Models: Start with Llama 3.1 (Meta) or Mistral 7B/8x7B
  • Refinement
    • LoRA : PEFT from Hugging Face
    • RAG: LangChain, FAISS
  • Tools
    • JetBrains phpStorm
    • VSCode with Extensions
    • Cursor

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.