AI Agent Suggestions For Training An AI Software Developer

This article is a personal R&D notebook using AI agents to help refine a new process to train an AI software developer. This new AI agent will be used in a real-world environment where I plan to implement a custom tailored AI agent to assist with Store Locator Plus® coding and deployment. I am looking to learn about integrating both public AI tools as well as tools specific to the DGX Spark into a standard software development lifecycle. The bigger picture is to learn more about integrating customized AI into business processes to act as a force multiplier versus a cool “parlor trick” of code suggestions or creating AI images or AI slop cat videos.

This article is not meant to be analytical. It is meant to function more as my long term scratch pad that I can refer back to in the future.

AI Agent Notes

Of the agents tested, ChatGPT, Claude, Gemini, and Grok – Gemini had the weakest answer despite all agents being given the same prompt. It was interesting to see the overlap in the analysis and suggestions but they did not all perform the task at the same level. My ranking for the quality and depth or response (given my limited knowledge in AI system development and based purely on the plausibility and depth of responses):

Antrhopic’s Claude Sonnet 4.5
OpenAI ChatGPT 5
xAI Grok 4
Google Gemini 2.5

It will be interesting to see which of the AI LLM agents will come out on top in the coming years. They are all racing to the same “AGI Finishe Line” (if they ever get there) but I personally think this will be a Highlander-like scenario where “there can only be one”.. maybe a couple. Like the browsers wars and the search engine wars of the Internet days we will likely see one emerge as the “go to”. Currently that is ChatGPT, but it is early in the game – first is not always best, ask Yahoo! how that works. There will be a couple of “don’t forget about us” stragglers that will serve those humans that always refuse to be “in the mainstream” , but eventually this generation of dozens of AI agent services will atrophy to a few remaining contenders. It will be interesting to see which survive the AI gauntlet of the “mid 2020s”.

Asking The AI Agents For Help

The Prompt

I am looking to build an AI agent that can help refine an existing SaaS application. The agent needs to be proficient in updating existing code and following the architecture and design of the existing application.

The application is written primarily in PHP and JavaScript.
It is build on WordPress Multisite using that as an application framework. The JavaScript is built primarily on jQuery and Vue frameworks that I would like to replace with React.

I am looking to build an AI agent using various technologies including various public AI services.  However I do have a NVIDIA DGX Spark available and am considering training an open source model and refining the model using RAGs, LoRA, or fine tuning if viable.

My tool kit for coding is mostly designed around the JetBrains code editing tools including phpStorm and the built-in AI agent that connects to various AI models including OpenAI GPTs, Antrhopic Claude, or Gemini.    I also have the NVIDIA AI Workbench installed but am not as familiar with that tool.   I am not against moving to other IDEs such as VSCode or Cursor if they can accelerate the use of the new AI agent.

What technologies do you suggest given the above outline?

The Responses

Technology Comparison Matrix

Public Models/APIs
- GPT-4 Turbo – GPT5, Claude 4.5, Gemini 2.5, Grok 4
- Claude 3.5 – GPT5, Claude 4.5, Gemini 2.5, Grok 4
- Gemini 1.5 – GPT5, Gemini 2.5, Grok 4
Local DGX Spark Models
- DeepSeek-Coder 33B – GPT5, Claude 4.5
- CodeLlama-34B – GPT5, Gemini 2.5
- CodeLlama 70B – Claude 4.5, Gemini 2.5
- StarCoder2 – GPT5, Gemini 2.5
- Phind-CodeLlama – Gemini 2.5
- Llama 3.1 (Meta) – Grok 4
- Mistral 7B/8x7B – Grok 4
RAG Orchestration
- LangChain – GPT5, Sonnet 4.5, Grok 4
- LlamaIndex – GPT5, Sonnet 4.5
- FAISS – Grok 4
Knowledge Indexing
- ChromaDB – GPT5, Claude 4.5
- Weaviate – GPT5, Claude 4.5
- LlamaIndex – GPT5
Agent Workflow
- OpenDevin – GPT5
- AutoGPTQ – GPT5
Model Training
- NeMo Framework – GPT5
Tools
- Cursor w/ LangChain plugin – GPT5, Claude 4.5, Grok 4
- JetBrains IDE w/ AI Agent – GPT5, Grok 4
  - Continue.dev – Claude 4.5 if staying with JetBrains IDE
- VSCode w/ LangChain plugin – GPT5, Grok 4
- NVIDIA AI Workbench – GPT5
- GitHub CoPilot – GPT5
- Hugging Face – GPT 5, Claude 4.5, Grok 4

OpenAI ChatGPT 5

1. Agent Framework

To orchestrate AI reasoning, file context, and tool execution:

LangChain or LlamaIndex – For managing context windows, RAG, and integrating code repositories.
OpenDevin or AutoGPTQ (emerging frameworks) – For supervised autonomous agent workflows.
NVIDIA NeMo Framework – Ideal for training and deploying open-source models locally on your DGX Spark; integrates with Workbench.

💡 Recommendation:
Use LangChain for RAG + NeMo for fine-tuning — a strong balance between flexibility and performance.

2. Foundation Models

You can combine public APIs for “reasoning” tasks with local models for code-specific tasks:

Public APIs (reasoning): GPT-4, Claude 3.5, Gemini 1.5 — best for architecture-level understanding, documentation generation, and reasoning.
Local models (code fine-tuned):
- CodeLlama-34B / StarCoder2 / DeepSeek-Coder for PHP/JavaScript environments.
- Fine-tune using LoRA adapters on your DGX Spark with NeMo or HuggingFace Transformers.

3. Retriever-Reader System

To make your AI agent “code-aware”:

Index your WordPress Multisite PHP codebase with LlamaIndex + ChromaDB or Weaviate.
Use semantic code embeddings (e.g., OpenAI text-embedding-3-large or local NVIDIA embedding models).
Insert inline architectural notes (custom metadata fields) for persistent context.

4. Development Environment

Tool	Function	Recommendation
phpStorm / JetBrains AI	Code understanding, architecture alignment	Keep for PHP layer
Cursor / VSCode + LangChain plugin	Live AI agent building	Excellent for React modernization
NVIDIA AI Workbench	Model training + LoRA fine-tuning	Learn it; ideal for on-prem training
GitHub Copilot or JetBrains AI	Handle fast repetitive code conversions	Use in tandem for productivity

5. Model Deployment & Inference

Triton Inference Server (bundled with DGX) → host fine-tuned or quantized models.
Expose as a private API for your IDE agents or plugins via REST/gRPC.

🧩 Recommended Tech Stack Summary

Category	Technology
Code Understanding	LangChain + LlamaIndex
Model Training	NVIDIA NeMo + HuggingFace
Embeddings	OpenAI or NVIDIA embeddings
Vector Store	ChromaDB, Weaviate, or PGVector
IDE Plugin Integration	VSCode LangChain, JetBrains AI
Inference Deployment	Triton Inference Server
Version Control	GitHub + AI Copilot PR suggestions

Anthropic : Claude Sonnet 4.5

Primary Development Agent:

Cursor IDE (strongly recommended over JetBrains for this use case)

AI Model Strategy:

Primary: Claude 3.5 Sonnet (via Cursor/API)
- Excellent at understanding existing codebases and maintaining architectural patterns
- Superior PHP and WordPress knowledge
- Strong at incremental refactoring (jQuery/Vue → React)
Secondary: GPT-4 Turbo
- For specific React component generation
- Alternative perspective on complex refactoring decisions
Local Model (DGX Spark): CodeLlama 70B or DeepSeek Coder 33B
- Fine-tune with RAG for your specific codebase patterns
- Use for privacy-sensitive code analysis
- Faster iteration for repetitive tasks

Knowledge Enhancement Strategy

RAG System (Recommended over LoRA/Fine-tuning initially): Your Codebase → Vector Database (Pinecone/Weaviate) → Context Injection

LangChain or LlamaIndex for RAG orchestration
ChromaDB or Pinecone for vector storage
Continue.dev (open-source Cursor alternative that integrates with JetBrains if you prefer staying there)

Google : Gemini 2.5 Flash

Public AI Services (Initial Development & General Tasks):

Continue leveraging services like OpenAI GPTs, Anthropic Claude, or Gemini through your JetBrains IDEs. These are excellent for general code generation, refactoring suggestions, documentation, and understanding existing codebases quickly.

Open Source Model (Specialized Refinement & Fine-tuning):

Model Choice: Consider models like Code Llama, StarCoder, or Phind-CodeLlama.

Development Environment & Tooling

JetBrains IDEs (phpStorm)
VSCode / Cursor (for React Development)

xAI : Grok 4

Public AI APIs for Initial Development
- OpenAI GPT-4o or GPT-4
- Anthropic Claude 3.5 Sonnet
- Google Gemini 1.5
Open-Source Models and Customization with NVIDIA DGX Spark
- Base Models: Start with Llama 3.1 (Meta) or Mistral 7B/8x7B
Refinement
- LoRA : PEFT from Hugging Face
- RAG: LangChain, FAISS
Tools
- JetBrains phpStorm
- VSCode with Extensions
- Cursor

Claude Agent (Sonnet 4.5) via JetBrains AI Assistant

Recommended Architecture: MCP-based RAG with Local DGX Spark Backend

Modern Approach (Better than LangChain/LlamaIndex)

1. Model Context Protocol (MCP) Server

JetBrains already supports MCP (as you’re using it now)
Create a custom MCP server running on your DGX Spark
Direct integration with JetBrains AI Assistant – no middleware needed
Real-time code context without file exports

2. DGX Spark Backend Stack

- Embedding Model: nomic-embed-text v1.5 or BGE-M3 (multilingual, code-aware)
- Vector DB: Qdrant or Milvus (high performance, local deployment)
- Reranker: Cohere Rerank or BGE-reranker
- LLM: DeepSeek-Coder-V2 or Qwen2.5-Coder (superior to GPT-4 for code)

3. Key Advantages Over LangChain/LlamaIndex

Direct integration: MCP eliminates abstraction layers
Real-time awareness: Watches file changes via JetBrains APIs
Lower latency: Local LAN communication (~1ms vs cloud)
Context-aware: Accesses open files, git state, dependencies automatically

Implementation Plan

Phase 1: MCP Server on DGX Spark

# Custom MCP server with tools:
- code_search(query, file_patterns, semantic=True)
- get_architecture_context(component)
- suggest_refactor(file, line_range)
- find_dependencies(class_name)
- explain_code_flow(entry_point)

Phase 2: RAG Pipeline

1. Index all plugins incrementally (watch for changes)
2. Chunk by semantic units (classes, functions, not fixed sizes)
3. Store metadata: git history, dependencies, call graphs
4. Build knowledge graph of WordPress hooks/filters
5. Track architectural patterns per plugin

Phase 3: JetBrains Integration

- Add MCP server config to JetBrains settings
- Create AI Assistant Rules that query your RAG
- Use "Starting Instructions" to load architecture context
- Expose tools via MCP for refactoring workflows

Specific for Your Environment

Based on the plugins I see (store-locator-plus, myslp-, slp-):

Index WordPress-specific patterns (hooks, filters, shortcodes)
Track inter-plugin dependencies
Map SaaS-specific architecture (payments, signup, dashboard)
Understand PHP namespace structures

Links

NVIDIA DGX Spark Build Examples

Feature Image
AI Generated by Nano Banana
via Galaxy.ai