Unlock the Full Potential
of a Complex SQL System
with AI-Assisted Intelligence
An AI-powered strategy to autonomously map, document, and navigate a 500+ stored procedure codebase — secured by NVIDIA NemoClaw, operated daily by Claude Code.
Complex systems deserve better tooling
A large SQL codebase built over years carries deep business logic and hard-won institutional knowledge. The opportunity is to surface that knowledge — making it accessible to AI tools and new team members alike, without losing any of what the team has built.
Systems evolve faster than documentation
This is true of every production system at scale. The stored procedure inventory grew organically to serve real business needs — the challenge now is giving AI tools the context map they need to work effectively with it.
Deep knowledge is an asset — and a risk
The team holds significant expertise about how this system works. That expertise is valuable. This strategy captures it in a structured, durable format so it's available to everyone — not just the people who have been here longest.
AI tools need structured context to be reliable
Without a system map, any AI tool will explore blindly — burning through its working memory before it can reason about dependencies. The .md strategy gives Claude Code the context it needs to operate correctly on complex systems.
Onboarding complexity grows with the system
The more interconnected the stored procedures, the longer it takes for a new team member to build a reliable mental model. Structured context files dramatically reduce that ramp-up time for anyone joining the project.
A three-layer intelligent stack
The solution combines an autonomous AI agent with a context management strategy and enterprise security — each layer doing exactly what it's best at, and working with the existing codebase rather than around it.
NVIDIA NemoClaw
The autonomous agent. Runs an OpenClaw agent inside a secure OpenShell sandbox. Reads 500+ SQL files in batches, analyzes stored procedures, and generates the documentation files — with zero risk of data exfiltration.
- One-command install and deploy
- Kernel-level security enforcement
- Runs once to generate the knowledge base
- Policy-based — auditable via YAML
.md Context Strategy
The knowledge base. A hierarchy of Markdown files that maps the entire SQL codebase into structured, navigable documentation. Not human docs — AI-optimized context designed to be loaded efficiently.
- CLAUDE.md: global rules, always loaded
- context-index.md: navigation routing
- context-sql-[domain].md: per-module inventory
- Lives in Git — versioned like code
Claude Code · or Junie
The daily driver. Developers interact with the .md files as source of truth through whichever AI coding agent the team already uses. Both Claude Code and JetBrains Junie are fully compatible with this strategy — and we already hold Junie licenses.
- Reads context files at session start
- Loads only the relevant module per task
- Operates at 20–30% context — not 80%+
- Updates .md files after each task
The team already has Junie licenses —
and the strategy works natively
The .md context strategy is not tied to any specific AI coding tool. It is based on an open standard that both Claude Code and JetBrains Junie read natively — which means there is no additional tooling cost for the daily development layer.
Native context file. Auto-loaded at session start. Designed specifically for Claude Code workflows.
// CLAUDE.md loaded automatically
Equivalent open standard. Auto-loaded by Junie at session start. Recognized by multiple agents simultaneously.
// same strategy, same result
Junie navigates and reads files autonomously — the same way Claude Code does. It can load context-index.md, identify which domain file is relevant, and load it without manual intervention.
Junie supports Anthropic Claude, OpenAI, Google, and others via BYOK. The team can use the same Claude model through Junie — maintaining quality parity with Claude Code.
The team already holds Junie licenses. Using Junie as the daily driver means the development layer of this strategy has no incremental licensing cost — only NemoClaw and the LLM API are new.
The team's Mac setup is
already sufficient
The entire stack runs on Apple Silicon Macs — no additional hardware required. With 64 GB of unified memory, the team can run local LLM inference directly on their machines, keeping sensitive data entirely on-premise if needed.
One-command install. Metal GPU acceleration automatic. Exposes an OpenAI-compatible API on localhost:11434 — exactly what NemoClaw needs as inference backend.
brew install ollama
# Run Nemotron (18 GB, 98 tok/s)
ollama run nemotron3-nano-30b
Apple's own ML framework — built from scratch for unified memory architecture. Noticeably faster tokens/sec than Ollama on the same model. Developer-oriented, no GUI.
pip install mlx-lm
# Run Nemotron via mlx-community
mlx_lm.generate \
--model mlx-community/\
Nemotron-3-Nano-30B-4bit
Desktop app with model browser, Metal acceleration built in, and a local server compatible with OpenAI API. Good for exploring models before committing to one for the NemoClaw workflow.
lmstudio.ai
# Search and download
Nemotron-3-Nano-30B
Why AI tools need structured context
to work with large codebases
Before the .md strategy makes sense, it helps to understand one fundamental constraint of every AI model: the context window. This is why the strategy is necessary — and why without it, any AI tool will struggle on a system of this size, regardless of how good the tool is.
Drag the slider to see what happens to response quality as more SQL files get loaded into a session:
- ✗Forgets decisions made 20 messages ago
- ✗Contradicts earlier analysis
- ✗Misses SP dependencies
- ✗Re-asks for files already read
- ✗Proposes changes conflicting with system rules
- ✓Tracks cross-SP dependencies accurately
- ✓Consistent reasoning throughout
- ✓Respects rules set at session start
- ✓Catches dangerous operations
- ✓Reliable from first to last message
A curated knowledge layer
that grows with the team
The goal isn't to give the AI more context — it's to give it the right context, on demand. The .md file hierarchy is a structured navigation system that maps the entire SQL codebase into something an AI agent can traverse purposefully rather than blindly.
The rules that can never be ignored. DB engine, naming conventions, schemas, and SPs that require approval before touching. Auto-loaded every session — the agent reads this before your first message. Keep it under 50 lines: every line loads every time.
The navigation map. One job: tell the agent which domain file to load for each type of task. Also holds the master WARNING list of SPs that require special care across the entire system. Without this file, the agent would have to guess or explore.
The real inventory — one per business domain. For each stored procedure: name, one-line purpose, tables it reads, tables it writes, other SPs it calls, execution order if sequenced, and warnings for destructive operations. This is what turns a folder of files into a navigable system.
Generated first by NemoClaw before any domain analysis. Clusters all 500+ SQL files into business domains by analyzing file names, prefixes and folder structure. It becomes the blueprint that guides the rest of the generation process.
The generation strategy scales with the size of the codebase. The number of SQL files determines how many context files you'll need and how to organize the generation sessions.
- →Single analysis session covers everything
- →One context-sql.md for the whole codebase
- →CLAUDE.md + index + one domain file
- →One focused session per business domain
- →One context-sql-[domain].md per domain
- →context-index.md as navigation hub
- →Script pre-catalogs files before agent runs
- →NemoClaw clusters into domains automatically
- →Hierarchical index for large domain groups
Create CLAUDE.md manually
This is the only file written by hand. Under 40 lines — global conventions, DB engine, naming rules, and any SPs that must never be modified without approval.
- DB Engine: SQL Server 2019
- Schema: dbo (default), finance, audit
- Naming: SPs prefixed usp_, tables UPPER_SNAKE
## Critical Rules
- Never modify usp_close_period without DBA approval
- All financial SPs must log to AUDIT_LOG table
Run a single analysis session
Point the agent at the SQL folder. It reads all files, identifies what each SP does, and generates the full inventory in one pass.
generate an entry in /docs/context/context-sql.md with:
- SP name and one-line description
- Tables it reads (SELECT/FROM/JOIN)
- Tables it writes (INSERT/UPDATE/DELETE)
- Other SPs it calls (EXEC)
- Any warnings (destructive ops, no rollback)
Group entries by apparent business domain.
Generate the navigation index
Ask the agent to build the routing layer from what it just cataloged.
For each domain: "To work on [domain] → read context-sql.md §[section]".
Add a WARNING section for any SPs flagged as destructive.
Result — your file structure
CLAUDE.md ← always loaded, ~30 lines
context-index.md ← navigation hub
context-sql.md ← full SP inventory
Map file prefixes to business domains
Before any agent session, identify domains by looking at filename patterns. This determines how many context files you'll generate.
usp_bill_* → billing
usp_pay_* → payments
usp_rep_* → reporting
usp_inv_* → inventory
usp_cl_* → period-close (critical)
One focused session per domain
Never mix domains in a single session — context bloat causes missed dependencies. One domain = one session = one context file.
Generate /docs/context/context-sql-billing.md with:
- Full SP inventory (name, purpose, tables R/W, calls made)
- Dependency map: which SPs call which
- Recommended execution order if applicable
- Cross-domain calls flagged explicitly
- Destructive operations flagged with [WARN]
Build the master index from all domain files
Generate context-index.md with:
- Domain routing: "For [task] → read [file]"
- Cross-domain dependency summary
- Master WARNING list (all [WARN] flags across all domains)
- Table ownership: which domain writes to each key table
Result — your file structure
CLAUDE.md ← always loaded
context-index.md ← master navigation
context-sql-billing.md
context-sql-payments.md
context-sql-reporting.md
context-sql-inventory.md
context-sql-period-close.md ← [WARN] flags
Pre-catalog all files with a script
For 400+ files, start with a script that generates a raw file manifest. This gives the agent a structured list to work from instead of discovering the filesystem blindly — which would consume context before any analysis begins.
find ./sql -name "*.sql" -type f \
| awk -F/ '{print NF-1, $0}' \
| sort > docs/context/raw-manifest.txt
# Output: every .sql file listed with its path
# Agent uses this as its working index
NemoClaw agent clusters files into domains
Let the agent analyze file names and folder paths to propose the domain grouping. Don't do this manually — it takes seconds and the agent spots patterns humans miss.
folder paths and naming prefixes, propose a domain grouping.
Output domain-map.md with:
- Domain name and file count
- Files that don't fit any domain (flag as "uncategorized")
- Suggested sub-domains for any group with 60+ files
Process each domain in a dedicated session
If the team has multiple Junie licenses, different team members can run different domains in parallel — cutting generation time significantly.
→ generates context-sql-billing.md
→ generates context-sql-payments.md
// Team member B — reporting + period-close (~80 files)
→ generates context-sql-reporting.md
→ generates context-sql-period-close.md
// Team member C — inventory + shared utilities
→ generates context-sql-inventory.md
→ generates context-sql-shared.md
Build hierarchical index
For very large systems, the index may need two levels — a master router and sub-indexes per major area — so the navigation layer itself stays lean.
context-index-financial.md ← billing, payments, period-close
context-index-operations.md ← inventory, logistics, shared
Review, commit to Git, and you're done
The team reviews the generated files — domain experts validate that the SP descriptions and warnings are accurate — then commits everything to the repository like any other code change.
CLAUDE.md
raw-manifest.txt ← script output
domain-map.md ← agent-generated clustering
context-index.md ← master router
context-index-financial.md
context-index-operations.md
modules/
context-sql-billing.md
context-sql-payments.md
context-sql-reporting.md
context-sql-period-close.md
context-sql-inventory.md
context-sql-shared.md
## System
- DB Engine: SQL Server 2019, instance PROD-SQL-01
- Default schema: dbo · Financial data: finance
- Naming: SPs prefixed usp_, tables UPPER_SNAKE_CASE
- Context files: /docs/context/
## Critical Rules
- [NEVER] Modify usp_cl_period_close without DBA sign-off
- [NEVER] DELETE on MOVEMENTS table directly — use usp_void_movement
- All writes to finance schema must use explicit transactions
- Log table: AUDIT_LOG — all financial SPs must insert here on completion
## Navigation
Read context-index.md to find which file to load for any task.
## Task Routing
| Task | Load this file |
| Invoice generation / billing | context-sql-billing.md |
| Payment processing | context-sql-payments.md |
| Financial reports / exports | context-sql-reporting.md |
| Month-end / period close | context-sql-period-close.md ⚠ |
| Inventory / stock movements | context-sql-inventory.md |
| Shared utilities / helpers | context-sql-shared.md |
## Cross-Domain Dependencies
- usp_bill_generate calls usp_pay_validate_credit (billing → payments)
- usp_cl_period_close calls usp_rep_snapshot_balances (close → reporting)
- All domains call usp_util_log_audit from context-sql-shared.md
## ⚠ WARNING — Review Required Before Touching
- usp_cl_period_close: irreversible fiscal period seal, no rollback
- usp_pay_batch_transfer: moves funds, triggers external bank API
- usp_inv_purge_old: bulk DELETE with no undo
// Generated by NemoClaw analysis agent · review before committing
### usp_bill_generate_invoice
Purpose: Creates a new invoice record from an approved order
Reads: ORDERS, ORDER_ITEMS, CUSTOMERS, TAX_CONFIG
Writes: INVOICES, INVOICE_LINES, AUDIT_LOG
Calls: usp_pay_validate_credit, usp_util_log_audit
Params: @order_id INT, @created_by NVARCHAR(50)
[Safe to modify with tests]
### usp_bill_cancel_invoice
Purpose: Voids an invoice and triggers credit note generation
Reads: INVOICES, PAYMENTS
Writes: INVOICES (status), CREDIT_NOTES, AUDIT_LOG
Calls: usp_pay_reverse_charge, usp_util_log_audit
[WARN] Cannot cancel if payment already settled — check PAYMENTS.status first
## Table Ownership (this domain writes to)
INVOICES, INVOICE_LINES, CREDIT_NOTES
## Execution Order (full billing cycle)
1. usp_bill_validate_order
2. usp_bill_generate_invoice
3. usp_bill_send_notification
4. usp_bill_update_order_status
Once the files are generated and committed, this is what a normal development session looks like:
Tell me which context file to load, then load it.
Summarize what you know about this domain before we start.
Do
- After modifying any SP, update the relevant context-sql-[module].md as part of the task
- Treat context file updates as part of "done" — not optional cleanup
- Keep CLAUDE.md under 50 lines — it loads every session, every token counts
- Re-run NemoClaw on a domain when that domain has significant new SPs
Don't
- Put credentials, connection strings or server IPs in any context file
- Try to generate all domain files in one giant session — it degrades quality
- Mix domain analysis with actual code changes in the same session
- Treat context files as documentation for humans — they are behavior instructions for the AI
The SQL files tell the agent what the code does. But this system was built on top of a specific tool that has its own logic, rules, and concepts. Without understanding how that tool works, the agent can read the code but can't reason about why it does what it does. That's where the second knowledge base comes in.
Generated from the 500+ SQL files. Tells the agent which stored procedures exist, what tables they read and write, what dependencies they have, and what order they must run in.
Reads: PERIODS, ACCOUNTS
Writes: LEDGER_ENTRIES, AUDIT_LOG
Calls: usp_validate_period
Generated from the tool's documentation library. Tells the agent what each table represents in the system, what business rules the code is enforcing, and what constraints can never be violated.
Central transaction register of
the accounting module. Period
must be open before any write.
LEDGER_ENTRIES is the central transaction table of the financial module.
Calls usp_validate_period first — required by system rules: period must be open before any ledger write.
[WARN] Writing during a closed period will corrupt the audit trail — the system has no rollback for this.
context-index.md ← unified navigation map
modules/
context-sql-billing.md
context-sql-payments.md
context-sql-reporting.md
context-sql-period-close.md
context-sql-inventory.md
context-sql-shared.md
modules/
system-financial.md ← financial module rules
system-inventory.md ← inventory module rules
system-reporting.md ← reporting module rules
system-glossary.md ← table/entity definitions
system-rules.md ← critical business constraints
Script generates manifest of all SQL files and PDF documents. NemoClaw proposes domain clustering for both simultaneously.
NemoClaw reads the tool's documentation by module. Extracts entity definitions, business rules, system constraints, and critical flows. Generates system-[domain].md files.
NemoClaw analyzes SQL files with system-[domain].md already loaded. Annotates each SP with real semantic meaning, not just technical facts. Generates context-sql-[domain].md files.
context-index.md routes both tracks. Team reviews, commits to Git. Knowledge base is live and complete from day one.
Agent reads SQL files without context. Produces technical annotations only — table names, call chains, execution order. A second pass adds system meaning afterward.
Writes: LEDGER_ENTRIES
Calls: usp_validate_period
// why? unknown at this stage
Agent reads SQL files already knowing the system. Produces technically complete AND semantically rich annotations in a single pass — no second pass needed.
Writes: LEDGER_ENTRIES ← central accounting register
Calls: usp_validate_period ← period must be open
[WARN] closed period write corrupts audit trail
NVIDIA NemoClaw —
the autonomous generation engine
Announced at GTC 2026, NemoClaw is OpenClaw with enterprise-grade security baked in. It runs an autonomous agent inside an OpenShell security sandbox — the agent that will analyze all 500+ SQL files and write the .md knowledge base for us.
"Every company needs an OpenClaw strategy" — Jensen Huang
NemoClaw is OpenClaw + OpenShell: a secure runtime that enforces policy-based guardrails at the kernel level. The agent runs inside an isolated sandbox with network deny-by-default, filesystem isolation, and all LLM calls proxied through the OpenShell gateway. API keys never touch the agent — they're injected at runtime.
Enterprise security
by design, not by policy
Security isn't a configuration option here — it's the fundamental architecture. OpenShell enforces isolation at the kernel level. Policies can't be bypassed by the agent, even if the agent is compromised.
Network isolation
Egress is denied by default. The agent cannot reach any external host unless explicitly listed in the policy YAML. For our use case: only the Claude API endpoint is allowed. Any other outbound attempt is blocked instantly and logged.
Filesystem isolation
The agent can only write to /sandbox and /tmp. Your SQL files are mounted read-only at /sql-input — the agent can read them but cannot modify, delete, or copy them anywhere outside the defined output path.
Process isolation
Landlock, seccomp, and network namespaces enforce at the kernel level. The agent cannot escalate privileges, fork unauthorized processes, or modify its own security constraints. This is the same model used in browser tab isolation.
Inference isolation
All LLM API calls route through the OpenShell gateway proxy. The agent never holds the API key — it's injected at runtime. This means even if the agent were compromised, it couldn't exfiltrate credentials.
Audit trail
Every LLM call, every network request (allowed or blocked), every file write is logged. The audit_log: true configuration captures what was sent to the model and what came back — full compliance traceability.
GitOps policy management
Security policies live in a YAML file that can be version-controlled and reviewed via pull request. Hot-reloadable — changes take effect without restarting the sandbox. This means policy changes have the same review process as code changes.
default: deny # deny ALL outbound by default
allow:
- host: api.anthropic.com # only Claude API
port: 443
proto: https
filesystem:
mounts:
- host: /path/to/sql/files
sandbox: /sql-input
readonly: true # agent can read, never modify
- host: /path/to/md-output
sandbox: /md-output
readonly: false
inference:
provider: anthropic
model: claude-sonnet-4-6
audit_log: true # every LLM call logged
From zero to operational
in 6 steps
The full implementation path — from installing NemoClaw to having Claude Code use the generated knowledge base in daily development.
Install NemoClaw
One command installs the CLI, OpenShell runtime, Docker sandbox, and the onboarding wizard. Requires Ubuntu 22.04+, Node.js 20+, Docker. Sandbox image is ~2.4 GB.
curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
# Run the interactive setup wizard
nemoclaw onboard
# Set inference provider key
export ANTHROPIC_API_KEY="sk-ant-..."
Configure security policy
Write the policy YAML and apply it to the sandbox. Network deny-by-default, SQL files mounted read-only, audit logging enabled.
nemoclaw policy apply sql-agent policy.yaml
# Verify sandbox is configured correctly
nemoclaw sql-agent status
Write agent identity (SOUL.md)
Define who the agent is and what it must/must not do. This is the agent's persistent identity — a system prompt that survives across runs.
You are a SQL codebase analyst. Read SPs, generate docs.
You do not execute SQL. You do not connect to databases.
## Rules
- Process files in batches of max 25 per domain
- Flag DELETE/DROP/TRUNCATE with [WARN]
- Never include credentials, IPs, or data values in output
- Write all output to /md-output/
Run the analysis agent
Connect to the sandbox and give the agent its task. For 500+ files, expect 2–4 hours. The agent runs autonomously — you can disconnect and reconnect.
# Paste this task to the agent:
Analyze all SQL files in /sql-input/.
Step 1: Group files by business domain → /md-output/domain-map.md
Step 2: For each domain, generate context-sql-[domain].md
Step 3: Generate context-index.md with routing + warnings
Step 4: Generate CLAUDE.md with global conventions
Process one domain fully before starting the next.
openshell gateway status # view security events
Extract and commit to Git
Copy the generated .md files from the mounted output directory to your project repository. Review, then commit like any other code.
# Review and commit
git add docs/context/
git commit -m "feat: add AI-generated SQL context files"
CLAUDE.md ← always loaded by Claude Code
context-index.md
domain-map.md
context-sql-billing.md
context-sql-payments.md
context-sql-reporting.md
context-sql-period-close.md
Daily use with Claude Code
From this point, NemoClaw is no longer needed day-to-day. Developers use Claude Code with the generated .md files as source of truth. The standard workflow:
Read context-index.md. Today's task: [describe task].
Tell me which context files to load, then load them.
Summarize what you know before we start.
Add any new SPs to the inventory.
Flag any new [WARN] conditions you identified.
What this costs vs.
what it returns
The investment is small. The return starts in week one and compounds every time a developer doesn't have to rediscover the system from scratch.
- Install NemoClaw
- Configure security policy
- Write agent identity
- Agent runs autonomously
- Generates all .md files
- Review + commit to Git
- 2 devs using Claude Code
- Track exploration time saved
- Refine context files
- Roll out to full team
- Knowledge base compounds
- Onboarding drastically faster
- +A structured, navigable map of the system — generated automatically
- +Context-building overhead cut from hours to minutes per task
- +New team member ramp-up time significantly reduced
- +SP dependencies visible before any change is made
- →NemoClaw just launched — early adoption builds institutional advantage
- →Pilot cost is minimal — proves value before full commitment
- →Knowledge base compounds in value the longer it's maintained
- →Team expertise is preserved in a durable, versioned format