status: accepted date: 2026-03-06 decision-makers: Tyler Granlund (Pack Leader), Planning Agent, Solutions Architect consulted: Security Auditor, Experience Architect, Web Puppy (research) informed: All Code Puppy agents, development team
Context and Problem Statement
The Azure Governance Platform requires an AI-powered development assistant to help with architecture decisions, code implementation, testing, security audits, and UX design. A key decision was whether to build this as a single monolithic agent or as a specialized multi-agent system. Given the diverse skill sets required (backend development, frontend design, security analysis, testing, documentation), how should we structure the AI assistant to maximize effectiveness while maintaining code quality and security?
Decision Drivers
- Specialization: Different tasks require different expertise (e.g., Python development vs accessibility testing)
- Parallel work: Need ability for multiple agents to work simultaneously on different issues
- Clear accountability: Each task should have a clearly responsible agent
- Security: Need to limit tool access per agent (least privilege principle)
- Code quality: Specialized reviewers for different domains (Python, security, UX)
- Scalability: System should scale to handle complex projects with many simultaneous tasks
- Traceability: Need clear audit trail of which agent made which decisions
- Context management: Single agent would have excessive context switching between vastly different tasks
Considered Options
- Monolithic agent - Single general-purpose agent handles all tasks
- Multi-agent system (29 specialized agents) - Each agent has specific expertise and limited tool access
- Hybrid approach - Small number of "specialist" agents (5-10) with broader capabilities
- Tool-based specialization - Single agent with different "modes" activated by tooling
Decision Outcome
Chosen option: "Multi-agent system (29 specialized agents)", because it provides the best balance of specialization, security, accountability, and parallel work capabilities.
Agent Roster
The system includes 29 specialized agents organized by domain:
Planning & Coordination (3 agents):
- Pack Leader ๐บ - Strategic oversight, prioritization, final approval
- Planning Agent ๐ - Traceability, requirements decomposition, roadmap management
- Bloodhound ๐โ๐ฆธ - Issue tracking (bd), backlog management
Architecture & Design (2 agents):
- Solutions Architect ๐๏ธ - Backend architecture, ADRs, technical decisions
- Experience Architect ๐จ - UX/UI design, accessibility, privacy patterns
Development (5 agents):
- Husky ๐บ - Main coding executor (pulls the sled!)
- Python Programmer ๐ - Python-specific implementation
- Terrier ๐ - Infrastructure, DevOps, deployment
- Agent Creator ๐ญ๏ธ - Creates/modifies agent JSON files
- Web Puppy ๐ต๏ธโโ๏ธ - Research, web scraping, evidence gathering
Quality Assurance (6 agents):
- QA Expert ๐พ - Test strategy, 13-step methodology
- Watchdog ๐โ๐ฆบ - Automated testing, CI/CD gates
- QA Kitten ๐ฑ - Manual web UI testing
- Terminal QA ๐ฅ๏ธ - CLI and smoke testing
- Shepherd ๐ - Code review coordinator, PR workflow
- Code Reviewer ๐ก๏ธ - Security-focused code review
Security & Compliance (2 agents):
- Security Auditor ๐ก๏ธ - STRIDE analysis, threat modeling
- Prompt Reviewer ๐ - Reviews agent prompts for security issues
Review Specialists (2 agents):
- Python Reviewer ๐ - Python code quality, PEP compliance
- Documentation Reviewer ๐ - Documentation quality
Consequences
- Good, because each agent can focus on their domain expertise without context switching
- Good, because multiple agents can work in parallel on different worktrees/issues
- Good, because tool access is limited per agent (security via least privilege)
- Good, because accountability is clear (TRACEABILITY_MATRIX.md tracks agent ownership)
- Good, because specialized reviewers catch domain-specific issues (Python reviewer for Python, Security Auditor for STRIDE)
- Good, because prompts are shorter and more focused (no "jack of all trades" mega-prompt)
- Bad, because coordination overhead requires Pack Leader and Planning Agent orchestration
- Bad, because agents must use
invoke_agentfor cross-domain collaboration (communication overhead) - Neutral, because 29 agents seems like a lot, but each has a clear, specific role
Confirmation
This decision is validated by:
- All 29 agent JSON files exist in
~/.code_puppy/agents/ - Agent tool audit (docs/security/agent-tool-audit.md) confirms each agent has minimal necessary tools
- TRACEABILITY_MATRIX.md maps every requirement to a specific agent
- Parallel worktree workflow allows multiple agents to work simultaneously
STRIDE Security Analysis
| Threat Category | Risk Level | Mitigation |
|---|---|---|
| Spoofing | Low | Each agent has unique identifier (e.g., husky-51d43f); all actions logged with agent ID |
| Tampering | Medium | Per-agent tool filtering prevents unauthorized file modifications; only Agent Creator can modify agents dir |
| Repudiation | Low | All agent actions logged to bd issues with timestamps; git commits show agent authorship |
| Information Disclosure | Medium | Agents only access files/tools necessary for their role; no shared secrets between agents |
| Denial of Service | Low | Agent resource limits prevent single agent from monopolizing system; timeout protections |
| Elevation of Privilege | Medium | Tool allow-lists enforce least privilege; agents cannot grant themselves new capabilities |
Overall Security Posture: Multi-agent architecture significantly improves security posture by:
- Reducing blast radius - Compromised agent only has access to their limited toolset
- Clear audit trail - Every action traceable to specific agent
- Least privilege by default - Each agent only gets tools they need
- Defense in depth - Multiple review agents create security checkpoints
Compared to a monolithic agent with all tools, this architecture reduces risk from High to Medium for tampering and elevation of privilege.
Pros and Cons of the Options
Monolithic agent (rejected)
Single general-purpose agent handles all tasks
- Good, because simpler coordination (no inter-agent communication)
- Good, because single prompt, easier to maintain initially
- Good, because no context handoff between agents
- Bad, because massive security risk - single compromised agent has access to all tools
- Bad, because context window pollution (must know Python + UX + security + testing simultaneously)
- Bad, because no specialization - "jack of all trades, master of none"
- Bad, because no parallel work - single agent can only do one thing at a time
- Bad, because unclear accountability - who reviews the reviewer?
- Bad, because prompt becomes unwieldy (100+ KB of "you are a Python expert AND a security expert AND...")
Multi-agent system (29 specialized agents) (accepted)
Current decision - see above for full analysis
- Good, because specialization improves quality
- Good, because parallel work scales better
- Good, because security via least privilege
- Good, because clear accountability and traceability
- Good, because focused prompts are easier to maintain
- Bad, because coordination overhead
- Bad, because requires orchestration layer (Pack Leader + Planning Agent)
- Neutral, because agent count seems high but each has distinct role
Hybrid approach (5-10 specialist agents) (rejected)
Fewer agents with broader responsibilities
- Good, because simpler coordination than 29 agents
- Good, because some specialization benefits
- Neutral, because moderate parallel work capability
- Bad, because still requires broad tool access per agent (higher security risk)
- Bad, because agents still have multiple responsibilities (e.g., "Backend Agent" does Python + infrastructure + security)
- Bad, because less clear accountability (who owns what?)
- Bad, because less effective specialization
Tool-based specialization (single agent with modes) (rejected)
Single agent switches "modes" based on task type
- Good, because single prompt to maintain
- Good, because no inter-agent communication
- Neutral, because could use tooling to enforce access control per mode
- Bad, because no parallel work - single agent can't work on multiple tasks simultaneously
- Bad, because mode switching adds complexity and potential errors
- Bad, because all tools must be accessible to single agent (security risk if mode system fails)
- Bad, because accountability is unclear (agent in "Python mode" vs "Security mode"?)
More Information
Related Requirements:
- REQ-101: Create Solutions Architect JSON agent
- REQ-102: Create Experience Architect JSON agent
- REQ-103: Audit all agent tool permissions
- REQ-601: STRIDE analysis for all agents
- REQ-604: Self-modification protections
Related Documents:
Validation:
- โ All 29 agents created and functional
- โ Agent tool audit completed (docs/security/agent-tool-audit.md)
- โ TRACEABILITY_MATRIX.md maps all requirements to agents
- โ Parallel worktree workflow tested and documented
Review History:
- 2026-03-06: Initial decision documented (retroactive ADR)
- Reviewed by: Security Auditor ๐ก๏ธ, Solutions Architect ๐๏ธ
- Signed off by: Pack Leader ๐บ, Planning Agent ๐
ADR Status: Accepted
Implementation Status: โ
Complete
Last Updated: March 6, 2026