Product Requirements Document Sprint 13 — GUI Audit Remediation & Theming
Project Overview¶
Sprint 13 remediates findings from a comprehensive GUI audit of src/gui/ (Streamlit
app) across three dimensions: design patterns, usability, and WCAG 2.1 AA accessibility.
Additionally, it introduces a theme selection system with three curated themes (two dark,
one light) to improve visual identity, reduce eye strain, and support diverse user
preferences.
Audit sources: parallel agent team review (design-reviewer, usability-reviewer, accessibility-reviewer) conducted 2026-02-28.
Pre-Sprint Fixes (already implemented)¶
The following audit findings were resolved before sprint start as trivial changes:
- Sidebar
label_visibility="collapsed"→"hidden"(a11y fix,sidebar.py) - Prompts page text areas set to
disabled=Truewith read-only notice (prompts.py,components/prompts.py) - Engine radio label renamed from “MAS (PydanticAI)” to “Multi-Agent System (MAS)” (
run_app.py) help=tooltips added to CC Teams checkbox, sub-agent checkboxes, prompt editor (run_app.py,settings.py,components/prompts.py)- “Advanced Settings” changed from
st.headertost.subheader(settings.py) - Emoji removed from
PAGE_TITLE,st.warning(),st.subheader(), sidebar info (text.py,evaluation.py,sidebar.py) st.caption()added before evaluation bar chart (evaluation.py)- Dead commented-out
match/caseblock removed fromoutput.py
Functional Requirements¶
Feature 1: Fix Broken ARIA Live Regions in Execution Page¶
Description: The ARIA live region tags in run_app.py are split across separate
st.markdown() calls, creating malformed DOM. Screen readers never announce status
changes during pipeline execution. Consolidate opening/closing ARIA tags into single
st.markdown() calls or use st.empty() containers.
Acceptance Criteria:
- All
<div role="status" aria-live="polite">regions are emitted as singlest.markdown()calls - Screen reader announcement works for idle, running, completed, and error states
- No orphaned opening/closing ARIA tags across separate
st.markdown()calls
Technical Requirements:
- Refactor
_display_execution_resultto build complete ARIA-wrapped HTML strings before emitting - Use
st.empty()containers for in-place status updates where appropriate - Verify DOM structure with browser dev tools after changes
Files:
src/gui/pages/run_app.py(edit)
Feature 2: Add Accessible Alternative for Agent Graph Visualization¶
Description: The Pyvis network graph is injected as raw HTML via components.html()
with no title, no ARIA role, no alt text, and no text equivalent. Add an accessible
text summary and fix the keyboard trap risk from scrolling=False.
Acceptance Criteria:
- Text summary of graph (node count, edge count, agent names) rendered below the graph
-
st.caption()with descriptive text added beforecomponents.html()call -
<title>element added to generated Pyvis HTML before injection -
scrolling=Trueset oncomponents.html()to prevent keyboard trap - Pyvis
bgcolorreads from theme instead of hard-coded#ffffff
Technical Requirements:
- Insert
<title>Agent Interaction Graph</title>into the Pyvis HTML string - Add
st.caption("Agent interaction graph showing agent and tool relationships. See statistics below for details.")before the component - Change
scrolling=Falsetoscrolling=True - Replace
bgcolor="#ffffff"with theme-aware value fromstyling.py
Files:
src/gui/pages/agent_graph.py(edit)src/gui/config/styling.py(edit)
Feature 3: Add Debug Log Panel ARIA Landmark¶
Description: The debug log panel renders raw HTML via unsafe_allow_html=True
with no ARIA landmark. Add role="log" and aria-label to the outermost container.
Fix message span color for theme compatibility.
Acceptance Criteria:
- Debug log container has
role="log"andaria-label="Debug logs" - Message span uses
color: inheritfor theme compatibility - Inline
font-family: monospace; font-size: 12pxremoved (duplicates global theme font)
Technical Requirements:
- Modify
LogCapture.format_logs_as_html()to wrap output in<section role="log" aria-label="Debug logs"> - Add
color: inheritto message<span>elements - Remove redundant inline font declarations
Files:
src/gui/utils/log_capture.py(edit)src/gui/pages/run_app.py(edit)
Feature 4: Fix Validation Warning Placement on Run Page¶
Description: The validation warning for empty query/paper fires inside the async handler and disappears on Streamlit rerender. Render the warning adjacent to the Run button so users see it.
Acceptance Criteria:
- Validation warning renders directly above or adjacent to the Run button
- Warning persists on screen until user corrects the input
- Warning is not buried in an async handler that fires after rerender
Technical Requirements:
- Move validation check from
_handle_query_submissionto therender_app()scope - Use
st.session_stateto persist the warning state across rerenders - Render warning with
st.warning()in the same container as the Run button
Files:
src/gui/pages/run_app.py(edit)
Feature 5: Fix Report Generation Double-Render and Download Persistence¶
Description: Clicking “Generate Report” twice duplicates the rendered markdown. The Download button recreates on each render with no stable confirmation. Cache the report in session state.
Acceptance Criteria:
- Generated report cached in
st.session_state— no duplicate renders on re-click - Download button persists after first generation
- “Clear Results” button resets execution state to idle
Technical Requirements:
- Store generated markdown in
st.session_state["generated_report"] - Render from cache if report already exists
- Add “Clear Results” button that resets
execution_stateto idle and clears result keys
Files:
src/gui/pages/run_app.py(edit)
Feature 6: Selectable Streamlit Themes (3 themes)¶
Description: Add three curated, selectable Streamlit themes: “Expanse Dark” (current),
“Nord Light” (light, readability-first), and “Tokyo Night” (warm dark, recommended
upgrade). Theme definitions stored in styling.py, selectable via sidebar or settings.
Acceptance Criteria:
- Three theme dicts defined in
config/styling.pywith full color specs - Theme selector widget in sidebar or settings page
- Selected theme persists in session state across page navigations
- Pyvis graph colors update to match selected theme accent colors
-
.streamlit/config.tomldocuments the default theme choice
Technical Requirements:
- Define
THEMESdict instyling.pywith keys:expanse_dark,nord_light,tokyo_night - Each theme:
primaryColor,backgroundColor,secondaryBackgroundColor,textColor,accentColor - Expanse Dark:
#4A90E2,#0b0c10,#1f2833,#66fcf1,#50C878 - Nord Light:
#5E81AC,#ECEFF4,#E5E9F0,#2E3440,#88C0D0 - Tokyo Night:
#7AA2F7,#1A1B26,#24283B,#C0CAF5,#9ECE6A - Theme selector writes to
st.session_state["selected_theme"] - Agent graph reads node/edge colors from active theme
- Note: Streamlit theme switching at runtime requires
st.set_page_configworkaround or custom CSS injection
Files:
src/gui/config/styling.py(edit)src/gui/components/sidebar.py(edit)src/gui/pages/agent_graph.py(edit)src/gui/pages/settings.py(edit).streamlit/config.toml(edit)
Feature 7: Improve Home Page Onboarding¶
Description: Home page has minimal onboarding with no actionable first step. Add a checklist or step-by-step card guiding new users through setup.
Acceptance Criteria:
- Home page shows a step-by-step onboarding guide (configure provider, download dataset, run query)
- Each step links or navigates to the relevant page
- Onboarding content defined in
text.py(not inline strings)
Technical Requirements:
- Add onboarding constants to
text.py(step titles, descriptions) - Render as
st.info()or card-like layout with numbered steps - Link steps to Settings and Run pages
Files:
src/gui/pages/home.py(edit)src/gui/config/text.py(edit)
Feature 8: Consolidate UI String Constants in text.py¶
Description: Several pages use inline string literals for headers and labels instead
of importing from text.py. Consolidate for single-source-of-truth copy management.
Acceptance Criteria:
- All header/subheader strings in
evaluation.pymoved totext.py - All header/subheader strings in
agent_graph.pymoved totext.py - All inline label strings in
run_app.py(“Debug Log”, “Generate Report”) moved totext.py
Technical Requirements:
- Add constants to
text.py:EVALUATION_HEADER,AGENT_GRAPH_HEADER,DEBUG_LOG_LABEL,GENERATE_REPORT_LABEL, etc. - Import and use in respective pages
Files:
src/gui/config/text.py(edit)src/gui/pages/evaluation.py(edit)src/gui/pages/agent_graph.py(edit)src/gui/pages/run_app.py(edit)
Feature 9: Fix Navigation Consistency and Baseline Expander¶
Description: Sidebar labels don’t match page headers. Evaluation baseline expander hidden by default on first visit. Phoenix Trace Viewer always visible even when not configured.
Acceptance Criteria:
- Sidebar navigation labels align with page headers
- Baseline comparison expander expanded by default on first visit (no result available)
- Phoenix Trace Viewer moved to collapsed sidebar expander
Technical Requirements:
- Update
PAGESlist inconfig.pyto match page header text - Set
expanded=Trueon baseline comparison expander when no result exists - Wrap Phoenix link in
st.sidebar.expander("Tracing (optional)")
Files:
src/gui/config/config.py(edit)src/gui/pages/evaluation.py(edit)src/gui/components/sidebar.py(edit)
Feature 10: Fix Pyvis Graph Contrast and Color Theming¶
Description: Agent graph node colors have insufficient contrast for text labels. Graph background is hard-coded white, conflicting with dark theme. Make graph colors theme-aware.
Acceptance Criteria:
- Node label
font_colorexplicitly set (not Pyvis default) - Node colors provide >= 4.5:1 contrast ratio for labels
- Graph
bgcolorreads from active theme (not hard-coded#ffffff) - Agent and tool node colors update when theme changes
Technical Requirements:
- Set
font_color="#000000"for light themes,font_color="#ECEFF4"for dark themes - Read
bgcolorfrom theme dict instyling.py - Map agent node color to theme
primaryColor, tool node color to themeaccentColor
Files:
src/gui/pages/agent_graph.py(edit)src/gui/config/styling.py(edit)
Feature 11: Type-Aware Output Rendering¶
Description: render_output uses generic st.write() for all result types.
Implement type-aware rendering so Pydantic models and dicts render with navigable
structure instead of raw object dumps.
Acceptance Criteria:
-
render_output()type-checks result and uses structured rendering (e.g.,st.json()for dicts) -
CompositeResultand other Pydantic models render with navigable structure
Technical Requirements:
- Add type dispatch:
st.json()for dicts,st.markdown()for strings, structured rendering for Pydantic models - Import relevant result types from
src/app/data_models/
Files:
src/gui/components/output.py(edit)
Non-Functional Requirements¶
- All color combinations must pass WCAG 2.1 AA contrast ratios (4.5:1 for normal text, 3:1 for large text)
- Theme switching must not require app restart — session-state-driven
- No new Python dependencies — use only Streamlit built-in capabilities
- All user-visible strings must live in
src/gui/config/text.py(single source of truth) - Debug log panel must function correctly in all three themes
Out of Scope¶
- Custom CSS injection beyond Streamlit theming — Streamlit’s theme system is sufficient for Sprint 13
- Automated accessibility testing in CI (axe-core, pa11y) — deferred to Sprint 14+
- Responsive/mobile layout optimization — Streamlit’s responsive behavior is acceptable for desktop-first tool
- Internationalization (i18n) — text.py consolidation enables future i18n but translation is out of scope
- Phoenix availability detection — checking if Phoenix is running on startup is a nice-to-have deferred to Sprint 14
- Prompt editing with save — Sprint 13 makes prompts read-only; full CRUD editing deferred to Sprint 14+
- Sub-agent default enablement — changing checkbox defaults requires UX research on user expectations
Notes for Ralph Loop¶
Story Breakdown - Phase 1 (12 stories total):
-
Feature 1 → STORY-001: Fix broken ARIA live regions in run_app.py Consolidate split
st.markdown()ARIA tags into single calls. Refactor_display_execution_resultto build complete ARIA-wrapped HTML. Files:src/gui/pages/run_app.py. Tests:tests/gui/test_run_app.py. -
Feature 2 → STORY-002: Add accessible alternative for agent graph Add
<title>to Pyvis HTML,st.caption()before graph, text summary below,scrolling=True. Files:src/gui/pages/agent_graph.py. Tests:tests/gui/test_agent_graph_page.py. -
Feature 3 → STORY-003: Add debug log panel ARIA landmark Add
role="log"andaria-labelto log container HTML. Fix message span color. Remove redundant inline font. Files:src/gui/utils/log_capture.py,src/gui/pages/run_app.py. Tests:tests/gui/test_realtime_debug_log.py. -
Feature 4 → STORY-004: Fix validation warning placement on Run page Move validation from async handler to
render_app()scope. Persist warning in session state. Files:src/gui/pages/run_app.py. Tests:tests/gui/test_run_app.py. -
Feature 5 → STORY-005: Fix report generation and add clear results (depends: STORY-004) Cache generated report in session state. Add “Clear Results” button. Files:
src/gui/pages/run_app.py. Tests:tests/gui/test_run_app.py. -
Feature 6 → STORY-006: Define theme dicts in styling.py Create
THEMESdict with three themes (Expanse Dark, Nord Light, Tokyo Night). Full hex color specs. Files:src/gui/config/styling.py. Tests:tests/gui/test_styling.py(new). -
Feature 6 → STORY-007: Add theme selector widget (depends: STORY-006) Add theme selector to sidebar or settings. Persist in session state. Files:
src/gui/components/sidebar.py,src/gui/pages/settings.py,.streamlit/config.toml. Tests:tests/gui/test_sidebar_navigation.py,tests/gui/test_settings_integration.py. -
Feature 7 → STORY-008: Improve home page onboarding Add step-by-step onboarding guide with setup checklist. Define content in text.py. Files:
src/gui/pages/home.py,src/gui/config/text.py. Tests:tests/gui/test_home_page.py(new). -
Feature 8 → STORY-009: Consolidate UI string constants in text.py Move inline header/label strings from evaluation.py, agent_graph.py, run_app.py to text.py. Files:
src/gui/config/text.py,src/gui/pages/evaluation.py,src/gui/pages/agent_graph.py,src/gui/pages/run_app.py. Tests:tests/gui/test_text_constants.py(new). -
Feature 9 → STORY-010: Fix navigation consistency and baseline expander (depends: STORY-009) Align sidebar labels with page headers. Expand baseline expander on first visit. Move Phoenix link to collapsed expander. Files:
src/gui/config/config.py,src/gui/pages/evaluation.py,src/gui/components/sidebar.py. Tests:tests/gui/test_sidebar_navigation.py,tests/gui/test_evaluation_page.py. -
Feature 10 → STORY-011: Fix Pyvis graph contrast and color theming (depends: STORY-006, STORY-002) Set explicit font_color. Read bgcolor from theme. Map node colors to theme palette. Files:
src/gui/pages/agent_graph.py,src/gui/config/styling.py. Tests:tests/gui/test_agent_graph_page.py,tests/gui/test_styling.py(new). -
Feature 11 → STORY-012: Type-aware output rendering Add type dispatch for structured rendering of dicts, strings, and Pydantic models. Files:
src/gui/components/output.py. Tests:tests/gui/test_output_rendering.py(new).
TDD Workflow Per Story¶
Each story follows Red-Green-Refactor:
- RED: Write failing tests first in the test file(s) listed above. Test behavior, not implementation (e.g., “ARIA region contains status text” not “st.markdown called with specific args”).
- GREEN: Implement minimal code in source file(s) to pass tests.
- REFACTOR: Clean up, run
make quick_validate, thenmake test.
Testing rules:
- Mock
streamlitcalls withspec=constraints (e.g.,MagicMock(spec=st.delta_generator.DeltaGenerator)) - Test ARIA/HTML output as string assertions on the generated HTML content
- Theme tests: validate dict structure, color hex format, required keys
- Use
st.session_statefixtures from existingtests/gui/conftest.py - New test files marked
(new)above must be created by the implementing teammate
Notes for CC Agent Teams¶
File-Conflict Dependencies¶
| Story | Logical Dep | + File-Conflict Dep | Shared File | Resolution |
|---|---|---|---|---|
| STORY-001 | — | + STORY-003, STORY-004 | src/gui/pages/run_app.py |
Wave 1: different functions, low conflict risk |
| STORY-003 | — | + STORY-001, STORY-004 | src/gui/pages/run_app.py |
Wave 1: different functions, low conflict risk |
| STORY-004 | — | + STORY-001, STORY-003 | src/gui/pages/run_app.py |
Wave 1: different functions, low conflict risk |
| STORY-005 | STORY-004 | — | src/gui/pages/run_app.py |
Wave 2: sequential after STORY-004 |
| STORY-007 | STORY-006 | — | src/gui/components/sidebar.py |
Wave 2: sequential after STORY-006 |
| STORY-009 | — | + STORY-002 | src/gui/pages/agent_graph.py |
Wave 1: STORY-009 edits imports only, low conflict |
| STORY-010 | STORY-009 | — | src/gui/pages/evaluation.py |
Wave 2: sequential after STORY-009 |
| STORY-011 | STORY-006 | + STORY-002 | src/gui/pages/agent_graph.py |
Wave 3: sequential after both STORY-002 and STORY-006 |
Orchestration Waves¶
Wave 1 (parallel, 7 teammates):
STORY-001 run_app.py ARIA regions — no deps
STORY-002 agent_graph.py accessibility — no deps
STORY-003 log_capture.py ARIA landmark — no deps
STORY-004 run_app.py validation warning — no deps
STORY-006 styling.py theme dicts — no deps
STORY-008 home.py onboarding — no deps
STORY-012 output.py type-aware rendering — no deps
Wave 2 (parallel, 3 teammates — after Wave 1 deps resolve):
STORY-005 run_app.py report caching — depends: STORY-004
STORY-007 sidebar.py theme selector — depends: STORY-006
STORY-009 text.py string consolidation — no logical dep (Wave 2 for run_app.py contention)
Wave 3 (parallel, 2 teammates — after Wave 2 deps resolve):
STORY-010 config.py + evaluation.py nav — depends: STORY-009
STORY-011 agent_graph.py contrast/theme — depends: STORY-006, STORY-002
Teammate Spawn Strategy¶
Each wave spawns teammates in parallel using general-purpose subagent type with isolation: "worktree". Teammates receive:
- Story ID, title, description, acceptance criteria, source files, test files
MANDATORY: Read AGENTS.md first, then CONTRIBUTING.md. TDD workflow: write failing tests first, then implement.- Explicit
depends_onlist so teammates know which files are safe to edit
Wave transitions: Lead waits for all teammates in current wave to complete (TaskList polling) before spawning next wave. This prevents file conflicts on shared files like run_app.py and agent_graph.py.
run_app.py contention (Wave 1): STORY-001, STORY-003, and STORY-004 all edit run_app.py but touch different functions (_display_execution_result, log panel rendering, and render_app() validation respectively). Use worktree isolation — lead merges sequentially after Wave 1 completes. If merge conflicts arise, resolve by re-running the conflicting story on the merged base.