Troubleshooting Guide¶
This document provides guidance for common issues encountered during evaluation and development.
Table of Contents¶
Tier 2 Authentication Failures¶
Symptoms¶
When running evaluations with Tier 2 (LLM-as-Judge) enabled, you may see:
- Warning logs:
"Auth failure detected - using neutral fallback score" - Tier 2 metrics return neutral scores (0.5)
Tier2Result.fallback_usedisTrue- Lower composite scores due to neutral Tier 2 contributions
Causes¶
Authentication failures occur when:
- Missing API keys: Primary provider (
tier2_provider) has no API key configured - Invalid API keys: Configured API key is expired or incorrect
- No fallback provider: Both primary and fallback providers lack valid API keys
Resolution¶
1. Check API Key Configuration¶
Verify environment variables are set correctly:
# For OpenAI (default primary provider)
echo $OPENAI_API_KEY
# For GitHub (common fallback)
echo $GITHUB_API_KEY
# For other providers (Cerebras, Groq, etc.)
echo $CEREBRAS_API_KEY
echo $GROQ_API_KEY
2. Configure Fallback Provider¶
Update JudgeSettings to specify a fallback provider:
from app.config.judge_settings import JudgeSettings
settings = JudgeSettings(
tier2_provider="openai",
tier2_model="gpt-4o-mini",
tier2_fallback_provider="github", # Fallback when primary fails
tier2_fallback_model="gpt-4o-mini",
)
3. Provider Fallback Chain¶
The evaluation engine follows this fallback chain:
- Primary provider (
tier2_provider) - checked first - Fallback provider (
tier2_fallback_provider) - used if primary unavailable - Neutral scores (0.5) - returned when all providers unavailable
4. Verify Provider Selection¶
Use the select_available_provider() method to check which provider will be used:
from app.config.app_env import AppEnv
from app.judge.llm_evaluation_managers import LLMJudgeEngine
engine = LLMJudgeEngine(settings)
env_config = AppEnv() # Loads from environment
selected = engine.select_available_provider(env_config)
if selected is None:
print("No providers available - Tier 2 will use neutral fallback scores")
else:
provider, model = selected
print(f"Using provider: {provider}/{model}")
Expected Behavior¶
When Auth Fails¶
- Individual assessments return neutral score (0.5)
technical_accuracy: 0.5constructiveness: 0.5planning_rationality: 0.5fallback_usedflag set toTruemodel_usedfield shows configured provider (not “fallback_traditional”)- Composite scoring redistributes weights to Tier 1 + Tier 3
When Auth Succeeds¶
- Full LLM-based scores (0.0-1.0 range based on assessment)
fallback_usedflag set toFalse- Normal composite scoring with all three tiers
Disabling Tier 2¶
If you don’t have access to LLM providers, disable Tier 2 entirely:
settings = JudgeSettings(
tier1_enabled=True,
tier2_enabled=False, # Skip LLM-as-Judge
tier3_enabled=True,
)
This avoids auth failure warnings and redistributes weights to Tier 1 + Tier 3 automatically.
Logging¶
Enable debug logging to see provider selection details:
You’ll see logs like:
"Using primary provider: openai/gpt-4o-mini""Primary provider unavailable, using fallback: github/gpt-4o-mini""Neither primary nor fallback providers have valid API keys"