T1 Research Findings: Asset Extraction Readiness

Date: 2026-05-11

Agent: researcher

Task: 20260511-T001 - Asset Extraction Readiness

Executive Summary

This T1 research phase provides a comprehensive analysis of the current asset extraction workflow, identifies historical patterns, documents tooling capabilities, and pinpoints specific gaps that need to be addressed for title-menu asset extraction readiness.

1. Historical Asset Evidence Analysis

Stage Briefs Review

Finding: No dedicated asset stage brief exists. Asset handling is distributed across: - SB-003 Render System: Mentions texture loading but no D2-specific formats - SB-004 Font System: Focuses on font assets, not PVR/PVM textures - SB-010 Build System: References asset pipeline but lacks D2-specific details

Gap: No comprehensive asset extraction workflow documentation exists.

Historical Manifest Analysis

File: /work/repo/asset-staging/raw/d2-title/extract-manifest.json

Key Findings: 1. Early Speculative Manifest: Contains 8 assets with placeholder entry names 2. Different Symbol Naming: Uses d2_asset_title_snow vs current d2_asset_snow 3. Partial Coverage: Missing menu, copyright, and some background layers 4. Notes Field: Provides useful context about asset roles

Speculative Entry Names Found: - Q_TITLEBGMT0.PVM:BGMT0 (background) - Q_DJSNOW.PVM:DJSNOW (snow particles) - Q_TITLE2D.PVM:TITLE2D (logo) - Q_TITLEMENU.PVM:TITLEMENU (menu) - Q_TITLEMENU.PVM:COPYRIGHT (copyright) - SAKA_MNSNOW1A.PVM:MNSNOW1A (snow variant A) - SAKA_MNSNOW1B.PVM:MNSNOW1B (snow variant B) - SAKA_MNSNOW1C.PVM:MNSNOW1C (snow variant C)

Current Manifest Analysis

File: /work/repo/tools/title_menu_manifest.json

Key Findings: 1. Expanded Coverage: 11 assets covering all required roles 2. Consistent Naming: Uses d2_asset_* prefix consistently 3. Role Classification: Explicit role field (background, logo, menu, overlay, particles, copyright) 4. Speculative Entry Names: Still uses placeholder entry names that need validation

Current Speculative Entry Names: - Q_TITLEBGMT0.PVM:TITLEBGMT0 - Q_TITLEBGMT1.PVM:TITLEBGMT1 - Q_TITLEBGMT2.PVM:TITLEBGMT2 - Q_DMTITLE.PVM:DMTITLE - Q_TITLEMENU.PVM:TITLEMENU - Q_TITLE2D.PVM:TITLE2D - Q_DJSNOW.PVM:DJSNOW - SAKA_MNSNOW1A.PVM:SAKA_MNSNOW1A - SAKA_MNSNOW1B.PVM:SAKA_MNSNOW1B - SAKA_MNSNOW1C.PVM:SAKA_MNSNOW1C - P_COMTIT.PVR: (no entry, standalone PVR)

Gap Analysis Document

File: /work/repo/notes/2026-05-10-d2-title-menu-gap.md

Key Findings: 1. Current State: Renders old decoder-evidence screen (W/A/R/P + 0GDTEX) 2. Target State: D2 title menu with 5 components (background, snow, logo, menu, copyright) 3. Pipeline Status: - ✅ PVM parser implemented (supports multiple formats) - ✅ Manifest-driven extraction available - ❌ Title-menu PVMs not extracted from disc - ❌ Entry names not validated against real files - ❌ .CTS animation files not handled 4. Build Gap: No target for title menu validation image

2. Extraction Tool Capabilities Analysis

Supported Formats

Pixel Formats (PF_*): - 0x01 PF_RGB565: RGB 5-6-5 format - 0x02 PF_ARGB4444: ARGB 4-4-4-4 format - 0x03 PF_ARGB1555: ARGB 1-5-5-5 format - 0x04 PF_YUV422: YUV 4:2:2 format (not implemented) - 0x05 PF_BUMP: Bump map format (not implemented) - 0x06 PF_PAL4: 4-bit palettized (not implemented) - 0x07 PF_PAL8: 8-bit palettized (not implemented)

Data Formats (DF_*): - 0x01 DF_SQUARE_TWIDDLED: Square twiddled pixel data - 0x03 DF_VQ: Vector quantization (256-entry codebook) - 0x10 DF_SMALLVQ: Small VQ variant

Decoder Coverage

Implemented Decoders: 1. ✅ _decode_square_twiddled_rgb565() - RGB565 square twiddled 2. ✅ _decode_square_twiddled_argb4444() - ARGB4444 square twiddled 3. ✅ _decode_square_twiddled_argb1555() - ARGB1555 square twiddled 4. ✅ _decode_vq_rgb565() - VQ/SMALLVQ RGB565 5. ✅ decode_pvrt_auto() - Auto-detection dispatcher

Missing Decoders: 1. ❌ YUV422 formats 2. ❌ Bump map formats 3. ❌ Palettized formats (PAL4, PAL8) 4. ❌ .CTS animation sequence parsing

Error Handling Analysis

Current Error Messages:

# Missing PVMH magic
raise ValueError("PVMH header missing")

# Missing PVRT magic  
raise ValueError(f"PVRT header missing at 0x{offset:x}")

# Wrong format for decoder
raise ValueError(f"{source_name} is not RGB565 square twiddled")

# Unsupported format
raise ValueError(f"unsupported PVRT format pixel_format={pf_name}...")

# File not found
print(f"SKIP {symbol}: {source_path} not found ({notes})")

# PVM entry not found
print(f"SKIP {symbol}: entry '{entry}' not in {source}. Available: {available}")

# Decode errors
print(f"SKIP {symbol}: decode error: {e}")

Improvement Opportunities: 1. Actionable Guidance: Errors don't explain how to fix issues 2. Path Suggestions: Don't indicate where to place assets 3. Format Documentation: Don't explain supported formats 4. Discovery Hint: Don't suggest using --discover mode 5. Manifest Validation: No pre-flight validation of manifest schema

Discovery Mode Analysis

Current --discover Output: - ✅ Lists all PVM/PVR/CTS files recursively - ✅ Shows PVM entry names and offsets - ✅ Displays PVRT format info (PF/DF labels, dimensions) - ✅ Indicates CTS companions - ❌ No comparison with manifest expectations - ❌ No indication of missing expected files - ❌ No format validation warnings - ❌ Output is text-only, not machine-readable

Improvement Opportunities: 1. Add manifest comparison mode 2. Highlight missing expected files 3. Warn about unsupported formats 4. Add JSON output option for tooling integration 5. Show expected vs. actual asset coverage

3. Manifest Validation Analysis

Current Manifest Schema

Required Fields: - symbol: C identifier for generated asset - source_file: Source filename - source_entry: Entry name (for PVM) or null (for PVR) - decoder: "auto" (only supported value) - role: Asset role (background, logo, menu, overlay, particles, copyright) - description: Human-readable description

Validation Rules: - ✅ Symbol must start with d2_asset_ prefix - ✅ Source file must exist - ✅ PVM files require entry name - ✅ PVR files require null entry - ❌ No validation of role values - ❌ No validation of symbol uniqueness - ❌ No validation of source file extensions - ❌ No pre-flight validation without actual extraction

Placeholder Clarity Analysis

Current Placeholder Issues: 1. Speculative Entry Names: All PVM entry names are guesses 2. No Confidence Indicators: No way to mark speculative vs. confirmed 3. No Validation Status: No field to track validation state 4. No Historical Context: No reference to where names came from

Suggested Improvements: 1. Add validation_status field: "speculative", "confirmed", "extracted" 2. Add source field: "historical_manifest", "discovery", "extracted" 3. Add confidence field: 0-100% confidence score 4. Add warnings in generated headers for speculative assets

4. Workflow Documentation Gaps

Missing Documentation

Critical Missing Documents: 1. Asset Extraction Workflow: Step-by-step guide from disc to C code 2. Private Asset Setup: Where to place assets, directory structure 3. Discovery Process: How to use --discover to validate entry names 4. Manifest Editing: How to update manifest with confirmed names 5. Troubleshooting Guide: Common issues and solutions

Existing Documentation Review

BUILD.md: - ✅ Explains build targets - ❌ No asset extraction section - ❌ No private asset setup instructions - ❌ No troubleshooting for missing assets

README.md: - ✅ Project overview - ❌ No asset pipeline mention - ❌ No setup prerequisites - ❌ No quickstart for asset extraction

docs/ASSET_EXTRACTION_MAP.md: - ✅ Lists candidate files - ✅ Explains extraction path priority - ❌ No workflow steps - ❌ No tool usage examples - ❌ No error handling guidance

5. Tooling Enhancement Requirements

High-Priority Enhancements

  1. --dry-run Mode
  2. Validate manifest schema without requiring files
  3. Check symbol naming conventions
  4. Verify role values
  5. Report potential issues before extraction

  6. Improved Error Messages

  7. Explain how to resolve each error type
  8. Provide path suggestions for missing assets
  9. Reference documentation sections
  10. Suggest --discover for validation

  11. Manifest Comparison in Discovery

  12. Show which manifest assets are missing
  13. Highlight assets with wrong formats
  14. Indicate speculative vs. confirmed entries
  15. Provide coverage percentage

  16. Better Discovery Output

  17. Add JSON output option
  18. Show expected manifest assets
  19. Highlight format compatibility issues
  20. Provide actionable next steps

Medium-Priority Enhancements

  1. Placeholder Management
  2. Add placeholder validation status
  3. Generate warnings for speculative assets
  4. Track confidence levels
  5. Provide upgrade path from placeholders to real assets

  6. Format Documentation

  7. Add --list-formats option
  8. Explain supported PVRT formats
  9. Show format compatibility matrix
  10. Provide examples of each format

  11. Asset Preview

  12. Generate PNG previews of extracted assets
  13. Show asset dimensions and format
  14. Provide visual validation
  15. Help with debugging

6. Current Build System Analysis

Build Targets

Current Targets: - make d2-assets: Runs extraction tool with manifest - make elf: Builds ELF executable - make flycast-image: Builds Flycast-compatible ELF - make verify-flycast: Launches Flycast for validation

Build System Status: - ✅ KOS toolchain detection works - ✅ ELF validation (entry point check) - ✅ Manifest generation - ❌ No asset validation before build - ❌ No warning if using placeholder assets - ❌ No title-menu specific target

Asset Integration

Current Integration: - Generated d2_menu_assets.c included in build - Assets accessible via d2_menu_assets.h header - Asset metadata in asset_metadata.c - No runtime asset validation - No missing asset warnings

7. Specific Recommendations

Documentation Improvements

  1. Create docs/ASSET_EXTRACTION_WORKFLOW.md: ```markdown
  2. Overview of asset extraction process
  3. Prerequisites (Python, private asset location)
  4. Private asset setup (directory structure)
  5. Discovery phase (--discover usage)
  6. Manifest validation and editing
  7. Extraction execution
  8. Build integration
  9. Troubleshooting common issues ```

  10. Update README.md:

  11. Add Asset Extraction section
  12. List prerequisites clearly
  13. Provide quickstart guide
  14. Link to detailed workflow

  15. Update BUILD.md:

  16. Add Asset Extraction prerequisites
  17. Explain placeholder vs. real assets
  18. Document troubleshooting steps
  19. Add common error resolutions

  20. Update docs/ASSET_EXTRACTION_MAP.md:

  21. Add workflow steps
  22. Include tool usage examples
  23. Explain error handling
  24. Provide discovery guidance

Tooling Improvements

  1. Add --dry-run Mode: ```python def validate_manifest_only(d2_dir: Path, manifest: dict) -> list[str]: """Validate manifest schema and naming without requiring files.""" warnings = [] seen_symbols = set()

    for item in manifest.get("assets", []): # Validate symbol naming if not item["symbol"].startswith("d2_asset_"): warnings.append(f"Symbol '{item['symbol']}' should start with 'd2_asset_'")

       # Check for duplicates
       if item["symbol"] in seen_symbols:
           warnings.append(f"Duplicate symbol: '{item['symbol']}'")
       seen_symbols.add(item["symbol"])
    
       # Validate source file extension
       source = item.get("source_file") or item.get("source")
       if not source:
           warnings.append(f"Missing source file for '{item['symbol']}'")
       elif not source.upper().endswith((".PVM", ".PVR")):
           warnings.append(f"Invalid source extension for '{item['symbol']}': {source}")
    
       # Validate PVM/PVR requirements
       if source.upper().endswith(".PVM"):
           if not item.get("source_entry"):
               warnings.append(f"PVM source requires entry name: '{item['symbol']}'")
       elif source.upper().endswith(".PVR"):
           if item.get("source_entry"):
               warnings.append(f"PVR source should not have entry name: '{item['symbol']}'")
    

    return warnings ```

  2. Enhance Error Messages: ```python # Before print(f"SKIP {symbol}: {source_path} not found ({notes})")

# After print( f"ERROR: Asset '{symbol}' not found at {source_path}\n" f"Expected location: {source_path}\n" f"Solution: Place private D2 assets at {d2_dir}/\n" f"See docs/ASSET_EXTRACTION_WORKFLOW.md for setup instructions\n" f"Use --discover to see available assets" ) ```

  1. Improve Discovery Output: ```python def discover_with_manifest(d2_dir: Path, manifest: dict) -> None: """Enhanced discovery that compares with manifest expectations.""" # Get manifest expectations expected = {item["source_file"]: item for item in manifest.get("assets", [])}

    # Track coverage found_assets = set() missing_assets = set(expected.keys())

    # Existing discovery logic... for path in all_paths: rel = path.relative_to(d2_dir) if rel in expected: found_assets.add(rel) missing_assets.discard(rel)

       # Existing discovery output...
    

    # Add summary print("\n" + "="60) print("MANIFEST COVERAGE SUMMARY") print("="60) print(f"Expected assets: {len(expected)}") print(f"Found assets: {len(found_assets)}") print(f"Missing assets: {len(missing_assets)}") if missing_assets: print("\nMissing assets:") for asset in sorted(missing_assets): print(f" - {asset}") print(f"\nCoverage: {len(found_assets)}/{len(expected)} ({100*len(found_assets)//len(expected)}%)") ```

Manifest Improvements

  1. Add Validation Status Field: json { "symbol": "d2_asset_title_bg", "source_file": "Q_TITLEBGMT0.PVM", "source_entry": "TITLEBGMT0", "validation_status": "speculative", "confidence": 50, "source": "historical_manifest", "role": "background", "description": "Snowy mountain background layer 0" }

  2. Add Schema Validation: python MANIFEST_SCHEMA = { "type": "object", "properties": { "version": {"type": "string"}, "target": {"type": "string"}, "description": {"type": "string"}, "source_dir": {"type": "string"}, "assets": { "type": "array", "items": { "type": "object", "properties": { "symbol": {"type": "string", "pattern": "^d2_asset_"}, "source_file": {"type": "string"}, "source_entry": {"anyOf": [{"type": "string"}, {"type": "null"}]}, "validation_status": {"type": "string", "enum": ["speculative", "confirmed", "extracted"]}, "confidence": {"type": "integer", "minimum": 0, "maximum": 100}, "source": {"type": "string"}, "role": {"type": "string", "enum": ["background", "logo", "menu", "overlay", "particles", "copyright"]}, "description": {"type": "string"} }, "required": ["symbol", "source_file", "role"] } } }, "required": ["version", "assets"] }

8. Implementation Roadmap

Phase 1: Documentation (T2)

  1. Draft docs/ASSET_EXTRACTION_WORKFLOW.md
  2. Update README.md with asset requirements
  3. Update BUILD.md with troubleshooting
  4. Update docs/ASSET_EXTRACTION_MAP.md with workflow

Phase 2: Tooling Enhancements (T3)

  1. Implement --dry-run mode
  2. Enhance error messages with actionable guidance
  3. Add manifest comparison to discovery
  4. Improve discovery output formatting

Phase 3: Manifest Improvements (T3)

  1. Add validation status field
  2. Add confidence indicators
  3. Add source tracking
  4. Update placeholders with historical context

Phase 4: Testing (T5)

  1. Test --dry-run with current manifest
  2. Test discovery with manifest comparison
  3. Verify error messages are clear
  4. Test existing build still works

9. Risk Assessment

Low Risk

Medium Risk

High Risk (Avoided)

10. Conclusion

The T1 research phase has identified clear, actionable improvements that can be made to the asset extraction workflow without requiring access to private D2 assets. The proposed enhancements focus on:

  1. Documentation: Creating comprehensive workflow guides
  2. Tooling: Improving error handling and discovery capabilities
  3. Manifest: Adding validation status and clarity
  4. User Experience: Making the workflow fail gracefully and provide clear guidance

These improvements will prepare the project for successful asset extraction when private D2 assets become available, while maintaining the current build functionality with placeholder assets.

11. References

12. Next Steps

Proceed to T2 (Design) phase to: 1. Create detailed documentation outlines 2. Design tooling enhancements 3. Plan manifest improvements 4. Develop testing strategy