Feature Specification: Visionary Tarot (Card Recognition)

This document specifies the technical and user-experience requirements for the "Visionary Tarot" feature, enabling users to bridge the gap between physical tarot decks and digital intelligent interpretation.

🎯 Objective

Enable users to take a photo of a physical tarot card spread (e.g., a 3-card spread on a table) and have the app automatically identify the cards, recognize the positions, and provide an AI-powered interpretation.

🛠️ Technical Requirements

1. Computer Vision Pipeline (On-Device)

To ensure low latency and privacy, the initial detection should happen on the user's device.

Step 1: Object Detection (Segmentation)
- Model: YOLOv8-Nano or MobileNetV3-SSD optimized for mobile (CoreML/TFLite).
- Task: Detect rectangular regions representing individual tarot cards within the camera frame.
- Constraint: Must work in varied lighting (candlelight, bright sun) and against different backgrounds (cloth, wood, carpets).
Step 2: Card Classification
- Method: Feature Embedding (CLIP-based).
- Mechanism: Instead of a 78-class classifier (which is brittle to deck art styles), the app will use a lightweight encoder to generate an embedding of the detected card segment and compare it against a pre-computed database of "Standard Rider-Waite" embeddings in the Vector DB.
- Benefit: High accuracy across different deck styles if using a robust vision-language model approach.

2. The "Logic" Layer (Cloud-Based)

Once the cards and positions are identified, the heavy lifting is done in the cloud.

Feature Mapping: The system must map the detected rectangular coordinates to a "Spread Template" (e.g., "Past, Present, Future" or "Celtic Cross").
Context Injection: The identified cards are sent to the LLM (Claude/GPT-4o) along with the user's current astrological transits.

🎨 User Experience (UX) Design

1. The "Scan" Interface

AR Overlay: When the camera is active, the app displays a "Live Guide" (e.g., a glowing rectangular reticle) that follows the user's focus as they point at a card.
Haptic Feedback: A subtle "click" (haptic vibration) when a card is successfully "locked in" and identified.
Visual Confirmation: As cards are detected, they appear as "Digital Ghosts" (semi-transparent overlays) on the screen, confirming to the user that the app "sees" them.

2. The "Magic Transition"

Animation: Once the user taps "Interpret," the camera view dissolves into a beautiful, full-screen digital version of the detected cards, transitioning from the "Physical World" to the "Digital Oracle."

⚠️ Challenges & Mitigations

Challenge	Mitigation Strategy
Multiple Card Styles	Use a multi-modal embedding approach (CLIP) that focuses on structural features and iconography rather than specific pixel-perfect matching.
Poor Lighting/Blur	Implement an "Auto-Focus & Flash" prompt within the UI and use image pre-processing (histogram equalization) before inference.
Complex Backgrounds	Use Instance Segmentation to separate the card from the tablecloth or hands.

📊 Success Metrics

Detection Accuracy: >95% for standard Rider-Waite decks in optimal lighting.
Inference Latency: <2 seconds from "Capture" to "Identification."
User Retention: Measure the increase in session length for users who engage with the "Visionary" feature vs. those who use only digital pulls.

Feature Specification: Visionary Tarot (Card Recognition) ​

🎯 Objective ​

🛠️ Technical Requirements ​

1. Computer Vision Pipeline (On-Device) ​

2. The "Logic" Layer (Cloud-Based) ​

🎨 User Experience (UX) Design ​

1. The "Scan" Interface ​

2. The "Magic Transition" ​

⚠️ Challenges & Mitigations ​

📊 Success Metrics ​