Feature Specification: Visionary Tarot (Card Recognition)
This document specifies the technical and user-experience requirements for the "Visionary Tarot" feature, enabling users to bridge the gap between physical tarot decks and digital intelligent interpretation.
🎯 Objective
Enable users to take a photo of a physical tarot card spread (e.g., a 3-card spread on a table) and have the app automatically identify the cards, recognize the positions, and provide an AI-powered interpretation.
🛠️ Technical Requirements
1. Computer Vision Pipeline (On-Device)
To ensure low latency and privacy, the initial detection should happen on the user's device.
- Step 1: Object Detection (Segmentation)
- Model: YOLOv8-Nano or MobileNetV3-SSD optimized for mobile (CoreML/TFLite).
- Task: Detect rectangular regions representing individual tarot cards within the camera frame.
- Constraint: Must work in varied lighting (candlelight, bright sun) and against different backgrounds (cloth, wood, carpets).
- Step 2: Card Classification
- Method: Feature Embedding (CLIP-based).
- Mechanism: Instead of a 78-class classifier (which is brittle to deck art styles), the app will use a lightweight encoder to generate an embedding of the detected card segment and compare it against a pre-computed database of "Standard Rider-Waite" embeddings in the Vector DB.
- Benefit: High accuracy across different deck styles if using a robust vision-language model approach.
2. The "Logic" Layer (Cloud-Based)
Once the cards and positions are identified, the heavy lifting is done in the cloud.
- Feature Mapping: The system must map the detected rectangular coordinates to a "Spread Template" (e.g., "Past, Present, Future" or "Celtic Cross").
- Context Injection: The identified cards are sent to the LLM (Claude/GPT-4o) along with the user's current astrological transits.
🎨 User Experience (UX) Design
1. The "Scan" Interface
- AR Overlay: When the camera is active, the app displays a "Live Guide" (e.g., a glowing rectangular reticle) that follows the user's focus as they point at a card.
- Haptic Feedback: A subtle "click" (haptic vibration) when a card is successfully "locked in" and identified.
- Visual Confirmation: As cards are detected, they appear as "Digital Ghosts" (semi-transparent overlays) on the screen, confirming to the user that the app "sees" them.
2. The "Magic Transition"
- Animation: Once the user taps "Interpret," the camera view dissolves into a beautiful, full-screen digital version of the detected cards, transitioning from the "Physical World" to the "Digital Oracle."
⚠️ Challenges & Mitigations
| Challenge | Mitigation Strategy |
|---|---|
| Multiple Card Styles | Use a multi-modal embedding approach (CLIP) that focuses on structural features and iconography rather than specific pixel-perfect matching. |
| Poor Lighting/Blur | Implement an "Auto-Focus & Flash" prompt within the UI and use image pre-processing (histogram equalization) before inference. |
| Complex Backgrounds | Use Instance Segmentation to separate the card from the tablecloth or hands. |
📊 Success Metrics
- Detection Accuracy: >95% for standard Rider-Waite decks in optimal lighting.
- Inference Latency: <2 seconds from "Capture" to "Identification."
- User Retention: Measure the increase in session length for users who engage with the "Visionary" feature vs. those who use only digital pulls.