Implement intelligent `scan` command #1069

New issue

Open

opened 2025-10-14 16:46:24 -06:00 by navan · 0 comments

navan commented

2025-10-14 16:46:24 -06:00

Owner

Originally created by @eyaltoledano on 4/1/2025

"When installing Task Master on existing projects, I struggle to quickly generate meaningful tasks because there's no efficient way to summarize and describe the existing codebase accurately. A command that intelligently scans and summarizes the existing project structure and code would significantly streamline initial setup."

Motivation

New users adopting Task Master on existing projects face difficulties accurately capturing the current state and complexity of their projects. Without a precise understanding and summary of the existing codebase, generating meaningful and accurate tasks from scratch becomes challenging.

A recursive, intelligent, and AI-driven scan command will empower users to quickly and reliably produce comprehensive project summaries, facilitating accurate task and PRD generation, reducing initial friction, and significantly improving user onboarding.

Proposed Solution

Implement an intelligent CLI command:

task-master scan [--output=<file_path>]

This command performs a recursive, AI-driven analysis of the project's codebase in iterative, transparent steps, resulting in a structured JSON summary with clear file-level and directory-level details.

Detailed Scanning Steps and AI Prompts

Scan 1: Root Directory Scan (Project Type Identification)

CLI introspection: List root-level files and directories (using ls).
Prompt to LLM: "Given this root directory and files (package.json, Dockerfile, index.js, .gitignore, etc.), identify the type of project (e.g., Node.js, React, Laravel, Python). Clearly specify which files or folders should be excluded from further analysis due to irrelevance (e.g., logs, binaries)."
LLM identifies project type and irrelevant files/directories.

Scan 2: Entry Point and Core Files Identification

CLI introspection: Inspect top-level files using cat or head to determine key entry points.
Prompt to LLM: "Based on the project type identified ({projectType}), identify the main entry points and core files critical for understanding the structure and functionality of the application."
LLM provides a structured list of critical entry points and core files.

Scan 3: Core Structure and Relevant Directory Identification

CLI introspection: List directories at the first level, inspect directory names and structures.
Prompt to LLM: "Given the identified entry points and core files ({entryPoints}), suggest key directories that likely contain significant business logic, controllers, views, services, or components essential to the project."
LLM returns structured recommendations of directories for deeper analysis.

Scans 4 and beyond: Recursive Deepening Scans (Iterative Analysis)

CLI introspection: Use commands such as grep, cat, and file line-number extraction to inspect specific directories and files at a granular level.
Prompt to LLM (per iteration): "Based on previously identified critical directories ({directories}), select relevant files or additional subdirectories for detailed summarization. Provide detailed file-level and directory-level summaries for each selection. Use function names, their line numbers, and contextual code snippets to generate accurate summaries."
LLM iteratively deepens the analysis, providing structured and detailed summaries.
Each scan clearly logs progress in CLI output for transparency.

High-Level Workflow

Scan 1: Perform initial root scan, identify project type, and filter irrelevant files.
Scan 2: Identify and summarize main entry points and critical core files.
Scan 3: Construct an initial core structure summary, highlighting key directories and files.
Scans 4+: Perform iterative, asynchronous deeper scans based on previous results, dynamically refining file and directory selections.
JSON Output: Generate a comprehensive JSON summary with detailed file and directory-level descriptions.
Output structured JSON to the specified file location.
Clearly log each scan step in the CLI for transparency.

Logs would clearly show the the recursive LLM-based file discovery

[Scan #1]: Detected Node.js project. Entry points identified: index.js, App.jsx
[Scan #2]: Controllers and services directories selected for further inspection...
[Scan #3]: authController.js, userController.js summarized successfully.
[Scan #4]: authService.js identified as critical for authentication flow...
...
[Complete]: Project successfully scanned and summarized.
Output saved to project_scan.json

Key Elements

Command syntax:

task-master scan --output=project_scan.json

Example Comprehensive JSON summary format:

{
  "projectType": "Node.js + React",
  "entryPoints": {
    "index.js": {"path": "/index.js", "summary": "Main backend entry point initializing Express server, middleware, and routes."},
    "App.jsx": {"path": "/src/App.jsx", "summary": "Frontend root component handling React Router and authentication checks."}
  },
  "directories": {
    "controllers": {
      "path": "/src/controllers",
      "summary": "Handles incoming API requests and delegates to service layer.",
      "files": {
        "authController.js": {"path": "/src/controllers/authController.js", "summary": "Manages user authentication, JWT issuance, and verification."},
        "userController.js": {"path": "/src/controllers/userController.js", "summary": "Manages CRUD operations for user profiles and permissions."}
      }
    },
    "services": {
      "path": "/src/services",
      "summary": "Business logic layer, handling interactions between controllers and data persistence.",
      "files": {
        "authService.js": {"path": "/src/services/authService.js", "summary": "Encapsulates authentication logic, including token creation and validation."},
        "userService.js": {"path": "/src/services/userService.js", "summary": "Manages business rules for user data management."}
      }
    },
    "views": {
      "path": "/src/views",
      "summary": "React components responsible for rendering main user interfaces.",
      "files": {
        "loginView.jsx": {"path": "/src/views/loginView.jsx", "summary": "Login interface handling user authentication and input validation."},
        "dashboardView.jsx": {"path": "/src/views/dashboardView.jsx", "summary": "Displays user-specific dashboard, activity logs, and statistics."}
      }
    }
  },
  "description": "Comprehensive project summary detailing frontend and backend architecture, data flow, and main functional components."
}

Implementation Considerations

Leverage iterative, context-aware LLM prompts for intelligent file and directory selection.
Ensure asynchronous operation for efficient handling of large codebases.
Robust logging for transparency and user feedback.
Thorough handling of diverse project types and structures.

Out of Scope (Future Considerations)

GUI visualization of the scanning process.
Direct integration with PRD/task generation commands (handled separately).

*Originally created by @eyaltoledano on 4/1/2025* > "When installing Task Master on existing projects, I struggle to quickly generate meaningful tasks because there's no efficient way to summarize and describe the existing codebase accurately. A command that intelligently scans and summarizes the existing project structure and code would significantly streamline initial setup." --- ### Motivation New users adopting Task Master on existing projects face difficulties accurately capturing the current state and complexity of their projects. Without a precise understanding and summary of the existing codebase, generating meaningful and accurate tasks from scratch becomes challenging. A recursive, intelligent, and AI-driven scan command will empower users to quickly and reliably produce comprehensive project summaries, facilitating accurate task and PRD generation, reducing initial friction, and significantly improving user onboarding. --- ### Proposed Solution Implement an intelligent CLI command: - `task-master scan [--output=<file_path>]` This command performs a recursive, AI-driven analysis of the project's codebase in iterative, transparent steps, resulting in a structured JSON summary with clear file-level and directory-level details. ### Detailed Scanning Steps and AI Prompts **Scan 1: Root Directory Scan (Project Type Identification)** - CLI introspection: List root-level files and directories (using `ls`). - Prompt to LLM: "Given this root directory and files (package.json, Dockerfile, index.js, .gitignore, etc.), identify the type of project (e.g., Node.js, React, Laravel, Python). Clearly specify which files or folders should be excluded from further analysis due to irrelevance (e.g., logs, binaries)." - LLM identifies project type and irrelevant files/directories. **Scan 2: Entry Point and Core Files Identification** - CLI introspection: Inspect top-level files using `cat` or `head` to determine key entry points. - Prompt to LLM: "Based on the project type identified ({projectType}), identify the main entry points and core files critical for understanding the structure and functionality of the application." - LLM provides a structured list of critical entry points and core files. **Scan 3: Core Structure and Relevant Directory Identification** - CLI introspection: List directories at the first level, inspect directory names and structures. - Prompt to LLM: "Given the identified entry points and core files ({entryPoints}), suggest key directories that likely contain significant business logic, controllers, views, services, or components essential to the project." - LLM returns structured recommendations of directories for deeper analysis. **Scans 4 and beyond: Recursive Deepening Scans (Iterative Analysis)** - CLI introspection: Use commands such as `grep`, `cat`, and file line-number extraction to inspect specific directories and files at a granular level. - Prompt to LLM (per iteration): "Based on previously identified critical directories ({directories}), select relevant files or additional subdirectories for detailed summarization. Provide detailed file-level and directory-level summaries for each selection. Use function names, their line numbers, and contextual code snippets to generate accurate summaries." - LLM iteratively deepens the analysis, providing structured and detailed summaries. - Each scan clearly logs progress in CLI output for transparency. ### High-Level Workflow 1. **Scan 1:** Perform initial root scan, identify project type, and filter irrelevant files. 2. **Scan 2:** Identify and summarize main entry points and critical core files. 3. **Scan 3:** Construct an initial core structure summary, highlighting key directories and files. 4. **Scans 4+:** Perform iterative, asynchronous deeper scans based on previous results, dynamically refining file and directory selections. 5. **JSON Output:** Generate a comprehensive JSON summary with detailed file and directory-level descriptions. 6. Output structured JSON to the specified file location. 7. Clearly log each scan step in the CLI for transparency. Logs would clearly show the the recursive LLM-based file discovery ``` [Scan #1]: Detected Node.js project. Entry points identified: index.js, App.jsx [Scan #2]: Controllers and services directories selected for further inspection... [Scan #3]: authController.js, userController.js summarized successfully. [Scan #4]: authService.js identified as critical for authentication flow... ... [Complete]: Project successfully scanned and summarized. Output saved to project_scan.json ``` --- ### Key Elements - Command syntax: ```shell task-master scan --output=project_scan.json ``` ### Example Comprehensive JSON summary format: ```json { "projectType": "Node.js + React", "entryPoints": { "index.js": {"path": "/index.js", "summary": "Main backend entry point initializing Express server, middleware, and routes."}, "App.jsx": {"path": "/src/App.jsx", "summary": "Frontend root component handling React Router and authentication checks."} }, "directories": { "controllers": { "path": "/src/controllers", "summary": "Handles incoming API requests and delegates to service layer.", "files": { "authController.js": {"path": "/src/controllers/authController.js", "summary": "Manages user authentication, JWT issuance, and verification."}, "userController.js": {"path": "/src/controllers/userController.js", "summary": "Manages CRUD operations for user profiles and permissions."} } }, "services": { "path": "/src/services", "summary": "Business logic layer, handling interactions between controllers and data persistence.", "files": { "authService.js": {"path": "/src/services/authService.js", "summary": "Encapsulates authentication logic, including token creation and validation."}, "userService.js": {"path": "/src/services/userService.js", "summary": "Manages business rules for user data management."} } }, "views": { "path": "/src/views", "summary": "React components responsible for rendering main user interfaces.", "files": { "loginView.jsx": {"path": "/src/views/loginView.jsx", "summary": "Login interface handling user authentication and input validation."}, "dashboardView.jsx": {"path": "/src/views/dashboardView.jsx", "summary": "Displays user-specific dashboard, activity logs, and statistics."} } } }, "description": "Comprehensive project summary detailing frontend and backend architecture, data flow, and main functional components." } ``` --- ### Implementation Considerations - Leverage iterative, context-aware LLM prompts for intelligent file and directory selection. - Ensure asynchronous operation for efficient handling of large codebases. - Robust logging for transparency and user feedback. - Thorough handling of diverse project types and structures. --- ### Out of Scope (Future Considerations) - GUI visualization of the scanning process. - Direct integration with PRD/task generation commands (handled separately).