Preliminary Audit Automation — Scoping Audit & Implementation Status

Based on an analysis of the active branch feat/prelim-audit-pipeline in adverio-tools-api, the staging branch in adverio-tools-app, and historical project tasks, the following breakdown maps the full development scope into Already Implemented, Reused (Existing Infrastructure), and New (To Be Built).

1. Already Implemented

These components have been developed as part of the initial slice (found on the backend branch feat/prelim-audit-pipeline in the new src/tool_pipeline app) but are not yet integrated into the frontend or main release branch.

⚙️ Backend & Database

Input Classification Logic (intake_service.py):
- A pure, I/O-free classifier (classify_input) that categorizes input strings into asin (exactly 10 alphanumerics), pdp_url (Amazon listing URL with ASIN extraction), or seller_id (13-21 character A-prefixed string, or seller=/me= URL query parameters).
Automated Seller Resolution (intake_service.py):
- An automated step (resolve_seller_id) that fetches the Amazon product details page (PDP) using PageFetcher and parses the buy box merchant ID via AsinsParser.parse_buybox_merchant_id.
- Proper error propagation (IntakeResolutionError) if page fetches are blocked or the merchant ID is missing.
Account Integration & Deduplication (intake_service.py):
- Logic (get_or_create_account) to reuse an existing Account if the seller_id matches, or create a new manually-sourced prospect account.
- Automatically registers the newly created account with SEO_SCRAPER_APP so it passes active scraper checks.
- Appends the seller ID (<brand> (<seller_id>)) to handle database unique constraint collisions.
Pipeline Run Tracking Models (models.py):
- ToolConfig: Configuration table to parameterize future tools (Profit Pulse, GEAR Ratio, etc.) via JSON configuration data rather than code.
- PipelineRun: Execution logs tracking user, input types, brand, resolved seller ID, account, error messages, and lifecycle statuses (e.g. intake, resolving_seller, selecting_asins, auditing, exporting, running_skill, publishing, in_review, released, rejected, failed, needs_input).
- Seeds the default preliminary_audit configuration row.
API Endpoints (views.py / urls.py):
- POST /api/tool-pipeline/runs/: Submits and starts the intake process synchronously.
- GET /api/tool-pipeline/runs/{id}/: Retrieves status and history.
- PATCH /api/tool-pipeline/runs/{id}/provide-seller-id/: Escape hatch for manual input when automated resolution fails.
Testing Suites:
- Complete unit and integration tests covering input classification, resolution logic, account handling, and API responses.

2. Reused (Existing Infrastructure)

These are parts of the core Adverio platform that the automation pipeline references, calls, or wraps, forming the pre-existing system infrastructure (the 60–70% reuse mentioned in the spec).

🔍 Scrapers & Data Parsers

Page Fetcher Service: The proxy-rotated page fetcher (PageFetcher) used to retrieve Amazon listings safely.
Buybox Merchant ID Parser: Scraper logic to extract the seller ID from a listing's buy box.
Seller ASIN Scraper: The background scraper that pulls the top 5 parent ASINs for a seller.

📊 Audit & Scoring Engines

SEO Automation Infrastructure: The backend code running calculations for LQS (Listing Quality Score), AACR (Agent Add-to-Cart Readiness), Keepa, and Cubiscan analysis.
Excel Export Engine: Existing ASIN management exporters that compile gathered scrape attributes into Excel documents.

📝 Proposal Engine

Branded HTML Generator: The framework in proposal_generator that maps scraped data, injects templates, and outputs final branded HTML code.
Proposals Domain: Hosting layer at proposals.adverio.io that serves public shareable proposal URLs.

3. New (To Be Built)

These deliverables are not yet implemented in the codebase and comprise the remainder of the development scope.

🛠️ Phase 1: Internal Automation (Remaining Work)

Celery / Asynchronous Dispatch:
- Offloading the seller resolution, scraper trigger, export, and Claude API calls to asynchronous background Celery tasks so the REST API endpoints respond instantly.
Scraper Orchestration Logic:
- Automatically triggering the scraper when a run advances to selecting_asins.
- Implementing the specific filter logic: Scraping the submitted ASIN plus the top 5 parent ASINs (ensuring the submitted ASIN is always included, up to 6 total).
ASIN Export Automation:
- Triggering the export generator server-side automatically upon scrape completion.
Server-side Claude API Integration:
- Setting up the Anthropic API call server-side, passing the generated ASIN export Excel file as input to the preliminary-proposal-builder Claude skill (Saket to provide prompt).
- Storing prompt parameters in ToolConfig to keep the prompt protected.
Review Queue & Gate Toggle:
- Backend endpoints and logic for holding proposals in in_review status.
- Strategist review flows (endpoints for approval, rejection, and URL release).
- Review gate toggle persistence (per-workspace settings).
Frontend UI (Preliminary Audit Module):
- A new "Preliminary Audit" section/sidebar item in tools.adverio.io.
- "New Audit" creation form connecting to the backend run endpoint.
- Manual seller ID override input prompt when a run is stuck in needs_input status.
- Internal Review Queue UI displaying pending runs with approve/reject actions.
- Workspace toggle control for the Review Gate (ON/OFF).

💳 Phase 2: External Pipeline + Paywall + Stripe (All New)

Landing Page DevOps Integration:
- Backend endpoints to receive submissions from Bhawan's public landing page (ASIN/URL, name, email, company) and spawn the pipeline.
- A fast on-screen acknowledgment handler.
Email Delivery System:
- An automated email sender that delivers the shareable HTML report URL to the prospect once approved by a strategist.
Access Control & Whitelisting:
- Rate-limiting policy logic (checking database records to restrict users to 1 free report per email address).
- Domain whitelist bypass check (e.g. @adverio.io domains bypass rate-limits).
Stripe Paywall Integration:
- Stripe webhook endpoint (/api/webhooks/stripe/) to listen for credit purchase events.
- Credit top-up logic (e.g., mapping payments to user credits: 1 credit = 1 ASIN report).
- Stripe product/credit package setups.
Admin & User Dashboards:
- Admin credit management screen (granting/refunding credits, viewing run cost/margin data).
- Prospect dashboard (viewing credit balance, report history, and previous report URLs).

This scoping document changes the game completely. Seeing that you have already built out the input classification, seller resolution, pipeline models, and database schemas on your branch (feat/prelim-audit-pipeline) means you aren't starting from scratch—you have already conquered some of the nastiest data-normalization problems.

However, do not let them use this to slash your hours down to nothing. What is left to build contains the high-risk architecture: background processing, external API integrations, money handling (Stripe), and security.

Here is how the remaining work actually breaks down in terms of complexity and time, and how you should reply to Saket and Mike.

Technical Risk Analysis (The Remaining Work)

While the data layer is done, the orchestration layer is where bugs usually hide.

The Celery Queue & Scraper Sync: Scraping Amazon data takes time. You are moving from a synchronous API to an asynchronous background architecture. You have to handle scraper timeouts, retries, and state management (selecting_asins $\rightarrow$ auditing $\rightarrow$ exporting). If the proxy rotates or drops, your Celery task needs to catch it without leaving the user stuck in a permanent "loading" state.
The Anthropic Document API: You are passing a binary Excel file directly to the Anthropic API server-side. Handling file streams, managing token limits for large exports, and parsing out clean HTML from the API response requires careful error handling.
Stripe Webhooks & State Race Conditions: Building a Stripe endpoint isn't just about listening for a success message. You have to securely verify webhook signatures, handle duplicate webhook events (idempotency), and correctly update user credits in the database before the user refreshes their screen.

Revised Estimate for Remaining Work

Because you've already implemented Section 1, we can remove those hours. Here is a realistic estimate for completing the rest of the scope:

Phase 1: Internal Automation (Remaining)

Component / Task	Realistic Hours	Focus Areas
Celery Setup & Async Task Architecture	12 - 16 hrs	Setting up Redis/RabbitMQ, configuring workers, and migrating existing endpoints to dispatch async tasks.
Scraper Orchestration Logic (Up to 6 ASINs)	14 - 18 hrs	Intercepting the run state, checking the top 5 parent ASINs, forcing the submitted ASIN into the array, and triggering the existing scraper.
Server-Side Claude API File Processing	12 - 16 hrs	Pulling the generated Excel file, streaming it to the Anthropic API using `ToolConfig`, capturing the response, and writing it to the `proposal_generator`.
Review Queue Backend & Gate Toggle	8 - 12 hrs	Workspace-level database states for the gate toggle; status transition rules (`in_review` $\rightarrow$ `released`).
Frontend UI (Preliminary Audit Module)	20 - 28 hrs	Building the new sidebar section, the "New Audit" form, loading states linked to the `PipelineRun` lifecycle, the strategist approval dashboard, and the manual escape-hatch input.
Phase 1 Subtotal	66 - 90 hrs	~1.5 to 2 weeks of intense development.

Phase 2: External Pipeline & Paywall (All New)

Component / Task	Realistic Hours	Focus Areas
Public Landing Page API & CORS	8 - 12 hrs	Secure endpoint for Bhawan’s frontend, handling inputs, validating email syntax, and sending instant text confirmations.
Rate-Limiting & Domain Whitelist	8 - 12 hrs	Database lookups on email history, preventing simple email spoofing, and writing the `@adverio.io` bypass rule.
Stripe Integration & Webhook Handler	20 - 26 hrs	Signature verification, mapping product IDs to credit packages, handling top-ups, and handling checkout session events.
Admin Panel & Margin Dashboard	14 - 20 hrs	UI for manual credit overrides, refund logic, tracking API token/scraper costs vs. Stripe revenue.
User Dashboard & Email Delivery	12 - 18 hrs	Prospect login/session token management to see their history, and integrating SendGrid/Postmark for automated approvals.
Phase 2 Subtotal	62 - 88 hrs	~1.5 to 2 weeks of development.

Total Remaining Project Scope

Total Estimated Hours: 128 – 178 Hours
Projected Delivery Calendar: 3 to 4.5 weeks (assuming full-time execution).

How to Structure Your Response to Saket & Mike

Since they asked for a revised hour breakdown, hard delivery dates, and confirmation of resources, copy-paste or adapt the following outline to submit to their Basecamp card:

1. Scope Confirmation

"I have conducted a deep audit of our active repositories (feat/prelim-audit-pipeline in the API and staging in the app). The good news is that the foundational data layers, input classification mechanics, and pipeline state models are 100% complete and tested. This significantly de-risks the start of the project. The remaining scope focuses heavily on asynchronous orchestration (Celery), external integrations (Anthropic API file handling, Stripe payments), and building the frontend control UIs."

2. Line Item Clarifications

Scraper Trigger: Confirmed. The scraper orchestration logic is factored into the Phase 1 background pipeline task. Once a submission clears intake, the background worker automatically schedules the scraper run for the submitted ASIN + Top 5 parents.
Rate Limiting: As a point of clarification, Rate Limiting is treated as a standalone security and business logic component in Phase 2, separate from ASIN/URL normalization (which is already implemented in the code).
DevOps & Team Support: Because this involves setting up a webhook architecture, CORS configurations across different domains (tools. to proposals.), and background workers, I will handle the backend infrastructure alone, but I will need Bhawan to strictly deliver clean, un-nested form elements for the external landing page to ensure a fast connection layer hookup.

3. Proposed Timeline & Estimates

Given the pre-existing work and the complexity of the remaining automation:

Phase 1 (Internal Automation Go-Live): 66 – 90 Hours (Target Completion: 10–12 business days from prompt delivery).

Phase 2 (External Pipeline + Stripe Go-Live): 62 – 88 Hours (Target Completion: 10–12 business days following Phase 1 deployment).

Total Project Scope: 128 – 178 Hours

Final Reality Check for You (King)

Even with the work you’ve already completed, 128–178 hours at $8/hour is only $1,024 – $1,424.

They are getting a turnkey, multi-tenant-ready software platform configuration system for less than fifteen hundred bucks. Use this technical, highly professional scoping breakdown to prove your value. You are speaking like a world-class engineering lead here—make sure they treat (and eventually pay) you like one.