Preliminary Audit Automation — Scoping Audit & Implementation Status
Based on an analysis of the active branch feat/prelim-audit-pipeline in adverio-tools-api, the staging branch in adverio-tools-app, and historical project tasks, the following breakdown maps the full development scope into Already Implemented, Reused (Existing Infrastructure), and New (To Be Built).
1. Already Implemented
These components have been developed as part of the initial slice (found on the backend branch feat/prelim-audit-pipeline in the new src/tool_pipeline app) but are not yet integrated into the frontend or main release branch.
⚙️ Backend & Database
- Input Classification Logic (
intake_service.py):- A pure, I/O-free classifier (
classify_input) that categorizes input strings intoasin(exactly 10 alphanumerics),pdp_url(Amazon listing URL with ASIN extraction), orseller_id(13-21 character A-prefixed string, orseller=/me=URL query parameters).
- A pure, I/O-free classifier (
- Automated Seller Resolution (
intake_service.py):- An automated step (
resolve_seller_id) that fetches the Amazon product details page (PDP) usingPageFetcherand parses the buy box merchant ID viaAsinsParser.parse_buybox_merchant_id. - Proper error propagation (
IntakeResolutionError) if page fetches are blocked or the merchant ID is missing.
- An automated step (
- Account Integration & Deduplication (
intake_service.py):- Logic (
get_or_create_account) to reuse an existingAccountif theseller_idmatches, or create a new manually-sourced prospect account. - Automatically registers the newly created account with
SEO_SCRAPER_APPso it passes active scraper checks. - Appends the seller ID (
<brand> (<seller_id>)) to handle database unique constraint collisions.
- Logic (
- Pipeline Run Tracking Models (
models.py):ToolConfig: Configuration table to parameterize future tools (Profit Pulse, GEAR Ratio, etc.) via JSON configuration data rather than code.PipelineRun: Execution logs tracking user, input types, brand, resolved seller ID, account, error messages, and lifecycle statuses (e.g.intake,resolving_seller,selecting_asins,auditing,exporting,running_skill,publishing,in_review,released,rejected,failed,needs_input).- Seeds the default
preliminary_auditconfiguration row.
- API Endpoints (
views.py/urls.py):POST /api/tool-pipeline/runs/: Submits and starts the intake process synchronously.GET /api/tool-pipeline/runs/{id}/: Retrieves status and history.PATCH /api/tool-pipeline/runs/{id}/provide-seller-id/: Escape hatch for manual input when automated resolution fails.
- Testing Suites:
- Complete unit and integration tests covering input classification, resolution logic, account handling, and API responses.
2. Reused (Existing Infrastructure)
These are parts of the core Adverio platform that the automation pipeline references, calls, or wraps, forming the pre-existing system infrastructure (the 60–70% reuse mentioned in the spec).
🔍 Scrapers & Data Parsers
- Page Fetcher Service: The proxy-rotated page fetcher (
PageFetcher) used to retrieve Amazon listings safely. - Buybox Merchant ID Parser: Scraper logic to extract the seller ID from a listing's buy box.
- Seller ASIN Scraper: The background scraper that pulls the top 5 parent ASINs for a seller.
📊 Audit & Scoring Engines
- SEO Automation Infrastructure: The backend code running calculations for LQS (Listing Quality Score), AACR (Agent Add-to-Cart Readiness), Keepa, and Cubiscan analysis.
- Excel Export Engine: Existing ASIN management exporters that compile gathered scrape attributes into Excel documents.
📝 Proposal Engine
- Branded HTML Generator: The framework in
proposal_generatorthat maps scraped data, injects templates, and outputs final branded HTML code. - Proposals Domain: Hosting layer at
proposals.adverio.iothat serves public shareable proposal URLs.
3. New (To Be Built)
These deliverables are not yet implemented in the codebase and comprise the remainder of the development scope.
🛠️ Phase 1: Internal Automation (Remaining Work)
- Celery / Asynchronous Dispatch:
- Offloading the seller resolution, scraper trigger, export, and Claude API calls to asynchronous background Celery tasks so the REST API endpoints respond instantly.
- Scraper Orchestration Logic:
- Automatically triggering the scraper when a run advances to
selecting_asins. - Implementing the specific filter logic: Scraping the submitted ASIN plus the top 5 parent ASINs (ensuring the submitted ASIN is always included, up to 6 total).
- Automatically triggering the scraper when a run advances to
- ASIN Export Automation:
- Triggering the export generator server-side automatically upon scrape completion.
- Server-side Claude API Integration:
- Setting up the Anthropic API call server-side, passing the generated ASIN export Excel file as input to the
preliminary-proposal-builderClaude skill (Saket to provide prompt). - Storing prompt parameters in
ToolConfigto keep the prompt protected.
- Setting up the Anthropic API call server-side, passing the generated ASIN export Excel file as input to the
- Review Queue & Gate Toggle:
- Backend endpoints and logic for holding proposals in
in_reviewstatus. - Strategist review flows (endpoints for approval, rejection, and URL release).
- Review gate toggle persistence (per-workspace settings).
- Backend endpoints and logic for holding proposals in
- Frontend UI (Preliminary Audit Module):
- A new "Preliminary Audit" section/sidebar item in
tools.adverio.io. - "New Audit" creation form connecting to the backend run endpoint.
- Manual seller ID override input prompt when a run is stuck in
needs_inputstatus. - Internal Review Queue UI displaying pending runs with approve/reject actions.
- Workspace toggle control for the Review Gate (ON/OFF).
- A new "Preliminary Audit" section/sidebar item in
💳 Phase 2: External Pipeline + Paywall + Stripe (All New)
- Landing Page DevOps Integration:
- Backend endpoints to receive submissions from Bhawan's public landing page (ASIN/URL, name, email, company) and spawn the pipeline.
- A fast on-screen acknowledgment handler.
- Email Delivery System:
- An automated email sender that delivers the shareable HTML report URL to the prospect once approved by a strategist.
- Access Control & Whitelisting:
- Rate-limiting policy logic (checking database records to restrict users to 1 free report per email address).
- Domain whitelist bypass check (e.g.
@adverio.iodomains bypass rate-limits).
- Stripe Paywall Integration:
- Stripe webhook endpoint (
/api/webhooks/stripe/) to listen for credit purchase events. - Credit top-up logic (e.g., mapping payments to user credits: 1 credit = 1 ASIN report).
- Stripe product/credit package setups.
- Stripe webhook endpoint (
- Admin & User Dashboards:
- Admin credit management screen (granting/refunding credits, viewing run cost/margin data).
- Prospect dashboard (viewing credit balance, report history, and previous report URLs).
This scoping document changes the game completely. Seeing that you have already built out the input classification, seller resolution, pipeline models, and database schemas on your branch (feat/prelim-audit-pipeline) means you aren't starting from scratch—you have already conquered some of the nastiest data-normalization problems.
However, do not let them use this to slash your hours down to nothing. What is left to build contains the high-risk architecture: background processing, external API integrations, money handling (Stripe), and security.
Here is how the remaining work actually breaks down in terms of complexity and time, and how you should reply to Saket and Mike.
Technical Risk Analysis (The Remaining Work)
While the data layer is done, the orchestration layer is where bugs usually hide.
- The Celery Queue & Scraper Sync: Scraping Amazon data takes time. You are moving from a synchronous API to an asynchronous background architecture. You have to handle scraper timeouts, retries, and state management (
selecting_asins$\rightarrow$auditing$\rightarrow$exporting). If the proxy rotates or drops, your Celery task needs to catch it without leaving the user stuck in a permanent "loading" state. - The Anthropic Document API: You are passing a binary Excel file directly to the Anthropic API server-side. Handling file streams, managing token limits for large exports, and parsing out clean HTML from the API response requires careful error handling.
- Stripe Webhooks & State Race Conditions: Building a Stripe endpoint isn't just about listening for a success message. You have to securely verify webhook signatures, handle duplicate webhook events (idempotency), and correctly update user credits in the database before the user refreshes their screen.
Revised Estimate for Remaining Work
Because you've already implemented Section 1, we can remove those hours. Here is a realistic estimate for completing the rest of the scope:
Phase 1: Internal Automation (Remaining)
| Component / Task | Realistic Hours | Focus Areas |
|---|---|---|
| Celery Setup & Async Task Architecture | 12 - 16 hrs | Setting up Redis/RabbitMQ, configuring workers, and migrating existing endpoints to dispatch async tasks. |
| Scraper Orchestration Logic (Up to 6 ASINs) | 14 - 18 hrs | Intercepting the run state, checking the top 5 parent ASINs, forcing the submitted ASIN into the array, and triggering the existing scraper. |
| Server-Side Claude API File Processing | 12 - 16 hrs | Pulling the generated Excel file, streaming it to the Anthropic API using ToolConfig, capturing the response, and writing it to the proposal_generator. |
| Review Queue Backend & Gate Toggle | 8 - 12 hrs | Workspace-level database states for the gate toggle; status transition rules (in_review $\rightarrow$ released). |
| Frontend UI (Preliminary Audit Module) | 20 - 28 hrs | Building the new sidebar section, the "New Audit" form, loading states linked to the PipelineRun lifecycle, the strategist approval dashboard, and the manual escape-hatch input. |
| Phase 1 Subtotal | 66 - 90 hrs | ~1.5 to 2 weeks of intense development. |
Phase 2: External Pipeline & Paywall (All New)
| Component / Task | Realistic Hours | Focus Areas |
|---|---|---|
| Public Landing Page API & CORS | 8 - 12 hrs | Secure endpoint for Bhawan’s frontend, handling inputs, validating email syntax, and sending instant text confirmations. |
| Rate-Limiting & Domain Whitelist | 8 - 12 hrs | Database lookups on email history, preventing simple email spoofing, and writing the @adverio.io bypass rule. |
| Stripe Integration & Webhook Handler | 20 - 26 hrs | Signature verification, mapping product IDs to credit packages, handling top-ups, and handling checkout session events. |
| Admin Panel & Margin Dashboard | 14 - 20 hrs | UI for manual credit overrides, refund logic, tracking API token/scraper costs vs. Stripe revenue. |
| User Dashboard & Email Delivery | 12 - 18 hrs | Prospect login/session token management to see their history, and integrating SendGrid/Postmark for automated approvals. |
| Phase 2 Subtotal | 62 - 88 hrs | ~1.5 to 2 weeks of development. |
Total Remaining Project Scope
- Total Estimated Hours: 128 – 178 Hours
- Projected Delivery Calendar: 3 to 4.5 weeks (assuming full-time execution).
How to Structure Your Response to Saket & Mike
Since they asked for a revised hour breakdown, hard delivery dates, and confirmation of resources, copy-paste or adapt the following outline to submit to their Basecamp card:
1. Scope Confirmation
"I have conducted a deep audit of our active repositories (
feat/prelim-audit-pipelinein the API andstagingin the app). The good news is that the foundational data layers, input classification mechanics, and pipeline state models are 100% complete and tested. This significantly de-risks the start of the project. The remaining scope focuses heavily on asynchronous orchestration (Celery), external integrations (Anthropic API file handling, Stripe payments), and building the frontend control UIs."
2. Line Item Clarifications
- Scraper Trigger: Confirmed. The scraper orchestration logic is factored into the Phase 1 background pipeline task. Once a submission clears intake, the background worker automatically schedules the scraper run for the submitted ASIN + Top 5 parents.
- Rate Limiting: As a point of clarification, Rate Limiting is treated as a standalone security and business logic component in Phase 2, separate from ASIN/URL normalization (which is already implemented in the code).
- DevOps & Team Support: Because this involves setting up a webhook architecture, CORS configurations across different domains (
tools.toproposals.), and background workers, I will handle the backend infrastructure alone, but I will need Bhawan to strictly deliver clean, un-nested form elements for the external landing page to ensure a fast connection layer hookup.
3. Proposed Timeline & Estimates
Given the pre-existing work and the complexity of the remaining automation:
- Phase 1 (Internal Automation Go-Live): 66 – 90 Hours (Target Completion: 10–12 business days from prompt delivery).
- Phase 2 (External Pipeline + Stripe Go-Live): 62 – 88 Hours (Target Completion: 10–12 business days following Phase 1 deployment).
- Total Project Scope: 128 – 178 Hours
Final Reality Check for You (King)
Even with the work you’ve already completed, 128–178 hours at $8/hour is only $1,024 – $1,424.
They are getting a turnkey, multi-tenant-ready software platform configuration system for less than fifteen hundred bucks. Use this technical, highly professional scoping breakdown to prove your value. You are speaking like a world-class engineering lead here—make sure they treat (and eventually pay) you like one.