From Free Analyzer to Paid Build: The Engineering Behind Lab 6

The "free consultation" model is broken. An agency books a 30-minute call, spends the first 15 minutes understanding the prospect's business, and the last 15 minutes pitching. The prospect leaves with a vague sense of what the agency does. The agency leaves with a calendar full of unqualified conversations. Neither side has evidence. The prospect has no proof the agency can deliver. The agency has no signal that the prospect is serious. Lab 6 replaces this entire ritual with engineering: a free website diagnostic that produces a detailed, evidence-based report showing exactly what is wrong, exactly what to fix first, and exactly what the full build would involve. The prospect gets real value before spending a dollar. The agency gets a qualified lead who has already seen the depth of analysis behind the work.

Why Free Consultations Fail as Qualification

A consultation call is a human-intensive, unscalable process that produces no durable artifact. The agency representative runs through a checklist of questions. The prospect gives polished answers. Nobody learns anything actionable. The call ends with "we'll send you a proposal," and the proposal is a generic document with the prospect's name swapped into the header.

The structural problem is information asymmetry. The prospect cannot evaluate the agency's technical depth from a conversation. The agency cannot evaluate the prospect's urgency from polite answers. Both sides are performing rather than analyzing.

Lab 6 eliminates the performance. Instead of talking about what we could do, we show what we found. The prospect submits a URL. Within minutes, they receive a comprehensive diagnostic that reveals problems they did not know they had, quantified with scores, illustrated with screenshots, and prioritized by business impact. The diagnostic itself is the sales pitch. Not because it sells, but because it demonstrates.

The best qualification signal is not what a prospect says on a call. It is what they do after receiving evidence. If someone reads a detailed diagnostic showing 14 issues with their site and books a call to discuss the fix, they are not tire-kicking. They are buying.

The Diagnostic Pipeline: From URL to Report

The Lab 6 diagnostic pipeline has five stages, each producing structured data that feeds the next. The entire pipeline runs without human intervention, which means it scales to hundreds of analyses per day without adding headcount.

Stage 1: URL Intake and Validation

The pipeline starts with a single input: a URL. Before anything else, the system validates that the URL is reachable, resolves correctly (following redirects), returns a 200 status code, and serves actual HTML content rather than a login wall or CAPTCHA. URLs that fail validation receive an immediate error response explaining the issue.

interface AnalysisRequest {
  url: string;
  submitted_at: string;
  idempotency_key: string;  // prevents duplicate analyses for same URL
  source: 'website' | 'api' | 'internal';
}

async function validateUrl(url: string): Promise<ValidationResult> {
  const parsed = new URL(url);
  if (!['http:', 'https:'].includes(parsed.protocol)) {
    return { valid: false, reason: 'Protocol must be HTTP or HTTPS' };
  }

  const response = await fetch(url, {
    method: 'HEAD',
    redirect: 'follow',
    signal: AbortSignal.timeout(10_000),
  });

  if (!response.ok) {
    return { valid: false, reason: `Server returned ${response.status}` };
  }

  const contentType = response.headers.get('content-type') ?? '';
  if (!contentType.includes('text/html')) {
    return { valid: false, reason: 'URL does not serve HTML content' };
  }

  return { valid: true, resolved_url: response.url };
}

The idempotency key is critical. If the same URL is submitted twice within 24 hours, the system returns the cached analysis instead of running a new one. This prevents abuse, reduces compute costs, and ensures consistent results for the same input.

Stage 2: Headless Scraping with Puppeteer

Once the URL passes validation, Puppeteer launches a headless Chromium instance and loads the page with full JavaScript execution. This is essential because modern websites are largely JavaScript-rendered. A simple HTTP fetch would see an empty shell. Puppeteer sees what the user sees.

The scraper collects four categories of data:

DOM Structure: Full HTML after JavaScript execution, heading hierarchy (h1 through h6), link inventory, image inventory with alt text presence, form structure, and interactive element count.
Meta Information: Title tag, meta description, Open Graph tags, canonical URL, robots directives, structured data (JSON-LD, microdata), and favicon presence.
Performance Metrics: Time to first byte (TTFB), First Contentful Paint (FCP), Largest Contentful Paint (LCP), Cumulative Layout Shift (CLS), Total Blocking Time (TBT), and total page weight broken down by resource type (HTML, CSS, JS, images, fonts).
Visual Capture: Full-page screenshot at 1440px viewport width and a mobile screenshot at 375px viewport width. These screenshots are included in the final report so the prospect can see exactly what the system analyzed.

async function scrapePage(url: string): Promise<ScrapeResult> {
  const browser = await puppeteer.launch({
    headless: true,
    args: ['--no-sandbox', '--disable-setuid-sandbox'],
  });

  const page = await browser.newPage();
  await page.setViewport({ width: 1440, height: 900 });

  // Collect performance metrics via CDP
  const client = await page.target().createCDPSession();
  await client.send('Performance.enable');

  const response = await page.goto(url, {
    waitUntil: 'networkidle0',
    timeout: 30_000,
  });

  // Extract DOM data
  const domData = await page.evaluate(() => ({
    title: document.title,
    metaDescription: document.querySelector('meta[name="description"]')?.getAttribute('content'),
    h1Count: document.querySelectorAll('h1').length,
    headingHierarchy: Array.from(document.querySelectorAll('h1,h2,h3,h4,h5,h6'))
      .map(h => ({ level: parseInt(h.tagName[1]), text: h.textContent?.trim() })),
    imageCount: document.querySelectorAll('img').length,
    imagesWithoutAlt: document.querySelectorAll('img:not([alt]), img[alt=""]').length,
    linkCount: document.querySelectorAll('a[href]').length,
    formCount: document.querySelectorAll('form').length,
    wordCount: document.body?.innerText.split(/\s+/).length ?? 0,
  }));

  // Capture screenshots
  const desktopScreenshot = await page.screenshot({ fullPage: true, type: 'webp' });
  await page.setViewport({ width: 375, height: 812 });
  await page.reload({ waitUntil: 'networkidle0' });
  const mobileScreenshot = await page.screenshot({ fullPage: true, type: 'webp' });

  // Collect Web Vitals via PerformanceObserver
  const vitals = await page.evaluate(() => {
    return JSON.parse(document.querySelector('#__perf_data')?.textContent ?? '{}');
  });

  await browser.close();

  return { domData, desktopScreenshot, mobileScreenshot, vitals, statusCode: response?.status() };
}

Running Puppeteer in a serverless environment (Vercel Functions, AWS Lambda) requires careful resource management. The browser instance must launch, execute, and close within the function's timeout. We use a connection pool pattern for high-traffic periods, reusing browser instances across invocations within the same warm container.

Stage 3: Nine-Category Scoring

The raw scrape data feeds into a scoring engine that evaluates the site across nine categories. Each category has its own rubric with weighted criteria, producing a score from 0 to 100. The overall site score is a weighted average of all nine categories.

UX Design (15%): Visual hierarchy, whitespace usage, typography consistency, color contrast ratios, interactive element sizing, responsive behavior.
Messaging Clarity (15%): Headline specificity, value proposition clarity, benefit-oriented copy, jargon density, reading level, call-to-action strength.
Trust Signals (12%): Testimonials, case studies, certifications, team photos, physical address, privacy policy, SSL certificate, professional imagery.
Performance (12%): Core Web Vitals (LCP, CLS, TBT), total page weight, resource count, caching headers, compression.
Mobile Experience (10%): Responsive layout, touch target sizing, viewport configuration, font legibility, horizontal scroll absence.
SEO Foundations (10%): Title tag, meta description, heading hierarchy, image alt text, canonical URLs, structured data, robots.txt, sitemap.
Accessibility (10%): ARIA landmarks, color contrast (WCAG AA), keyboard navigation, form labels, skip navigation, alt text coverage.
Conversion Architecture (10%): CTA placement and frequency, form friction (field count), social proof proximity to CTAs, urgency elements, above-the-fold value proposition.
Intelligence Readiness (6%): Analytics integration, conversion tracking, pixel installation, event tracking, UTM parameter handling, cookie consent.

Each category score is computed by a dedicated scoring function that takes the scrape data and returns a structured result with the score, individual criteria evaluations, and human-readable explanations for each finding.

Stage 4: AI Analysis with Structured Prompts

The scoring engine produces quantitative data. The AI analysis layer produces qualitative insight. For each of the nine categories, a structured prompt sends the category's raw findings to a language model with a category-specific rubric. The model returns a structured JSON response containing findings (what is wrong), quick wins (changes that take under an hour and have high impact), and strategic recommendations (larger changes that require design or development effort).

interface CategoryAnalysis {
  category: string;
  score: number;
  findings: Array<{
    severity: 'critical' | 'warning' | 'info';
    title: string;
    description: string;
    evidence: string;       // specific element or metric that triggered the finding
    business_impact: string; // why this matters for revenue or conversion
  }>;
  quick_wins: Array<{
    title: string;
    effort: 'minutes' | 'hours';
    expected_impact: 'low' | 'medium' | 'high';
    implementation: string; // specific steps to fix
  }>;
  strategic_recommendations: Array<{
    title: string;
    rationale: string;
    scope: string;
  }>;
}

async function analyzeCategory(
  category: string,
  scrapeData: ScrapeResult,
  scoreResult: CategoryScore
): Promise<CategoryAnalysis> {
  const prompt = buildCategoryPrompt(category, scrapeData, scoreResult);

  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'system', content: CATEGORY_RUBRICS[category] }, { role: 'user', content: prompt }],
    response_format: { type: 'json_object' },
    max_tokens: 2000,
    temperature: 0.3,  // low temperature for consistent, analytical output
  });

  return JSON.parse(response.choices[0].message.content!) as CategoryAnalysis;
}

The temperature: 0.3 setting is deliberate. Diagnostic analysis should be consistent and analytical, not creative. Two analyses of the same site should produce substantially similar findings. Higher temperatures introduce variance that would undermine the report's credibility.

Stage 5: Report Generation

The final stage assembles all scoring data, AI analysis, and screenshots into a polished diagnostic report. The report structure follows a specific narrative arc: overall score (the hook), critical findings (the pain), quick wins (the proof of actionability), and strategic recommendations (the bridge to the paid engagement).

Every report is stored with its full input data and analysis results so it can be regenerated if the report template changes. This is event sourcing applied to content generation: the report is a projection of the underlying data, never the primary artifact.

Cost Controls and Operational Guardrails

A free tool that costs $2.50 per analysis in API calls will bankrupt you at scale. Lab 6 uses three cost control mechanisms to keep per-analysis costs under $0.40.

Token budgets cap the maximum tokens consumed per category analysis. Each of the nine categories has a 2,000-token output limit. The system prompt is compressed and cached (OpenAI prompt caching reduces repeat system prompt costs by 50%). Total API cost per analysis stays below $0.30 even with GPT-4o.

Response caching stores completed analyses keyed by URL and a content hash of the scraped DOM. If the same URL is submitted and the site content has not materially changed (content hash matches), the cached analysis is returned instantly at zero marginal cost. Cache TTL is 24 hours.

Idempotency keys prevent duplicate processing. Every analysis request includes a client-generated idempotency key. The server checks the key before starting work. If the key already exists in the database, the existing result is returned. This handles network retries, double-clicks, and impatient users who submit the same URL multiple times.

async function getOrCreateAnalysis(
  request: AnalysisRequest
): Promise<AnalysisResult> {
  // Check idempotency key first
  const existing = await db.analyses.findByIdempotencyKey(request.idempotency_key);
  if (existing) return existing;

  // Check URL cache
  const cached = await db.analyses.findByUrl(request.url, { maxAge: '24h' });
  if (cached) {
    // Link this idempotency key to the cached result
    await db.analyses.linkIdempotencyKey(request.idempotency_key, cached.id);
    return cached;
  }

  // No cache hit. Run the full pipeline.
  const result = await runDiagnosticPipeline(request);
  await db.analyses.save(result, request.idempotency_key);
  return result;
}

The Conversion Funnel: Evidence Creates Urgency

The diagnostic report is not a lead magnet. It is a conversion mechanism with a specific psychological architecture.

The overall score creates context. A prospect who scores 42/100 now has a frame of reference. They know their site is underperforming, not because an agency told them, but because a systematic analysis measured it. The score is not opinion. It is evidence.

Critical findings create urgency. "Your largest contentful paint is 6.2 seconds on mobile. 53% of visitors leave before the page finishes loading." This is not a sales pitch. It is a measurement. The prospect cannot unsee it. Every day they do not fix it, they are losing traffic they already paid to acquire.

Quick wins create credibility. "Add a meta description to your homepage (15 minutes). Compress hero image from 2.4MB to 180KB (20 minutes). Add alt text to 14 product images (30 minutes)." These are specific, actionable, and free. The prospect can verify them independently. If the diagnostic is right about the quick wins (and it will be, because it is based on measurement rather than opinion), the prospect trusts the strategic recommendations that follow.

Strategic recommendations create the bridge. "Your conversion architecture scores 31/100. You have no above-the-fold value proposition, your primary CTA is below three scrolls, and your form has 11 required fields. A complete redesign of the conversion flow would address all three issues." The prospect now understands the scope of the real problem. The quick wins are band-aids. The paid build is the structural fix.

This approach out-converts a sales call because the prospect sees evidence, not promises. They do not have to trust the agency's claims. They can verify the findings themselves. By the time they book a call, the question has already shifted from "can you help?" to "when can you start?"

Technical Architecture: Serverless Diagnostics at Scale

The Lab 6 pipeline runs on Next.js API routes deployed to Vercel. The architecture splits into two function types: a fast intake route and a long-running analysis worker.

The intake route (/api/lab6/analyze) validates the URL, checks the idempotency key, checks the cache, and if no cached result exists, enqueues the analysis job and returns a job ID. Response time: under 200ms. The client polls a status endpoint with the job ID.

The analysis worker runs as a background function (Vercel's maxDuration: 60 configuration for Pro plans). It executes the full five-stage pipeline: URL validation, Puppeteer scraping, nine-category scoring, AI analysis, and report assembly. When complete, it writes the result to the database and marks the job as finished.

// /api/lab6/analyze (intake route)
export async function POST(request: Request) {
  const body = await request.json();
  const parsed = AnalysisRequestSchema.safeParse(body);

  if (!parsed.success) {
    return Response.json({ error: parsed.error.flatten() }, { status: 400 });
  }

  // Check for existing result
  const existing = await getOrCreateAnalysis(parsed.data);
  if (existing.status === 'complete') {
    return Response.json({ status: 'complete', result: existing });
  }

  // Enqueue for processing
  const jobId = await enqueueAnalysis(parsed.data);
  return Response.json({ status: 'processing', job_id: jobId }, { status: 202 });
}

// /api/lab6/status/[jobId] (polling endpoint)
export async function GET(
  request: Request,
  { params }: { params: { jobId: string } }
) {
  const job = await db.jobs.findById(params.jobId);
  if (!job) return Response.json({ error: 'Job not found' }, { status: 404 });

  if (job.status === 'complete') {
    return Response.json({ status: 'complete', result: job.result });
  }

  if (job.status === 'failed') {
    return Response.json({ status: 'failed', error: job.error }, { status: 500 });
  }

  return Response.json({
    status: 'processing',
    progress: job.progress,  // e.g., "Analyzing mobile experience..."
    started_at: job.started_at,
  });
}

Retry logic uses exponential backoff with jitter. If the Puppeteer scrape fails (timeout, network error, crash), the system retries up to three times with delays of 2s, 4s, and 8s plus random jitter. If all retries fail, the job is marked as failed with a specific error message, and the prospect receives a notification explaining that the site could not be analyzed (with common reasons: the site blocks automated browsers, requires authentication, or has an invalid SSL certificate).

Lab 6 is not a tool that happens to generate leads. It is a lead generation system that happens to be a tool. Every architectural decision, from the scoring categories to the report narrative arc to the cost controls, is designed around a single outcome: turning anonymous website visitors into qualified prospects who arrive at the sales conversation already convinced of the problem and the agency's ability to solve it.