Snapshots

Snapshots are pre-rendered copies of your pages that PerfectSearch serves to search engine bots and AI crawlers instead of raw JavaScript. Each snapshot captures the fully rendered DOM of a page using Playwright, then stores it as both HTML and Markdown. When a bot requests a page, PerfectSearch serves the cached snapshot instantly, ensuring your content is always visible and indexable.

What are snapshots?

Snapshots are pre-rendered copies of your web pages created by a headless Playwright browser. Instead of serving raw JavaScript to bots — which many crawlers cannot execute — PerfectSearch serves these static snapshots so that every word of your content is immediately visible. Snapshots are stored in both HTML and Markdown formats.

When PerfectSearch renders a snapshot, it loads your page in a full Chromium browser, waits for JavaScript to execute and the DOM to settle, then captures the final rendered HTML. It also generates a clean Markdown version of the page by extracting the text content, headings, links, and images. Both versions are stored in the database and served to the appropriate type of bot.

Snapshots solve the fundamental problem of JavaScript SEO: search engine bots and AI crawlers often see a blank page or partial content when they visit a JavaScript-heavy site. By serving pre-rendered snapshots, PerfectSearch ensures that bots see the same complete content that your human visitors see, without requiring the bot to execute any JavaScript.

What snapshot formats are available?

PerfectSearch stores each snapshot in three formats: HTML, Markdown, and Markdown LLM. The HTML format is a complete rendering of the page DOM. Markdown is a clean text extraction. Markdown LLM is an optimized variant with additional structural metadata for AI ingestion. PerfectSearch auto-detects the bot type and serves the right format.

  • HTML — The full rendered DOM of the page, including all elements, attributes, and inline styles. Served to traditional search engine crawlers like Googlebot, Bingbot, and Yandex. This is what search engines use to index your content, extract structured data, and evaluate page quality.
  • Markdown — A clean text extraction of the page content with headings, paragraphs, lists, links, and images preserved in standard Markdown syntax. Strips navigation, footers, ads, and other non-content elements. Smaller payload size and faster to parse than HTML.
  • Markdown LLM — An enhanced Markdown variant that includes additional structural information optimized for large language model ingestion. Adds section boundaries, semantic role annotations, and metadata headers. Served to AI retrieval crawlers like ChatGPT-User, ClaudeBot, and PerplexityBot to help the AI better understand and cite your content.

PerfectSearch automatically detects the type of bot from its User-Agent string and serves the appropriate format. Search engine bots receive HTML. AI and LLM bots receive Markdown LLM. You can override this behavior per site in the dashboard settings if you prefer a specific format for all bots.

How does the snapshot lifecycle work?

Every snapshot has one of three states: fresh, stale, or missing. Fresh snapshots are within their cache TTL and served immediately. Stale snapshots have exceeded their TTL but are still served while a background re-render is queued. Missing snapshots have no cached version, so PerfectSearch returns a 404 and queues the page for rendering.

  • Fresh — The snapshot was rendered within the configured cache TTL. It is served immediately with no additional work. The response header x-perfectsearch: hit indicates a fresh cache hit.
  • Stale — The snapshot exists but was rendered before the cache TTL expired. PerfectSearch still serves the stale content to avoid a blank response, but simultaneously queues a background re-render job. The next request after the re-render completes will receive the fresh version. The response header shows x-perfectsearch: stale.
  • Missing — No snapshot exists for this URL. The Snapshot API returns a 404 response and queues a render job. The bot does not receive content on this request, but the page will be rendered and cached for subsequent visits. This typically happens on the first bot visit to a new page.

This lifecycle follows a stale-while-revalidate pattern: bots always get a response as quickly as possible (even if stale), while PerfectSearch works in the background to keep snapshots fresh. The only time a bot receives no content is the very first visit before any snapshot has been created.

How do I configure cache TTL?

The cache TTL determines how long a snapshot is considered fresh before it becomes stale and triggers a background re-render. The default TTL is 24 hours, which works well for most sites. You can configure the TTL per site in the dashboard under Settings, with values ranging from 1 hour to 30 days.

Choosing the right TTL depends on how frequently your content changes:

  • 1–6 hours — For sites with rapidly-changing content like news, stock prices, or live event pages. Lower TTLs mean more re-renders and higher usage.
  • 24 hours (default) — A balanced choice for most sites. Content is re-rendered daily, keeping snapshots reasonably fresh without excessive render volume.
  • 7–30 days — For sites with mostly static content like documentation, portfolios, or archived pages. Minimizes render usage.

Remember that stale snapshots are still served — the TTL only controls when a background re-render is queued. Setting a very short TTL does not cause bots to see blank pages; it just means PerfectSearch re-renders your pages more often, which increases your monthly render usage.

How do I manually re-render a snapshot?

You can manually re-render snapshots in two ways: through the Snapshot Explorer in the dashboard for individual pages, or through the POST /v1/snapshot/queue API endpoint for programmatic and bulk re-rendering. Both methods queue a new render job that replaces the existing snapshot once complete.

Via the dashboard:

  1. Navigate to your site and open the Snapshots tab.
  2. Find the page you want to re-render using the search bar.
  3. Click the Re-render button on the snapshot row.
  4. The snapshot status changes to “Rendering” while the job is in progress.

Via the API:

bash
curl -X POST https://search.perfectline.io/v1/snapshot/queue \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"site_id": "your-site-id", "paths": ["/products", "/about", "/blog/latest"]}'

The API accepts an array of paths and queues a render job for each one. See the POST /v1/snapshot/queue API reference for all options.

How do I use the Snapshot Explorer?

The Snapshot Explorer is accessible from the Snapshots tab in your site dashboard. It provides a searchable list of all snapshots for your site, with the ability to preview rendered content, inspect metadata, and trigger re-renders. Use the search bar to filter by URL path and quickly find specific pages.

Each snapshot row displays the URL path, the snapshot status (fresh, stale, or rendering), the last rendered timestamp, and the HTML and Markdown file sizes. Click a row to expand it and see a preview of the rendered content in both HTML and Markdown formats. You can switch between formats using the tab controls.

The Snapshot Explorer also shows aggregate statistics at the top: total snapshots, the percentage that are fresh vs. stale, and your total storage usage. This gives you a quick overview of your snapshot health without needing to examine individual pages.

How do I bulk invalidate snapshots?

Bulk invalidation lets you mark multiple snapshots as stale at once, triggering background re-renders for all of them. You can invalidate by path pattern using the API, or purge your entire cache from the site settings page. Invalidation does not delete snapshots — it marks them as stale so they are re-rendered on the next request.

Via the API:

bash
curl -X POST https://search.perfectline.io/v1/webhook/invalidate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"site_id": "your-site-id", "pattern": "/blog/**"}'

The pattern field supports the same wildcard syntax as access control rules. Use /** to invalidate everything. See the POST /v1/webhook/invalidate API reference for details.

Via the dashboard:

Go to your site's Settings page and click the Purge All Cache button. This invalidates every snapshot for the site and queues re-renders for all of them. Use this after a major site redesign or content migration to ensure all snapshots reflect the latest version of your pages.

Invalidation vs. deletion

Invalidating a snapshot does not delete it. The stale snapshot continues to be served to bots while a fresh version is being rendered in the background. This ensures that bots never receive a blank page during re-rendering. If you need to immediately stop serving a specific page, use an access control rule to block the path instead.