Sitemap

PerfectSearch uses your sitemap to discover all the pages on your site, visualize your site structure as an interactive tree, and track snapshot coverage across every URL. Adding your sitemap is the fastest way to ensure every page has a pre-rendered snapshot ready for bots, and the coverage view makes it easy to spot gaps.

How do I add my sitemap?

Go to your site in the PerfectSearch dashboard and open the Sitemap tab. Enter your sitemap URL (for example, https://example.com/sitemap.xml) into the input field and click Add. PerfectSearch fetches the sitemap, parses all URLs, and builds the visual tree within seconds.

Step-by-step:

  1. Navigate to Dashboard and select your site.
  2. Open the Sitemap tab.
  3. Paste your sitemap URL into the input field. This is typically https://yourdomain.com/sitemap.xml or https://yourdomain.com/sitemap-index.xml.
  4. Click Add. PerfectSearch fetches and parses the sitemap.
  5. Once parsed, the sitemap tree appears with all discovered URLs organized by path segments.

If your sitemap URL is listed in your robots.txt file, PerfectSearch can also auto-detect it. After adding your site, check the Sitemap tab — if a sitemap was found in your robots.txt, it will be pre-populated and ready to confirm.

What does the sitemap tree show?

The sitemap tree is a visual, hierarchical representation of every URL in your sitemap, organized by path segments. Each node in the tree represents a path segment, and leaf nodes represent actual pages. Every URL displays a colored badge showing its current snapshot status: cached, stale, or missing.

The tree is collapsible and expandable. Top-level segments like /blog, /products, and /docs appear as folders that you can expand to see their child pages. Each folder shows an aggregate snapshot coverage percentage, so you can quickly identify which sections of your site have incomplete coverage without drilling into every page.

Clicking on an individual URL in the tree opens a detail panel showing the snapshot status, last rendered timestamp, HTML and Markdown file sizes, and a link to preview the snapshot in the Snapshot Explorer. You can also trigger a re-render directly from this panel.

Badge ColorStatusMeaning
GreenCachedA fresh snapshot exists and is within the cache TTL. Ready to serve.
YellowStaleSnapshot exists but has exceeded its cache TTL. Still served while re-rendering in the background.
RedMissingNo snapshot has been rendered for this URL. Bots will receive a 404 until rendering completes.

How does snapshot coverage work?

Snapshot coverage compares the URLs in your sitemap against your snapshot cache. Each URL is classified as cached (green), stale (yellow), or missing (red). The coverage percentage tells you what fraction of your sitemap URLs have a fresh, up-to-date snapshot ready to serve to bots. Aim for 100% coverage to ensure every page is instantly visible to search engines and AI crawlers.

The coverage summary at the top of the Sitemap tab shows three numbers:

  • Cached — Pages with a fresh snapshot within the cache TTL. These are fully ready for bots.
  • Stale — Pages with an expired snapshot. The stale version is still served, but a re-render is needed to refresh the content.
  • Missing — Pages with no snapshot at all. Bots requesting these pages will get a 404 until a snapshot is rendered.

If you have missing pages, click the Render Missing button to queue render jobs for all uncached URLs at once. This is especially useful after adding your sitemap for the first time or after publishing a large batch of new pages.

How does sitemap-driven crawl work?

When you add a sitemap to PerfectSearch, all discovered URLs are automatically queued for rendering. This sitemap-driven crawl ensures that every page on your site has a snapshot ready before the first bot visit, eliminating the cold-start problem where a bot's first visit returns a 404. Subsequent sitemap refreshes detect new URLs and queue only those for rendering.

The initial crawl processes URLs in batches to avoid overwhelming your origin server. PerfectSearch's rendering workers respect a configurable concurrency limit (default: 5 concurrent renders per site) and add a brief delay between batches. For a site with 1,000 pages, the initial crawl typically completes within 30-60 minutes depending on page complexity and server response times.

PerfectSearch re-fetches your sitemap periodically (every 24 hours by default) to detect new or removed URLs. New URLs are queued for rendering automatically. Removed URLs are flagged in the dashboard but their snapshots are not deleted — you can clean them up manually from the Snapshot Explorer if needed.

If your sitemap references a sitemap index file (a sitemap of sitemaps), PerfectSearch follows all child sitemap references and aggregates the URLs into a single tree. This is common for large e-commerce sites and content platforms with tens of thousands of pages that split their sitemap across multiple files.

What sitemap formats are supported?

PerfectSearch supports standard XML sitemaps, sitemap index files, and robots.txt Sitemap directives. All three formats are parsed automatically when you add a sitemap URL. You do not need to specify the format — PerfectSearch detects it from the response content and handles each type appropriately.

  • Standard XML sitemap — The most common format. A single XML file containing <url> elements, each with a <loc> specifying the page URL. PerfectSearch also reads optional <lastmod> timestamps to prioritize recently-updated pages during rendering.
  • Sitemap index — An XML file containing <sitemap> elements that reference multiple child sitemaps. PerfectSearch fetches and parses each child sitemap and merges all URLs into a single tree. Common for large sites with more than 50,000 URLs.
  • robots.txt Sitemap directive — A Sitemap: line in your robots.txt file that points to your sitemap URL. PerfectSearch checks your robots.txt during site setup and auto-detects any declared sitemaps. Multiple Sitemap: directives are supported.

Keep your sitemap up to date

PerfectSearch re-fetches your sitemap every 24 hours to discover new pages. If you add a batch of new pages and want them rendered immediately, you can manually refresh the sitemap from the Sitemap tab by clicking the “Refresh” button. This triggers an immediate fetch and queues any newly-discovered URLs for rendering without waiting for the next automatic refresh cycle.