***

title: Crawl a Website for Knowledge Base
slug: knowledge-base-website-crawler
description: Import multiple pages into a knowledge base by crawling a website from a root URL.
---------------------

For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://docs.synthflow.ai/llms.txt. For full documentation content, see https://docs.synthflow.ai/llms-full.txt.

Use the crawler when you want to import more than a single page from your website. The crawler starts from a root URL, follows links based on your settings, and imports discovered pages into your Knowledge Base.

<Note>
  The crawler runs asynchronously. A crawl can take a few minutes depending on your website size, crawl depth, and page limits.
</Note>

## Set up a website crawl

<Steps>
  <Step title="Open the import modal and choose Website Crawl">
    Go to **Knowledge Bases** and open your knowledge base. Click **Add Document**, choose **Paste from URL**, then switch to **Website Crawl**.

    ![Import from URL modal with Website Crawl selected](https://files.buildwithfern.com/synthflow.docs.buildwithfern.com/ae1e731255ca2a528f6df2973da0a6c810f44409ed7660a27ffd9d127f60d72d/docs/assets/screenshots/kb-crawler-import-modal.png)
  </Step>

  <Step title="Configure crawl settings">
    Set the required fields:

    * **Name**: internal name for this source.
    * **Root URL**: starting URL for discovery (for example, `https://docs.synthflow.ai`).
    * **Max Pages (limit)**: maximum number of pages to import.
    * **Sitemap Handling**: whether sitemap URLs are included, skipped, or used exclusively.
    * **Sync Schedule**: automatic recrawl cadence.

    Use **Advanced Settings** for include/exclude path rules, URL parameter handling, subdomain inclusion, and crawl depth.

    ![Advanced crawler settings including include/exclude paths and crawl depth](https://files.buildwithfern.com/synthflow.docs.buildwithfern.com/ba9cd5acf8db5eb7517141b2ef859c304b449ad0c1063b6280eb6b34fc94dab5/docs/assets/screenshots/kb-crawler-advanced-settings.png)
  </Step>

  <Step title="Start the crawl">
    Click **Start Crawl**. Importing starts in the background and the source status shows as **Crawling in progress** while pages are being discovered and indexed.

    ![Crawler in progress status](https://files.buildwithfern.com/synthflow.docs.buildwithfern.com/2ae43972f305247122061f2c8887fa604aa21e16b58dba145fe5fac213329bd4/docs/assets/screenshots/kb-crawler-in-progress.png)
  </Step>

  <Step title="Review imported pages">
    When crawling finishes, open the source details to review the imported pages list and inspect extracted content.

    ![Imported pages and extracted content preview](https://files.buildwithfern.com/synthflow.docs.buildwithfern.com/dbfdbd88fd986593225cf4c54061bafb1305a06a374e4c8ffa720d9d8138aa4a/docs/assets/screenshots/kb-crawler-results.png)
  </Step>
</Steps>

## Recommendations

* Start with a lower page limit while testing, then increase gradually.
* Use include/exclude rules to avoid irrelevant sections (for example `/blog` or `/changelog`).
* Keep URL-parameter ignoring enabled to reduce duplicate page imports.
* Use a lower crawl depth for focused imports and higher depth for full-site coverage.