
๐ Firecrawl Website Content Extractor
# ๐ Firecrawl Website Content Extractor (n8n Workflow)
This n8n automation workflow uses **Firecrawl API** to extract structured data (e.g., quotes and authors) from web pages โ such as [Quotes to Scrape](https://quotes.toscrape.com/) โ and handles retries in case of delayed extraction.
---
## ๐ Workflow Overview
### ๐ฏ Purpose:
- Crawl and extract **structured web data** using Firecrawl
- Wait for asynchronous scraping to complete
- Retrieve and validate results
- Support retries if content is not ready
---
## ๐ง Step-by-Step Node Breakdown
### 1. ๐งช **Manual Trigger**
- Node: `When clicking โTest workflowโ`
- Used to **manually test** or execute the workflow during setup or debugging.
---
### 2. ๐ค **Firecrawl Extract API Request**
- Node: `Extract`
- Sends a `POST` request to `https://api.firecrawl.dev/v1/extract`
- Payload includes:
ย - `urls`: List of pages to crawl (`https://quotes.toscrape.com/*`)
ย - `prompt`: "Extract all quotes and their corresponding authors from the website."
ย - `schema`: JSON schema defining expected structure (`quotes[]`, each with `text` and `author`)
> ๐ Uses an **HTTP Header Auth** credential for Firecrawl API
---
### 3. โฑ๏ธ **Wait for 30 Seconds**
- Node: `30 Secs`
- Gives Firecrawl time to finish processing in the background
- Prevents hitting the API before results are ready
---
### 4. ๐ฅ **Get Results**
- Node: `Get Results`
- Performs a `GET` request to the status URL using `{{ $('Extract').item.json.id }}` to retrieve extraction results.
---
### 5. โ โ **Condition Check**
- Node: `If`
- Checks if the `data` array is empty (i.e., no results yet)
- If **data is empty**:
ย - Waits **10 more seconds** and retries
- If **data is available**:
ย - Passes data to the next step (e.g., processing or storage)
---
### 6. ๐ **Retry Delay**
- Node: `10 Seconds`
- Waits briefly before sending another `GET` request to Firecrawl
---
### 7. ๐ ๏ธ **Edit Fields (Optional Output Formatting)**
- Node: `Edit Fields`
- Placeholder to structure or format the extracted results (quotes and authors)
---
## ๐งพ Sticky Note: Firecrawl Setup Guide
Included as an embedded reference:
- ๐ [10% Firecrawl Discount](https://firecrawl.link/nateherk)
- ๐งฐ Instructions to:
ย - Add Firecrawl API credentials in **n8n**
ย - Use Firecrawl Community Node for **self-hosted** instances
ย - Set up the schema and prompt for targeted data extraction
---
## โ Key Features
- ๐ API-based crawling with schema-structured output
- โฑ๏ธ Smart waiting + retry mechanism
- ๐ง AI prompt integration for intelligent data parsing
- โ๏ธ Flexible for different URLs, prompts, and schemas
---
## ๐ฆ Sample Output Schema
```json
{
ย "quotes": [
ย ย {
ย ย ย "text": "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.",
ย ย ย "author": "Albert Einstein"
ย ย },
ย ย {
ย ย ย "text": "It is our choices, Harry, that show what we truly are, far more than our abilities.",
ย ย ย "author": "J.K. Rowling"
ย ย }
ย ]
}