Your Cart
Loading

๐ŸŒ Firecrawl Website Content Extractor

On Sale
$5.00
$5.00
Added to cart

# ๐ŸŒ Firecrawl Website Content Extractor (n8n Workflow)


This n8n automation workflow uses **Firecrawl API** to extract structured data (e.g., quotes and authors) from web pages โ€” such as [Quotes to Scrape](https://quotes.toscrape.com/) โ€” and handles retries in case of delayed extraction.


---


## ๐Ÿ” Workflow Overview


### ๐ŸŽฏ Purpose:

- Crawl and extract **structured web data** using Firecrawl

- Wait for asynchronous scraping to complete

- Retrieve and validate results

- Support retries if content is not ready


---


## ๐Ÿ”ง Step-by-Step Node Breakdown


### 1. ๐Ÿงช **Manual Trigger**

- Node: `When clicking โ€˜Test workflowโ€™`

- Used to **manually test** or execute the workflow during setup or debugging.


---


### 2. ๐Ÿ“ค **Firecrawl Extract API Request**

- Node: `Extract`

- Sends a `POST` request to `https://api.firecrawl.dev/v1/extract`

- Payload includes:

ย - `urls`: List of pages to crawl (`https://quotes.toscrape.com/*`)

ย - `prompt`: "Extract all quotes and their corresponding authors from the website."

ย - `schema`: JSON schema defining expected structure (`quotes[]`, each with `text` and `author`)


> ๐Ÿ“Œ Uses an **HTTP Header Auth** credential for Firecrawl API


---


### 3. โฑ๏ธ **Wait for 30 Seconds**

- Node: `30 Secs`

- Gives Firecrawl time to finish processing in the background

- Prevents hitting the API before results are ready


---


### 4. ๐Ÿ“ฅ **Get Results**

- Node: `Get Results`

- Performs a `GET` request to the status URL using `{{ $('Extract').item.json.id }}` to retrieve extraction results.


---


### 5. โœ…โŒ **Condition Check**

- Node: `If`

- Checks if the `data` array is empty (i.e., no results yet)

- If **data is empty**:

ย - Waits **10 more seconds** and retries

- If **data is available**:

ย - Passes data to the next step (e.g., processing or storage)


---


### 6. ๐Ÿ” **Retry Delay**

- Node: `10 Seconds`

- Waits briefly before sending another `GET` request to Firecrawl


---


### 7. ๐Ÿ› ๏ธ **Edit Fields (Optional Output Formatting)**

- Node: `Edit Fields`

- Placeholder to structure or format the extracted results (quotes and authors)


---


## ๐Ÿงพ Sticky Note: Firecrawl Setup Guide


Included as an embedded reference:

- ๐Ÿ”— [10% Firecrawl Discount](https://firecrawl.link/nateherk)

- ๐Ÿงฐ Instructions to:

ย - Add Firecrawl API credentials in **n8n**

ย - Use Firecrawl Community Node for **self-hosted** instances

ย - Set up the schema and prompt for targeted data extraction


---


## โœ… Key Features


- ๐Ÿ”Œ API-based crawling with schema-structured output

- โฑ๏ธ Smart waiting + retry mechanism

- ๐Ÿง  AI prompt integration for intelligent data parsing

- โš™๏ธ Flexible for different URLs, prompts, and schemas


---


## ๐Ÿ“ฆ Sample Output Schema


```json

{

ย "quotes": [

ย ย {

ย ย ย "text": "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.",

ย ย ย "author": "Albert Einstein"

ย ย },

ย ย {

ย ย ย "text": "It is our choices, Harry, that show what we truly are, far more than our abilities.",

ย ย ย "author": "J.K. Rowling"

ย ย }

ย ]

}


You will get a JSON (6KB) file