LogoCreatorHub.tools
Logo of Scraperr

Scraperr

Scraperr: A self-hosted web scraping solution with XPath-based extraction, queue management, domain spidering, and data export.

Introduction

Scraperr

A powerful self-hosted web scraping solution that allows you to scrape websites without writing a single line of code.

📚 Check out the docs for a comprehensive quickstart guide and detailed information.

✨ Key Features:

  • XPath-Based Extraction: Precisely target page elements.
  • Queue Management: Submit and manage multiple scraping jobs.
  • Domain Spidering: Option to scrape all pages within the same domain.
  • Custom Headers: Add JSON headers to your scraping requests.
  • Media Downloads: Automatically download images, videos, and other media.
  • Results Visualization: View scraped data in a structured table format.
  • Data Export: Export your results in markdown and CSV formats.
  • Notification Channels: Send completion notifications through various channels.

🚀 Getting Started:

Docker
make up
Helm

Refer to the docs for helm deployment: https://scraperr-docs.pages.dev/guides/helm-deployment

⚖️ Legal and Ethical Guidelines:

When using Scraperr, please remember to:

  1. Respect robots.txt: Always check a website's robots.txt file to verify which pages permit scraping.
  2. Terms of Service: Adhere to each website's Terms of Service regarding data extraction.
  3. Rate Limiting: Implement reasonable delays between requests to avoid overloading servers.

Disclaimer: Scraperr is intended for use only on websites that explicitly permit scraping. The creator accepts no responsibility for misuse of this tool.

💬 Join the Community:

Get support, report bugs, and chat with other users and contributors.

👉 Join the Scraperr Discord

📄 License:

This project is licensed under the MIT License. See the LICENSE file for details.

👏 Contributions:

Development made easier with the webapp template.

To get started, simply run make build up-dev.

Information

Categories

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates