HivemunkCrawler

HivemunkCrawler is the web crawler operated by hivemunk. It visits publicly available beekeeping-related business websites to help build a directory and product index for beekeeping supplies. The purpose of the crawler is to help beekeepers find equipment, compare availability and prices across sellers, and discover beekeeping supply businesses they may not already know about. HivemunkCrawler is focused on beekeeping businesses and beekeeping equipment. It is not a general-purpose web crawler.

User agent

HivemunkCrawler identifies itself with the following user agent:

HivemunkCrawler/1.0 (+https://hivemunk.com/crawler)

The URL in the user agent points to this page so website owners can understand why the crawler visited their site and how to contact hivemunk.

Why HivemunkCrawler visits websites

HivemunkCrawler visits websites for a few related purposes:

Business discovery
To identify businesses that may be relevant to beekeepers, including beekeeping supply stores, honey producers, farm stores, beekeeper associations, and related organizations.
Business verification
To determine whether a discovered website is actually relevant to beekeeping, whether it sells physical products, and whether it appears to be a business that should be included in the hivemunk directory.
Directory building
To collect limited business-level information such as website URL, business type, public address, and general product focus.
Product indexing
When crawling appears to be allowed, to collect limited product-level information from beekeeping supply websites so beekeepers can search and compare products across sellers.

How websites are discovered

Candidate websites may be discovered from public sources such as:

Search engines
Public map and business listing results
Supplier directories
Beekeeping association pages
Public business websites
Other publicly available references

A website being discovered does not automatically mean it will be added to the hivemunk directory or product index. Candidate websites are reviewed by automated checks first.

What HivemunkCrawler may collect

Depending on the website and what is publicly available, HivemunkCrawler may collect the following business-level information:

Website URL
Domain name
Business name
Business type or category
Whether the business appears relevant to beekeeping
Whether the business appears to sell physical products
Public business address, if shown on the website
Contact or location page URLs
E-commerce platform, if identifiable
Whether a terms, legal, or terms of service page exists
Whether a sitemap or robots.txt file exists

For product indexing, HivemunkCrawler may collect limited product-level information such as:

Product name
Product page URL
Listed price
Currency
Product category or collection
Availability text, such as "in stock," "out of stock," or "sold out"
Basic product attributes visible on listing or product pages

The product index is intended to point users back to the original seller. hivemunk does not intend to replace the seller's website.

What HivemunkCrawler does not do

HivemunkCrawler does not:

Create accounts
Log in to websites
Submit forms
Sign up for newsletters
Add products to carts
Start checkout flows
Place orders
Bypass paywalls
Bypass CAPTCHAs
Access password-protected pages
Access private customer areas
Attempt to evade bot protections
Copy customer reviews
Download or rehost product images
Republish full website content
Republish full product descriptions as a substitute for visiting the seller's website

HivemunkCrawler is designed to collect limited structured information for discovery, comparison, and referral.

How HivemunkCrawler behaves

HivemunkCrawler works in stages.

1. Discovery stage

During discovery, hivemunk identifies possible beekeeping-related businesses from public sources.

At this stage, the crawler may not visit the business website yet. The goal is to create a candidate list of websites that may be relevant to beekeepers.

2. Verification stage

During verification, HivemunkCrawler may visit a small number of publicly available pages to understand what kind of business the website represents.

This may include requests to:

The homepage
robots.txt
sitemap.xml
Contact pages
About pages
Visit or location pages
Terms, legal, or terms of service pages

In some cases, the website URL that was originally discovered may redirect to a different URL before reaching the actual business website. HivemunkCrawler may follow these redirects to find the correct page.

In some cases, HivemunkCrawler may scan additional related pages beyond the homepage to find information needed for the directory. For example, contact, about, or similar pages may be visited if the homepage does not contain enough information.

The verification stage looks for limited signals such as:

Whether the site is relevant to beekeeping
Whether the site sells physical products
Whether the site appears to be a supply store, honey producer, association, educational site, or another type of business
Whether a public business address is shown
Whether product pages appear to exist
Whether the website publishes crawling restrictions

This stage is intentionally limited. Its purpose is to decide whether the site should be considered for the directory or later product indexing.

3. Permission review stage

Before product indexing, HivemunkCrawler checks for signs that automated access should be avoided or limited.

This includes reviewing:

robots.txt, specifically for rules addressed to HivemunkCrawler
Terms of service
Terms of use
Legal pages
Other pages that appear to describe automated access, scraping, crawling, bots, or data extraction

Terms, legal, and terms of service pages are analyzed using automated tools to determine whether they explicitly restrict automated access or data collection. General "all rights reserved" or intellectual property clauses do not count as restrictions — only explicit prohibitions on automated access or data collection are treated as blocks.

If product crawling appears to be disallowed, the site may be excluded from product indexing or marked for manual review.

The verification stage (step 2) involves only a small number of requests and occurs before the full permission review. The full robots.txt and terms review happens at this stage, before any product indexing begins.

4. Product indexing stage

If a website appears relevant to beekeeping and product crawling appears to be allowed, HivemunkCrawler may visit product-related pages.

When the e-commerce platform provides methods to access product data, HivemunkCrawler uses those methods. When such methods are not available, HivemunkCrawler may visit individual product pages directly.

The goal of this stage is to collect limited structured product information, not to copy the website.

5. Refresh stage

Business and product information may be refreshed periodically.

Refreshes are used to keep information accurate, including product availability, listed prices, business URLs, and whether a business still appears relevant to beekeeping.

Refresh frequency may vary depending on the type of information and the behavior of the website.

Headless browser

HivemunkCrawler renders pages using a headless browser. This means it loads and executes JavaScript on the pages it visits, similar to how a regular web browser would. This may trigger analytics, tracking, or statistics scripts on the visited site. The crawler does not interact with the page beyond loading it — it does not click buttons, scroll, or fill in forms.

robots.txt

HivemunkCrawler checks robots.txt for rules specifically addressed to it. If HivemunkCrawler is disallowed, the site will be excluded from product indexing.

To block HivemunkCrawler from your entire site, add the following to the robots.txt file at the root of your domain:

User-agent: HivemunkCrawler
Disallow: /

Blocking HivemunkCrawler may prevent your business from appearing in the hivemunk beekeeping supply directory or product index.

Blocking product indexing only

If you want your business to appear in the hivemunk directory but do not want product pages crawled, you can block product-related paths instead of blocking the entire site.

Exact paths vary by website platform. If you are not sure what to block, contact us and we can help.

Allowing HivemunkCrawler

To explicitly allow HivemunkCrawler, you can add:

User-agent: HivemunkCrawler
Allow: /

You do not need to add this if your site already allows general crawling.

Terms of service and legal restrictions

HivemunkCrawler may check terms, legal, or terms of service pages for restrictions related to:

Scraping
Crawling
Bots
Automated access
Data extraction
Screen scraping
Use of robots.txt

If your website terms prohibit automated crawling, scraping, or data extraction, HivemunkCrawler may exclude your site from product indexing or mark it for manual review.

Product images

HivemunkCrawler is not intended to download, copy, or rehost product images.

If product images are displayed in hivemunk in the future, they should generally be handled by linking back to the original seller or by using images only with permission, license, partnership, or another appropriate basis.

Product descriptions

HivemunkCrawler is not intended to republish full product descriptions.

Product indexing may use short product names, categories, prices, availability indicators, and product URLs to help beekeepers find relevant products. Users should visit the seller's website for the complete product listing, description, shipping terms, return policy, and checkout process.

Accuracy

HivemunkCrawler collects information from public websites, but product data can change quickly.

Prices, availability, shipping costs, taxes, promotions, and product details may change after a page is crawled. hivemunk may display when information was last checked, but the seller's website is the authoritative source for current product details.

Before purchasing, users should confirm all information directly on the seller's website.

Removal requests

If you want your website removed from the hivemunk directory or product index, contact:

hello@hivemunk.com

Please include:

Your domain name
Whether you want the entire business listing removed
Whether you only want product indexing disabled
Whether any information is inaccurate and should be corrected

Removal and correction requests are reviewed manually.

Correction requests

If hivemunk has incorrect information about your business, contact:

hello@hivemunk.com

Please include the correct information and the URL where it can be verified.

Examples of corrections include:

Business name
Business address
Website URL
Whether you sell beekeeping supplies
Whether you sell online
Product category information

Contact

For crawler questions, removal requests, correction requests, or crawl-rate concerns, contact:

hello@hivemunk.com

Please include your domain name so we can review the correct website.