
HivemunkCrawler
HivemunkCrawler is the web crawler operated by hivemunk. It visits publicly available beekeeping-related business websites to help build a directory and product index for beekeeping supplies. The purpose of the crawler is to help beekeepers find equipment, compare availability and prices across sellers, and discover beekeeping supply businesses they may not already know about. HivemunkCrawler is focused on beekeeping businesses and beekeeping equipment. It is not a general-purpose web crawler.
User agent
HivemunkCrawler identifies itself with the following user agent:
HivemunkCrawler/1.0 (+https://hivemunk.com/crawler)
The URL in the user agent points to this page so website owners can understand why the crawler visited their site and how to contact hivemunk.
Why HivemunkCrawler visits websites
HivemunkCrawler visits websites for a few related purposes:
-
Business discovery
To identify businesses that may be relevant to beekeepers, including beekeeping supply stores, honey producers, farm stores, beekeeper associations, and related organizations. -
Business verification
To determine whether a discovered website is actually relevant to beekeeping, whether it sells physical products, and whether it appears to be a business that should be included in the hivemunk directory. -
Directory building
To collect limited business-level information such as website URL, business type, public address, and general product focus. -
Product indexing
When crawling appears to be allowed, to collect limited product-level information from beekeeping supply websites so beekeepers can search and compare products across sellers.
How websites are discovered
Candidate websites may be discovered from public sources such as:
- Search engines
- Public map and business listing results
- Supplier directories
- Beekeeping association pages
- Public business websites
- Other publicly available references
A website being discovered does not automatically mean it will be added to the hivemunk directory or product index. Candidate websites are reviewed by automated checks first.
What HivemunkCrawler may collect
Depending on the website and what is publicly available, HivemunkCrawler may collect the following business-level information:
- Website URL
- Domain name
- Business name
- Business type or category
- Whether the business appears relevant to beekeeping
- Whether the business appears to sell physical products
- Public business address, if shown on the website
- Contact or location page URLs
- E-commerce platform, if identifiable
- Whether a terms, legal, or terms of service page exists
- Whether a sitemap or robots.txt file exists
For product indexing, HivemunkCrawler may collect limited product-level information such as:
- Product name
- Product page URL
- Listed price
- Currency
- Product category or collection
- Availability text, such as "in stock," "out of stock," or "sold out"
- Basic product attributes visible on listing or product pages
The product index is intended to point users back to the original seller. hivemunk does not intend to replace the seller's website.
What HivemunkCrawler does not do
HivemunkCrawler does not:
- Create accounts
- Log in to websites
- Submit forms
- Sign up for newsletters
- Add products to carts
- Start checkout flows
- Place orders
- Bypass paywalls
- Bypass CAPTCHAs
- Access password-protected pages
- Access private customer areas
- Attempt to evade bot protections
- Copy customer reviews
- Download or rehost product images
- Republish full website content
- Republish full product descriptions as a substitute for visiting the seller's website
HivemunkCrawler is designed to collect limited structured information for discovery, comparison, and referral.
How HivemunkCrawler behaves
HivemunkCrawler works in stages.
1. Discovery stage
During discovery, hivemunk identifies possible beekeeping-related businesses from public sources.
At this stage, the crawler may not visit the business website yet. The goal is to create a candidate list of websites that may be relevant to beekeepers.
2. Verification stage
During verification, HivemunkCrawler may visit a small number of publicly available pages to understand what kind of business the website represents.
This may include requests to:
- The homepage
robots.txtsitemap.xml- Contact pages
- About pages
- Visit or location pages
- Terms, legal, or terms of service pages
In some cases, the website URL that was originally discovered may redirect to a different URL before reaching the actual business website. HivemunkCrawler may follow these redirects to find the correct page.
In some cases, HivemunkCrawler may scan additional related pages beyond the homepage to find information needed for the directory. For example, contact, about, or similar pages may be visited if the homepage does not contain enough information.
The verification stage looks for limited signals such as:
- Whether the site is relevant to beekeeping
- Whether the site sells physical products
- Whether the site appears to be a supply store, honey producer, association, educational site, or another type of business
- Whether a public business address is shown
- Whether product pages appear to exist
- Whether the website publishes crawling restrictions
This stage is intentionally limited. Its purpose is to decide whether the site should be considered for the directory or later product indexing.
3. Permission review stage
Before product indexing, HivemunkCrawler checks for signs that automated access should be avoided or limited.
This includes reviewing:
robots.txt, specifically for rules addressed to HivemunkCrawler- Terms of service
- Terms of use
- Legal pages
- Other pages that appear to describe automated access, scraping, crawling, bots, or data extraction
Terms, legal, and terms of service pages are analyzed using automated tools to determine whether they explicitly restrict automated access or data collection. General "all rights reserved" or intellectual property clauses do not count as restrictions — only explicit prohibitions on automated access or data collection are treated as blocks.
If product crawling appears to be disallowed, the site may be excluded from product indexing or marked for manual review.
The verification stage (step 2) involves only a small number of requests and occurs before the full permission review. The full robots.txt and terms review happens at this stage, before any product indexing begins.
4. Product indexing stage
If a website appears relevant to beekeeping and product crawling appears to be allowed, HivemunkCrawler may visit product-related pages.
When the e-commerce platform provides methods to access product data, HivemunkCrawler uses those methods. When such methods are not available, HivemunkCrawler may visit individual product pages directly.
The goal of this stage is to collect limited structured product information, not to copy the website.
5. Refresh stage
Business and product information may be refreshed periodically.
Refreshes are used to keep information accurate, including product availability, listed prices, business URLs, and whether a business still appears relevant to beekeeping.
Refresh frequency may vary depending on the type of information and the behavior of the website.
Headless browser
HivemunkCrawler renders pages using a headless browser. This means it loads and executes JavaScript on the pages it visits, similar to how a regular web browser would. This may trigger analytics, tracking, or statistics scripts on the visited site. The crawler does not interact with the page beyond loading it — it does not click buttons, scroll, or fill in forms.
robots.txt
HivemunkCrawler checks robots.txt for rules specifically addressed to it. If HivemunkCrawler is disallowed, the site will be excluded from product indexing.
To block HivemunkCrawler from your entire site, add the following to the robots.txt file at the root of your domain:
User-agent: HivemunkCrawler
Disallow: /
Blocking HivemunkCrawler may prevent your business from appearing in the hivemunk beekeeping supply directory or product index.
Blocking product indexing only
If you want your business to appear in the hivemunk directory but do not want product pages crawled, you can block product-related paths instead of blocking the entire site.
Exact paths vary by website platform. If you are not sure what to block, contact us and we can help.
Allowing HivemunkCrawler
To explicitly allow HivemunkCrawler, you can add:
User-agent: HivemunkCrawler
Allow: /
You do not need to add this if your site already allows general crawling.
Terms of service and legal restrictions
HivemunkCrawler may check terms, legal, or terms of service pages for restrictions related to:
- Scraping
- Crawling
- Bots
- Automated access
- Data extraction
- Screen scraping
- Use of robots.txt
If your website terms prohibit automated crawling, scraping, or data extraction, HivemunkCrawler may exclude your site from product indexing or mark it for manual review.
Product images
HivemunkCrawler is not intended to download, copy, or rehost product images.
If product images are displayed in hivemunk in the future, they should generally be handled by linking back to the original seller or by using images only with permission, license, partnership, or another appropriate basis.
Product descriptions
HivemunkCrawler is not intended to republish full product descriptions.
Product indexing may use short product names, categories, prices, availability indicators, and product URLs to help beekeepers find relevant products. Users should visit the seller's website for the complete product listing, description, shipping terms, return policy, and checkout process.
Accuracy
HivemunkCrawler collects information from public websites, but product data can change quickly.
Prices, availability, shipping costs, taxes, promotions, and product details may change after a page is crawled. hivemunk may display when information was last checked, but the seller's website is the authoritative source for current product details.
Before purchasing, users should confirm all information directly on the seller's website.
Removal requests
If you want your website removed from the hivemunk directory or product index, contact:
hello@hivemunk.com
Please include:
- Your domain name
- Whether you want the entire business listing removed
- Whether you only want product indexing disabled
- Whether any information is inaccurate and should be corrected
Removal and correction requests are reviewed manually.
Correction requests
If hivemunk has incorrect information about your business, contact:
hello@hivemunk.com
Please include the correct information and the URL where it can be verified.
Examples of corrections include:
- Business name
- Business address
- Website URL
- Whether you sell beekeeping supplies
- Whether you sell online
- Product category information
Contact
For crawler questions, removal requests, correction requests, or crawl-rate concerns, contact:
hello@hivemunk.com
Please include your domain name so we can review the correct website.
