GlippyBot

Crawler information & allow-list guide

GlippyBot is the user-agent used by the Glippy Desktop app when a user runs a site audit. It is not a centralized search engine crawler. Each request comes from the machine of the person using Glippy, not from Glippy infrastructure.

User-Agent string

GlippyBot identifies itself with the following User-Agent header on every request:

Mozilla/5.0 (compatible; GlippyBot/1.0.0; +https://www.glippy.dev/bot)

The version segment matches the installed Glippy Desktop release, so older clients may send a slightly different version number.

What GlippyBot does

Fetches HTML pages on the seed domain a Glippy user has chosen to audit.
Reads robots.txt, sitemap.xml, and llms.txt.
Follows internal links up to the user-configured depth and page limit (default: depth 3, 100 pages).
Honors robots.txt rules by default (this can be toggled by the user, but the default is on).
Issues a normal GET request, accepts the response, and never executes JavaScript or submits forms.

Where requests come from

GlippyBot has no fixed IP range. The crawler runs locally on each Glippy user's machine, so requests come from that user's residential or corporate IP. Allow-listing GlippyBot by IP is not practical - allow-list it by User-Agent instead.

How to allow-list GlippyBot

If your WAF (Cloudflare, Akamai, SiteGround, DataDome, Imperva, etc.) is challenging or blocking GlippyBot, the cleanest fix is to add an allow-rule for the User-Agent string above. Most WAF dashboards let you create a rule like:

"If User-Agent contains GlippyBot, then skip the bot challenge."

If you prefer to control crawler behavior at the protocol level, use robots.txt:

robots.txt: allow GlippyBot everywhere

User-agent: GlippyBot
Allow: /

robots.txt: block GlippyBot

User-agent: GlippyBot
Disallow: /

GlippyBot honors Disallow directives by default. If you disallow it, no Glippy user will be able to audit your site.

robots.txt: limit GlippyBot to public pages

User-agent: GlippyBot
Disallow: /admin/
Disallow: /api/
Disallow: /private/
Allow: /

Crawl rate

GlippyBot defaults to 5 concurrent requests with no artificial delay between them, capped at 5,000 pages per crawl. The user can lower these limits in the app. If you'd like a global rate hint, set Crawl-delay in your robots.txt:

User-agent: GlippyBot
Crawl-delay: 2

Got blocked or rate-limited?

If a Glippy user reported GlippyBot being blocked on your site, the in-app banner will show them the exact User-Agent string and the IP address making the request. They can forward those details to you, and you can allow-list them in your WAF.

Verify a GlippyBot request

GlippyBot does not yet publish a public IP list (because it doesn't have one - see above). The User-Agent header is the canonical identifier. If you need to verify that traffic is genuine GlippyBot traffic, look for the full string above; impersonators rarely match it exactly, and a malicious crawler is more likely to spoof a major-search-engine UA than a small-tool UA.

Questions?

If you have questions about GlippyBot's behavior, want to request a rate-limit change for a specific deployment, or report abuse, contact [email protected].

← Back to Glippy