Skip to content
Login Contact

Marfeel crawlers, user agents, and IP whitelisting

The Marfeel tracker sends only essential data with each request, keeping it lightweight and consuming minimal bandwidth. All additional page data is collected by dedicated crawlers.

Because many URLs can point to the same content, Marfeel crawlers only crawl canonical URLs and their amphtml counterparts. All URLs pointing to the same canonical are stored as aliases.

Make sure both canonical and amphtml link rel elements are correctly set in all your content for Marfeel crawling to work perfectly.

Learn more about how Marfeel reads your pages metadata.

All Marfeel bots follow these rules to be responsible web citizens:

  • Sites are not proactively crawled to identify new content. Marfeel only crawls URLs with active users.
  • Marfeel limits the number of concurrent requests to each client’s servers. Re-crawls are rate limited to 1,000 requests every 5 minutes.
  • All assets are centrally cached so different bots can reuse them without fetching them separately.
  • Redirects are not followed unless necessary.
Whenever a domain starts using Marfeel, crawling during the first days may be more intense as there is a lot of content to discover. The crawler respects server capacity and slows down over time.

The Marfeel Editorial Crawler visits a URL and builds the editorial profile of a page using its metadata. It crawls URLs when they first receive a hit and every time the content is modified.

The user agent used by the editorial crawler is:

Mozilla/5.0 (compatible; NewsRoom.BI/0.1; +http://www.newsroom.bi/bot.html)

If the editorial crawler fails to process a page correctly, the editorial crawling troubleshooting guide covers common causes and fixes.

To detect structured data, meta tags, and other potential issues in client HTML, Marfeel periodically crawls all relevant URLs (those with traffic) using the following user agents:

Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA51N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Mobile Safari/537.36 (compatible; mrfCompass-Booldog/1.0)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36(KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36 (compatible; mrfCompass-Booldog/1.0)
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; mrfCompass-Marshall/1.0)
  • mrfCompass-Booldog crawls each URL initially using a mobile user agent. If a vary: User-Agent header is received in the response, it crawls the URL using a desktop user agent as well.
  • mrfCompass-Marshall crawls all amphtml links found by mrfCompass-Booldog.

Flowcards that load content directly from specific URLs use a dedicated bot to fetch that content. This bot identifies itself with the following user agent:

Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA51N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Mobile Safari/537.36 (compatible; mrfCompass-Jukebox/1.0)

The crawling frequency respects the cache-control header returned by the server.

Social experiences including Facebook, Twitter(X), Telegram, Reddit, and LinkedIn use the following user agent:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36 (compatible; mrfCompass-Social/1.0)

These experiences and services use Marfeel’s public IPs when crawling your site.

The Amplify service uses the following user agent to handle images in autoposting:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 (compatible; mrfCompass-Amplify/1.0)
// Mobile
Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.6099.144 Mobile Safari/537.36 (compatible; mrfCompass-Preview/1.0/1.0) Googlebot
// Desktop
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 (compatible; mrfCompass-Preview/1.0/1.0) Googlebot

To ensure Marfeel can access and monitor your website, whitelist the crawler user agents listed above or whitelist the static IP addresses available here. Many hosting and CDN providers include WAF services that may consider Marfeel bots potentially malicious and block them.

In case you have content behind a hard paywall, make sure that requests from these IPs have access to it. Otherwise many modules like content metrics, recommender, audits... might lack the needed information to function properly.

To whitelist Marfeel crawlers’ IPs on Cloudflare, follow these steps:

  1. On your Cloudflare console, under WAF, click on the firewall icon on Tools tab.
  2. List Marfeel’s crawlers IP addresses or in range format under the IP Access Rules. a. Enter the IP address b. Choose Whitelistas the action to apply c. Choose the website where to apply whitelisting rules
  3. Click add
  4. Repeat for each IP

All Marfeel crawler IP addresses offer a reverse DNS lookup pointing to crawler.marfeel.com. Use this to verify that a bot claiming to be a Marfeel crawler is authentic:

  1. Run a reverse DNS lookup on the accessing IP address from your logs, using the host command.
  2. Verify that the domain name is crawler.marfeel.com.
  3. Run a forward DNS lookup on the domain name retrieved in step 1 using the host command on the retrieved domain name.
  4. Verify that it is the same as the original accessing IP address from your logs.
$ host 162.55.235.182
182.235.55.162.in-addr.arpa domain name pointer crawler.marfeel.com.
$ host crawler.marfeel.com
crawler.marfeel.com is an alias for vampiresquid.het.mrf.io.
vampiresquid.het.mrf.io has address 162.55.235.186
vampiresquid.het.mrf.io has address 162.55.235.182
What user agents do Marfeel crawlers use?

Marfeel uses several crawlers, each with a distinct user agent: the Editorial Crawler identifies as NewsRoom.BI/0.1, the Audits crawler uses mrfCompass-Booldog and mrfCompass-Marshall, the Flowcards crawler uses mrfCompass-Jukebox, the Social crawler uses mrfCompass-Social, the Amplify crawler uses mrfCompass-Amplify, and the Previewer uses mrfCompass-Preview.

How do I whitelist Marfeel crawlers on Cloudflare?

In your Cloudflare console, go to WAF and open the Tools tab. Add each Marfeel crawler IP address (available at hub.marfeel.com/crawler-ips.json) under IP Access Rules, choose Whitelist as the action, select the website, and click Add. Repeat for each IP.

How can I verify that a bot is a genuine Marfeel crawler?

Run a reverse DNS lookup on the accessing IP address using the host command. Verify that the domain name resolves to crawler.marfeel.com. Then run a forward DNS lookup on that domain and confirm it matches the original IP address from your logs.