Marfeel crawlers, user agents, and IP whitelisting
The Marfeel tracker sends only essential data with each request, keeping it lightweight and consuming minimal bandwidth. All additional page data is collected by dedicated crawlers.
Because many URLs can point to the same content, Marfeel crawlers only crawl canonical URLs and their amphtml counterparts. All URLs pointing to the same canonical are stored as aliases.
canonical and amphtml link rel elements are correctly set in all your content for Marfeel crawling to work perfectly.
Good citizen practices
Section titled “Good citizen practices”All Marfeel bots follow these rules to be responsible web citizens:
- Sites are not proactively crawled to identify new content. Marfeel only crawls URLs with active users.
- Marfeel limits the number of concurrent requests to each client’s servers. Re-crawls are rate limited to 1,000 requests every 5 minutes.
- All assets are centrally cached so different bots can reuse them without fetching them separately.
- Redirects are not followed unless necessary.
Marfeel crawlers
Section titled “Marfeel crawlers”Editorial crawler
Section titled “Editorial crawler”The Marfeel Editorial Crawler visits a URL and builds the editorial profile of a page using its metadata. It crawls URLs when they first receive a hit and every time the content is modified.
The user agent used by the editorial crawler is:
Mozilla/5.0 (compatible; NewsRoom.BI/0.1; +http://www.newsroom.bi/bot.html)If the editorial crawler fails to process a page correctly, the editorial crawling troubleshooting guide covers common causes and fixes.
Audits crawler
Section titled “Audits crawler”To detect structured data, meta tags, and other potential issues in client HTML, Marfeel periodically crawls all relevant URLs (those with traffic) using the following user agents:
Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA51N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Mobile Safari/537.36 (compatible; mrfCompass-Booldog/1.0)Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36(KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36 (compatible; mrfCompass-Booldog/1.0)Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; mrfCompass-Marshall/1.0)mrfCompass-Booldogcrawls each URL initially using a mobile user agent. If avary: User-Agentheader is received in the response, it crawls the URL using a desktop user agent as well.mrfCompass-Marshallcrawls all amphtml links found bymrfCompass-Booldog.
Flowcards crawler
Section titled “Flowcards crawler”Flowcards that load content directly from specific URLs use a dedicated bot to fetch that content. This bot identifies itself with the following user agent:
Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA51N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Mobile Safari/537.36 (compatible; mrfCompass-Jukebox/1.0)The crawling frequency respects the cache-control header returned by the server.
Social experiences
Section titled “Social experiences”Social experiences including Facebook, Twitter(X), Telegram, Reddit, and LinkedIn use the following user agent:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36 (compatible; mrfCompass-Social/1.0)These experiences and services use Marfeel’s public IPs when crawling your site.
Amplify
Section titled “Amplify”The Amplify service uses the following user agent to handle images in autoposting:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 (compatible; mrfCompass-Amplify/1.0)Previewer crawlers
Section titled “Previewer crawlers”// MobileMozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.6099.144 Mobile Safari/537.36 (compatible; mrfCompass-Preview/1.0/1.0) Googlebot
// DesktopMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 (compatible; mrfCompass-Preview/1.0/1.0) GooglebotWhitelisting Marfeel crawlers
Section titled “Whitelisting Marfeel crawlers”To ensure Marfeel can access and monitor your website, whitelist the crawler user agents listed above or whitelist the static IP addresses available here. Many hosting and CDN providers include WAF services that may consider Marfeel bots potentially malicious and block them.
Cloudflare
Section titled “Cloudflare”To whitelist Marfeel crawlers’ IPs on Cloudflare, follow these steps:
- On your Cloudflare console, under WAF, click on the firewall icon on Tools tab.
- List Marfeel’s crawlers IP addresses or in range format under the IP Access Rules.
a. Enter the IP address
b. Choose
Whitelistas the action to apply c. Choose the website where to apply whitelisting rules - Click add
- Repeat for each IP
Verifying Marfeel crawlers
Section titled “Verifying Marfeel crawlers”All Marfeel crawler IP addresses offer a reverse DNS lookup pointing to crawler.marfeel.com. Use this to verify that a bot claiming to be a Marfeel crawler is authentic:
- Run a reverse DNS lookup on the accessing IP address from your logs, using the
hostcommand. - Verify that the domain name is
crawler.marfeel.com. - Run a forward DNS lookup on the domain name retrieved in step 1 using the
hostcommand on the retrieved domain name. - Verify that it is the same as the original accessing IP address from your logs.
$ host 162.55.235.182182.235.55.162.in-addr.arpa domain name pointer crawler.marfeel.com.
$ host crawler.marfeel.comcrawler.marfeel.com is an alias for vampiresquid.het.mrf.io.vampiresquid.het.mrf.io has address 162.55.235.186vampiresquid.het.mrf.io has address 162.55.235.182What user agents do Marfeel crawlers use?
Marfeel uses several crawlers, each with a distinct user agent: the Editorial Crawler identifies as NewsRoom.BI/0.1, the Audits crawler uses mrfCompass-Booldog and mrfCompass-Marshall, the Flowcards crawler uses mrfCompass-Jukebox, the Social crawler uses mrfCompass-Social, the Amplify crawler uses mrfCompass-Amplify, and the Previewer uses mrfCompass-Preview.
How do I whitelist Marfeel crawlers on Cloudflare?
In your Cloudflare console, go to WAF and open the Tools tab. Add each Marfeel crawler IP address (available at hub.marfeel.com/crawler-ips.json) under IP Access Rules, choose Whitelist as the action, select the website, and click Add. Repeat for each IP.
How can I verify that a bot is a genuine Marfeel crawler?
Run a reverse DNS lookup on the accessing IP address using the host command. Verify that the domain name resolves to crawler.marfeel.com. Then run a forward DNS lookup on that domain and confirm it matches the original IP address from your logs.