# Overview

Marfeel uses a crawler to browse our tenants' sites for new content to extract and be updated in their Marfeel site. The benefits of using a crawler are:

  • No integration needed on the customer's side
  • Automated process right out-of-the-box that seamlessly extracts the content directly from a tenant's website
  • CMS agnostic

The original HTML of articles and section pages are processed, to remove unnecessary elements like navigation bars, footers, and ad placements. This process also optimizes metadata and prepares assets like images to be lazy-loaded.

By default, all dynamic content in JavaScript is ignored so as not to extract any advertisements or interactive elements, and Marfeel crawlers don't count as a visit by executing any JS code. It is possible later on to retrieve scripts, and integrate them into a Marfeel site.

Marfeel's version of the HTML is then hosted and delivered from Marfeel's infrastructure.

One main JSON in a tenant's site repository helps us control which content is transformed: the definition.json file.

All pages are also going through common transformers and detectors.

Article pages are extracted with the Boilerpipe.

Section pages are extracted with a ripper, which can be JSOUP, whiteCollar or MarfeelPress.

Content "freshness" is normally managed automatically, and can also be forced through the invalidation API.