# Glossary

This glossary puts in relation content-platform terms at Marfeel.

Reach out to add more definitions!

# Alibaba

MarfeelAlibaba is the section extraction orchestrator. It selects the configured Ripper and Extractor to retrieve section information from the target tenant.

By default, it uses whiteCollarRipper and boilerpipeExtractor.

# AMP (opens new window)

AMP stands for Accelerated Mobile Pages. It is an open standard framework for any publisher to have pages load quickly on mobile devices.

At Marfeel, AMP pages are generated automatically using MarfeelJigsaw.

# Boilerpipe

Boilerpipe is the component in charge of processing Article pages extraction. It includes Fetchers, Extractors, and SAXProcessors.

# Details

Details is the Marfeel term for article pages.

# DocumentModifiers

DocumentModifiers allow further transformations to HTML elements. They target specific elements and have multiple purposes.

Some of the usages are collapsing images into galleries, remove unwanted content from the article...

TIP

Find the details of DocumentModifiers in the Article extraction article.

# Extractor

Extractors are components in charge of information retrieval. Depending on the context, it can refer to different Marfeel components.

# Extractor in WhiteCollar

In a WhiteCollar configuration file, extractor is where the configuration for article retrieval is set.

# Section extractor

Section extraction is handled by MarfeelAlibaba.

# Article extractor

MarfeelExtractor is a component within Boilerpipe. It retrieves the text content in article pages.

TIP

BoilerpipeExtractor is the default extractor for non-MarfeelPress tenants.

BoilerpipePressExtractor is the default extractor configuration for tenants using MarfeelPress.

# Provider extractor

Designed to automatically detect providers in tenant pages, they are created along with the provider implementation.

TIP

Learn the details in its dedicated article

# Metadata extractor

A metadata extractor retrieves information from the tenant's page, either mosaic or details, to pass it to widgets or ad servers.

# Fetcher

Fetchers retrieve content from the tenant's site. The content is then processed by MarfeelExtractor.

TIP

Check the Article pages extraction article for a complete picture of the article extraction process.

# Gutenberg

Gutenberg (opens new window) is Marfeel's backend. A monorepo (opens new window) that contains core components related to content extraction, its processing and finally serving the Marfeelized content. It also includes the Marfeel Insight application.

# Invalidation

Invalidation at Marfeel refers to the process of refreshing content. This includes triggering the process at the right time, content extraction, its processing and refreshing the cache layers. The invalidation is finished once the new content lands in the production environment.

# Jigsaw

MarfeelJigsaw (opens new window) is a library that transforms Marfeel HTML into AMP compliant HTML (opens new window) using filters and XSL transformations (opens new window).

# Mosaic

Mosaic is the Marfeel term for section pages.

# Ripper

A Ripper is the component of MarfeelAlibaba in charge of retrieving the necessary information to populate section pages in Marfeel. There are several ripper implementations : whitecollarRipper, PuppeteerRipper, JsoupRipper, MPressRipper...

These implementations can be executed on Gutenberg or the MarfeelMRippers depending on active feature toggles. For example, this one controls JsoupRipper microservice execution.

Ripper changes deploy

As Rippers implementations are also used on the MarfeelMRippers microservice, it is important to update the microservice after deploying Gutenberg's changes. For any change to a ripper in Gutenberg, contact the Content Platform chapter to receive instructions regarding the microservices shuttle.

# SAXProcessors

SAXProcessors are processing tools in charge of detecting and modifying HTML elements during article extraction. They process all the images, media, commenting system... replacing the necessary elements to produce the Marfeelized version of the article.

TIP

There are two SAXProcessors in Marfeel, ImageDocumentSAXProcessor and HTMLDocumentSAXProcessor.

Check their behaviour in detail in the Article extraction article.