Skip to content
Login Contact

Custom Fields for editorial metadata extraction

Custom Fields lets you extract custom metadata from your articles during crawling and use it across Amplify, Recommender, and editorial workflows. You define what to capture, where to store it, and how it syncs. Extracted data automatically flows through your entire publishing stack.

Custom Fields configuration panel showing extraction rules and sync options|690x431

Define extraction rules using XPath or JSONPath syntax to target specific elements in your page HTML or LD+JSON structured data. You can extract values from meta tags, HTML attributes, text content, and application/ld+json blocks embedded in the page.

Choose whether to capture attribute values or text content, store data as custom properties, tags, or system metadata, and set conditions for when extraction should happen. You control how extracted data behaves with existing values: overwrite, fill only empty fields, or append.

If the information exists in your server-side rendered HTML, Custom Fields captures it automatically during crawling. No manual tagging needed.

Custom Fields works with static HTML content available at crawl time. It cannot extract values from the JavaScript DataLayer or any data that requires JS execution to be present on the page.

For more details on how Marfeel’s editorial crawler works and what metadata it extracts by default, see How does Marfeel extract the metadata from articles.

Custom Fields extends this system by letting you define your own extraction rules on top of the standard metadata detection, using the same XPath and JSONPath syntax to pinpoint exactly what you need.

Navigate to Organization > Custom Fields and click + New Field.

The configuration form has two sections: What to Extract and Where to Save.

Custom Field creation form with expression input and save options|690x431

  1. Expression: Enter an XPath (starts with / or //) or JSONPath (starts with $) expression targeting the element you want to capture. A live preview lets you test the expression against any article URL before saving.
  2. Extract: Choose what to pull from the matched element:
  • Attribute Value: Extracts a specific attribute (e.g., content, src, href, alt)
  • Text Content: Extracts the text inside the matched element
SettingOptionsDescription
Save asCustom property, Tag, MRF MetadataDetermines how the extracted value is stored. Custom properties and tags flow to downstream products. MRF Metadata updates core article fields like mrf:authors, mrf:title or others
Name formatname or name:value or name:{value}For tags it can define the key-value structure. Use {value} to dynamically insert the extracted value.
If already existsOverwrite with new value, Fill only if empty, AppendControls behavior when the target field already has a value.
ConditionAlways save, PatternDetermines whether extraction runs unconditionally or only when a pattern is found.
Sync toAmplify, RecommenderSelect which downstream products receive the extracted data. You can enable both.
Use the preview URL field to test your expression against a real article before saving. Click the refresh icon to re-run the extraction and verify results. You can also use the Editorial Crawler Inspector to see exactly what the crawler extracts from any URL.
ExpressionExtractResult
//meta[@name="description"]/@contentAttribute ValueExtracts the article’s meta description
//figure/imgAttribute Value (src)Extracts the first figure image URL
//*[@id="post-642"]/div/div[2]/div/div[2]/p/imgAttribute Value (src)Extracts a specific image by DOM path
/html/head/meta[33]Attribute Value (content)Extracts a specific meta tag by position
$.@graph[?(@.@type=="NewsArticle")].thumbnailUrlExtracts thumbnailUrl from LD+JSON structured data
RestrictionLimit
Maximum custom field rules per account5
Tag value length128 characters
Custom property value length1,024 characters
Reserved fieldsmrf:canonical and mrf:cms_id cannot be overwritten
The mrf:canonical field is protected and cannot be overwritten by Custom Fields. This prevents accidental changes to article canonicalization.

Custom Fields opens up practical workflows across editorial, recommendations, and social distribution.

  • Content tiers and paywall status: Extract internal classifications that inform distribution strategies and performance analysis
  • Tracking parameters: Capture values that feed into analytics initiatives or integrate with third-party systems your newsroom relies on
  • Image alt text: Extract alt attributes from article images so downstream systems (like Recommender) can use proper alt text instead of falling back to the article title, improving PageSpeed scores
  • Article excerpts: Pull og:description or custom summary fields to make them available for newsletter rendering through Recommender layouts
  • Custom thumbnails: Extract thumbnailUrl or other image properties from structured data so Recommender can use publisher-preferred images instead of applying automated crops

Custom metadata automatically flows through the Recommender engine, making it available when building recommendation experiences. Combined with Recommender Layouts, custom properties become part of the recommendation data passed to your layout templates. You can display excerpts, use custom thumbnails, add premium badges, and more.

See Custom Fields in Recommender Layouts for template examples and implementation details.

Custom metadata extends to your social distribution workflow in Amplify in two ways: custom placeholders in post templates let you include extracted metadata in social post text, and Post Image settings let you choose which image property is used when sharing.

Amplify post template with custom field placeholders for social distribution|690x431

See Custom Fields in Amplify Layouts for template examples and implementation details.

What extraction methods does Custom Fields support?

Custom Fields supports XPath expressions (starting with / or //) to target HTML elements and JSONPath expressions (starting with $) to extract values from application/ld+json structured data blocks. You can extract attribute values or text content from matched elements.

What are the limits for Custom Fields?

Each account can have up to 5 custom field rules. Tag values are limited to 128 characters and custom property values to 1,024 characters. The mrf:canonical and mrf:cms_id fields are reserved and cannot be overwritten.

Can Custom Fields extract data from JavaScript or the DataLayer?

No. Custom Fields works only with static HTML content available at crawl time. It cannot extract values from the JavaScript DataLayer or any data that requires JS execution to be present on the page.