# Metadata

Deprecation

Metadata providers are deprecated. New implementations should use Middleware instead.

To configure analytics providers, ad servers, or any other third party integrations, Marfeel uses metadata, detected in a tenants's original site.

Every time there's a script, Marfeel scans it for active metadata. Marfeel then detects, parses, extracts, and stores this information so it can be used client-side, and replicated in the Marfeel solution with the same logic the publisher has in their website.

At extraction time, the Nashorn javascript engine (opens new window) is responsible for executing detected scripts, and extract the required metadata properties.

This article is focused on how to extract metadata. Using extracted metadata depends on the page (section or article), and the purpose:

Ripper compatibility

For sections, the metadataproviders object in the providers.json file only has effect in combination with the whiteCollar ripper.

Sections extracted with other rippers don't support metadata extraction.

Complete repository structure for metadata:

www.example.com
├─── index
├─── providers.json
└─── providers
    ├─── metadataProviders
    │   ├─── ExampleMetadataDetector.js
    │   └─── ExampleAdTargetingDetector.js
    └─── test
        ├─── fixtures.json
        └─── resources
            ├─── ExampleMetadataDetector.html
            └─── ExampleAdTargetingDetector.html

# Entry point



 

www.example.com
├─── index
└─── providers.json

The entry point for the metadata providers is - as with all Marfeel Nashorn implementations - the providers.json file:

"metadataProviders":{
    "details": { //relative to the mediagroup root
        "www.example.com/providers/metadataProviders/ExampleMetadataDetector.js": [],
        "marfeel/providers/metadataProviders/AwesomeMetadataDetector.js": []
    },
    "mosaic": { //relative to the tenant folder (www.example.com/)
        "providers/metadataProviders/ExampleAdTargetingDetector.js": []
    }
}

The metadataProviders object can contain 2 objects:

  • details: for article pages metadata detection
  • mosaic: for section pages metadata detection

A detector always has the file path as key and by default, an empty array as value. Do keep all the detectors in the same folder, inside providers/metadataProviders.

Marfeel tenant Detectors

The Marfeel tenant holds common metadata detectors. As with any Marfeel extension, first check if a detector has already been implemented (opens new window).

Relative path

The key is a relative path but it should not contain any folder navigation: ../. If you need the same detector for several tenants of the same Media Group, move the detector to the Marfeel tenant.

# Detector script






 
 

www.example.com
├─── index
├─── providers.json
└─── providers
    └─── metadataProviders
        ├─── ExampleMetadataDetector.js
        └─── ExampleAdTargetingDetector.js

Create one script per detector. Each script is a vanilla javascript, with ES5 syntax.

var ExampleMetadataDetector = function() {};

ExampleMetadataDetector.prototype = {
    className: 'mrf-metadataTargetings',
    name: 'ExampleMetadataDetector',
    type: 'AD_TARGETINGS',
    propertyType: 'someAdData',

    (optional) void prepareEnvironment(context, articleUri),

    boolean isCandidate(scriptContent, scriptAttributes),

    JSObject getMetadata(context, scriptContent, articleUri)

};

# Properties

  • className: string with the marfeel class name that the result script will have. Choose something specific for the current detector.
  • name: string. Must be the same as the object name, and as the filename.
  • type: string. Must be one of these (opens new window)
  • propertyType: string. The name of the resulting property.

With the previous script example, the result would be:

 


 
 






<script class="mrf-metadataTargetings" type="application/ld+json">
  {
    "@context":"http://schema.org",
    "@type":"AdTargetings",
    "someAdData": {
      ...,
      "tag":["cats","dogs"]
    }
  }
</script>

# prepareEnvironment function

Optional function without a return value.

As detector scripts are run server-side, they don't have a window object available, nor all the browser APIs. Use the context variable as the global scope, to mock the objects and functions needed for the script to work.

TIP

This method runs only once for each page.

# isCandidate function

Mandatory function returning a boolean value.

The function receives the script content as argument, and should return true if that script contains relevant metadata for the detector.

The second argument, scriptAttributes, contains a map of all the attributes of the script tag.

It is particularly useful for external scripts, to check their path, if it is relevant:

return scriptAttributes.getValue('src') === 'candidateURL'

TIP

This method can be called several times: once for every script detected on a page.

# getMetadata function

This function is called once per script identified as candidate. There is no guarantee on the order of execution.

This method should always return an object. It can be empty, or containing the metadata key-value pairs identified from the script.

# Testing a metadata detector

All metadata detectors can be tested with fixtures.

All the test-related files are inside a a test folder inside the providers folder.

# fixtures.json









 

www.example.com
├─── index
├─── providers.json
└─── providers
    ├─── metadataProviders
    │   ├─── ExampleMetadataDetector.js
    │   └─── ExampleAdTargetingDetector.js
    └─── test
        └─── fixtures.json
{
  "www.example.com/providers/metadataProviders/ExampleMetadataDetector.js":[
    {
      "in": "ExampleMetadataDetector.html",
      "out": {
        "metadata": "{\"polopolyid\":\"1.3054026\",\"nome_edizione\":\"\",\"page_type\":\"articolo\",\"parent\":\"\"}"
      }
    }
  ]
}

This file should have the same structure as your original providers.json, except the arrays are now containing at least two objects:

  • in: the name of the html file used to test

  • out: the expected metadata object. The value is a string representation of JSON, remember to escape all quotes!

  • inExtraFiles: to test detectors of external scripts. It is an object mapping the path of real scripts to a test script, stored next to the HTML:

"inExtraFiles": {
   "http://externalscript.com": "testscriptAdDetector.js"
}

TIP

To execute the tests in local, refer to this article on how to test providers.

# The test HTML document











 
 

www.example.com
├─── index
├─── providers.json
└─── providers
    ├─── metadataProviders
    │   ├─── ExampleMetadataDetector.js
    │   └─── ExampleAdTargetingDetector.js
    └─── test
        ├─── fixtures.json
        └─── resources
            ├─── ExampleMetadataDetector.html
            └─── ExampleAdTargetingDetector.html

Next to the fixtures.json, create a resources folder to hold all the HTML documents necessary for testing. We recommend one file per metadata detector, with the same name as the detector, to make discovery easier.

This HTML file should be valid HTML, and contain only the script relevant to the current metadata.

# Implementation examples