# WC library

The WhiteCollar library functions can be used either as modifier functions, or anywhere in the whiteCollar where they are useful.

Whenever you need a custom behavior not covered by the library yet, evaluate if it makes sense to add it for everybody.

The library code is in Gutenberg (opens new window).

# getAllProcessedNodesFromGetters

WC.getAllProcessedNodesFromGetters(Array getters, Array globalModifiers)

Receives an array of getter functions and an array of modifiers. Applies the modifiers to all items selected by getters.

# limitArticles

WC.limitArticles(Number limit)

Limits the number of the extracted articles according to the specified number.

# filterEqConsecutiveArticles

WC.filterEqConsecutiveArticles(String propertyName [default 'uri'])

Filters all the articles which are consecutives and have the same property value as the one specified as the argument.

If no argument is specified, it defaults to URI.

# uniqueBy

WC.uniqueBy(String propertyName)

Filters all the articles which have the same property value as the one specified as the argument.

If no argument is specified, it defaults to URI.

# applyBlacklist

WC.applyBlacklist(Array blacklistedStrings)

Filters out all the items by URI which contains any of the strings specified in the blacklist Array.

# contains

WC.contains(Array or String container)(Any content)

The curried function (opens new window) checks if content is contained by container.

For example, to check if the URI is part of the "awesome" subdomain:

WC.contains(item.uri)("awesome.");

# getHref

WC.getHref(Node node)

Returns the attribute href from the node.

# getSrc

WC.getSrc(Node node)

Returns the attribute src from the node.

# getAlt

WC.getAlt(Node node)

Returns the attribute alt from the node, if not found fallbacks to title attribute.

# getLazyImg

WC.getLazyImg(String selector, String attribute, String altAttribute)

Retrieves the image for lazy-loading images. Includes source and alt.

The selector is used as argument in a call to qs.

The attribute is the element attribute where the image source is stored. It defaults to data-src.

The altAttribute is the element attribute where the image alt is stored. Defaults to alt, then fallbacks to title.

# getSectionName

WC.getSectionName(function extractor)

Applies the extractor function to the current page URI, and returns a String.

If no argument is specified, the default extractor takes the first string of the pathname.

For example:

// current page URI: "http://example.tenant.com/sports/and/others
var sectionName = WC.getSectionName(); // sectioName will be "sports"

# getPageNumber

WC.getPageNumber(function extractor)

Applies the extractor function to the current page URI, and returns a String.

If no argument is specified, the default extractor takes the number from '/page/(number)'. If that pattern doesn't exist in the current URI, it returns 1.

For example:

// current page URI: "http://example.tenant.com/home/page/3
var pageNumber = WC.getPageNumber(); // pageNumber will be 3

# convertToArray

WC.convertToArray(Object arrayLike)

Converts array-like objects to Array.

# filterFalsy

WC.filterFalsy(Array content)

Filters out all items of an array which are falsy values ('', 0, NaN, null...).

# getText

WC.getText(Node node)

Returns the text content of a node, trimmed.

# qs

WC.qs(String selector, Node node[default document])

Equivalent to the standard querySelector, starting from node if provided, or from document otherwise. Returns the first matching element.

# qsAll

WC.qsAll()

Equivalent to the standard querySelector, starting from node if provided, or from document otherwise. Returns an array of all matching elements.

# merge

WC.merge(obj1, obj2)

Equivalent to the Rambda merge R.merge.

Creates a new object with the own properties of the first object merged with the own properties of the second object. If a key exists in both objects, the value from the second object will be used.

# create3piWidget

WC.create3piWidget(String className, Object options)

Creates a 3pi widget in Mosaic. The parameters are the following:

  • className: It's the class that will have the parent element of the widget iframe.
  • options: It's a Json object with different parameters:
    • src: widget source. The same as in the widgets.jsonfile. This is mandatory.
    • selector: widget selector. The same as in the widgets.json. This is also mandatory and must be a class, not an id.
    • width: iframe width. If there is no width, the default value will be 100%.
    • height: iframe height. If there is no width, the default value will be auto.
    • params: Json object with the parameters needed.

# getBalcon (deprecated)

Builds the pocket object for a content group. This method is deprecated, in favour of only defining a key in the whiteCollar pocket.

Use the layout descriptor for the layout-related configuration of the content group.

# getBalconKey (only puppeteer ripper)

Returns the key for the specified content group. It uses the deprecated logic of getBalcon to retrieve the content group from the node and returns the previous balcon.name. This method is compatible with the layout descriptor approach.

Example:

pocket: (node) => {
  key: WC.getBalconKey(node, '.news', 'h1')
}

# getBestSrcFromSrcSet

WC.getBestSrcFromSrcSet(String srcset)

Returns the best src of an image closer to 480px of width.

Example:

"elva-fairy-320w.jpg 320w, elva-fairy-480w.jpg 480w, elva-fairy-800w.jpg 800w" -> it will return elva-fairy-480w.jpg

  • srcset: It's the srcset string from the img.

# difference

WC.difference(Array first, Array second)

From R.difference (opens new window).

Finds the set (i.e. no duplicates) of all elements in the first list not contained in the second list. Objects and Arrays are compared in terms of value equality, not reference equality.

WC.difference([1,2,3,4], [7,6,5,4,3]); //=> [1,2]

# notExtractableIf

WC.notExtractableIf(Function checker)

Runs the checker function for each item and marks them as NOT extractable if the function returns true.

Should be used inside the modifiers array.

    ...
    modifiers: [WC.notExtractableIf(function (item) {
        return item.title.indexOf('test') > -1;
    }]
    ...

# getUniqueUri

WC.getUniqueUri()

Returns a uri made from the current href and a random query parameter. https://page.com/?marfeelqp=1234567

Useful when you want to create a dummy item for an iframe/widget

# getInnerHtml

Dangerous

We should be very sure about the html that we're extracting as injecting it can break the whole section page.

WC.getInnerHtml(HTMLNode node)

Shortcut function to get the innerHTML from a node.

# extractIframe (depracated)

WC.extractIframe(Object pocket, String title)

Returns the selectors to extract an iframe for WC. We should use layoutDescriptor to insert iframes into mosaic.

    ...
    {
        title: 'h2 > a',
        uri: 'h2 > a',
        ...
    },
    extractIframe({}, 'Top results of Tennis table'),
    {
        title: '.title',
        uri: '.link > a',
        ...
    },
    ...

# cleanBalconName

WC.cleanBalconName(String name)

Cleans not valid characters from the name. The only allowed characters are a-z and 0-9

# capitalize

WC.capitalize(String string)

Transforms the first character of a string to upper case.

# containsClass

WC.containsClass(HTMLNode element, String className)

Returns true if the element class list contains the className defined.

# containsId

WC.containsId(HTMLNode element, String id)

Returns true if the element.id is equal to the id.

# encodeToLatinAlfabet

WC.encodeToLatinAlfabet(String string)

Applies the encodeUri (opens new window) javascript function and then applies cleanBalconName.

Very useful for content group names that have tildes or asiatic characters.

# prefferedImgAttribute (only puppeteer ripper)

modifiers: [WC.prefferedImgAttribute(attributeName)]

Prioritizes the attribute specified when obtaining the image using the default media extractor.

Example:

selector: '.news-article'
extractors: {
  uri: 'h2 > a',
  title: 'h2 > a',
  media: 'img'
},
modifiers: [WC.prefferedImgAttribute('data-image-url')]

It will try to get the media from the attribute data-image-url from the IMG node selected.

Call to contributions

Not all WC library methods are described yet.

Open an issue on MarfeelDocs (opens new window) if you know what a function does, or directly open a PR to update this file (opens new window).

Missing functions:

  • createGroups,
  • removeUnwantedNodes,
  • createSectionStaticContent,
  • createSectionHtmlContent