# JSOUP

Content groups in JSOUP

At the moment, Jsoup ripper does not support content groups.

TIP

JSOUP performs better than WhiteCollarRipper, therefore it should be used when possible. If a section doesn't need content groups, try extracting it using JSOUP.

To use the Jsoup ripper for a specific section, use the feedRipper attribute in the section configuration.

"sectionDefinitions" : [ {
    "name" : "seo", --> the name of the section
    "title" : "Seo", --> the title of the section
    "feedDefinitions" : [ {
      "uri" : "https://example.com/seo",
      "alibabaDefinition" : {
        "configuration" : {
          "feedRipper" : "jsoupRipper",
          "jsoupSelectors" : "index/src/jsoup/seo.properties"
        }
      }
    } ]
  }]

jsoupSelectors defines the path of the file containing the articles selectors. This file must be in under src/jsoup/ in the site code repository.

TIP

The properties file extension (opens new window) is mainly used in Java to store configurable parameters

The .properties file functions as a whitecollar where all the selectors for the section need to be identified and defined to be extracted.

The following is a usage example showcased in the previous step above:

ARTICLES=article, .article
TITLE=.title
URI=a
IMG=img
DATE=date
AUTHOR=.author
EXCERPT=.excerpt
SUBTITLE=.subtitle

TIP

As you can see, multiple selectors can be concatenated by commas.

# Static content with JSOUP

We can retrieve static element using the the special selector HTML_ARTICLES in our .properties file.

HTML_ARTICLES=.static-content

HTML_ARTICLES

This is a special extractor. It returns all the HTML inside the selector as static content. Use it in the layout descriptor with the value jsoupWidget.

Layout descriptor example for this section:

{
    "layouts": [
        {
            "name": "newspaper/pill",
            "key": "jsoupWidget"
        },
        "newspaper/thumb"
    ]
}

In this case, the static content is placed on top of the section, and all the articles are after it.