# Section pagination

The section pagination feature replaces a section's feed definition, respecting the tenant's original website pagination.

It fits perfectly on any section paginated by the tenant, including the usual dynamic tags or author. Pagination improves SEO, as it leads to an increased crawling of sections pages (that end up in more articles crawled).

Infinite scroll pairs with section pagination in order to improve user engagement, by transparently appending more and more content on the section while scrolling.

Infinite scroll is not visible by search engines crawlers, therefore not affecting the SEO impact of pagination.

We must answer these two questions to paginate a section:

  1. Before the extraction, at page request time: is the requested url a section page? If so, from which section?
  2. At section extraction time: what are next and previous pages of this section?

Once we answer these questions, we can paginate anything.

# Required configuration

# Page pattern

We need a pagePattern to recognise a section page. A pagePattern at Marfeel is a regular expression that reflects the way the tenant implements pagination.

For instance, if the tenant appends /page/2/ to a section url to say it is page #2, the pagePattern is "/page/([0-9]+)/".

  • "[0-9]+" makes it work on any page number,
  • the parenthesis allow to capture the actual number.

Once a page is identified as a section, rippers can look for clues in the DOM while extracting the section.

Selectors such as [.next, .previous] help the ripper finding the right tags: the ones with an href with link to previous/next pages.

# Standard Example

Let's render tenant.com/sport/page/42.

Is the requested url a section page? if so, from which section?

Yes, it is a page of the section sport, since it matches sport pagePattern: "tenant.com/sport/page/([0-9]+)".

TIP

page/number is a standard way of building pagination URLs, so it can be detected automatically by Gutenberg.

Let's load this section page.

What are next/previous pages of this section?

When we extract sport/page/42, we find the pagination links:

previous = tenant.com/sport/page/41 ; next = tenant.com/sport/page/43

We can now render the pagination with appropriate links [ 41 | 42 | 43 ]

Infinite scroll kicks in and appends the content of these links automatically while the user scrolls down.

# Custom Example

Let's render tenant.com/news/10_0_0_0_0_3.

Is the requested url a section page? if so, from which section?

Without any configuration in the definition.json, this page appears to be an article.

If the news section has the pattern "tenant.com/news/10_0_0_0_0([0-9])", it can be recognised as a section.

What are next/previous pages of this section?

If the HTML tags for pagination are not standard, like this:

<div class="paginate">
  <a class="prev2" href="/news/10_0_0_0_0_2">prev</a>
  <a href="/news/10">1</a>
  <a href="/news/10_0_0_0_0_2">2</a>
  <a href="/news/10_0_0_0_0_3">3</a>
  <a href="/news/10_0_0_0_0_4">4</a>
  <a href="/news/10_0_0_0_0_5">5</a>
  <a class="next2" href="/news/10_0_0_0_0_4">next</a>
</div>

We need the tags to be configured in the section's ripper file, with .paginate>.prev2 and .paginate>.next2 to identify the previous and next pages.

# Feature toggles

Enable pagination with the feature flag renderSectionPagination (disabled by default).

Infinite scroll is controlled by the flag lazyPagination. It is enabled by default, so it automatically kicks in when pagination is enabled.

# Configuration

Most tenants are paginated with no configuration at all.

If pagination doens't work by default on a site, it might be due to unrecognised tags or special URL patterns.

# Customize extracted pages

Marfeel expects rel tags to detect pagination.

This tags are <link rel="prev"> and <link rel="next"> or <a rel="prev"> and <a rel="next">.

Each tag must contain an href property with the appropriate absolute url.

Section pages are extracted depending on those tags, and they can be configured in definition.json.

This configuration is propagated and used by default in both JSOUPRipper and WhiteCollar.

{
  ...
  "configuration" : {
    "pageSelectors" : ".pagination li:first-child a | .pagination li:last-child a"
  },
  ...
}

Note that both prev and next selectors are specified in the same property separated by a pipe (|) character as follows:

"pageSelectors" : "${prevSelector} | ${nextSelector}"

# Show section pagination

In this section, "showing" section pagination is equivalent to infinite scroll, if it is active.

The most common pagination pattern is: /page/([0-9]+)/. It means that for a given section url: http://example.com/desporte/, the next page would be http://example.com/desporte/page/2/

This pattern is supported by default, no configuration needed.

Other similar page patterns such as /([0-9]+)/ are automatically configured at tenant scaffolding time by MarfeelAlfred.

If MarfeelAlfred has not detected the page patterns adequately, configure it in the definition.json file.

Affect all sections at once by declaring a pagePattern in the root configuration object of definition.json:




 




{
  ...,
  "configuration" : {
    "pagePattern": "/page/([0-9]+)/"
  },
  ...
}

Prefer this method if most sections are paginated, with the same pattern.

When pagePattern is configured globally in the definition.json, there's no need to add the .*. This is controlled from the backend and adds it when necessary, depending on if it's a default or a dynamic section.

When it's configured for only one section, add the .* when necessary.

If some of the tenant URL's don't use a / at the end and some others do, you can also try with the following pattern: /page/([0-9]+)/?.

Exclude individual sections with the enablePagination flag in their specific configuration:

"enablePagination": "false"

By default, the home section is excluded. Include it with the same flag:

"enablePagination": "true"

When tenants don't count the first page of a section for their pagination numeration, the pageNumberStartsFromZero flag is required in the definition.json configuration. Eg: "Page 1" is test.com/games/ and "Page 2" is test.com/games/page/1/.

"configuration": {
    "pageNumberStartsFromZero": "true"
}    

TIP

Section pagination rendering is now feature toggled for each tenant, so you need to activate the feature renderSectionPagination in order to show section pages.

For sites without a common pattern, set specific pagePatterns in the configuration of each section:















 
 
 


{
  "name" : "tag",
  "title" : "Tags",
  "type" : "DYNAMIC",
  "uri" : "/tag/**",
  "configuration" : {
    "titlePattern" : "/tag/(.*)"
  },
  "alibabaDefinition" : {
    "configuration": {
      "feedRipper" : "jsoupRipper",
      "jsoupSelectors" : "index/src/jsoup/tag.properties"
    }
  },
  "pagePatterns": [{
    "pattern": ".*/page/([0-9]+)"
  }]
}

pagePatterns is an array: define as many patterns as required for a section:

{
  "pagePatterns": [
    {
      "pattern": ".*/page/([0-9]+)"
    },
    {
      "pattern": ".*/pagina/([0-9]+)"
    }
  ]
}

A pattern can also hold its own alibabaDefinition object:





 









{
  "pagePatterns": [
    {
      "pattern": "*/page/([0-9]+)",
      "alibabaDefinition" : {
        "configuration" : {
          "feedRipper" : "jsoupRipper",
          "jsoupSelectors" : "index/src/jsoup/tag.properties"
        }
      }
    }
  ]
}

# UI Customization

Rendering the section pagination is not limited to the next and previous pages. We can show as many section pages we've been able to discover during the extraction.

Use the ui.jsonto determine the maximum of pages the paginated section should show:

{
  "siteStructure": {
    "pagination": {
      "numPages": 5
    }
  }
}

The default is 7:

  • 3 previous pages
  • The current page
  • 3 next pages

# Local and preview branch behaviour

As section pagination uses the absolute links provided by tenant's html, we need to take in consideration that clicking a section page will take us to the production environment.