# PhantomJS

*Deprecated*

PhantomJS is being replaced by Puppeteer.

PhantomJS (opens new window) is a headless web browser scriptable with JavaScript. We use it at Marfeel to run the whiteCollar which extracts and organises section pages.

This article describes all the options to run it in local, and the error codes it can generate. Those error codes are visible both in a local environment during development, and in the documents visible in Kibana, under the invalidations index.

WARNING

PhantomJS doesn't support ES2015+ JavaScript.

# Command line

The script running PhantomJS is mrf-phantomjs, which implementation is in MarfeelXP/Jinks (opens new window).

If you're not using one of the options to specify the section to extract, this command always extracts the home section.

TIP

You can always run mrf-phantomjs -h from anywhere in the console, in order to see all the options.

Help output:

mrf-phantomjs [-h] [-e | -p SECTIONNUM | -n SECTIONNAME] [-b SUBFOLDER]
              [-g PAGENUM] [-w WCPATH] [-u URL] [-s USERAGENT]
              [-m METADATA] [-l] [-d] [-v]

optional arguments:
  -h, --help            show this help message and exit
  -e, --extract         extract manually
  -p SECTIONNUM, --sectionNum SECTIONNUM
                        the number of the section you want to extract
  -n SECTIONNAME, --sectionName SECTIONNAME
                        the name of the section you want to extract
  -b SUBFOLDER, --subfolder SUBFOLDER
                        the subfolder of the tenant
  -g PAGENUM, --pageNum PAGENUM
                        the number of the page you want to extract
  -w WCPATH, --wcPath WCPATH
                        the path of the whiteCollar script
  -u URL, --url URL     the url of the page you want to extract
  -s USERAGENT, --useragent USERAGENT
                        the useragent you want to use
  -m METADATA, --metadata METADATA
                        the metadataProviders to use
  -l, --legacy          use legacy alibaba in order to support apiOrder,
                        layout and disableSortByRelevance
  -d, --debug           debug in safari at localhost:9001
  -v, --verbose         show traceback logs

# Exit codes

To identify any issues or controlled errors that arise from this extraction, Marfeel has the following set of exit codes to pinpoint the error that occurred.

  • 3 fail loading page from arg[1] Indicates that the page cannot be loaded where Phantom is trying to extract items. That is, the URL is not valid.

  • 4 failinjectingWhiteCollar script The page is loading correctly, however the whiteCollar being used is not found.

  • 5 no items found on loaded page No items could be found on the page but the body size is greater than 1KB.

  • 7 Tenant whiteCollar script not present The whiteCollar has a bad configuration.

  • 8 Tenant whiteCollar script malformed The whiteCollar is not configured correctly.

  • 11 whiteCollar script failed to extract items whiteCollar failed to extract and format items for Marfeelization.

  • 12 MetadataProvider error Extracting active MetadataProviders failed or, MetadataProviders were not added in the metadataProvider JS files.

  • 13 Redirection response without redirectURL Phantomjs is unable handle the redirection.

  • 14 Fail loading request from page When a page request a resource it fails

  • 16 Redirection loop encountered The URL is a redirection loop, and will never end up having a valid 200 response.

  • 17 Client timeout Phantomjs could not retrieve the answer from the tenant because it took too long.

  • 18 Empty Body We throw this error when the body of the section retrieved has less than 1KB.

  • 19 Redirect found Exit code thrown when a redirect is found. Only used on consumer profile by now.

  • 255 Something unexpected that cannot be identified occurred.

# Debugging

In order to debug in details PhantomJS's behaviour, you can use Safari browser.

You might need to do so in the following cases:

  • Missing or malformed properties across the items(articles) extracted. For instance if we have a wrong uri selector no articles will appear.
  • To test in the real DOM phantom is working (without tenant's js execution)
  • To spot bugs in the functions created in the whiteCollar.
  1. From the console, plac eyourself inside the tenant's repository and execute phantom with the -d option:
cd www.example.com
mrf-phantomjs -d -pX<section number>
  1. Open safari browser and go to localhost:9000
  2. In the browser console, type: document.marfeel.alibaba.execute() in order to start the extraction process.

WARNING

At this point, the whiteCollar was already executed once, including any setup function it contains.

Running it again may create duplicated content.