Bad Robot!** Sneaky Bots in mPulse Data – For Retailers

**With apologies to JJ Abrams

mPulse collects massive amounts of RUM data every day: Navigations, SPA transitions, XHR calls, and page Resources to name some of the content that gets included. And it is all collected from within real browsers.

However, real browser can be light years removed from real visitor. mPulse tries to eliminate as many automated visits as possible, but blacklists can’t move as fast as automators, a term to describe those entities that develop and release scripted browsers into the world for purposes that range from innocuous to malevolent.

The hardest thing to detect, unless you have a system such as Akamai Bot Manager, are the patterns that might indicate that traffic is unusual, especially if it is not enough to trigger a massive spike in the number of beacons being collected by mPulse. Those patterns to exist, and they can be easily spotted, if you focus on a few key areas.

Watch your product pages

A favorite target for one species on bot is the product page (or PDP). This species, the price scraper, is designed to collect volumes of data on pricing to provide to competitors looking for some advantage in a hyper-competitive world. While relatively innocuous, the volume of these could slow the experience of real visitors to the site, especially if they don’t work as expected.

An example of this happened a few years ago when a retailer saw a massive spike in requests (to the tune of thousands of requests per day!) for a single product from a single location and browser version. This negatively affected the overall performance of the site as a whole.

Watch for Linux

Linux is a popular operating system…for servers. However, except in some very specific cases, it should not appear as one of the top 5 operating systems on your site. When this happens, treat it as a red flag event. 

In the instance used here, filtering the data to Linux only quickly showed that this was a scraper bot, targeting this customer’s PDP content. And while this bot accounted for only 4% of the PDP beacons for that day, the performance of these requests slightly skewed the median of the page group, increasing it from 2.03s (no Linux) to 2.09s (with Linux). This was due to a 20.66s median for the Linux bots.

These bots were using contemporary versions of popular browsers — Chrome/70 and Firefox/66, the release versions of these browsers at the time data was collected — making them indistinguishable from real traffic. The only dimension that flagged these as bots was Linux OS.

In this example from another retailer, the Linux presence on the PDP is even greater, comprising 27% of the overall traffic, a volume orders of magnitude greater than in the overall real user population.

Watch for Old Browser Versions

Another factor is that bots don’t always upgrade to the latest version of the browser at the same rate as real users, so finding a population of older browser versions among the data can provide a clear indicator of:

  1. A population of real-users that use old browsers due to corporate restrictions on software
  2. A population of bots that have not used the auto-upgrade feature to move to the latest version.

In the example at the right, the retailer sees its visitors upgrading to Chrome on a regular cycle, with the older versions aging out as expected based on the auto-upgrade feature in Chrome.

In the next example, the pattern is very different: Chrome 71 did not age out as would be expected using the auto-upgrade feature. Where Retailer 1 saw Chrome/71 age out by February 19, Retailer 2 saw the population of Chrome/71 stabilize at a level that was much higher than can be explained by residual, non-upgraded visitors.

But it’s not just Chrome that is affected; Firefox can also be used to create bot traffic. In the example below, the largest populations of real-user Firefox versions that should appear are 65 and 68. In the data, however, Firefox/60 and Firefox/38 are present, in numbers far exceeding those of the real user visitors, a clear indicator of bot traffic

These bots will also negatively affect the  recorded performance of Firefox during this period, as the median performance for both Firefox/38 and Firefox/60 when visiting the site was above 80 seconds.

Bots Matter in RUM

As shown by the data from two customers above, bots matter in a number of ways:

  • They can skew your mPulse performance metrics in a way that could lead to incorrect conclusions being made about the performance of key performance groups
  • They can inflate the metrics from certain OS and Browser families in a way that could lead to incorrect assumptions being made about the composition of the visitor population
  • They can cost the customer money, not just in inflated mPulse beacon counts, but in higher CDN and bandwidth usage bills.

While it is impossible to isolate and eliminate/block/trim all of them from mPulse data, watching for some of these signals can help organizations realize that bots could be a larger issue than they think, requiring more effective remediation than simple blacklists and filter rules.

Performance Trends for ???? – Smarter Systems

IF-repair by Yo Mostro (Flickr)Most of the trending items that I have discussed in the last two weeks are things that can be done today, problems that we are aware of and know need to be resolved. One item on my trend list, the appearance of smarter performance measurement systems, is something the WPO industry may have to wait a few years to appear.
A smarter performance measurement system is one that can learn what, when, and from where items need to be monitored by analyzing the behavior of your customers/employees and your systems. A hypothetical scenario of a smarter performance measurement system at work would be in the connection between RUM and synthetic monitoring. All of the professionals in WPO claim that these must be used together, but the actual configuration relies on humans to deliver the advantages that come from these systems. If RUM/analytics know where your customers are, what they do, and when they do it, then why can’t these same systems deploy (maybe even create and deploy!) synthetic tests to those regions automatically to capture detailed diagnostic data?
Why do measurement systems rely on us to manually configure the defaults for measurements? Why can’t we take a survey when we start with a system (and then every month or so after that) that helps the system determine the what/when/where/why/how of data and information we are looking to collect and have the system create a set of test deployment defaults and information displays that match our requirements?
The list of questions goes on, but they don’t have to. Measurement systems have, for too long, been built to rely on expert humans to configure and interpret results. Now we have a chance to step back and ask “If we built a performance measurement system for the a non-expert, what would it look like?”
More data isn’t the goal of performance measurement systems – more information is what we want.

Managing Performance Measurement: Who uses this stuff anyway?

Clogged Pipe - staale.skaland - FLICKROne of the least glamorous parts of managing performance measurement data is the time I have to take every month to wade through my measurements and decide which stay on and which get shut off. Since I’m the only person who uses my measurement account, this process usually takes less than 10 minutes, but can take longer if I’ve ignored it for too long.
With large organizations that are collecting data on multiple platforms, this process may be more involved. By the time you look at the account, the tests have likely accumulated for months and years, collecting data that no one looks at or cares about. They remain active only because no one owns the test and can ask to disable it.
What can you do to prevent this? Adding some measurement management tasks to your calendar will help prevent Performance Cruft from clogging your information pipes.

  1. Define who can create measurements. When you examine account permissions on your measurement systems, do you find that far more people than are necessary (YMMV on this number) have measurement creation privileges? If so, why? If someone should not have the ability to create new measurements, then take the permissions away. Defining a measurement change policy that spells out how measurements get added will help you reduce the amount of cruft in your measurement system.
  2. Create no measurement without an owner. This one is relatively easy – no new measurement gets added to or maintained on any measurement system without having one or more names attached to it. Making people take responsibility for the data being collected helps you with future validations and, if your system is set up this way, with assigning measurement cost to specific team budgets. It’s likely that management will make this doubly enforceable by assigning the cost of any measurement that has no owner to the performance team.
  3. Set measurement expiry dates. If a measurement will be absolutely critical during  only a specific time range, then only run the measurement for that time. There is no sense collecting data for any longer than is necessary as you have likely either stored or saved the data you need from that time for future analysis or comparisons.
  4. Validate measurement usage monthly or quarterly. Once names have been associated to measurements, the next step is to meet with all of the stakeholders monthly or quarterly to ensure that the measurements are still meaningful to their owners. Without a program of continuous follow-through, it will take little time for the system to get clogged again.
  5. Cull aggressively. If a measurement has no owner or is no longer meaningful to its owners, disable it immediately. Keep the data, but stop the collection. If it has no value to the organization, no one will miss it. If stopping the data leads to much screaming and yelling, assign the measurement to those people and reactivate.

Managing data collection is not the sexiest part of the web performance world, but admitting you have a data collection cruft problem is the first step along the path of effective measurement management.