One of the most important things to do with any sort of metric is to make sure that’s it’s measuring what you’re expecting it to measure, and that’s true in performance as well.
I remember a company years ago telling me that how good their site performed was judged based on a certain testing company’s industry benchmarks. There were many issues with those benchmarks, but one of the least obvious problems was that the browsers the tests were run on were very outdated. As a result, certain optimizations just weren’t going to do a thing for them. Resource hints were just emerging, and they would benefit from them greatly, but the browsers running the tests didn’t support resource hints, so until we got them focused on the right tools and proper metrics, shipping that optimization wouldn’t appear to move the needle at all.
Other times, metrics just tell us the wrong thing. Part of the reason for the emergence of more metrics focused on the user experience is because load time was simply not a very useful metric for a lot of companies. For some, sure, but for others, the point the load event fired was fairly arbitrary and completely detached from the user experience.
We still battle those issues today, but it can feel much worse.
I was helping a company analyze their Largest Contentful Paint.
We found that in some situations, a loading image was being flagged as the LCP. Chrome tries to filter out low-content images by looking at the bits per pixel and this one was right on the edge of the threshold. By compressing the image down only 790 more bytes, we could shave ~20s off the Largest Contentful Paint.
Now, the goal of LCP is to provide you a measurement of when the largest, and by association, ideally the most important, piece of content loads. That image was absolutely not the most important content. And while fixing the issue did improve LCP by quite a bit, it had absolutely no impact on the user experience.
That’s not to say that the optimization was useless. By tidying things up, the LCP metric became more stable, reliable and less noisy. It was a win, just not one that made any impact on user experience or business metrics—it was a win mostly for the metric itself.
That’s not really anything new with metrics, but the stakes do feel a bit different now because metrics like LCP are connected with search engine optimization nowadays. Sometimes it can feel like we’re working to appease the search engine rather than improve the user experience.
Don’t get me wrong, I love the core web vitals initiative (sounds so cool when you say “initiative”…like the Avengers or something) and I think it’s done far more good than bad for the web. But I do worry that we’ve gotten a little shallow with our performance work in general. There are a lot of companies focused solely on core web vitals in their performance work, and a steadily rising number of tools, agencies etc who do the same.
But it’s important to remember that while the core web vital metrics attempt to be very user focused (and in general, I would argue they do a good job of it), there are exceptions. There are times where those metrics simply aren’t measuring what matters. It doesn’t mean we ignore them—particularly with the search incentive behind them—but it does make it more important than ever to make sure that we understand what those metrics are measuring on our own sites and, if they’re falling short, that we find ways to measure what really matters to our users and businesses.
Philip Walton is always a must read, and his latest is no exception.
Phil took a look at the state of ES5 bundling in JavaScript, and the findings weren’t exactly encouraging.
If you look at the data below on how popular websites today are actually transpiling and deploying their code to production, it turns out that most sites on the internet ship code that is transpiled to ES5, yet still doesn’t work in IE 11—meaning the transpiler and polyfill bloat is being downloaded by 100% of their users, but benefiting none of them.
First off, very, very few companies are actively supporting IE 11 nowadays, so there’s virtually no reason to be doing this at all anymore for the vast majority of sites.
But, as Phil noted, the default configuration of Babel—the most popular JavaScript transpiler—is still to transpile to ES5. Defaults matter a ton and one of the best ways for us to prevent these kinds of issues is for tooling to be very smart about how they handle those default settings.
We knew this was coming, but Chrome officially ended support for First Input Delay. At least, in their tools.
The metric itself still can be recorded, but they’ve removed the data from CrUX, Page Speed Insights, their web-vitals.js library, and all the connected tooling.
It’s been a long time coming. FID was really only useful before it really became publicly available and publicized. There was some platform-level optimization that occurred because of the metric, for example, but by the time it was a metric sites had to pay attention to, it was mostly useless. This was particularly true because the “good” threshold that had been set for it of 100ms was almost criminally high.
So far, at least in my experience, Interaction to Next Paint has been a much more useful and interesting metric.