Resources

The Blind Spots of Google Lighthouse Score for Web Vitals

[Video and transcription of my BrightonSEO talk, April 2023] Hello, everyone. I’m Aymen Loukil, international SEO and Web Performance consultant. Today, I will share with you how a few years ago Google Lighthouse made me fail and lose money so you don’t have to face what I faced. => I prefer watch the recording So, ...

[Video and transcription of my BrightonSEO talk, April 2023]

Hello, everyone. I’m Aymen Loukil, international SEO and Web Performance consultant. Today, I will share with you how a few years ago Google Lighthouse made me fail and lose money so you don’t have to face what I faced.

=> I prefer watch the recording

So, back to my 2017 journey on web performance consulting, I was a big fan of Google Lighthouse. And this was my web performance workflow, start with Google Lighthouse and finish with Google Lighthouse. And between these two steps, we make editing, implementing, and trying to optimizing things, right?

So, on one of my biggest customers, it was an international e-commerce website, we were working on optimizing the PLP (Product Listing Page) template on mobile. And we were able to drastically improve Google Lighthouse score, moving from 42 to 63, and that was awesome, really, because web developers love making impact. And me as a consultant, I need to help my customers.

So, this was me excited, asking for deploying in production. “Please deploy it. I’m eager to see how this will behave in production,” right?

So, you know what? We deployed. We need to wait at least 28 days for the next monthly CrUX dashboard (Chrome User Experience report) dataset, to see if it benefited our users or not.

So, every second Tuesday of month, we have the monthly data set and see if we really improved it. And, you know, on your website, in one month, many, many things could happen. Features added, back fixes, regressions, you know? And that’s really too long to wait to be able to say, “Okay, we optimized things for our users,” right?

So, drum roll. And when the data set was live, unfortunately, users faced deterioration, and not improvement on our main metric, let’s say, LCP, for example, moved from 73% to 70%.

I was disappointed. And what we did here was a shot in the dark. And this feels bad as a consultant because my daily job is to help my customers make impact and making websites faster for real users. So, I needed to do reflection work on this to understand why this happened and why we do have a mismatch between Google Lighthouse data and users’ data, which is disappointing. So, yeah, we need to justify our work to our customers.

We were excited. Google Lighthouse improved, but ultimately, we didn’t help our users. So, what’s wrong with Google Lighthouse scoring system? This is the question I was trying to answer, which wasn’t simple to do.

What’s wrong with Google Lighthouse score?

So, consider we have 100 users on our website. What we did on the PLP page, does it fall somewhere? Does it represent a user? I don’t know. Maybe yes and maybe no.

Let’s have a look at PageSpeed Insights report showing both data, Lighthouse and field users’ data, right? I’m sorry, but it’s a real mess. This data is confusing. When Google Lighthouse is reporting a perfect CLS score, 0 here, 75% of our users are facing a bad CLS score. Also, when we have a look on LCP, 75% of our users are facing a perfect largest contentful paint, which is great, when Google Lighthouse is saying that we have a three seconds LCP, which is not pretty good.

So, this doesn’t surprise me because emulation isn’t real life. I’m sorry, Nintendo, but playing tennis in real life doesn’t feel like playing tennis on console game.

Many brag about their Google Lighthouse achievement

So, every day, there are many people bragging about their Google Lighthouse achievement. Yeah, the perfect score, 100, which is good because, at some way, we are really making progress. Maybe you are impacting your users and maybe not. But I deeply understand these people because I’ve also been part of this in 2018. And web performance score obsession isn’t new. As a human, we have always been assisted and obsessed with graduation, you know, education system, medical exams, etc.

Does Google Lighthouse score correlates with users data?

So, okay, let’s go back. Google Lighthouse data doesn’t match users’ data. Do they even correlate in some way? Brendan Kenny, a Googler, made a research on HTTP Archive data and found out that 50% of pages that have a perfect Google Lighthouse score doesn’t pass Web Vitals, 50%! You have 1 chance out of 2 if you have 100 Lighthouse score. But also with the score of 50, you may be able to do it. You may be able to pass the web vitals assessment. There is a blind spot somewhere in Google Lighthouse scoring system, isn’t it?

So, the question is, do they have something in common? Of course, they do. Both of them load web pages with a device or a device emulation, and in a context, right? So, the main difference is that users interact with your website, buy products, Zoom, scroll, and fill-in forms, but not Lighthouse.

So, talking about metrics…sorry for first input delay, interaction to next paint, because Google Lighthouse doesn’t interact. Lighthouse only report partial CLS, above-the-fold CLS, no scrolling, and LCP depends on the test conditions. So, ultimately, they don’t share as many metrics as we think they do.

=> Test website speed

Google Lighthouse has variability issues:

One day, a developer sent me a question by email and told me why when we make three consecutive rounds on the same page, on the same conditions, we never have the same score. And this was my answer. It’s like Rock, Paper Scissors! Sometimes, you can’t even say that the score improved thanks to what we did..

We can trick/cheat Google Lighthouse score:

Barry Pollard published a post “making the slowest fast page”, perfect Lighthouse score, and frustrating user experience.

Three months ago, one of my customers’ web agency sent us an email. We fixed all the web performance issues on the website, three exclamation marks, with a screenshot. And I was like, “Pardon?” I mean, okay, I checked the website. Oh my God, they hacked the website. They added this code in the source code. Please don’t do this. You are just misleading people, you are just tricking, and Google Lighthouse score doesn’t impact SEO, right?

So, yeah, this code, it’s a cloaking code. It checks if the page is loaded with Google Lighthouse or not, and if it’s the case, just load HTML. We solved it, all the web performance, we can go back home, and we no more need consultants, no agencies. So, I answered, “Do you mainly care about your users or Google Lighthouse?” This is the question.

Your users are different

Your users are different. And as Nick mentioned it, we often make assumptions about 3G, 4G, 5G are everywhere. CPUs of devices are getting faster every year. And you know what? I checked Google Analytics report, devices report, and iPhone is our top audience device. Yeah, iPhone, everywhere.

I’m sorry, you are just making assumptions because iPhone isn’t real life. Only 30% of worldwide share is iPhone, iOS, Google devices, and 70% are Android devices.

So, hear me out. Your users have potentially 2019, 2020 devices with broken glass, maybe. As a user, I don’t care about your Google Lighthouse score or your technical stack, or even if it’s chat GPT-powered. We don’t care. We just need information or fill or make some task. I’m looking for insurance, I’m looking to buy a product, I don’t care about your technical scoring stuff, right?

Rely on your users data when it comes to web performance:

Thanks to the reflection work, I was able to transform my failed web performance workflow, but also to create Speetals, a real users monitoring tool. So, I transformed my workflow onto this.

If you want to impact users’ data, you need to start auditing on it. If you want to impact something, you’re going to need to monitor this to audit this, and not audit with other tools that doesn’t represent users.

So, start with auditing in fields, that means users’ data, and fast validate your optimizations on your users‘ data. Between these two steps, of course, we’re going to use synthetic tools, Lighthouse, Chrome DevTools or your favorite tool in the market, right? So, hear your users’ experience and validate your efforts with them. This is the main meaning of this workflow.

Of course, here and in the first step, you can use your own real user monitoring tool. You can enjoy writing your SQL queries. You can do your Looker Studio dashboards, grabbing CrUX data, and I created Speetals to make this happen easily and with a simpler way.

So, let’s start with the first step. We need to hear our users’ experience. We are no more going to talk about scores. We are going to talk about distributions, the green, orange, and red distribution, good, average, bad, right? So, when I audit website performance, I put mobile and desktop side by side. That helps me to identify improvement points, but also distribution gaps. Sometimes we make assumptions that if our mobile CLS is good, our desktop one should be, which is not always the case. Also, here, we have a first input delay problem on mobile, which is more logical to see because interaction needs CPUs, etc. So, always compare devices and distribution across the metrics.

Here, I’m just showing web vitals metrics, but I also check time to first byte, first contentful paint, and INP, the new Core Vitals metric. Okay. So, another thing I love to do and which is very helpful is to see how every metric on every device distributes, thanks to this histogram. This is, for example, the LCP histogram showing how our users are facing LCP. So, on the X-axis, we have the values of LCP on milliseconds, and there we have the percentage. And we have the markers of good and bad average threshold. And you see this P55? This is the 75th percentile. That means 75 of your users are facing a 6 seconds LCP, which is crazy, too much. So, when we optimize web performance on websites, we need to push people to the left. So, the goal is to push our 75th percentile to the left, to the green, to the safe zone. And if the P75 is here, the metric will pass green, and it will pass web vitals.

Another thing I love to do, which is very helpful, is to compare metric distribution against competition. Okay? So, I’m not going to say the brand, but it’s surprising and shocking to see that your website is performing poorly against competition. This can help fix goals, but also to challenge teams and have a performance culture inside your team and say, “Yes, okay, let’s go compete with them.”

Who has an international website here? Great. Don’t forget to compare your audience distribution on each of your targeted markets. This is one of my customers, a French-based marketplace, but it’s an international website on the same domain name, right? It’s a .com domain. So, this is so important to check the gap between people in countries like European countries and people in Middle East. So, this is so important. Don’t neglect this thing.

Step two. Okay. We got many data about domain level, distributions, metrics. We already know on which metric, on which device we need to work to optimize things. So now, we need to do the right thing first to make the most of impact. So, here, we are going to do a transition between domain data to page-level data. It’s useless to monitor all your website pages. Doesn’t make sense. What makes sense most is to take the top of your pages. Let’s say you have a PLP format, sign-up page, help page, and do a blog, a PDP, etc. And what I do is to sort these pages from worst to best performing on each metric. So, here, for example, I sorted them by LCP. So, if you want to improve the 60% on device, on mobile or desktop, start with this low-hanging fruit page type, the main category, which has the most of red distribution on LCP. Right?

Step three. This is what I was doing in 2017. So, implement testing lab. You can use Google Lighthouse. I love Lighthouse. And, of course, the most important step. Okay. You with your customer team, you made efforts. You invested time and money in improving web performance. Now, we need to be sure if we hit the mark or not. So, don’t wait 28 days. It’s too late to validate your site speed optimizations, and you are not going to see if you made impact. So, daily validation based on CrUX data. So, here, for example, we deployed the fix on PDP, CLS on mobile, on desktop, sorry, and we made 34% of progress. So, we deployed here. Two days later, we are able to see the impact, if it’s improving or regressing, right? So, this is important to have these iterations and this rhythm of deployment validation in one-week window. You won’t need more than one week to validate your efforts.

Another example, 7% improvement on LCP mobile on PLP page, deployed, validated, and you go take other tickets, other optimizations, and you iterate. This is the principle of my workflow and my methodology. So, remember the 75th percentile. When you improve the green distribution, your 75th percentile chart must go down. You are pushing people to the left. So, here, our 75th percentile moved from 4 seconds to 2 seconds, which is great. That means the metric was in red distribution of web vitals and now it’s going to be green, so, validated.

And remember my frustration. You can prevent it by fast detecting regressions. So important. So, here, regression occurred on PDP page because marketing team decided to add the widget of reviews sliding and every review have a different size, and this generated too much of CLS. So, thanks to Slack alerts, we were able to detect fast and react fast.

Of course, finally, we still need to validate monthly, to validate the outcomes, but also to make reporting to management and to… Yeah, we invested, we made efforts, and that’s validated. So, continue to have buying and sponsorship on web performance work. So, this is so important to keep the trend going. Here, the website had the migration, so lost too much of web performance. And here, we started trying to work on web performance.

So, to sum up, never rely on Google Lighthouse only to monitor web performance. Use Lighthouse, use your synthetic tools, but mainly you should focus on your users’ data. Focus on them because at the end of the day, your users are buying your services or products, and not Google Lighthouse. Okay? Fast validation is so important. Don’t wait 28 days. Make it really short cycle of validation of efforts and validate the outcomes and enjoy. Thank you very much for coming. Thank you for attending. Thank you, Brighton.

Want to avoid all the above pitfalls?

Monitor your website with Speetals, a user-centric site speed tool. Focus on your users, prioritize, optimize, validate, and repeat!

Monitor your website now!

[Transcription by Speechpad]

Aymen Loukil

Web Performance Consultant and Speetals Founder