Skip to content

Latest commit

 

History

History
968 lines (807 loc) · 55 KB

wuwt-e04-tests.md

File metadata and controls

968 lines (807 loc) · 55 KB

What’s Up With Tests

This is a transcript of What's Up With That Episode 4, a 2022 video discussion between Sharon (yangsharon@chromium.org) and Stephen (smcgruer@chromium.org).

The transcript was automatically generated by speech-to-text software. It may contain minor errors.


Testing is important! What kinds of tests do we have in Chromium? What are they all about? Join in as Stephen, who led Chrome's involvement in web platform tests, tells us all about them.

Notes:


00:00 SHARON: Hello, everyone, and welcome to "What's Up With That," the series that demystifies all things Chrome. I'm your host, Sharon. And today we're talking testing. Within Chrome, there are so many types of tests. What are they all? What's the difference? What are the Chromium-specific quirks? Today's guest is Stephen. He previously led Chrome's involvement in web platform tests. Since then, he's worked on rendering, payments, and interoperability. As a fun aside, he's one of the first people I met who worked on Chrome and is maybe part of why I'm here today. So welcome, Stephen.

00:33 STEPHEN: Well, thank you very much for having me, Sharon, I'm excited to be here.

00:33 SHARON: Yeah, I'm excited to have you here. So today, we're in for maybe a longer episode. Testing is a huge topic, especially for something like Chrome. So grab a snack, grab a drink, and let's start. We'll start with what are all of the things that we have testing for in Chrome. What's the purpose of all these tests we have?

00:51 STEPHEN: Yeah. It's a great question. It's also an interesting one because I wanted to put one caveat on this whole episode, which is that there is no right answer in testing. Testing, even in the literature, never mind in Chromium itself, is not a solved problem. And so you'll hear a lot of different opinions. People will have different thoughts. And I'm sure that no matter how hard we try, by the end of this episode, our inbox will be filled with angry emails from people being like, no, you are wrong. So all of the stuff we're saying here today is my opinion, albeit I'll try and be as useful as possible. But yeah, so why do we test was the question, right? So there's a lot of different reasons that we write tests. Obviously, correctness is the big one. You're writing some code, you're creating a feature, you want it to be correct. Other reasons we write them, I mean, tests can be useful as a form of documentation in itself. If you're ever looking at a class and you're like, what does - why is this doing this, why is the code doing this, the test can help inform that. They're also useful - I think a topic of this podcast is sort of security. Tests can be very useful for security. Often when we have a security bug, we go back and we write what are called regression tests, so at least we try and never do that security failure again. And then there are other reasons. We have tests for performance. We have tests for - our launch process uses tests. There's lots and lots of reasons we have tests.

02:15 SHARON: Now that you've covered all of the different reasons why we test, how do we do each of these types of tests in Chromium? What are the test types we have?

02:27 STEPHEN: Yeah. So main test types we have in Chromium, unit tests, browser tests, what we call web tests, and then there's a bunch of more specialized ones, performance tests, testing on Android, and of course manual testing.

02:43 SHARON: We will get into each of these types now, I guess. The first type of test you mentioned is unit tests. Why don't you tell us a quick rundown of what unit tests are. I'm sure most people have encountered them or heard of them before. But just a quick refresher for those who might not.

02:55 STEPHEN: Yeah, absolutely. So as the name implies, a unit test is all about testing a unit of code. And what that is not very well defined. But you can usually think of it as just a class, a file, a small isolated component that doesn't have to talk to all the other bits of the code to work. Really, the goal is on writing something that's testing just the code under test - so that new method you've added or whatever. And it should be quick and easy to run.

03:22 SHARON: So on the screen now we have an example of a pretty typical unit test we see in Chrome. So there's three parts here. Let's go through each of them. So the first type - the first part of this is TEST_P. What is that telling us?

03:38 STEPHEN: Yeah. So that is - in Chromium we use a unit testing framework called Google test. It's very commonly used for C++. You'll see it all over the place. You can go look up documentation. The test macros, that's what this is, are essentially the hook into Google test to say, hey, the thing that's coming here is a test. There's three types. There is just test, which it just says here is a function. It is a test function. TEST_F says that you basically have a wrapper class. It's often called a test fixture, which can do some common setup across multiple different tests, common teardown, and that sort of thing. And finally, TEST_P is what we call a parameterized test. And what this means is that the test can take some input parameters, and it will run the same test with each of those values. Very useful for things like when you want to test a new flag. What happens if the flag is on or off?

04:34 SHARON: That's cool. And a lot of the things we're mentioning for unit test also apply to browser test, which we'll cover next. But the parameterization is an example of something that carries over to both. So that's the first part. That's the TEST_P, the macro. What's the second part, PendingBeaconHostTest? What is that?

04:54 STEPHEN: Yeah. So that is the fixture class, the test container class I was talking about. So in this case, we're assuming that in order to write a beacon test, whatever that is, they have some set up, some teardown they need to do. They might want to encapsulate some common functionality. So all you have to do to write one of these classes is, you declare a C++ class and you subclass from the Google test class name.

05:23 SHARON: So this is a TEST_P, but you mentioned that this is a fixture. So are fixture tests a subset of parameterized tests?

05:35 STEPHEN: Parameterized tests are a subset of fixture test, is that the right way around to put it? All parameterized tests are fixtures tests. Yes.

05:41 SHARON: OK.

05:41 STEPHEN: You cannot have a parameterized test that does not have a fixture class. And the reason for that is how Google test actually works under the covers is it passes those parameters to your test class. You will have to additionally extend from the testing::WithParamInterface. And that says, hey, I'm going to take parameters.

06:04 SHARON: OK. But not all fixture tests are parameterized tests.

06:04 STEPHEN: Correct.

06:04 SHARON: OK. And the third part of this, SendOneOfBeacons. What is that?

06:10 STEPHEN: That is your test name. Whatever you want to call your test, whatever you're testing, put it here. Again, naming tests is as hard as naming anything. A lot of yak shaving, finding out what exactly you should call the test. I particularly enjoy when you see test names that themselves have underscores in them. It's great.

06:30 SHARON: Uh-huh. What do you mean by yak shaving?

06:35 STEPHEN: Oh, also known as painting a bike shed? Bike shed, is that the right word? Anyway, generally speaking -

06:40 SHARON: Yeah, I've heard -

06:40 STEPHEN: arguing about pointless things because at the end of the day, most of the time it doesn't matter what you call it.

06:46 SHARON: OK, yeah. So I've written this test. I've decided it's going to be parameterized. I've come up with a test fixture for it. I have finally named my test. How do I run my tests now?

06:57 STEPHEN: Yeah. So all of the tests in Chromium are built into different test binaries. And these are usually named after the top level directory that they're under. So we have components_unittests, content_unittests. I think the Chrome one is just called unit_tests because it's special. We should really rename that. But I'm going to assume a bunch of legacy things depend on it. Once you have built whichever the appropriate binary is, you can just run that from your out directory, so out/release/components_unittests, for example. And then that, if you don't pass any flags, will run every single components unit test. You probably don't want to do that. They're not that slow, but they're not that fast. So there is a flag --gtest_filter, which allows you to filter. And then it takes a test name after that. The format of test names is always test class dot test name. So for example, here PendingBeaconHostTest dot SendOneOfBeacons.

08:04 SHARON: Mm-hmm. And just a fun aside for that one, if you do have parameterized tests, it'll have an extra slash and a number at the end. So normally, whenever I use it, I just put a star before and after. And that generally does - covers the cases.

08:17 STEPHEN: Yeah, absolutely.

08:23 SHARON: Cool. So with the actual test names, you will often see them prefixed with either MAYBE_ or DISABLED_, or before the test, there will be an ifdef with usually a platform and then depending on the cases, it'll prefix the test name with something. So I think it's pretty clear what these are doing. Maybe is a bit less clear. Disabled pretty clear what that is. But can you tell us a bit about these prefixes?

08:51 STEPHEN: Yeah, absolutely. So this is our way of trying to deal with that dreaded thing in testing, flake. So when a test is flaky, when it doesn't produce a consistent result, sometimes it fails. We have in Chromium a whole continuous integration waterfall. That is a bunch of bots on different platforms that are constantly building and running Chrome tests to make sure that nothing breaks, that bad changes don't come in. And flaky tests make that very hard. When something fails, was that a real failure? And so when a test is particularly flaky and is causing sheriffs, the build sheriffs trouble, they will come in and they will disable that test. Basically say, hey, sorry, but this test is causing too much pain. Now, as you said, the DISABLED_ prefix, that's pretty obvious. If you put that in front of a test, Google test knows about it and it says, nope, will not run this test. It will be compiled, but it will not be run. MAYBE_ doesn't actually mean anything. It has no meaning to Google test. But that's where you'll see, as you said, you see these ifdefs. And that's so that we can disable it on just one platform. So maybe your test is flaky only on Mac OS, and you'll see basically, oh, if Mac OS, change the name from maybe to disabled. Otherwise, define maybe as the normal test name.

10:14 SHARON: Makes sense. We'll cover flakiness a bit later. But yeah, that's a huge problem. And we'll talk about that for sure. So these prefixes, the parameterization and stuff, this applies to both unit and browser tests.

10:27 STEPHEN: Yeah.

10:27 SHARON: Right? OK. So what are browser tests? Chrome's a browser. Browser test, seems like there's a relation.

10:34 STEPHEN: Yeah. They test the browser. Isn't it obvious? Yeah. Browser tests are our version - our sort of version of an integration or a functional test depending on how you look at things. What that really means is they're testing larger chunks of the browser at once. They are integrating multiple components. And this is somewhere that I think Chrome's a bit weird because in many large projects, you can have an integration test that doesn't bring your entire product up and in order to run. Unfortunately, or fortunately, I guess it depends on your viewpoint, Chrome is so interconnected, it's so interdependent, that more or less we have to bring up a huge chunk of the browser in order to connect any components together. And so that's what browser tests are. When you run one of these, there's a massive amount of machinery in the background that goes ahead, and basically brings up the browser, and actually runs it for some definition of what a browser is. And then you can write a test that pokes at things within that running browser.

11:42 SHARON: Yeah. I think I've heard before multiple times is that browser tests launch the whole browser. And that's -

11:47 STEPHEN: More or less true. It's - yeah.

11:47 SHARON: Yes. OK. Does that also mean that because you're running all this stuff that all browser tests have fixtures? Is that the case?

11:59 STEPHEN: Yes, that is the case. Absolutely. So there is only - I think it's - oh my goodness, probably on the screen here somewhere. But it's IN_PROC_BROWSER_TEST_F and IN_PROC_BROWSER_TEST_P. There is no version that doesn't have a fixture.

12:15 SHARON: And what does the in proc part of that macro mean?

12:15 STEPHEN: So that's, as far as I know - and I might get corrected on this. I'll be interested to learn. But it refers to the fact that we've run these in the same process. Normally, the whole Chromium is a multi-process architecture. For the case of testing, we put that aside and just run everything in the same process so that it doesn't leak, basically.

12:38 SHARON: Yeah. There's flags when you run them, like --single-process. And then there's --single-process-test. And they do slightly different things. But if you do run into that, probably you will be working with people who can answer and explain the differences between those more. So something that I've seen quite a bit in browser and unit tests, and only in these, are run loops. Can you just briefly touch on what those are and what we use them for in tests?

13:05 STEPHEN: Oh, yeah. That's a fun one. I think actually previous on an episode of this very program, you and Dana talked a little bit around the fact that Chrome is not a completely synchronous program, that we do we do task splitting. We have a task scheduler. And so run loops are part of that, basically. They're part of our stack for handling asynchronous tasks. And so this comes up in testing because sometimes you might be testing something that's not synchronous. It takes a callback, for example, rather than returning a value. And so if you just wrote your test as normal, you call the function, and you don't - you pass a callback, but then your test function ends. Your test function ends before that callback ever runs. Run loop gives you the ability to say, hey, put this callback into some controlled run loop. And then after that, you can basically say, hey, wait on this run loop. I think it's often called quit when idle, which basically says keep running until you have no more tasks to run, including our callback, and then finish. They're powerful. They're very useful, obviously, with asynchronous code. They're also a source of a lot of flake and pain. So handle with care.

14:24 SHARON: Yeah. Something a tip is maybe using the --gtest_repeat flag. So that one lets you run your test however number of times you've had to do it.

14:30 STEPHEN: Yeah.

14:36 SHARON: And that can help with testing for flakiness or if you're trying to debug something flaky. In tests, we have a variety of macros that we use. In the unit test and the browser tests, you see a lot of macros, like EXPECT_EQ, EXPECT_GT. These seem like they're part of maybe Google test. Is that true?

14:54 STEPHEN: Yeah. They come from Google test itself. So they're not technically Chromium-specific. But they basically come in two flavors. There's the EXPECT_SOMETHING macros. And there's the ASSERT_SOMETHING macros. And the biggest thing to know about them is that expect doesn't actually cause - it causes a test to fail, but it doesn't stop the test from executing. The test will continue to execute the rest of the code. Assert actually throws an exception and stops the test right there. And so this can be useful, for example, if you want to line up a bunch of expects. And your code still makes sense. You're like, OK, I expect to return object, and it's got these fields. And I'm just going to expect each one of the fields. That's probably fine to do. And it may be nice to have output that's like, no, actually, both of these fields are wrong. Assert is used when you're like, OK, if this fails, the rest of the test makes no sense. Very common thing you'll see. Call an API, get back some sort of pointer, hopefully a smart pointer, hey. And you're going to be like, assert that this pointer is non-null because if this pointer is null, everything else is just going to be useless.

15:57 SHARON: I think we see a lot more expects than asserts in general anecdotally from looking at the test. Do you think, in your opinion, that people should be using asserts more generously rather than expects, or do we maybe want to see what happens - what does go wrong if things continue beyond a certain point?

16:15 STEPHEN: Yeah. I mean, general guidance would be just keep using expect. That's fine. It's also not a big deal if your test actually just crashes. It's a test. It can crash. It's OK. So use expects. Use an assert if, like I said, that the test doesn't make any sense. So most often if you're like, hey, is this pointer null or not and I'm going to go do something with this pointer, assert it there. That's probably the main time you'd use it.

16:45 SHARON: A lot of the browser test classes, like the fixture classes themselves, are subclass from other base classes.

16:53 STEPHEN: Mm-hmm.

16:53 SHARON: Can you tell us about that?

16:53 STEPHEN: Yeah. So basically, we have one base class for browser tests. I think its BrowserTestBase, I think it's literally called, which sits at the bottom and does a lot of the very low level setup of bringing up a browser. But as folks know, there's more than one browser in the Chromium project. There is Chrome, the Chrome browser that is the more full-fledged version. But there's also content shell, which people might have seen. It's built out of content. It's very simple browser. And then there are other things. We have a headless mode. There is a headless Chrome you can build which doesn't show any UI. You can run it entirely from the command line.

17:32 SHARON: What's the difference between headless and content shell?

17:39 STEPHEN: So content shell does have a UI. If you run content shell, you will actually see a little UI pop up. What content shell doesn't have is all of those features from Chrome that make Chrome Chrome, if you will. So I mean, everything from bookmarks, to integration with having an account profile, that sort of stuff is not there. I don't think content shell even supports tabs. I think it's just one page you get. It's almost entirely used for testing. But then, headless, sorry, as I was saying, it's just literally there is no UI rendered. It's just headless.

18:13 SHARON: That sounds like it would make -

18:13 STEPHEN: And so, yeah. And so - sorry.

18:13 SHARON: testing faster and easier. Go on.

18:18 STEPHEN: Yeah. That's a large part of the point, as well as when you want to deploy a browser in an environment where you don't see the UI. So for example, if you're running on a server or something like that. But yeah. So for each of these, we then subclass that BrowserTestBase in order to provide specific types. So there's content browser test. There's headless browser test. And then of course, Chrome has to be special, and they called their version in process browser test because it wasn't confusing enough. But again, it's sort of straightforward. If you're in Chrome, /chrome, use in_process_browser_test. If you're in /content, use content_browsertest. It's pretty straightforward most of the time.

18:58 SHARON: That makes sense. Common functions you see overridden from those base classes are these set up functions. So they're set, set up on main thread, there seems to be a lot of different set up options. Is there anything we should know about any of those?

19:13 STEPHEN: I don't think that - I mean, most of it's fairly straightforward. I believe you should mostly be using setup on main thread. I can't say that for sure. But generally speaking, setup on main thread, teardown on main thread - or is it shutdown main thread? I can't remember - whichever the one is for afterwards, are what you should be usually using in a browser thread. You can also usually do most of your work in a constructor. That's something that people often don't know about testing. I think it's something that's changed over time. Even with unit tests, people use the setup function a lot. You can just do it in the constructor a lot of the time. Most of background initialization has already happened.

19:45 SHARON: I've definitely wondered that, especially when you have things in the constructor as well as in a setup method. It's one of those things where you just kind of think, I'm not going to touch this because eh, but -

19:57 STEPHEN: Yeah. There are some rough edges, I believe. Set up on main thread, some things have been initialized that aren't around when your class is being constructed. So it is fair. I'm not sure I have any great advice unless - other than you may need to dig in if it happens.

20:19 SHARON: One last thing there. Which one gets run first, the setup functions or the constructor?

20:19 STEPHEN: The constructor always happens first. You have to construct the object before you can use it.

20:25 SHARON: Makes sense. This doesn't specifically relate to a browser test or unit test, but it does seem like it's worth mentioning, which is the content public test API. So if you want to learn more about content and content public, check out episode three with John. But today we're talking about testing. So we're talking about content public test. What is in that directory? And how does that - how can people use what's in there?

20:48 STEPHEN: Yeah. It's basically just a bunch of useful helper functions and classes for when you are doing mostly browser tests. So for example, there are methods in there that will automatically handle navigating the browser to a URL and actually waiting till it's finished loading. There are other methods for essentially accessing the tab strip of a browser. So if you have multiple tabs and you're testing some cross tab thing, methods in there to do that. I think that's probably where the content browser test - like base class lives there as well. So take a look at it. If you're doing something that you're like, someone should write - it's the basic - it's the equivalent of base in many ways for testing. It's like, if you're like, someone should have written a library function for this, possibly someone has already. And you should take a look. And if they haven't, you should write one.

21:43 SHARON: Yeah. I've definitely heard people, code reviewers, say when you want to add something that seems a bit test only to content public, put that in content public test because that doesn't get compiled into the actual release binaries. So if things are a bit less than ideal there, it's a bit more forgiving for a place for that.

22:02 STEPHEN: Yeah, absolutely. I mean, one of the big things about all of our test code is that you can actually make it so that it's in many cases not compiled into the binary. And that is both useful for binary size as well as you said in case it's concerning. One thing you can do actually in test, by the way, for code that you cannot avoid putting into the binary - so let's say you've got a class, and for the reasons of testing it because you've not written your class properly to do a dependency injection, you need to access a member. You need to set a member. But you only want that to happen from test code. No real code should ever do this. You can actually name methods blah, blah, blah for test or for testing. And this doesn't have any - there's no code impact to this. But we have pre-submits that actually go ahead and check, hey, are you calling this from code that's not marked as test code? And it will then refuse to - it will fail to pre-submit upload if that happens. So it could be useful.

23:03 SHARON: And another thing that relates to that would be the friend test or friend something macro that you see in classes. Is that a gtest thing also?

23:15 STEPHEN: It's not a gtest thing. It's just a C++ thing. So C++ has the concept of friending another class. It's very cute. It basically just says, this other class and I, we can access each other's internal states. Don't worry, we're friends. Generally speaking, that's a bad idea. We write classes for a reason to have encapsulation. The entire goal of a class is to encapsulate behavior and to hide the implementation details that you don't want to be exposed. But obviously, again, when you're writing tests, sometimes it is the correct thing to do to poke a hole in the test and get at something. Very much in the schools of thought here, some people would be like, you should be doing dependency injection. Some people are like, no, just friend your class. It's OK. If folks want to look up more, go look up the difference between open box and closed box testing.

24:00 SHARON: For those of you who are like, oh, this sounds really cool, I will learn more.

24:00 STEPHEN: Yeah, for my test nerds out there.

24:06 SHARON: [LAUGHS] Yeah, Stephen's got a club. Feel free to join.

24:06 STEPHEN: Yeah. [LAUGHTER]

24:11 SHARON: You get a card. Moving on to our next type of test, which is your wheelhouse, which is web tests. This is something I don't know much about. So tell us all about it.

24:22 STEPHEN: [LAUGHS] Yeah. This is my - this is where hopefully I'll shine. It's the area I should know most about. But web tests are - they're an interesting one. So I would describe them is our version of an end-to-end test in that a web test really is just an HTML file, a JavaScript file that is when you run it, you literally bring up - you'll remember I said that browser tests are most of a whole browser. Web tests bring up a whole browser. It's just the same browser as content shell or Chrome. And it runs that whole browser. And the test does something, either in HTML or JavaScript, that then is asserted and checked. And the reason I say that I would call them this, I have heard people argue that they're technically unit tests, where the unit is the JavaScript file and the entire browser is just, like, an abstraction that you don't care about. I guess it's how you view them really. I view the browser as something that is big and flaky, and therefore these are end-to-end tests. Some people disagree.

25:22 SHARON: In our last episode, John touched on these tests and how that they're - the scope and that each test covers is very small. But how you run them is not. And I guess you can pick a side that you feel that you like more and go with that. So what are examples of things we test with these kind of tests?

25:49 STEPHEN: Yeah. So the two big categories of things that we test with web tests are basically web APIs, so JavaScript APIs, provided by the browser to do something. There are so many of those, everything from the fetch API for fetching stuff to the web serial API for talking to devices over serial ports. The web is huge. But anything you can talk to via JavaScript API, we call those JavaScript tests. It's nice and straightforward. The other thing that web tests usually encompass are what are called rendering tests or sometimes referred to as ref tests for reference tests. And these are checking the actual, as the first name implies, the rendering of some HTML, some CSS by the browser. The reason they're called reference tests is that usually the way you do this to check whether a rendering is correct is you set up your test, and then you compare it to some image or some other reference rendering that you're like, OK, this should look like that. If it does look like that, great. If it doesn't, I failed.

26:54 SHARON: Ah-ha. And are these the same as - so there's a few other test names that are all kind of similar. And as someone who doesn't work in them, they all kind of blur together. So I've also heard web platform tests. I've heard layout tests. I've heard Blink tests, all of which do - all of which are JavaScript HTML-like and have some level of images in them. So are these all the same thing? And if not, what's different?

27:19 STEPHEN: Yeah. So yes and no, I guess, is my answer. So a long time ago, there were layout tests basically. And that was something we inherited from the WebKit project when we forked there, when we forked Chromium from WebKit all those years ago. And they're exactly what I've described. They were both JavaScript-based tests and they were also HTML-based tests for just doing reference renderings. However, web platform test came up as an external project actually. Web platform test is not a Chromium project. It is external upstream. You can find them on GitHub. And their goal was to create a set of - a test suite shared between all browsers so that all browsers could test - run the same tests and we could actually tell, hey, is the web interoperable? Does it work the same way no matter what browser you're on? The answer is, no. But we're trying. And so inside of Chromium we said, that's great. We love this idea. And so what we did was we actually import web platform test into our layout tests. So web platform test now becomes a subdirectory of layout tests. OK?

28:30 SHARON: OK. [LAUGHS]

28:30 STEPHEN: To make things more confusing, we don't just import them, but we also export them. We run a continuous two-way sync. And this means that Chromium developers don't have to worry about that upstream web platform test project most of the time. They just land their code in Chromium, and a magic process happens, and it goes up into the GitHub project. So that's where we were for many years - layout tests, which are a whole bunch of legacy tests, and then also web platform tests. But fairly recently - and I say that knowing that COVID means that might be anything within the last three years because who knows where time went - we decided to rename layout test. And partly, the name we chose was web tests. So now you have web tests, of which web platform tests are a subset, or a - yeah, subset of web test. Easy.

29:20 SHARON: Cool.

29:20 STEPHEN: [LAUGHS]

29:20 SHARON: Cool. And what about Blink tests? Are those separate, or are those these altogether?

29:27 STEPHEN: I mean, if they're talking about the JavaScript and HTML, that's going to just be another name for the web tests. I find that term confusing because there is also the Blink tests target, which builds the infrastructure that is used to run web tests. So that's probably what you're referring, like blink_test. It is the target that you build to run these tests.

29:50 SHARON: I see. So blink_test is a target. These other ones, web test and web platform tests, are actual test suites.

29:57 STEPHEN: Correct. Yes. That's exactly right.

30:02 SHARON: OK. All right.

30:02 STEPHEN: Simple.

30:02 SHARON: Yeah. So easy. So you mentioned that the web platform tests are cross-browser. But a lot of browsers are based on Chromium. Is it one of the things where it's open source and stuff but majority of people contributing to these and maintaining it are Chrome engineers?

30:23 STEPHEN: I must admit, I don't know what that stat is nowadays. Back when I was working on interoperability, we did measure this. And it was certainly the case that Chromium is a large project. There were a lot of tests being contributed by Chromium developers. But we also saw historically - I would like to recognize Mozilla, most of all, who were a huge contributor to the web platform test project over the years and are probably the reason that it succeeded. And we also - web platform test also has a fairly healthy community of completely outside developers. So people that just want to come along. And maybe they're not able to or willing to go into a browser, and actually build a browser, and muck with code. But they could write a test for something. They can find a broken behavior and be like, hey, there's a test here, Chrome and Firefox do different things.

31:08 SHARON: What are examples of the interoperability things that you're testing for in these cross-browser tests?

31:17 STEPHEN: Oh, wow, that's a big question. I mean, really everything and anything. So on the ref test side, the rendering test, it actually does matter that a web page renders the same in different browsers. And that is very hard to achieve. It's hard to make two completely different engines render some HTML and CSS exactly the same way. But it also matters. We often see bugs where you have a lovely - you've got a lovely website. It's got this beautiful header at the top and some content. And then on one browser, there's a two-pixel gap here, and you can see the background, and it's not a great experience for your users. So ref tests, for example, are used to try and track those down. And then, on the JavaScript side, I mean really, web platform APIs are complicated. They're very powerful. There's a reason they are in the browser and you cannot do them in JavaScript. And that is because they are so powerful. So for example, web USB to talk to USB devices, you can't just do that from JavaScript. But because they're so powerful, because they're so complicated, it's also fairly easy for two browsers to have slightly different behavior. And again, it comes down to what is the web developer's experience. When I try and use the web USB API, for example, am I going to have to write code that's like, if Chrome, call it this way, if Fire - we don't want that. That is what we do not want for the web. And so that's the goal.

32:46 SHARON: Yeah. What a team effort, making the whole web work is. All right. That's cool. So in your time working on these web platform tests, do you have any fun stories you'd like to share or any fun things that might be interesting to know?

33:02 STEPHEN: Oh, wow. [LAUGHS] One thing I like to bring up - I'm afraid it's not that fun, but I like to repeat it a lot of times because it's weird and people get tripped up by it - is that inside of Chromium, we don't run web platform tests using the Chrome browser. We run them using content shell. And this is partially historical. That's how layout tests run. We always ran them under content shell. And it's partially for I guess what I will call feasibility. As I talked about earlier, content shell is much simpler than Chrome. And that means that if you want to just run one test, it is faster, it is more stable, it is more reliable I guess I would say, than trying to bring up the behemoth that is Chrome and making sure everything goes correctly. And this often trips people up because in the upstream world of this web platform test project, they run the test using the proper Chrome binary. And so they're different. And different things do happen. Sometimes it's rendering differences. Sometimes it's because web APIs are not always implemented in both Chrome and content shell. So yeah, fun fact.

34:19 SHARON: Oh, boy. [LAUGHTER]

34:19 STEPHEN: Oh, yeah.

34:19 SHARON: And we wonder why flakiness is a problem. Ah. [LAUGHS]

34:19 STEPHEN: Yeah. It's a really sort of fun but also scary fact that even if we put aside web platform test and we just look at layout test, we don't test what we ship. Layout test running content shell, and then we turn around and we're like, here's a Chrome binary. Like uh, those are different. But, hey, we do the best we can.

34:43 SHARON: Yeah. We're out here trying our best. So that all sounds very cool. Let's move on to our next type of test, which is performance. You might have heard the term telemetry thrown around. Can you tell us what telemetry is and what these performance tests are?

34:54 STEPHEN: I mean, I can try. We've certainly gone straight from the thing I know a lot about into the thing I know very little about. But -

35:05 SHARON: I mean, to Stephen's credit, this is a very hard episode to find one single guest for. People who are working extensively usually in content aren't working a ton in performance or web platform stuff. And there's no one who is - just does testing and does every kind of testing. So we're trying our best. [INAUDIBLE]

35:24 STEPHEN: Yeah, absolutely. You just need to find someone arrogant enough that he's like, yeah, I'll talk about all of those. I don't need to know the details. It's fine. But yeah, performance test, I mean, the name is self explanatory. These are tests that are trying to ensure the performance of Chromium. And this goes back to the four S's when we first started Chrome as a project - speed, simplicity, security, and I've forgotten the fourth S now. Speed, simplicity, security - OK, let's not reference the four S's then. [LAUGHTER] You have the Comet. You tell me.

36:01 SHARON: Ah. Oh, I mean, I don't read it every day. Stability. Stability.

36:08 STEPHEN: Stability. God damn it. Let's literally what the rest of this is about. OK, where were we?

36:13 SHARON: We're leaving this in, don't worry. [LAUGHTER]

36:19 STEPHEN: Yeah. So the basic idea of performance test is to test performance because as much as you can view behavior as a correctness thing, in Chromium we also consider performance a correctness thing. It is not a good thing if a change lands and performance regresses. So obviously, testing performance is also hard to do absolutely. There's a lot of noise in any sort of performance testing. An so, we do it essentially heuristically, probabilistically. We run whatever the tests are, which I'll talk about in a second. And then we look at the results and we try and say, hey, OK, is there a statistically significant difference here? And there's actually a whole performance sheriffing rotation to try and track these down. But in terms of, yeah, you mentioned telemetry. That weird word. You're like, what is a telemetry test? Well, telemetry is the name of the framework that Chromium uses. It's part of the wider catapult project, which is all about different performance tools. And none of the names, as far as I know, mean anything. They're just like, hey, catapult, that's a cool name. I'm sure someone will explain to me now the entire history behind the name catapult and why it's absolutely vital. But anyway, so telemetry basically is a framework that when you give it some input, which I'll talk about in a second, it launches a browser, performs some actions on a web page, and records metrics about those actions. So the input, the test essentially, is basically a collection of go to this web page, do these actions, record these metrics. And I believe in telemetry that's called a story, the story of someone visiting a page, I guess, is the idea. One important thing to know is that because it's sort of insane to actually visit real websites, they keep doing things like changing - strange. We actually cache the websites. We download a version of the websites once and actually check that in. And when you go run a telemetry test, it's not running against literally the real Reddit.com or something. It's running against a version we saved at some point.

38:31 SHARON: And how often - so I haven't really heard of anyone who actually works on this and that we can't - you don't interact with everyone. But how - as new web features get added and things in the browser change, how often are these tests specifically getting updated to reflect that?

38:44 STEPHEN: I would have to plead some ignorance there. It's certainly also been my experience as a browser engineer who has worked on many web APIs that I've never written a telemetry test myself. I've never seen one added. My understanding is that they are - a lot of the use cases are fairly general with the hope that if you land some performance problematic feature, it will regress on some general test. And then we can be like, oh, you've regressed. Let's figure out why. Let's dig in and debug. But it certainly might be the case if you are working on some feature and you think that it might have performance implications that aren't captured by those tests, there is an entire team that works on the speed of Chromium. I cannot remember their email address right now. But hopefully we will get that and put that somewhere below. But you can certainly reach out to them and be like, hey, I think we should test the performance of this. How do I go about and do that?

39:41 SHARON: Yeah. That sounds useful. I've definitely gotten bugs filed against me for performance stuff. [LAUGHS] Cool. So that makes sense. Sounds like good stuff. And in talking to some people in preparation for this episode, I had a few people mention Android testing specifically. Not any of the other platforms, just Android. So do you want to tell us why that might be? What are they doing over there that warrants additional mention?

40:15 STEPHEN: Yeah. I mean, I think probably the answer would just be that Android is such a huge part of our code base. Chrome is a browser, a multi-platform browser, runs on multiple desktop platforms, but it also runs on Android. And it runs on iOS. And so I assume that iOS has its own testing framework. I must admit, I don't know much about that at all. But certainly on Android, we have a significant amount of testing framework built up around it. And so there's the option, the ability for you to test your Java code as well as your C++ code.

40:44 SHARON: That makes sense. And yeah, with iOS, because they don't use Blink, I guess there's - that reduces the amount of test that they might need to add, whereas on Android they're still using Blink. But there's a lot of differences because it is mobile, so they're just, OK, we actually can test those things. So let's go more general now. At almost every stage, you've mentioned flakiness. So let's briefly run down, what is flakiness in a test?

41:14 STEPHEN: Yes. So flakiness for a test is just - the definition is just that the test does not consistently produce the same output. When you're talking about flakiness, you actually don't care what the output is. A test that always fails, that's fine. It always fails. But a test that passes 90% of the time and fails 10%, that's not good. That test is not consistent. And it will cause problems.

41:46 SHARON: What are common causes of this?

41:46 STEPHEN: I mean, part of the cause is, as I've said, we write a lot of integration tests in Chromium. Whether those are browser tests, or whether those are web tests, we write these massive tests that span huge stacks. And what comes implicitly with that is timing. Timing is almost always the problem - timing and asynchronicity. Whether that is in the same thread or multiple threads, you write your test, you run it on your developer machine, and it works. And you're like, cool, my test works. But what you don't realize is that you're assuming that in some part of the browser, this function ran, then this function run. And that always happens in your developer machine because you have this CPU, and this much memory, and et cetera, et cetera. Then you commit your code, you land your code, and somewhere a bot runs. And that bot is slower than your machine. And on that bot, those two functions run in the opposite order, and something goes horribly wrong.

42:50 SHARON: What can the typical Chrome engineer writing these tests do in the face of this? What are some practices that you generally should avoid or generally should try to do more often that will keep this from happening in your test?

43:02 STEPHEN: Yeah. So first of all, write more unit tests, write less browser tests, please. Unit tests are - as I've talked about, they're small. They're compact. They focus just on the class that you're testing. And too often, in my opinion - again, I'm sure we'll get some nice emails stating I'm wrong - but too often, in my opinion people go straight to a browser test. And they bring up a whole browser just to test functionality in their class. This sometimes requires writing your class differently so that it can be tested by a unit test. That's worth doing. Beyond that, though, when you are writing a browser test or a web test, something that is more integration, more end to end, be aware of where timing might be creeping in. So to give an example, in a browser test, you often do things like start by loading some web contents. And then you will try and poke at those web contents. Well, so one thing that people often don't realize is that loading web contents, that's not a synchronous process. Actually knowing when a page is finished loading is slightly difficult. It's quite interesting. And so there are helper functions to try and let you wait for this to happen, sort of event waiters. And you should - unfortunately, the first part is you have to be aware of this, which is just hard to be. But the second part is, once you are aware of where these can creep in, make sure you're waiting for the right events. And make sure that once those events have happened, you are in a state where the next call makes sense.

44:28 SHARON: That makes sense. You mentioned rewriting your classes so they're more easily testable by a unit test. So what are common things you can do in terms of how you write or structure your classes that make them more testable? And just that seems like a general good software engineering practice to do.

44:50 STEPHEN: Yeah, absolutely. So one of the biggest ones I think we see in Chromium is to not use singleton accessors to get at state. And what I mean by that is, you'll see a lot of code in Chromium that just goes ahead and threw some mechanism that says, hey, get the current web contents. And as you, I think, you've talked about on this program before, web contents is this massive class with all these methods. And so if you just go ahead and get the current web contents and then go do stuff on that web contents, whatever, when it comes to running a test, well, it's like, hold on. That's trying to fetch a real web contents. But we're writing a unit test. What does that even look like? And so the way around this is to do what we call dependency injection. And I'm sure as I've said that word, a bunch of listeners or viewers have just recoiled in fear. But we don't lean heavily into dependency injection in Chromium. But it is useful for things like this. Instead of saying, go get the web contents, pass a web contents into your class. Make a web contents available as an input. And that means when you create the test, you can use a fake or a mock web contents. We can talk about difference between fakes and mocks as well. And then, instead of having it go do real things in real code, you can just be like, no, no, no. I'm testing my class. When you call it web contents do a thing, just return this value. I don't care about web contents. Someone else is going to test that.

46:19 SHARON: Something else I've either seen or been told in code review is to add delegates and whatnot.

46:25 STEPHEN: Mm-hmm.

46:25 SHARON: Is that a good general strategy for making things more testable?

46:25 STEPHEN: Yeah. It's similar to the idea of doing dependency injection by passing in your web contents. Instead of passing in your web contents, pass in a class that can provide things. And it's sort of a balance. It's a way to balance, if you have a lot of dependencies, do you really want to add 25 different inputs to your class? Probably not. But you define a delegate interface, and then you can mock out that delegate. You pass in that one delegate, and then when delegate dot get web content is called, you can mock that out. So very much the same goal, another way to do it.

47:04 SHARON: That sounds good. Yeah, I think in general, in terms of Chrome specifically, a lot of these testing best practices, making things testable, these aren't Chrome-specific. These are general software engineering-specific, C++-specific, and those you can look more into separately. Here we're mostly talking about what are the Chrome things. Right?

47:24 STEPHEN: Yeah.

47:24 SHARON: Things that you can't just find as easily on Stack Overflow and such. So you mentioned fakes and mocks just now. Do you want to tell us a bit about the difference there?

47:32 STEPHEN: I certainly can do it. Though I want to caveat that you can also just go look up those on Stack Overflow. But yeah. So just to go briefly into it, there is - in testing you'll often see the concept of a fake version of a class and also a mock version of a class. And the difference is just that a fake version of the class is a, what I'm going to call a real class that you write in C++. And you will probably write some code to be like, hey, when it calls this function, maybe you keep some state internally. But you're not using the real web contents, for example. You're using a fake. A mock is actually a thing out of the Google test support library. It's part of a - Google mock is the name of the sub-library, I guess, the sub-framework that provides this. And it is basically a bunch of magic that makes that fake stuff happen automatically. So you can basically say, hey, instead of a web contents, just mock that web contents out. And the nice part about mock is, you don't have to define behavior for any method you don't care about. So if there are, as we've discussed, 100 methods inside web contents, you don't have to implement them all. You can be like, OK, I only care about the do Foobar method. When that is called, do this.

48:51 SHARON: Makes sense. One last type of test, which we don't hear about that often in Chrome but does exist quite a bit in other areas, is manual testing. So do we actually have manual testing in Chrome? And if so, how does that work?

49:03 STEPHEN: Yeah, we actually do. We're slightly crossing the boundary here from the open Chromium into the product that is Google Chrome. But we do have manual tests. And they are useful. They are a thing. Most often, you will see this in two cases as a Chrome engineer. You basically work with the test team. As I said, all a little bit internal now. But you work with the test team to define a set of test cases for your feature. And these are almost always end-to-end tests. So go to this website, click on this button, you should see this flow, this should happen, et cetera. And sometimes we run these just as part of the launch process. So when you're first launching a new feature, you can be like, hey, I would love for some people to basically go through this and smoke test it, make sure that everything is correct. Some things we test every release. They're so important that we need to have them tested. We need to be sure they work. But obviously, all of the caveats about manual testing out there in the real world, they apply equally to Chromium or to Chrome. Manual testing is slow. It's expensive. We require people - specialized people that we have to pay and that they have to sit there, and click on things, and that sort of thing, and file bugs when it doesn't work. So wherever possible, please do not write manual tests. Please write automated testing. Test your code, please. But then, yeah, it can be used.

50:33 SHARON: In my limited experience working on Chrome, the only place that I've seen there actually be any level of dependency on manual test has been in accessibility stuff -

50:38 STEPHEN: Yeah.

50:38 SHARON: which kind of makes sense. A lot of that stuff is not necessarily - it is stuff that you would want to have a person check because, sure, we can think that the speaker is saying this, but we should make sure that that's the case.

50:57 STEPHEN: Exactly. I mean, that's really where manual test shines, where we can't integration test accessibility because you can't test the screen reader device or the speaker device. Whatever you're using, we can't test that part. So yes, you have to then have a manual test team that checks that things are actually working.

51:19 SHARON: That's about all of our written down points to cover. Do you have any general thoughts, things that you think people should know about tests, things that people maybe ask you about tests quite frequently, anything else you'd like to share with our lovely listeners?

51:30 STEPHEN: I mean, I think I've covered most of them. Please write tests. Write tests not just for code you're adding but for code you're modifying, for code that you wander into a directory and you say, how could this possibly work? Go write a test for it. Figure out how it could work or how it couldn't work. Writing tests is good.

51:50 SHARON: All right. And we like to shout-out a Slack channel of interest. Which one would be the - which one or ones would be a good Slack channel to post in if you have questions or want to get more into testing?

52:03 STEPHEN: Yeah. It's a great question. I mean, I always like to - I think it's been called out before, but the hashtag #halp channel is very useful for getting help in general. There is a hashtag #wpt channel. If you want to go ask about web platform tests, that's there. There's probably a hashtag #testing. But I'm going to admit, I'm not in it, so I don't know.

52:27 SHARON: Somewhat related is there's a hashtag #debugging channel.

52:27 STEPHEN: Oh.

52:27 SHARON: So if you want to learn about how to actually do debugging and not just do log print debugging.

52:34 STEPHEN: Oh, I was about to say, do you mean by printf'ing everywhere in your code?

52:41 SHARON: [LAUGHS] So there are a certain few people who like to do things in an actual debugger or enjoy doing that. And for a test, that can be a useful thing too - a tool to have. So that also might be something of interest. All right, yeah. And kind of generally, as you mentioned a lot of things are your opinion. And it seems like we currently don't have a style guide for tests or best practices kind of thing. So how can we -

53:13 STEPHEN: [LAUGHS] How can we get there? How do we achieve that?

53:19 SHARON: How do we get one?

53:19 STEPHEN: Yeah.

53:19 SHARON: How do we make that happen?

53:19 STEPHEN: It's a hard question. We do - there is documentation for testing, but it's everywhere. I think there's /docs/testing, which has some general information. But so often, there's just random READMEs around the code base that are like, oh, hey, here's the content public test API surface. Here's a bunch of useful information you might want to know. I hope you knew to look in this location. Yeah, it's a good question. Should we have some sort of process for - like you said, like a style guide but for testing? Yeah, I don't know. Maybe we should enforce that people dependency inject their code.

54:04 SHARON: Yeah. Well, if any aspiring test nerds want to really get into it, let me know. I have people who are also interested in this and maybe can give you some tips to get started. But yeah, this is a hard problem and especially with so many types of tests everywhere. I mean, even just getting one for each type of test would be useful, let alone all of them together. So anyway - well, that takes us to the end of our testing episode. Thank you very much for being here, Stephen. I think this was very useful. I learned some stuff. So that's cool. So hopefully other people did too. And, yeah, thanks for sitting and answering all these questions.

54:45 STEPHEN: Yeah, absolutely. I mean, I learned some things too. And hopefully we don't have too many angry emails in our inbox now.

54:52 SHARON: Well, there is no email list, so people can't email in if they have issues. [LAUGHTER]

54:58 STEPHEN: If you have opinions, keep them to yourself -

54:58 SHARON: Yeah. [INAUDIBLE]

54:58 STEPHEN: until Sharon invites you on her show.

55:05 SHARON: Yeah, exactly. Yeah. Get on the show, and then you can air your grievances at that point. [LAUGHS] All right. Thank you.