Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Improving bug reports by supporting inline reproductions #35389

Closed
orta opened this issue Nov 27, 2019 · 6 comments · Fixed by #39628
Closed

RFC: Improving bug reports by supporting inline reproductions #35389

orta opened this issue Nov 27, 2019 · 6 comments · Fixed by #39628
Labels
Infrastructure Issue relates to TypeScript team infrastructure

Comments

@orta
Copy link
Contributor

orta commented Nov 27, 2019

We want to provide better tools for providing reproductions of TypeScript bugs in issues. In an issue we really only have text to work with, so we need some kind of text representation of a TypeScript project.

Classes of issues

Looking through the bugs label, there are a large amounts of cases we can probably address:

  • Bad JS/DTS Emit
  • This code should be this type
  • This code is slow to compile / use IntelliSense
  • This code crashes
  • This code should error
  • This code should not error
  • This refactor didn't do what I expected

There are a few we probably can't:

  • This code infinite loops

I'm interested in expanding either of these lists! ^

Possible solution

Two parts:

  • We use something like arcanis/sherlock to run any issues with re-pros on a nightly-ish schedule with different builds of TypeScript and store the results using something like this - this runs TypeScript code with markup which is built for repros

  • A page on the website which helps you generate a single typescript codeblock which is a reproduction of a bug

Possible Markup

We've been working on a TypeScript markup language for the website and this called ts-twoslash which adds commands inside comments. I'd recommend reading the README for that before expanding some of these examples.

Bad JS Emit
// @showEmit
// @emitShouldContain: this string
// @emitShouldNotContain: this other string

console.log("hello world")
Bad DTS Emit
// @showEmit
// @showEmittedFile: index.d.ts
// @emitShouldContain: 123
// @emitShouldNotContain: 456

export const myVariable = "123123"
This code should be this type
// @noImplicitAny: false

const a = 123
//    ^? = const a: 123

or

// @noImplicitAny: false

const a = 123
//    ^? != const a: any
This code is slow
// @slow

const a = 123
//    ^?

This annotation could indicate that it should fail if it takes over 1s

This code crashes
const a = 123
a.l

In theory there shouldn't need to be a crashes annotation, all code should not crash. I'm open to debates on why that might need to be different though.

This code should error
// @expectErrors: 4311

const a = "hello"
a * a
This code should not error
const a = "hello"
a * a

The lack of an error annotation should indicate this. If it is expected that there should be some errors, then it can use the @errors: 3211 which is already set up

This refactor didn't do what I expected

I'm still a bit hazy on this, in fourslash there are a lot of options here - but to try provide a really high-level

const a = "hello"
   a * a
// ^^^^^! refactor

This could create a JSON dump of all the refactoring options maybe?

Web Page

We have been working towards having the playground support a custom set of tabs in the sidebar, it's not a far reach to imagine that an editor for twoslash could be the playground but with tabs focused on running ts-twoslash and getting useful results.

Here's a rough idea of what that could feel like:

Screen Shot 2019-11-27 at 12 03 02 PM

The blockers on that:

@orta orta added the Infrastructure Issue relates to TypeScript team infrastructure label Nov 27, 2019
@RyanCavanaugh
Copy link
Member

I really like the idea of a workbench.

I think it's sufficient for the markup language to be able to show the setup behavior without prescribing the expected output; the author can describe in prose what they don't like about it. We've found that emitShouldContain/etc style tests are usually too sensitive to "spelling" effects (e.g. Array<T> vs T[]).

For something like a refactoring, showing the refactorings available (perhaps filtered by a search string?) and their post-render output would be sufficient.

Error message numbers only have "best effort" stability and I'd generally avoid trying to hardcode them. This is also subject to oversensitivity; when we add a new "Here's what we think is going on" error message on top of an existing error message, the observed top-level error code changes. Error positions are also subject to some movement; this is why I generally lean toward "Let people demonstrate the behavior" rather than "Let people prescribe the behavior". Showing version-to-version changes in the demonstrated behavior is also likely to be more instructive than pass/fail.

@elibarzilay
Copy link
Contributor

What I had in mind is something that is more focused on how to implement this, and specifically independent of the style of tests -- anything goes, as long as it has support in the TS repo.

With this in mind (and therefore nothing about the front end that you'd actually use), the functionality that would be implemented is best left for just the tests that are in the repo -- so in the below I'm talking about fourslash and baseline tests, but if the twoslash thing is added, then that would be another option. (IOW, I see this as independent of the style of tests.)

  • Take the whole repo, build it, and dump the result in a docker image (almost as is, removing unnecessary stuff, and possibly with some interface script described below).

  • There would be a classification of test files by their "style" -- baseline or fourslash, and some files would not be classified (tests that are more involved than a one-file thing (like the docker tests), or things that are not test cases by themselves).

  • The resulting image has two purposes:

    • given some text, grep known files for that text (including the file names)
    • run gulp runtests --t="someTest"
  • When someone interacts with this:

    • the first step would be to grep for something similar and/or decide on the style of tests.
    • then, edit a file that they found, or write a new one
    • then the file is executed via a docker run with that one file mounted in the right place, and with an filename and an argument that makes only that one file run and the results rendered but
    • (the docker run would include a timeout and memory limits)
    • these rendered results should normally be a test failure: an uncaught exception, a timeout, running out of memory; or in the case of a baseline test, the resulting files
  • At this point, the test case could be attached to a PR with some additional human explanation. That explanation could be empty (a test just failed, and it shouldn't), or it could describe what's wrong with the output in the case of a baseline test (shouldn't get errors, something in the types file should be different etc). The bottom line is that after this there is a test case ready to add.

  • When the attachement to a PR is done, also add a "test case available" label.

  • Better to attach to an existing PR rather than make it create a PR, since I see bugs where people have interest in helping so this accomodates X submitting a bug, and Y adding a test case.

  • (In the world I came from there would be concern about adding a label only after verifying some signature, but this is all just simplifying what people can write, and no tests would be added automatically anyway, so there's no need for any of that.)

@orta
Copy link
Contributor Author

orta commented Nov 27, 2019

Cool @elibarzilay - this looks like a case of creating a setup where we get working test cases which matches our own in the compiler and it gives us the freedom to cover crashing / OOM / infinite loop tests.

I'm open to building the front-end to something like this, but I'm wary of building a backend to support this myself. My past experience building something similar was that this will be quite a lot of maintenance and work. If that's something you're up for building and owning, then I can think about how we teach and build out a UI for it in the site.

FWIW, I still prefer like my original proposal for a few reasons:

  • Low barrier to entry for people wanting to make reports because twoslash is barely any different from normal typescript, and easy to document
  • No server to maintain and secure, it runs in a browser or inside node via cron'd GitHub Actions
  • Shares underlaying infra code with the site, which means it will get upkeep almost for free

Which is basically a worse is better argument. It does a lot of things worse:

  • It's a unique way to write tests (being neither fourslash, nor baselines)
  • It'd require us to take those issues to make our own tests when fixing issues
  • It doesn't cover all cases of tests
  • It's basically an approximation of the underlaying bug

But it covers most of the cases and is more convenient

@orta
Copy link
Contributor Author

orta commented Jul 1, 2020

Update

I have an (unpolished) bug workbench in the staging site:

Screen Shot 2020-07-01 at 8 53 00 AM

Yesterday, I got the verification steps working with twoslash code samples as a GitHub Action: https://github.com/microsoft/TypeScript-Twoslash-Repro-Action

This action verifies any code samples in an issue body, or comments against the last 6 versions of TypeScript and the latest nightly build. I just need to work on the reporting aspects of it.

Each run keeps track of:

  • Any highlighted queries
  • Compiler errors
  • Emit output
  • Compiler crashes

@weswigham
Copy link
Member

This action verifies any code samples in an issue body, or comments against the last 6 versions of TypeScript and the latest nightly build. I just need to work on the reporting aspects of it.

Suggestion: why not have the action bisect releases (if it fails the most recent release)/nightlies (if it passes on the most recent release but not the most recent nightly), so it can identify if the issue is a regression, when it was introduced.

@orta
Copy link
Contributor Author

orta commented Jul 5, 2020

Aye, It should support this OOTB today (going from green to red on a new nightly run is just another state change it would detect) - historical results are always kept around, for example this comment shows some different results over time with different builds of TS

As it runs every night, you'd get the list of commits between the two nightlies in the comment which should make it easy to track down

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Infrastructure Issue relates to TypeScript team infrastructure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants