Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic Time Binning Fails for Large Time Intervals #484

Closed
manzt opened this issue Aug 16, 2024 · 5 comments · Fixed by #501
Closed

Automatic Time Binning Fails for Large Time Intervals #484

manzt opened this issue Aug 16, 2024 · 5 comments · Fixed by #501

Comments

@manzt
Copy link
Collaborator

manzt commented Aug 16, 2024

Thanks for all the work on Mosaic! It's been really fun building quak on top of it. There are some tricky SQL errors that keep popping up (manzt/quak#35 manzt/quak#21), and I think I have finally tracked down that it stems from an initial parser error from code that relies on Mosaic's automatic time/date binning from #435.

Error: Parser Error: syntax error at or near "50000000000"
LINE 1: SELECT (TIME_BUCKET(INTERVAL 50000000000 year, "date_of_birth")) AS ...
                                  ^

I'm able to reproduce the error with vgplot and the athletes.csv dataset.

<!doctype html>
<html lang="en">
    <script type="module">
      import * as vg from "https://cdn.jsdelivr.net/npm/@uwdata/vgplot@0.10.0/+esm";
      let url = "https://raw.githubusercontent.com/uwdata/mosaic/main/data/athletes.csv";

      let connector = vg.wasmConnector();

      await vg.coordinator().databaseConnector(connector);

      // Note: vg.coordinator().exec([vg.loadCSV("https://...")])
      // it seems duckdb-wasm tries to make a range request for a csv with a URL,
      // which is not supported by GitHub, so we fetch the file manually
      // and register it as a table.

      let db = await connector.getDuckDB();
      await db.registerFileText("athletes.csv", await fetch(url).then((r) => r.text()));

      let conn = await connector.getConnection();
      await conn.insertCSVFromPath("athletes.csv", { schema: "main", name: "athletes" });

      // Create a subset of the data, which has a smaller range of date_of_birth and does
      // not cause an error in the binning
      await vg.coordinator().exec([
        "CREATE VIEW athletes_partial AS SELECT * FROM athletes ORDER BY date_of_birth ASC LIMIT 1000",
      ]);

      let plot = vg.plot(
        vg.rectY(
          vg.from("athletes"), // This causes an error
          // vg.from("athletes_partial"), // this is fine
          {
            x: vg.bin("date_of_birth"),
            y: vg.count(),
            fill: "steelblue",
            inset: 0.5,
          },
        ),
      );
      document.body.appendChild(plot);
    </script>
    <body></body>
</html>

Important to note that a smaller interval of the same data seems to work just fine (athletes_partial).

I think this might just be an error in the generated SQL that is not tested, since I can reproduce the error and fix it by quoting the interval.

await vg.coordinator().query(
++  `SELECT (TIME_BUCKET(INTERVAL '50000000000 year', "date_of_birth")) from athletes`,
--  `SELECT (TIME_BUCKET(INTERVAL 50000000000 year, "date_of_birth")) from athletes`,
);

I don't know enough about the auto-binning implementation to understand when this code path is hit, so the "large time intervals" might be incorrect for identifying when this fails. Perhaps a better issue title would be "Automatic time binning fails for some data".

@jheer
Copy link
Member

jheer commented Aug 25, 2024

Thanks! Looks like a simple (face-palm inducing) bug in the time binning routine for multi-year intervals. This led to intervals that were much too large and then cause query parse errors. Adding quotes to the intervals fixes the parse error, and then more informative DuckDB error messages occur, but the underlying cause is the same.

@manzt
Copy link
Collaborator Author

manzt commented Aug 25, 2024

Thanks for the fix!

@manzt
Copy link
Collaborator Author

manzt commented Aug 27, 2024

Thanks again for the fix! I hate to ask, but could these changes end up in a patch release? Or require waiting until the next minor?

@jheer
Copy link
Member

jheer commented Sep 13, 2024

Sorry for the delay. I plan to cut a new release as soon as #519 lands.

@manzt
Copy link
Collaborator Author

manzt commented Sep 16, 2024

No problem. Thanks!

@jheer jheer mentioned this issue Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants