Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qm.Load from child to parent constructs potentially unnecessarily large sql statements #267

Closed
robotlovesyou opened this issue Apr 4, 2018 · 4 comments

Comments

@robotlovesyou
Copy link

robotlovesyou commented Apr 4, 2018

If you're having a generation problem please answer these questions before submitting your issue. Thanks!

What version of SQLBoiler are you using (sqlboiler --version)?

v2.6.0

If this happened at generation time what was the full SQLBoiler command you used to generate your models? (if not applicable leave blank)

If this happened at runtime what code produced the issue? (if not applicable leave blank)

I can't share my schema but it's roughly as follows

Table A
a_id: integer
owner_id: integer
b_id: integer
other columns

Table B
b_id: integer
other columns

So, the owner can have many A rows, each A row has exactly one B but each B can belong to more than one A

If I select all the A entries belonging to owner with something similar to

as, err := db.As(
db,
qm.Where("owner_id=?", ownerID),
qm.Load("B"),
).All()

Then for every row in A, the ID of B is repeated, so say I have 100 entries in A belonging to owner, where 50 entries have the B with id 1, and 50 have the B with id 2, then instead of just two entries in the in clause of the SQL select, I'll have 100.

What is the output of the command above with the -d flag added to it? (Provided you are comfortable sharing this, it contains a blueprint of your schema)

NA

Please provide a relevant database schema so we can replicate your issue (Provided you are comfortable sharing this)

See above

Further information. What did you do, what did you expect?

I would expect the qm.Load modifier to ensure that it is only including each id once.

@aarondl
Copy link
Member

aarondl commented Apr 9, 2018

So, you have:

A has one B; B has many A
And when we do a Load() we load all B's columns instead of removing the ID is the crux of the issue?

One of the problems is mapping that would require some code acrobatics. Since we basically bind all the rows of B to B itself. We'd have to do something magical to get that ID to bind to B as well as to B_ID. I think this is doable, but low priority atm.

I'll leave this opened, thanks for the report.

@robotlovesyou
Copy link
Author

robotlovesyou commented Apr 9, 2018

Hi, no I don't think I've been clear. That's not what I mean.

Say I have 10 A rows, each of which has one of two B rows

if I do

as, err := db.As(
db,
qm.Where("owner_id=?", ownerID),
qm.Load("B"),
).All()

Then my sql for selecting the B rows will look something like

SELECT * FROM B WHERE id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

With parameters something like

[1,1,1,2,2,1,1,2,2,2]

So imagine what the SQL is going to look like if I have 100, 1000, or 10,000 rows!

I would have thought it should be something like

SELECT * FROM B WHERE id in (?, ?)

With parameters something like

[1,2]

@aarondl
Copy link
Member

aarondl commented Apr 9, 2018

I'll have to look into this further later, I couldn't find any code that would support this claim (not saying you're wrong, just saying I have no idea why that could happen). Sounds like it's going to take some unique queries, or de-duping of the IDs which isn't very great either. Thanks again for reporting the issue.

@aarondl aarondl added bug and removed performance labels Apr 9, 2018
aarondl added a commit that referenced this issue Jun 5, 2018
Take a schema like youtube where there are users and videos, in the
past eager loading when doing a query like Videos(Load("User")) would
create something that would look like this:

SELECT * FROM "users" WHERE "id" IN ($1,$2,$3,$4,$5,$6,$7);
[1 1 1 1 1 2 2]

Now we de-duplicate the IDs of the eager loaded relationships, so even
though there are 7 videos, they only refer to 2 users so we only look up
those two.

SELECT * FROM "videos";
[]
SELECT * FROM "users" WHERE "id" IN ($1,$2);
[1 2]

- Fix #267
@aarondl aarondl closed this as completed Jun 5, 2018
@H0schi
Copy link

H0schi commented Dec 29, 2023

Hi!

I stumbled upon this issue when looking into why queries take long on my machine with a "large" number of rows (50k+) that eagerly load about 5 other tables of data, all with a 1:N relationship. Having this deduplicate logic for arguments makes my query take up to 200s. After removing the deduplicate logic, my query finishes after only about 1s.

Is there really a drawback to having the same arguments multiple times in the SQL query? The underlying database shouldn't really care if there are duplicate keys, correct? I think the trade of of having "ugly queries" with too many arguments and performance is worth it.

I vote for removing the duplicate keys check, but maybe I missed something. If there is nothing against removing the deduplication, we'd be happy to create a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants