-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk import #168
Bulk import #168
Conversation
src/adapter/mysql.cr
Outdated
next unless model.valid? | ||
stmt << "(" | ||
stmt << model.to_h[primary_name].to_s + ", " unless primary_auto | ||
stmt << fields.map { |field| model.to_h[field].is_a?(String) ? "'" + model.to_h[field].to_s + "'" : model.to_h[field] }.join(", ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of how I'm handling the quoting of String type fields. Any better ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The adapters now provide a quote method but the functionality is has some bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quoting method I do use for column/table names. However, since it quotes with the backtick that couldn't be used for string values afaik?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't see that you are quoting values. A parameterized query is better for that, and I believe the adapters automatically do what they need to with values in parameterized queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my initial thought, however when I tried it I kept getting like db params invalid format
or something like that (not home atm to look). The params should just be an array of values, which the first value of array maps to first ? placeholder correct?
I'll mess with it again and see if I can get the params working prob this weekend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, I think it's going to be a good feature.
.gitignore
Outdated
@@ -4,6 +4,7 @@ | |||
/.crystal/ | |||
/doc/ | |||
/config/*.db | |||
/.idea/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is an editor thing. You should setup and ignore your editor files in a system wide global gitignore file instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Blacksmoke16 interesting, Are you coding with Intellij Idea? just curious, What plugin are you using to have crystal support? 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@faustinoaq I am yes :) Using Rubymine with https://github.com/intellij-crystal/intellij-crystal
Is pretty meh in regards to full native support like Ruby in Rubymine, but it provides syntax highlighting and that's good enough for me atm.
src/adapter/mysql.cr
Outdated
next unless model.valid? | ||
stmt << "(" | ||
stmt << model.to_h[primary_name].to_s + ", " unless primary_auto | ||
stmt << fields.map { |field| model.to_h[field].is_a?(String) ? "'" + model.to_h[field].to_s + "'" : model.to_h[field] }.join(", ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The adapters now provide a quote method but the functionality is has some bugs.
src/granite_orm/transactions.cr
Outdated
# the array must contain only one model class | ||
# invalid model records will be skipped | ||
def self.import(model_array) | ||
raise ArgumentError.new("Model class mismatch: expected array of only #{self} models.") unless model_array.all? { |model| model.class == self } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this definition be changed so that the compiler will enforce the type constraint instead of a runtime error?
First guess, something like this might work:
def self.import(model_array : Array(self))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooo yes! I will do that tonight (still getting used to static typed land :P)
Add type filter to import class method Reformat some code
src/adapter/mysql.cr
Outdated
next unless model.valid? | ||
stmt << '(' | ||
stmt << "?," unless primary_auto | ||
stmt << Array.new(fields.size, '?').join(',') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unless primary_auto references above/below here can be removed once #169 gets merged, since fields will cover it by default.
Probably also be able to remove the primary_auto param since that is only reason i needed it.
@robacarp What do you think the best way to handle the lifecycle hooks is for this? Best approach/worth doing/etc? |
Handle timestamps Start of specs Ran crystal tool format
Am currently having some issues with timestamps and/or the import statement setting timestamps. https://travis-ci.org/amberframework/granite-orm/builds/360749166#L579 Am not sure what is causing it...have to keep messing with it. EDIT: Error I am getting locally:
A list of all the affected test:
|
|
I'll look into that, thanks! Probably best to just keep it before_create, run import statement, after_create? Mostly my question is how that would specifically work since each model isn't saved normally so without running the callback in the .each of models_array it wouldn't really be similar to EDIT: To be more clear, should the callbacks be ran for EACH model like before_save, add to insert string, after_create. OR should the callbacks be executed before and after the import statement is ran. |
@Blacksmoke16 ah, of course. I think they should run for each record. For example a user object which needs a password hash is not a valid record until the before_create executes. Proposing a workflow here:
@amberframework/granite-orm-contributors thoughts? |
@Blacksmoke16 FYI, depending on your personal git workflow, you may need to update your |
src/adapter/pg.cr
Outdated
end.chomp(',') | ||
|
||
if update_keys = options["on_duplicate_key_update"]? | ||
statement += " ON CONFLICT (#{quote(primary_name)}) DO UPDATE SET " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems right, but would you mind explaining the behavior in a terse comment for future maintainers? (And me...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I plan on updating the READ me with info on how to use everything. But basic this was part of Postgres' 9.5 version:
https://www.postgresql.org/docs/9.5/static/sql-insert.html
It's basic the PostGres version of MySQL ON DUPLICATE KEY UPDATE
From the docs:
INSERT INTO distributors (did, dname)
VALUES (5, 'Gizmo Transglobal'), (6, 'Associated Computing, Inc')
ON CONFLICT (did) DO UPDATE SET dname = EXCLUDED.dname;
Is basic saying if there is a conflict on the did
value, then dname
should be updated to the new value that would have otherwise been excluded.
@@ -76,6 +76,47 @@ class Granite::Adapter::Pg < Granite::Adapter::Base | |||
end | |||
end | |||
|
|||
def import(table_name : String, primary_name : String, fields, model_array, **options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your **options
can be a little cleaner and have a similar effect:
def import(table_name : String, primary_name : String, fields, model_array : Array(self), *, update_on_duplicate_key = false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would mean i would need to have a defined param for each options right?
As of now there is on_duplicate_key_ignore: Bool
, on_duplicate_key_update: Array(Strings) to be updated
, and possibly in future also validate: true
, as giving option to ignore validation might speed up large imports. I don't have strong feelings one way or another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably have the best of both worlds using a crystal overloaded method signature. If you need, here are the docs on overloading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t have strong opinions on this either. I just noticed a rubyism that might be able to be cleaned up with some crystal.
Just a little update on this. Been busy with finals coming up and such, will make some more commits this weekend. |
# The save method will check to see if the primary exists yet. If it does it | ||
# will call the update method, otherwise it will call the create method. | ||
# This will update the timestamps apropriately. | ||
# This will update the timestamps appropriately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇
src/adapter/mysql.cr
Outdated
update_keys.each do |key| | ||
statement += "#{quote(key)}=VALUES(#{quote(key)}), " | ||
end | ||
statement = statement.chomp(", ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does mysql not have an ON DUPLICATE IGNORE
option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Blacksmoke16 I don't know because I didn't test this, but I think you'll be able to trigger the callbacks from the class level by using these methods. If it's not straightforward to trigger the callbacks externally, I'm fine pushing the callbacks to a later pull for the import feature. There's no need to get way out into the weeds. |
I'll poke around with it this weekend and we can go from there. Thanks! |
@robacarp from my testing I am able to trigger the callback successfully to a degree. Since we are in a class method, the method that is to be called in the model doesn't exist.
Where |
README.md
Outdated
|
||
Each model has an `import` class level method to import an array of models in one bulk insert statement. | ||
```Crystal | ||
models [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, missed =
on these, i'll fix that tonight.
end | ||
end | ||
|
||
def self.import(model_array : Array(self), update_on_duplicate : Bool, columns : Array(String)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you would never use update_on_duplicate
and ignore_on_duplicate
what if we use an enum instead of boolean flags that has options for how to handle duplicates? maybe duplicates: ignore
duplicates: update
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be a nice way to also require columns key when using the update option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Blacksmoke16 did you figure out the callbacks?
Also, what happens when one of the rows fails to import? Do all of them rollback?
Sadly no, the issue is trying to run a model's instance method from the class scope. As of now that one I would have to test it if you have a specific case, otherwise iirc the whole import fails since it is just one SQL query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work here, I think this is a nice feature to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 Excellent work.
I just added Bulk import to amber documentation as well 🎉 https://amberframework.gitbook.io/amber/guides/models/granite/bulk-insertions |
PR for #156
Progress
Questions:
Either, hooks would not run for each model being inserted, but would get triggered once all are inserted and such.