Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk import #168

Merged
merged 24 commits into from
Apr 26, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
814de94
Start of bulk-import method
Blacksmoke16 Mar 30, 2018
d1b2134
Use parameterized statement
Blacksmoke16 Mar 30, 2018
93b6e67
Handle duplicate key IGNORE/UPDATE
Blacksmoke16 Mar 31, 2018
6435d63
Add commas between update values
Blacksmoke16 Mar 31, 2018
85c36a0
merge in the fields unity code
Blacksmoke16 Mar 31, 2018
73e7de5
Remove now undeeded primary_auto variable
Blacksmoke16 Mar 31, 2018
1a03145
Add import for pg adapter
Blacksmoke16 Mar 31, 2018
54dd5b7
Also quote other column key
Blacksmoke16 Mar 31, 2018
c74d126
Add adapter for sqlite
Blacksmoke16 Mar 31, 2018
e0e1be4
Merge branch 'master' into bulk-import
faustinoaq Apr 4, 2018
84364d9
Fix spec
Blacksmoke16 Apr 7, 2018
2d15769
Add more tests, fix sqlite adapter
Blacksmoke16 Apr 7, 2018
bbd2f7f
Update readme
Blacksmoke16 Apr 7, 2018
2425ad2
Fix spacing issue in sqlite adapater
Blacksmoke16 Apr 7, 2018
87009d5
Fix indentation
Blacksmoke16 Apr 7, 2018
acc4a6e
Fix typos in readme examples
Blacksmoke16 Apr 7, 2018
4b10e27
Merge branch 'master' into bulk-import
faustinoaq Apr 7, 2018
87268a0
Fix indentations
Blacksmoke16 Apr 7, 2018
f6dcf2a
Add overload methods for update/ignore_on_duplicate, update readme
Blacksmoke16 Apr 8, 2018
e3e525a
Add = sign to model arrays in readme
Blacksmoke16 Apr 12, 2018
371d0b4
Merge branch 'master' into bulk-import
drujensen Apr 22, 2018
9fe4bcb
Specify imports do not trigger callbacks yet
Blacksmoke16 Apr 25, 2018
5b31b80
proper grammar
Blacksmoke16 Apr 25, 2018
027be38
Merge branch 'master' into bulk-import
drujensen Apr 26, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,71 @@ class Site < Granite::ORM::Base
end
```

### Bulk Insertions

#### Import

**Note: As of now, imports do not trigger callbacks.**

Each model has an `import` class level method to import an array of models in one bulk insert statement.
```Crystal
models = [
Model.new(id: 1, name: "Fred", age: 14),
Model.new(id: 2, name: "Joe", age: 25),
Model.new(id: 3, name: "John", age: 30),
]

Model.import(models)
```

#### update_on_duplicate

The `import` method has an optional `update_on_duplicate` + `columns` params that allows you to specify the columns (as an array of strings) that should be updated if primary constraint is violated.
```Crystal
models = [
Model.new(id: 1, name: "Fred", age: 14),
Model.new(id: 2, name: "Joe", age: 25),
Model.new(id: 3, name: "John", age: 30),
]

Model.import(models)

Model.find!(1).name # => Fred

models = [
Model.new(id: 1, name: "George", age: 14),
]

Model.import(models, update_on_duplicate: true, columns: %w(name))

Model.find!(1).name # => George
```

##### NOTE: If using PostgreSQL you must have version 9.5+ to have the on_duplicate_key_update feature.

#### ignore_on_duplicate

the `import` method has an optional `ignore_on_duplicate` param, that takes a boolean, which will skip records if the primary constraint is violated.
```Crystal
models = [
Model.new(id: 1, name: "Fred", age: 14),
Model.new(id: 2, name: "Joe", age: 25),
Model.new(id: 3, name: "John", age: 30),
]

Model.import(models)

Model.find!(1).name # => Fred

models = [
Model.new(id: 1, name: "George", age: 14),
]

Model.import(models, ignore_on_duplicate: true)

Model.find!(1).name # => Fred
```

### SQL

To clear all the rows in the database:
Expand Down
59 changes: 59 additions & 0 deletions spec/granite_orm/transactions/import_spec.cr
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
require "../../spec_helper"

{% for adapter in GraniteExample::ADAPTERS %}
module {{adapter.capitalize.id}}
describe "{{ adapter.id }} .import" do
it "should import 3 new objects" do
to_import = [
Parent.new(name: "ImportParent1"),
Parent.new(name: "ImportParent2"),
Parent.new(name: "ImportParent3"),
]
Parent.import(to_import)
Parent.all("WHERE name LIKE ?", ["Import%"]).size.should eq 3
end

it "should work with on_duplicate_key_update" do
to_import = [
Parent.new(id: 111, name: "ImportParent1"),
Parent.new(id: 112, name: "ImportParent2"),
Parent.new(id: 113, name: "ImportParent3"),
]

Parent.import(to_import)

to_import = [
Parent.new(id: 112, name: "ImportParent112"),
]

Parent.import(to_import, update_on_duplicate: true, columns: ["name"])

if parent = Parent.find 112
parent.name.should be "ImportParent112"
parent.id.should eq 112
end
end

it "should work with on_duplicate_key_ignore" do
to_import = [
Parent.new(id: 111, name: "ImportParent1"),
Parent.new(id: 112, name: "ImportParent2"),
Parent.new(id: 113, name: "ImportParent3"),
]

Parent.import(to_import)

to_import = [
Parent.new(id: 113, name: "ImportParent113"),
]

Parent.import(to_import, ignore_on_duplicate: true)

if parent = Parent.find 113
parent.name.should be "ImportParent3"
parent.id.should eq 113
end
end
end
end
{% end %}
2 changes: 1 addition & 1 deletion spec/spec_helper.cr
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
require "spec"

module GraniteExample
ADAPTERS = ["pg","mysql","sqlite"]
ADAPTERS = ["pg", "mysql", "sqlite"]
end

require "../src/granite_orm"
Expand Down
3 changes: 3 additions & 0 deletions src/adapter/base.cr
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ abstract class Granite::Adapter::Base
# This will insert a row in the database and return the id generated.
abstract def insert(table_name, fields, params, lastval) : Int64

# This will insert an array of models as one insert statement
abstract def import(table_name : String, primary_name : String, fields, model_array, **options)

# This will update a row in the database.
abstract def update(table_name, primary_name, fields, params)

Expand Down
40 changes: 40 additions & 0 deletions src/adapter/mysql.cr
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,46 @@ class Granite::Adapter::Mysql < Granite::Adapter::Base
end
end

def import(table_name : String, primary_name : String, fields, model_array, **options)
params = [] of DB::Any
now = Time.now.to_utc
fields.reject! { |field| field === "id" } if primary_name === "id"

statement = String.build do |stmt|
stmt << "INSERT"
stmt << " IGNORE" if options["ignore_on_duplicate"]?
stmt << " INTO #{quote(table_name)} ("
stmt << fields.map { |field| quote(field) }.join(", ")
stmt << ") VALUES "

model_array.each do |model|
model.updated_at = now if model.responds_to? :updated_at
model.created_at = now if model.responds_to? :created_at
next unless model.valid?
stmt << '('
stmt << Array.new(fields.size, '?').join(',')
params.concat fields.map { |field| model.to_h[field] }
stmt << "),"
end
end.chomp(',')

if options["update_on_duplicate"]?
if columns = options["columns"]?
statement += " ON DUPLICATE KEY UPDATE "
columns.each do |key|
statement += "#{quote(key)}=VALUES(#{quote(key)}), "
end
statement = statement.chomp(", ")
end
end

log statement, params

open do |db|
db.exec statement, params
end
end

private def last_val
return "SELECT LAST_INSERT_ID()"
end
Expand Down
43 changes: 43 additions & 0 deletions src/adapter/pg.cr
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,49 @@ class Granite::Adapter::Pg < Granite::Adapter::Base
end
end

def import(table_name : String, primary_name : String, fields, model_array, **options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your **options can be a little cleaner and have a similar effect:

def import(table_name : String, primary_name : String, fields, model_array : Array(self), *, update_on_duplicate_key = false)

Copy link
Contributor Author

@Blacksmoke16 Blacksmoke16 Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would mean i would need to have a defined param for each options right?

As of now there is on_duplicate_key_ignore: Bool, on_duplicate_key_update: Array(Strings) to be updated, and possibly in future also validate: true, as giving option to ignore validation might speed up large imports. I don't have strong feelings one way or another.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably have the best of both worlds using a crystal overloaded method signature. If you need, here are the docs on overloading

Copy link
Member

@robacarp robacarp Apr 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t have strong opinions on this either. I just noticed a rubyism that might be able to be cleaned up with some crystal.

params = [] of DB::Any
now = Time.now.to_utc
fields.reject! { |field| field === "id" } if primary_name === "id"
index = 0

statement = String.build do |stmt|
stmt << "INSERT"
stmt << " INTO #{quote(table_name)} ("
stmt << fields.map { |field| quote(field) }.join(", ")
stmt << ") VALUES "

model_array.each do |model|
model.updated_at = now if model.responds_to? :updated_at
model.created_at = now if model.responds_to? :created_at
next unless model.valid?
stmt << '('
stmt << fields.map_with_index { |_f, idx| "$#{index + idx + 1}" }.join(',')
params.concat fields.map { |field| model.to_h[field] }
stmt << "),"
index += fields.size
end
end.chomp(',')

if options["update_on_duplicate"]?
if columns = options["columns"]?
statement += " ON CONFLICT (#{quote(primary_name)}) DO UPDATE SET "
columns.each do |key|
statement += "#{quote(key)}=EXCLUDED.#{quote(key)}, "
end
end
statement = statement.chomp(", ")
elsif options["ignore_on_duplicate"]?
statement += " ON CONFLICT DO NOTHING"
end

log statement, params

open do |db|
db.exec statement, params
end
end

private def last_val
return "SELECT LASTVAL()"
end
Expand Down
34 changes: 34 additions & 0 deletions src/adapter/sqlite.cr
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,40 @@ class Granite::Adapter::Sqlite < Granite::Adapter::Base
end
end

def import(table_name : String, primary_name : String, fields, model_array, **options)
params = [] of DB::Any
now = Time.now.to_utc
fields.reject! { |field| field === "id" } if primary_name === "id"

statement = String.build do |stmt|
stmt << "INSERT "
if options["update_on_duplicate"]?
stmt << "OR REPLACE "
elsif options["ignore_on_duplicate"]?
stmt << "OR IGNORE "
end
stmt << "INTO #{quote(table_name)} ("
stmt << fields.map { |field| quote(field) }.join(", ")
stmt << ") VALUES "

model_array.each do |model|
next unless model.valid?
model.updated_at = now if model.responds_to? :updated_at
model.created_at = now if model.responds_to? :created_at
stmt << '('
stmt << Array.new(fields.size, '?').join(',')
params.concat fields.map { |field| model.to_h[field] }
stmt << "),"
end
end.chomp(',')

log statement, params

open do |db|
db.exec statement, params
end
end

private def last_val
return "SELECT LAST_INSERT_ROWID()"
end
Expand Down
31 changes: 29 additions & 2 deletions src/granite_orm/transactions.cr
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,36 @@ module Granite::ORM::Transactions
@updated_at : Time?
@created_at : Time?

# The import class method will run a batch INSERT statement for each model in the array
# the array must contain only one model class
# invalid model records will be skipped
def self.import(model_array : Array(self))
begin
@@adapter.import(table_name, primary_name, fields.dup, model_array)
rescue err
raise DB::Error.new(err.message)
end
end

def self.import(model_array : Array(self), update_on_duplicate : Bool, columns : Array(String))
Copy link
Member

@drujensen drujensen Apr 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you would never use update_on_duplicate and ignore_on_duplicate what if we use an enum instead of boolean flags that has options for how to handle duplicates? maybe duplicates: ignore duplicates: update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a nice way to also require columns key when using the update option?

begin
@@adapter.import(table_name, primary_name, fields.dup, model_array, update_on_duplicate: update_on_duplicate, columns: columns)
rescue err
raise DB::Error.new(err.message)
end
end

def self.import(model_array : Array(self), ignore_on_duplicate : Bool)
begin
@@adapter.import(table_name, primary_name, fields.dup, model_array, ignore_on_duplicate: ignore_on_duplicate)
rescue err
raise DB::Error.new(err.message)
end
end

# The save method will check to see if the primary exists yet. If it does it
# will call the update method, otherwise it will call the create method.
# This will update the timestamps apropriately.
# This will update the timestamps appropriately.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

def save
return false unless valid?

Expand Down Expand Up @@ -91,7 +118,7 @@ module Granite::ORM::Transactions
end
end

module ClassMethods
module ClassMethods
def create(**args)
create(args.to_h)
end
Expand Down
2 changes: 1 addition & 1 deletion src/granite_orm/validators.cr
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ require "./error"
# !user.name.to_s.blank?
# end
#
# validate :name, "can't be blank", -> (user : User) do
# validate :name, "can't be blank", ->(user : User) do
# !user.name.to_s.blank?
# end
#
Expand Down