-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional README updates #563
Merged
jkowens
merged 13 commits into
zdennis:master
from
dillonwelch:additional-readme-updates
Nov 14, 2018
Merged
Changes from 9 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
f3b7f78
Add section about validate_uniqueness option
dillonwelch 6236a32
Add section about not using array of hashes
dillonwelch 017a815
Add section about counter cache logic
dillonwelch 7a04386
Add section about running tests
dillonwelch c826757
Add section about ActiveRecord timestamps
dillonwelch e8c28a2
Add section with examples
dillonwelch 1ffbffa
Add sections for duplicate key options + move uniqueness validation o…
dillonwelch f1a45eb
Add section on supported adapters
dillonwelch b3f69b6
Add relevant examples from the home wiki page
dillonwelch 225a645
Fix up formatting issues from Wiki to README copy-paste
dillonwelch 0e15c5b
Add name to contributors list
dillonwelch d0a94a2
Make updates on hashes examples
dillonwelch 7aad08d
Make another update on hash examples
dillonwelch File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,16 +21,313 @@ and then the reviews: | |
That would be about 4M SQL insert statements vs 3, which results in vastly improved performance. In our case, it converted | ||
an 18 hour batch process to <2 hrs. | ||
|
||
The gem provides the following high-level features: | ||
|
||
* activerecord-import can work with raw columns and arrays of values (fastest) | ||
* activerecord-import works with model objects (faster) | ||
* activerecord-import can perform validations (fast) | ||
* activerecord-import can perform on duplicate key updates (requires MySQL or Postgres 9.5+) | ||
|
||
## Table of Contents | ||
|
||
* [Examples](#examples) | ||
* [Introduction](#introduction) | ||
* [Columns and Arrays](#columns-and-arrays) | ||
* [ActiveRecord Models](#activerecord-models) | ||
* [Batching](#batching) | ||
* [Recursive](#recursive) | ||
* [Options](#options) | ||
* [Duplicate Key Ignore](#duplicate-key-ignore) | ||
* [Duplicate Key Update](#duplicate-key-update) | ||
* [Uniqueness Validation](#uniqueness-validation) | ||
* [Array of Hashes](#array-of-hashes) | ||
* [Counter Cache](#counter-cache) | ||
* [ActiveRecord Timestamps](#activerecord-timestamps) | ||
* [Callbacks](#callbacks) | ||
* [Supported Adapters](#supported-adapters) | ||
* [Additional Adapters](#additional-adapters) | ||
* [Requiring](#requiring) | ||
* [Autoloading via Bundler](#autoloading-via-bundler) | ||
* [Manually Loading](#manually-loading) | ||
* [Load Path Setup](#load-path-setup) | ||
* [Conflicts With Other Gems](#conflicts-with-other-gems) | ||
* [More Information](#more-information) | ||
* [Contributing](#contributing) | ||
* [Running Tests](#running-tests) | ||
|
||
### Examples | ||
|
||
#### Introduction | ||
|
||
Without `activerecord-import`, you'd write something like this: | ||
|
||
```ruby | ||
10.times do |i| | ||
Book.create! :name => "book #{i}" | ||
end | ||
``` | ||
|
||
This would end up making 10 SQL calls. YUCK! With `activerecord-import`, you can instead do this: | ||
|
||
```ruby | ||
```ruby | ||
books = [] | ||
10.times do |i| | ||
books << Book.new(:name => "book #{i}") | ||
end | ||
Book.import books # or use import! | ||
``` | ||
|
||
and only have 1 SQL call. Much better! | ||
|
||
#### Columns and Arrays | ||
|
||
The @#import@ method can take an array of column names (string or symbols) and an array of arrays. Each child array represents an individual record and its list of values in the same order as the columns. This is the fastest import mechanism and also the most primitive. | ||
|
||
```ruby | ||
columns = [ :title, :author ] | ||
values = [ ['Book1', 'FooManChu'], ['Book2', 'Bob Jones'] ] | ||
|
||
# Importing without model validations | ||
Book.import columns, values, :validate => false | ||
|
||
# Import with model validations | ||
Book.import columns, values, :validate => true | ||
|
||
# when not specified :validate defaults to true | ||
Book.import columns, values | ||
``` | ||
|
||
#### ActiveRecord Models | ||
|
||
The @#import@ method can take an array of models. The attributes will be pulled off from each model by looking at the columns available on the model. | ||
|
||
```ruby | ||
books = [ | ||
Book.new(:title => "Book 1", :author => "FooManChu"), | ||
Book.new(:title => "Book 2", :author => "Bob Jones") | ||
] | ||
|
||
# without validations | ||
Book.import books, :validate => false | ||
|
||
# with validations | ||
Book.import books, :validate => true | ||
|
||
# when not specified :validate defaults to true | ||
Book.import books | ||
``` | ||
|
||
The @#import@ method can take an array of column names and an array of models. The column names are used to determine what fields of data should be imported. The following example will only import books with the @:title@ field: | ||
|
||
```ruby | ||
books = [ | ||
Book.new(:title => "Book 1", :author => "FooManChu"), | ||
Book.new(:title => "Book 2", :author => "Bob Jones") | ||
] | ||
columns = [ :title ] | ||
|
||
# without validations | ||
Book.import columns, books, :validate => false | ||
|
||
# with validations | ||
Book.import columns, books, :validate => true | ||
|
||
# when not specified :validate defaults to true | ||
Book.import columns, books | ||
|
||
# result in table books | ||
# title | author | ||
#--------|-------- | ||
# Book 1 | NULL | ||
# Book 2 | NULL | ||
|
||
``` | ||
|
||
#### Batching | ||
|
||
The @#import@ method can take a @:batch_size@ option to control the number of rows to insert per INSERT statement. The default is the total number of records being inserted so there is a single INSERT statement. | ||
|
||
```ruby | ||
books = [ | ||
Book.new(:title => "Book 1", :author => "FooManChu"), | ||
Book.new(:title => "Book 2", :author => "Bob Jones"), | ||
Book.new(:title => "Book 1", :author => "John Doe"), | ||
Book.new(:title => "Book 2", :author => "Richard Wright") | ||
] | ||
columns = [ :title ] | ||
|
||
# 2 INSERT statements for 4 records | ||
Book.import columns, books, :batch_size => 2 | ||
``` | ||
|
||
#### Recursive | ||
|
||
NOTE: This only works with PostgreSQL. | ||
|
||
Assume that Books <code>has_many</code> Reviews. | ||
|
||
```ruby | ||
books = [] | ||
10.times do |i| | ||
book = Book.new(:name => "book #{i}") | ||
book.reviews.build(:title => "Excellent") | ||
books << book | ||
end | ||
Book.import books, recursive: true | ||
``` | ||
|
||
### Options | ||
|
||
#### Duplicate Key Ignore | ||
|
||
[MySQL](http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html), [SQLite](https://www.sqlite.org/lang_insert.html), and [PostgreSQL](https://www.postgresql.org/docs/current/static/sql-insert.html#SQL-ON-CONFLICT) (9.5+) support `on_duplicate_key_ignore` which allows you to skip records if a primary or unique key constraint is violated. | ||
|
||
```ruby | ||
book = Book.create! title: "Book1", author: "FooManChu" | ||
book.title = "Updated Book Title" | ||
book.author = "Bob Barker" | ||
|
||
Book.import [book], on_duplicate_key_ignore: true | ||
|
||
book.reload.title # => "Book1" (stayed the same) | ||
book.reload.author # => "FooManChu" (stayed the same) | ||
``` | ||
|
||
The option `:on_duplicate_key_ignore` is bypassed when `:recursive` is enabled for [PostgreSQL imports](https://github.com/zdennis/activerecord-import/wiki#recursive-example-postgresql-only). | ||
|
||
#### Duplicate Key Update | ||
|
||
MySQL, PostgreSQL (9.5+), and SQLite (3.24.0+) support @on duplicate key update@ (also known as "upsert") which allows you to specify fields whose values should be updated if a primary or unique key constraint is violated. | ||
|
||
One big difference between MySQL and PostgreSQL support is that MySQL will handle any conflict that happens, but PostgreSQL requires that you specify which columns the conflict would occur over. SQLite models its upsert support after PostgreSQL. | ||
|
||
h2. Basic Update | ||
|
||
```ruby | ||
book = Book.create! title: "Book1", author: "FooManChu" | ||
book.title = "Updated Book Title" | ||
book.author = "Bob Barker" | ||
|
||
# MySQL version | ||
Book.import [book], on_duplicate_key_update: [:title] | ||
|
||
# PostgreSQL version | ||
Book.import [book], on_duplicate_key_update: {conflict_target: [:id], columns: [:title]} | ||
|
||
# PostgreSQL shorthand version (conflict target must be primary key) | ||
Book.import [book], on_duplicate_key_update: [:title] | ||
|
||
book.reload.title # => "Updated Book Title" (changed) | ||
book.reload.author # => "FooManChu" (stayed the same) | ||
``` | ||
|
||
h2. Using the value from another column | ||
|
||
```ruby | ||
book = Book.create! title: "Book1", author: "FooManChu" | ||
book.title = "Updated Book Title" | ||
|
||
# MySQL version | ||
Book.import [book], on_duplicate_key_update: {author: :title} | ||
|
||
# PostgreSQL version (no shorthand version) | ||
Book.import [book], on_duplicate_key_update: { | ||
conflict_target: [:id], columns: {author: :title} | ||
} | ||
|
||
book.reload.title # => "Book1" (stayed the same) | ||
book.reload.author # => "Updated Book Title" (changed) | ||
``` | ||
|
||
h2. Using Custom SQL | ||
|
||
```ruby | ||
book = Book.create! title: "Book1", author: "FooManChu" | ||
book.author = "Bob Barker" | ||
|
||
# MySQL version | ||
Book.import [book], on_duplicate_key_update: "author = values(author)" | ||
|
||
# PostgreSQL version | ||
Book.import [book], on_duplicate_key_update: { | ||
conflict_target: [:id], columns: "author = excluded.author" | ||
} | ||
|
||
# PostgreSQL shorthand version (conflict target must be primary key) | ||
Book.import [book], on_duplicate_key_update: "author = excluded.author" | ||
|
||
book.reload.title # => "Book1" (stayed the same) | ||
book.reload.author # => "Bob Barker" (changed) | ||
``` | ||
|
||
h2. PostgreSQL Using constraints | ||
|
||
```ruby | ||
book = Book.create! title: "Book1", author: "FooManChu", edition: 3, published_at: nil | ||
book.published_at = Time.now | ||
|
||
# in migration | ||
execute <<-SQL | ||
ALTER TABLE books | ||
ADD CONSTRAINT for_upsert UNIQUE (title, author, edition); | ||
SQL | ||
|
||
# PostgreSQL version | ||
Book.import [book], on_duplicate_key_update: {constraint_name: :for_upsert, columns: [:published_at]} | ||
|
||
|
||
book.reload.title # => "Book1" (stayed the same) | ||
book.reload.author # => "FooManChu" (stayed the same) | ||
book.reload.edition # => 3 (stayed the same) | ||
book.reload.published_at # => 2017-10-09 (changed) | ||
``` | ||
|
||
#### Uniqueness Validation | ||
|
||
By default, `activerecord-import` will not validate for uniquness when importing records. Starting with `v0.27.0`, there is a parameter called `validate_uniqueness` that can be passed in to trigger this behavior. This option is provided with caution as there are many potential pitfalls. Please use with caution. | ||
|
||
```ruby | ||
Book.import books, validate_uniqueness: true | ||
``` | ||
|
||
### Array of Hashes | ||
|
||
Due to the counter-intuitive behavior that can occur when dealing with hashes instead of ActiveRecord objects, `activerecord-import` will raise an exception when passed an array of hashes. If you have an array of hash attributes, you should instead use them to instantiate an array of ActiveRecord objects and then pass that into `import`. | ||
|
||
See https://github.com/zdennis/activerecord-import/issues/507 for discussion. | ||
|
||
```ruby | ||
arr = [ | ||
{ bar: 'abc' }, | ||
{ baz: 'xyz' }, | ||
{ bar: '123', baz: '456' } | ||
] | ||
|
||
# An exception will be raised | ||
Foo.import arr | ||
|
||
# better | ||
arr.map! { |args| Foo.new(args) } | ||
Foo.import arr | ||
``` | ||
|
||
### Counter Cache | ||
|
||
When running `import`, `activerecord-import` does not automatically update counter cache columns. To update these columns, you will need to do one of the following: | ||
|
||
* Provide values to the column as an argument on your object that is passed in. | ||
* Manually update the column after the record has been imported. | ||
|
||
### ActiveRecord Timestamps | ||
|
||
If you're familiar with ActiveRecord you're probably familiar with its timestamp columns: created_at, created_on, updated_at, updated_on, etc. When importing data the timestamp fields will continue to work as expected and each timestamp column will be set. | ||
|
||
Should you wish to specify those columns, you may use the option @timestamps: false@. | ||
|
||
However, it is also possible to set just @:created_at@ in specific records. In this case despite using @timestamps: true@, @:created_at@ will be updated only in records where that field is @nil@. Same rule applies for record associations when enabling the option @recursive: true@. | ||
|
||
If you are using custom time zones, these will be respected when performing imports as well as long as @ActiveRecord::Base.default_timezone@ is set, which for practically all Rails apps it is | ||
|
||
### Callbacks | ||
|
||
|
@@ -70,6 +367,24 @@ end | |
Book.import valid_books, validate: false | ||
``` | ||
|
||
### Supported Adapters | ||
|
||
The following database adapters are currently supported: | ||
|
||
* MySQL - supports core import functionality plus on duplicate key update support (included in activerecord-import 0.1.0 and higher) | ||
* MySQL2 - supports core import functionality plus on duplicate key update support (included in activerecord-import 0.2.0 and higher) | ||
* PostgreSQL - supports core import functionality (included in activerecord-import 0.1.0 and higher) | ||
* SQLite3 - supports core import functionality (included in activerecord-import 0.1.0 and higher) | ||
* Oracle - supports core import functionality through DML trigger (available as an external gem: "activerecord-import-oracle_enhanced":https://github.com/keeguon/activerecord-import-oracle_enhanced) | ||
* SQL Server - supports core import functionality (available as an external gem: "activerecord-import-sqlserver":https://github.com/keeguon/activerecord-import-sqlserver) | ||
|
||
If your adapter isn't listed here, please consider creating an external gem as described in the README to provide support. If you do, feel free to update this wiki to include a link to the new adapter's repository! | ||
|
||
To test which features are supported by your adapter, use the following methods on a model class: | ||
* supports_import?(*args) | ||
* supports_on_duplicate_key_update? | ||
* supports_setting_primary_key_of_imported_objects? | ||
|
||
### Additional Adapters | ||
|
||
Additional adapters can be provided by gems external to activerecord-import by providing an adapter that matches the naming convention setup by activerecord-import (and subsequently activerecord) for dynamically loading adapters. This involves also providing a folder on the load path that follows the activerecord-import naming convention to allow activerecord-import to dynamically load the file. | ||
|
@@ -179,6 +494,28 @@ For more information on activerecord-import please see its wiki: https://github. | |
|
||
To document new information, please add to the README instead of the wiki. See https://github.com/zdennis/activerecord-import/issues/397 for discussion. | ||
|
||
### Contributing | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jkowens not sure if there's any other info here you want to add |
||
|
||
#### Running Tests | ||
|
||
The first thing you need to do is set up your database(s): | ||
|
||
* copy `test/database.yml.sample` to `test/database.yml` | ||
* modify `test/database.yml` for your database settings | ||
* create databases as needed | ||
|
||
After that, you can run the tests. They run against multiple tests and ActiveRecord versions. | ||
|
||
This is one example of how to run the tests: | ||
|
||
```ruby | ||
rm Gemfile.lock | ||
AR_VERSION=4.2 bundle install | ||
AR_VERSION=4.2 bundle exec rake test:postgresql test:sqlite3 test:mysql2 | ||
``` | ||
|
||
Once you have pushed up your changes, you can find your CI results [here](https://travis-ci.org/zdennis/activerecord-import/). | ||
|
||
# License | ||
|
||
This is licensed under the ruby license. | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkowens @ahmad-sanad I want to confirm that I understood the issue here correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So an array of hashes does work, but only if the columns are consistent in every hash in the array.
e.g.
The potential workarounds discussed were:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, thanks for adding this!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NP! Thanks for the clarification. I will fix this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in regards to no. 2 the thought was to make use of Enumerable's
#group_by
.For example:
I think this would be the recommended workaround, but I don't have any benchmark data to support that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated hash examples again with
group_by
@jkowens