Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible fix for issue #409 #416

Closed
wants to merge 6 commits into from
Closed

Possible fix for issue #409 #416

wants to merge 6 commits into from

Conversation

jonseaberg
Copy link
Contributor

I had a similar problem to the one reported in #409. I think this is a very specific problem limited to the following setup:

  • MySQL using utf_unicode_ci or utf_general_ci collation
  • A unique index on the names column of the tags table

In my case I had a tag with text "inupiat" and someone was trying to add the tag "Iñupiat". With my setup these strings are considered the same to the database, but compare differently in ruby. My solution was to tie the given text "Iñupiat" to the existing tag with text "inupiat". I did this by creating a table of strings from the given list and joining that to the tags table.

mysql> SELECT tags.*, names.list_name FROM `tags` INNER JOIN (SELECT 'IÑUPIAT' AS list_name) AS names ON tags.name = names.list_name WHERE (lower(name) = 'iñupiat');
+--------+---------+---------+------+-----------+
| id     | name    | context | sort | list_name |
+--------+---------+---------+------+-----------+
| 168084 | inupiat | NULL    |    0 | IÑUPIAT   |
+--------+---------+---------+------+-----------+
1 row in set (0.00 sec)

This gives back the original string and the existing tag if one exists. Now we can return the matched tag or create one if needed.

The other two commits are changes I made to get the specs running on my machine. They may be addressed by other work, so I am happy to remove them if needed.

@bf4
Copy link
Collaborator

bf4 commented Dec 10, 2013

Can you rebase against master and force push?

@jonseaberg
Copy link
Contributor Author

@bf4 Rebased and pushed. Let me know if there is anything else I can do.

@@ -1,4 +1,3 @@
# coding: utf-8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz put back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Sorry.

@jonseaberg
Copy link
Contributor Author

This change solved my issue, but after some thought it may not be the best approach. I am thinking it might be better to leave the code as is and catch the ActiveRecord::RecordNotUnique exception thrown on create, build the list of duplicates and re-request those to get the full list to return. Something like:

      duplicates = []
      result = []
      existing_tags = Hash[Tag.named_any(list).group_by{|tag| comparable_name(tag.name)}.map{|k,v| [k,v.first]}]
      list.each do |tag_name|
        existing_tag = existing_tags[comparable_name(tag_name)]
        if existing_tag
          result << existing_tag
        else
          begin
            result << Tag.create(:name => tag_name)
          rescue ActiveRecord::RecordNotUnique
            duplicates << tag_name
          end
        end
      end

      if duplicates.empty?
        result
      else
        result + Tag.named_any(duplicates)
      end

This may also help if two users try to add the same tag at the same time.

@bf4
Copy link
Collaborator

bf4 commented Dec 11, 2013

fwiw, Ernie Miller suggested the following uniqueness check code in his RailsConf talk last year

# UNIQUENESS
class Record < ActiveRecord::Base
  # validates :email, uniqueness: true
  # def save(*)
  #  super
  # rescue ActiveRecord::RecordNotUnique => e
  #  errors.add(:email, :taken)
  #  false
  # end
  validates :title,
    uniqueness: { scope: :subtitle }
  def save(*)
    super
  rescue ActiveRecord::RecordNotUnique => e
    attr_name, *scope = e.message.
      match(/Key \(([^\)]+)\)/)[1].split(/,\s*/)
    message = errors.generate_message(attr_name, :taken)
    if scope.any?
      message << " (scope: #{scope.to_sentence})"
    end
    errors.add attr_name, message
    false
  end
end

@bf4
Copy link
Collaborator

bf4 commented Dec 11, 2013

In any case... I haven't carefully read the code yet.. my comments thus far are on a first review. Will do more careful review soon.

@@ -15,10 +15,14 @@ def using_sqlite?
::ActiveRecord::Base.connection && ::ActiveRecord::Base.connection.adapter_name == 'SQLite'
end

def using_mysql?
::ActiveRecord::Base.connection && ::ActiveRecord::Base.connection.adapter_name.downcase =~ /mysql/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this to make a db connection, when before it just checked the configured adapter? /mysql/ === ActiveRecord::Base.connection_config[:adapter]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to be consistent with using_postgresql? and using_sqlite?. I did some testing and they all can be implemented to check the value of ActiveRecord::Base.connection_config[:adapter]. Do you think it makes sense to change all of the using_ methods to check the connection_config?

@bf4
Copy link
Collaborator

bf4 commented Dec 27, 2013

Essentially, though, the bug is that the code identifies inupiat as identical to Iñupiat when strict_case_match is off? If so, then that should be one of the tests, no?

@jonseaberg
Copy link
Contributor Author

Good point. The tests I wrote for this case were deleted in my last commit. I think I have confused the issue by committing a second solution. My first solution addressed the problem I described and does not assume a unique index is on tag name. The current code and my solution are subject to a race condition which would allow duplicate tags, so I considered a second solution.

The second solution I came up with in the last commit assumes there is a unique index on the tag name, which may be best for another time. That said, I think I should revert my last commit. What do you think?

@bf4
Copy link
Collaborator

bf4 commented Jan 10, 2014

Let's try to finish this up

…icate tags."

This reverts commit 7be5492.

This code assumes there is a unique index on tags.name.  Reverting because
adding a unique index on name is custom behavior and not every consumer of
the library will add the index.
@jonseaberg
Copy link
Contributor Author

@bf4 Sounds good. I think its good to solve one issue at a time, so I reverted my last commit which tried to handle the case of creating duplicate tags. This PR only deals with the original issue of accent insensitive collation resulting in duplicate tags. Can you take another look and let me know if there are additional changes to make?

@bf4
Copy link
Collaborator

bf4 commented Jan 13, 2014

Firstly, you probably want to rebase off of master and get the tests passing :)

@@ -15,10 +15,18 @@ def using_sqlite?
::ActiveRecord::Base.connection && ::ActiveRecord::Base.connection.adapter_name == 'SQLite'
end

def using_mysql?
/mysql/ === ActiveRecord::Base.connection_config[:adapter]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the same semantics as the other db checks (that do not require a db connection)

@jonseaberg
Copy link
Contributor Author

@bf4 Closing in favor of #498

@jonseaberg jonseaberg closed this Mar 19, 2014
@bf4
Copy link
Collaborator

bf4 commented Mar 19, 2014

@jonseaberg Thanks for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants