Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor search query with quotes #2491

Merged
merged 24 commits into from
Dec 1, 2022
Merged

Refactor search query with quotes #2491

merged 24 commits into from
Dec 1, 2022

Conversation

mrharpo
Copy link
Contributor

@mrharpo mrharpo commented Nov 16, 2022

Quotes

Refactors ApplicationHelper::query_to_terms_array to lib/query_to_terms_array.rb

Better handles quoted searches and edge cases.

  • Splits into sub-functions:
    • terms_array
      • The final array of search terms
    • stopwords
      • Get stopwords from stopwords.txt
      • Store in Rails cache
    • quoted_phrases
      • Returns list of phrases between two quotation marks as Array<String>
    • quoted_terms_arrays
      • Returns terms_array of quoted phrases as Array<Array<String>>
    • unquoted_query
      • Returns String of original query, minus quoted phrases
    • unquoted_terms
      • Returns Array<Array<String>> of all unquoted terms from query, minus stopwords
    • strip_special_chars
      • Removes all non-alphanumeric characters
  • Better documentation of steps
  • Only parse quotes if even number of "
    • Queries with an odd number get stripped of punctuation
    • Fixes 500 error from solr

app/views/catalog/index.html.erb

  • Adds ` ` (backtick) escaping
    • Allows all " and ' characters
  • Uses raw() formatting

specs/lib/query_to_terms_array_spec.rb

  • Adds unit tests to ensure full coverage of functionality
    • Empty query returns []
    • Capitalizes the user query
    • Removes punctuation from quoted and unquoted strings and
    • Leaves punctuation in quoted strings
    • Matches multiple sets of double quoted phrases
    • Strips all quotes if there are an odd number of quotation marks

Closes #2453

@mrharpo mrharpo requested a review from afred November 16, 2022 19:21
@mrharpo mrharpo self-assigned this Nov 16, 2022
Copy link
Contributor

@afred afred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is still in draft, and looking good, but just a few stylistic change suggestions:

  • For methods whose main job is to return a thing, it often reads better to just have the method named the thing, and avoid the get_ prefix. Also, parentheses are usually left off when there aren't any params defined., e.g. use def stopwords instead of def get_stopwords().
  • No need for parentheses unless you have params defined.
  • Blocks within do...end are indented 2 spaces.
  • No need to assign a local variable stopwords if it's not getting used anywhere else.
  • Put all these methods into a PORO somewhere, like ./lib/query_to_terms_array.rb.
  • Use constructor to take in the parameter, assign it to an (ideally immutable)
    attribute.
  • For immutable attributes, use attr_reader, which is just a shortcut for:
    def query
      @query
    end
    
    This is not required, but i like it because it ensures you won't accidentally
    change the @query attribute, and it provides a clean separation between the
    object's interface (public methods meant to be used by consumer) and data
    attributes which, by default are all mutable.
  • Any method that does not need to be called by a consumer of the object
    can be private. In Ruby, any method defined beneath private
    is private
    • I like to indent below private, but some people don't, because there is no
      end after.
    • Using private does not affect functionality or performance, but it's
      good OOP practice because it:
      • Forces the programmer to think about the object interface, what it
        consumers of the object need, and what they don't.
      • You are only obliged to test the public interface of an object, so
        keeping as much private as possible saves you some work.

Putting it all together i imagine would shape up to be something like this:

# ./lib/query_to_terms_array.rb
class QueryToTermsArray
  attr_reader :query

  def initialize(query)
    @query = query
  end

  def terms_array
    # logic!
  end

  private
  
    def stopwords
      Rails.cache('stopwords') do
        # get the stopwords from the file
      end
    end

    def quoted_phrases
      query.scan(/"([^"]*)"/)
    end

    def query_minus_quoted_phrases
      # logic!
    end
end

# And can be used in controllers like this...
terms_array = QueryToTermsArray.new(query).term_array

@mrharpo mrharpo marked this pull request as ready for review November 23, 2022 18:00
Adds specs and does a little refactoring of constructor and other methods to
(hopefully) make the logic a little more clear and maintainable.
@afred afred assigned mrharpo and unassigned mrharpo Nov 30, 2022
Copy link
Contributor

@afred afred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@afred afred merged commit 65251e0 into master Dec 1, 2022
@afred afred deleted the MLAAAPB-53-quote-search branch December 1, 2022 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conducting searches with quotes yields results that don't include transcript snippets
2 participants