Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line items occasionally have multiple redundant shipments created for them #4190

Closed
dpritchett opened this issue Jan 10, 2014 · 28 comments
Closed

Comments

@dpritchett
Copy link
Contributor

2-0-stable ( revision: 6c87593 )

Description

Twice in the past few weeks I've seen an order get its line items assigned to multiple identical shipments. As an example, earlier today I had a line item with a quantity of two. My order showed up with three shipments of two units of that product, each with its own $2 USPS shipping charges. When one of our warehouse users deleted one of the line items from one of the shipments, it left us with an invalid unrenderable order with two shipments and no line items.

I suspect a race condition where Heroku possibly dispatched the same PUT request to two dynos at about the same time, leading to multiple shipments and inventory units being created. Unfortunately the relevant records are destroyed and I can't confirm further.

Misc

This has been observed roughly around the time I push new code to heroku, on which I am running ten dynos using two Puma processes each. Our backend is a single Heroku postgres database though.

Stacktrace

This is the render failure stacktrace after the line item was deleted via the delete button on one of the three shipment manifest partials.
https://gist.github.com/dpritchett/6252e1cce6952fa2394b

cc @peterberkenbosch

p.s. I plan to snapshot the database before letting anyone fix the order next time.

@peterberkenbosch
Copy link
Member

@dpritchett did the row locking stuff make this go away?

@dpritchett
Copy link
Contributor Author

Sure did! Haven't seen a single deadlock since I implemented it Monday morning. I put it on the order controller, updater, and populator, but you might be able to get away with just putting it on the updater and working your way out.

Order-level rowlocks seem drastic at first but there can't be that many concurrent browser sessions with a legitimate interest in changing the same order, right?

@peterberkenbosch
Copy link
Member

Glad to here, can you share that snippet here as well for reference maybe?

@dpritchett
Copy link
Contributor Author

They aren't nearly as clean as the :update wrapper I linked already, but here they are if you're curious: https://gist.github.com/dpritchett/d919ff806a79a97ebbcb

Basically I just eagerly lock the order first chance I get on any order-changing method I could think of. You might even need to think about doing the same to any API controllers that touch Order. I'd like to think that just putting locks on the Populator and the Updater would suffice, but I haven't proven it either way just yet.

@dpritchett
Copy link
Contributor Author

While the deadlock fix has gotten our store to work a lot smoother it hasn't saved us from duplicate shipments yet. cc @rsbrown

@radar
Copy link
Contributor

radar commented Jan 24, 2014

Could this perhaps be a duplicate call to create_proposed_shipments happening at the same time? Would that be possible?

On Sat, Jan 25, 2014 at 9:13 AM, Daniel J. Pritchett
notifications@github.com wrote:

While the deadlock fix has gotten our store to work a lot smoother it hasn't saved us from duplicate shipments yet. cc @rsbrown

Reply to this email directly or view it on GitHub:
#4190 (comment)

@JDutil
Copy link
Member

JDutil commented Feb 13, 2014

We have a client on Spree 2-0-stable & Heroku experiencing same issue. Coworker is going to be testing out your fixes, and we will let you know our results.

@JDutil
Copy link
Member

JDutil commented Feb 13, 2014

Also noteworthy: Only a single database as well, and 2 dynos so race condition sounds likely here as well.

@dpritchett
Copy link
Contributor Author

#4334 looks like a related patch, might want to check it out as well.
On Feb 12, 2014 7:16 PM, "Jeff Dutil" notifications@github.com wrote:

We have a client on Spree 2-0-stable & Heroku experiencing same issue.
Coworker is going to be testing out your fixes, and we will let you know
our results.

Reply to this email directly or view it on GitHubhttps://github.com//issues/4190#issuecomment-34938615
.

@dpritchett
Copy link
Contributor Author

We wound up using js to disable multiple clicks of the continue button. Ugly but it seems to have helped. The real problem is still an inability for the shipment and package to realize when they're doubling up on the same line item. Possibly inventory unit could use another foreign key or a new position column to enforce the 1:1 nature of the implicit relation to line item quantity? I don't like the sound of that but I don't have a better idea.

@radar
Copy link
Contributor

radar commented Feb 13, 2014

@dpritchett I reckon that #4334 could be the solution to this problem. I'd be keen for you to try that out before doing any additional work on adding more foreign keys.

@JDutil
Copy link
Member

JDutil commented Feb 13, 2014

I think it was brought up to link inventory units to line items in order to improve returns & exchanges. Definitely best not to start mucking with inventory units if #4334 resolves this, but they may be another road to go down if issues persist.

@dpritchett
Copy link
Contributor Author

Here's the log entry that proved to me that the problem was double-submitting. This order was reported to me as having two redundant shipments and I traced it back to here.

My only issue with #4334 is that the store I reported is on 2.0. Not sure if I'll be able to test newer fixes in production any time soon. I'd say if #4334 looks good for y'all then don't worry about me on 2.0 for now. It might be worth thinking about how we could simulate this double PUT for future integration tests though...

Feb 04 15:11:36 myapp-production app/web.2:  Started PUT
"/checkout/update/address" for 108.18.204.240 at 2014-02-04 21:11:36
+0000
Feb 04 15:11:36 myapp-production app/web.4:  Started PUT
"/checkout/update/address" for 108.18.204.240 at 2014-02-04 21:11:36
+0000

@radar
Copy link
Contributor

radar commented Feb 13, 2014

2-1-stable, 2-0-stable. Same but different. 232e618 adds this to my 2-0-stable branch, but may need some modifications depending on how much the tests complain. I'd run them myself but I am in the middle of some work on 2-2-stable, so I figured I'd leave this one to CI.

@radar
Copy link
Contributor

radar commented Mar 3, 2014

@dpritchett Is this good now?

@dpritchett
Copy link
Contributor Author

I am still using my own patches on the store that prompted this issue. Your commits look solid to me, so I'm going to close this. If I get the time+budget to update the store in question I'll revisit this as needed.

Thanks!

@davidoh
Copy link

davidoh commented Sep 25, 2014

I think we need to revisit this issue. @marksb and I encountered the same problem this week with multiple shipments created for a single line item (with quantity 1).

There are multiple before_action callbacks that are calling current_order, but not all of them are calling it with the :lock option. Since the current_order method memoizes @current_order, if the first call to the method does not have the :lock option any subsequent calls to the current_order method will not lock the row in the database even if they are called with the :lock option.

We are using the 2-2-stable branch, and as far as we can see there are several calls to the current_order method in Spree::Core::ControllerHelpers::Order (listed in order):

  1. set_current_order which is added as a before_filter in the same helper. This is the main problem because it's called before load_order_with_lock which is next.
  2. load_order_with_lock which is called by Spree::CheckoutController. This needs to be the first callback that accesses current_order so that the order is locked.
  3. There are a couple of subsequent calls to current_order but since they are after load_order_with_lock they don't interfere with the order being locked.

Does anyone have any thoughts or can confirm this issue? If people can confirm this we can submit a pull request.

@JDutil
Copy link
Member

JDutil commented Sep 29, 2014

This was a tough issue to reproduce when I was dealing with it, and haven't had any issues with this for awhile now.

If your able to work up a PR properly locking the current_order I can review it, but I'm not sure I can provide much help on this front as I don't have a reproducible example or failing spec.

@DangerDawson
Copy link
Contributor

I am also seeing duplicate shipments on line items, and like the above I noticed that it was coming from duplicate HTTP requests.

I have setup a rig which successfully recreates the problem ( mixture of event-machine and some hackery on the controller to turn off some of the security filters ).

I also have a couple of fixes which I will tidy up and send a pull request for although before I do I want to get some collaboration on which strategy / fix I should submit:

  1. Basically wrap the problem code in a transaction
  2. Use a combination of Redis and a GUID to prevent duplicate requests ( we could use the DB instead of Redis although we would have to store a GUID with an index on it, which is not terribly efficient, and also use a daemon / cron task to auto expire the GUID's so the table stays small )

Also from what I can gather postgresql has auto-commit on by default, so I am a bit confused how calling lock on a model without explicitly wrapping it in a transaction will actually do?

e.g.

current_order = Spree::Order.includes(:adjustments).lock(options[:lock]).find_by(id: session[:order_id], currency: current_currency)

Anyway I would love to discuss either option 1 or 2 before I provide a fix, any thoughts?

@JDutil
Copy link
Member

JDutil commented Oct 4, 2014

Option 1 seems like the most appropriate fix. We don't want Spree to depend on redis. I'm also not familiar with what a lock within a transaction would do I would need to research it.

@DangerDawson
Copy link
Contributor

So after a small distraction I am back looking at a possible fix, of which I have 2:

First Approach

We use a combination of pessimistic and optimistic locking to check if the order has been processed by the current transaction:

  • Create a column called state_version on the order model
  • Increment the column and render the checkout/edit.html.rb page and include the value in a hidden field
  • On a commit we lock the order, test and increment the counter in the following way:
    def load_order_with_lock
      @order = current_order(lock: true)
      redirect_to spree.cart_path and return unless @order

      @order.with_lock do
        if params[:order] && params[:order][:state_version]
          if (@order.lock_versionx.to_i != params[:order][:state_version].to_i)
            redirect_to spree.cart_path and return
          end
          @order.increment!(:state_version)
        end
      end

      if params[:state]
        redirect_to checkout_state_path(@order.state) if @order.can_go_to_state?(params[:state]) && !skip_state_validation?
        @order.state = params[:state]
      end
    end

The above needs a bit of tidying up, but you get the ideal

Second Approach

We use a DB table / favourite key value store to write a UUID to a column with a unique constraint against it and timestamp.

  • Create a UUID and assign it to a hidden field on the checkout/edit.html.rb page
  • Put a before filter on the CheckoutController which atomically tries to write the UUID to a data store and continues on a success or redirects on a failure.

Of the 2 approaches the second one is by far simpler, although it really cries out for Redis as you can use key expiry, but can still be accomplished in an ever growing DB table..

thoughts?

@JDutil JDutil reopened this Oct 17, 2014
@JDutil
Copy link
Member

JDutil commented Oct 21, 2014

@DangerDawson I've been looking into this, and I think the first approach is what I'm going for. I'm trying to implement AR's built in optimistic locking http://api.rubyonrails.org/classes/ActiveRecord/Locking/Optimistic.html but it's broken quite a lot of the build. I'll push up where I'm at soon. Perhaps you can provide any feedback, help track down causes of spec failures & peer review the work to help out if interested.

JDutil pushed a commit to JDutil/spree that referenced this issue Oct 21, 2014
@JDutil
Copy link
Member

JDutil commented Oct 21, 2014

@DangerDawson having a difficult time with the built in optimistic locking as it effects so many other pieces of the application. I don't think I would be able to push it out to stable branches either with such a large change that would break custom checkout forms. Using your suggestion of a custom lock version column we're able to roll out the change, and people with custom forms are unaffected if we leave the conditional check for the parameter to be present.

I've pushed a branch with your suggested way of handling the locking just for the checkout controller here:
https://github.com/JDutil/spree/compare/fix-4190

Could you give it a try to see how it works out for you on the simulation you've created. I've still got a couple specs that need to be fixed before merging, and would be good to get some feedback from people on it.

JDutil pushed a commit to JDutil/spree that referenced this issue Oct 21, 2014
@DangerDawson
Copy link
Contributor

Hi Jeff,

I have created a pull request so we can compare notes, I have also added specs:

#5515

Let's have a chat tomorrow ( its late here ) and see if we can agree on the correct fix, then work together on getting any failing specs working.

D

JDutil pushed a commit to JDutil/spree that referenced this issue Oct 22, 2014
… controller. refs spree#4190

Fix broken specs.

Add regression test for state lock version. Fixes spree#4190
JDutil pushed a commit to JDutil/spree that referenced this issue Oct 22, 2014
… controller. refs spree#4190

Fix broken specs.

Add regression test for state lock version. Fixes spree#4190

Conflicts:
	frontend/spec/controllers/spree/checkout_controller_spec.rb
JDutil pushed a commit to JDutil/spree that referenced this issue Oct 22, 2014
… controller. refs spree#4190

Fix broken specs.

Add regression test for state lock version. Fixes spree#4190

Conflicts:
	frontend/spec/controllers/spree/checkout_controller_spec.rb
@JDutil JDutil closed this as completed in db52e1e Oct 22, 2014
@JDutil
Copy link
Member

JDutil commented Oct 22, 2014

@DangerDawson sorry hadn't checked my email before just pushing the fix I implemented. It looks like the approach is essentially the same though. I think simply incrementing version is sufficient, and avoids adding yet another callback to orders like using the uuid does. We really want to cutback on all the order callbacks to improve performance. Also I think it's a bit better to do a redirect and display a message to the user if the version check fails rather than simply returning OK status in case there is some sort of false positive.

It would be a good idea to refactor this though to break the version check out of the load_order_with_lock method as you've done though. Would you want to submit a PR refactoring it to be a bit cleaner like your PR, but drop the uuid stuff? Or I can revisit it.

For anyone else experiencing this issue if you update to latest commit on your branch, and ensure that you have the state_lock_version hidden field https://github.com/spree/spree/blob/master/frontend/app/views/spree/checkout/edit.html.erb#L18 added to your checkout edit form if you've customized it at all this should clear the issue up for you as well.

@DangerDawson
Copy link
Contributor

@JDutil I Will refactor it later today / tonight and also add a spec to ensure with_lock is being called.. then send you a pull request

@JDutil
Copy link
Member

JDutil commented Oct 23, 2014

@DangerDawson awesome thank you.

vergeman pushed a commit to vergeman/spree that referenced this issue Jan 28, 2015
… controller. refs spree#4190

Fix broken specs.

Add regression test for state lock version. Fixes spree#4190
@amitpatelx
Copy link

amitpatelx commented Jul 5, 2017

Intermittently shipments get deleted after order is complete. Do you think that issue has to anything with the locking issue and is the vergeman@918e87f solution for that problem?

I am using I am using Spree 2-3-stable and Rails 4.1.15 on Heroku with two dynos and puma.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants