Skip to content
This repository has been archived by the owner on Feb 8, 2018. It is now read-only.

run Gittip #7 #169

Closed
chadwhitacre opened this issue Jul 20, 2012 · 17 comments
Closed

run Gittip #7 #169

chadwhitacre opened this issue Jul 20, 2012 · 17 comments

Comments

@chadwhitacre
Copy link
Contributor

Care must be taken to use the old Stripe code one more time instead of the new Balanced code. We didn't get data migrated from Stripe to Balanced in time to use Balanced this week.

@chadwhitacre
Copy link
Contributor Author

pid-34940 thread-140735091989696 (MainThread) Traceback (most recent call last):
pid-34940 thread-140735091989696 (MainThread)   File "/Users/whit537/personal/gittip/www.gittip.com/gittip/cli.py", line 22, in payday
pid-34940 thread-140735091989696 (MainThread)     billing.payday()
pid-34940 thread-140735091989696 (MainThread)   File "/Users/whit537/personal/gittip/www.gittip.com/gittip/billing.py", line 396, in payday
pid-34940 thread-140735091989696 (MainThread)     payday_loop(payday_start, participants)
pid-34940 thread-140735091989696 (MainThread)   File "/Users/whit537/personal/gittip/www.gittip.com/gittip/billing.py", line 436, in payday_loop
pid-34940 thread-140735091989696 (MainThread)     payday_one(payday_start, participant)
pid-34940 thread-140735091989696 (MainThread)   File "/Users/whit537/personal/gittip/www.gittip.com/gittip/billing.py", line 451, in payday_one
pid-34940 thread-140735091989696 (MainThread)     short = total - participant['balance']
pid-34940 thread-140735091989696 (MainThread) TypeError: unsupported operand type(s) for -: 'Decimal' and 'NoneType'

@chadwhitacre
Copy link
Contributor Author

There's a typecheck on total immediately before, so the offending value is participant['balance'].

@chadwhitacre
Copy link
Contributor Author

Per schema.sql, balance can be NULL but should default to 0.0:

balance               numeric(35,2)               DEFAULT 0.0

@chadwhitacre
Copy link
Contributor Author

Two out of 3,178 records in the participants table have balance NULL.

@chadwhitacre
Copy link
Contributor Author

The only places I see where balance is set explicitly are in billing.py:

        UPDATE participants
           SET balance = (balance + pending)
             , pending = NULL
        UPDATE participants
           SET balance=(balance - %s)
         WHERE id=%s
           AND pending IS NOT NULL
     RETURNING balance
        UPDATE participants
           SET last_bill_result=%s
             , balance=(balance + %s)
         WHERE id=%s

@chadwhitacre
Copy link
Contributor Author

It turns out that NULL + number is NULL:

gittip=# select null + 1 as result;
 result 
--------
   NULL
(1 row)

gittip=# 

@chadwhitacre
Copy link
Contributor Author

This first one smells worse to me than the other two:

        UPDATE participants
           SET balance = (balance + pending)
             , pending = NULL

@chadwhitacre
Copy link
Contributor Author

Is it guaranteed that pending in the first assignment is the value of pending before the second assignment?

Sure sounds like it:

The expression can use the old values of this and other columns in the table.

@chadwhitacre
Copy link
Contributor Author

Let's go look at history for those two NULL balances ...

@chadwhitacre
Copy link
Contributor Author

Interesting! One joined a week ago, the other is unclaimed but was created right after the first joined. I bet the first tipped the second.

Hypothesis: the memory errors that caused #143 also factored into this.

@chadwhitacre
Copy link
Contributor Author

Oh wow! So it turns out both balance NULL accounts were created during payday last week!

2012-07-13 17:49:02.299015+00 ts_start
2012-07-13 17:50:09.919073+00 ctime for first participant with balance NULL
2012-07-13 18:00:26.152734+00 ctime for second participant with balance NULL
2012-07-13 18:01:43.170725+00 ts_end

I double-checked, and a GitHub user gets an entry in the participants table simply when anyone visits https://www.gittip.com/github/{login}/ for the first time. I cross-checked the tips table, and indeed the two participants in this case are unrelated, other than they both were "upserted" in Gittip while payday was running.

So how exactly did this happen?

@chadwhitacre
Copy link
Contributor Author

Ah! It's because when they were created, their pending value was set to NULL. Then when this was added to their 0.0 balance at the end of payday, balance became NULL. QED.

@chadwhitacre
Copy link
Contributor Author

So how do we repair the existing database, and how do we close this hole?

Ack! This means that anyone creating an account RIGHT NOW will fall prey to the same bug. :-(

@chadwhitacre
Copy link
Contributor Author

And in fact we do have three new participants, with pending NULL. Hmm ...

@chadwhitacre
Copy link
Contributor Author

Fix

The repair is to set balance to 0.0 where it is NULL. Right? My concern is that this would clobber someone's positive balance. But since they joined during the immediately prior payday, it's guaranteed that they couldn't possibly have a positive balance at the start of this payday, although they may of course receive tips during this payday (although the two cases in question aren't this week).

For the three ... I mean four (you turkeys!) participants that are created during the current payday, we should let them go until the end of this payday run. Their balance will become NULL during the ending of payday (as with the two from last week), at which point we should reset their balance to 0.0.

Repair

The first fix is to put a NOT NULL constraint on balance (see #35, and rue the day), so we would catch this the week it first happened instead of the next week.

The second fix is to add a WHERE pending IS NOT NULL clause to the update where this broke down:

        UPDATE participants
           SET balance = (balance + pending)
             , pending = NULL
         WHERE pending IS NOT NULL

That will ensure that participants created after payday starts are not clobbered at the end of payday.

We then need to tweak the queries at the start of payday to guarantee that any participants that were created after the start of payday will not be included in the list of participants to include in this payday. That means adding WHERE ctime > ts_start clauses on the queries that zero out the pending column and fetch the list of participants for payday.

Of course now that we have a test suite for billing (#44) we should also write a test for this before fixing.

Here's another question: What happens if someone who signed up before payday tips an account that's created during payday? I believe we need to take steps to protect against that as well.

@chadwhitacre
Copy link
Contributor Author

The people who joined during the first (aborted) payday run were picked up in the second, meaning they had their pending value set to 0.0 before proceeding, and balance survived the event intact. No-one else joined during the second run, so there's no further need to repair the db at this point relative to this issue.

@chadwhitacre
Copy link
Contributor Author

Fix for root cause of failure reticketed as #170. Case closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant