Skip to content
This repository has been archived by the owner on Feb 8, 2018. It is now read-only.

deduplicate routes and customers #3806

Closed
chadwhitacre opened this issue Oct 5, 2015 · 15 comments
Closed

deduplicate routes and customers #3806

chadwhitacre opened this issue Oct 5, 2015 · 15 comments

Comments

@chadwhitacre
Copy link
Contributor

Reticketed from #2779. We don't have unique (network, address) in exchanges, nor participants.balanced_customer_href.

@chadwhitacre
Copy link
Contributor Author

Can all duplicates be traced to an absorption?

@chadwhitacre chadwhitacre changed the title resolve duplicate routes and customers deduplicate routes and customers Oct 5, 2015
@chadwhitacre
Copy link
Contributor Author

I find 12 balanced_customer_hrefs that are duplicated, each in two records.

@chadwhitacre
Copy link
Contributor Author

11 are explained by absorptions.

@chadwhitacre
Copy link
Contributor Author

For the twelfth, the two participant accounts are clearly a duplicate of each other, but it's not clear how they both ended up with the same balanced_customer_href.

@chadwhitacre
Copy link
Contributor Author

The twelfth only has a bank account, not a credit card.

@chadwhitacre
Copy link
Contributor Author

For the 11 absorptions, the path is clear: delete balanced_customer_href from the archived_as account. How about the twelfth?

@chadwhitacre
Copy link
Contributor Author

For the twelfth, it's clear that one is the primary account and the other is secondary, based on the amount of tip & withdrawal activity. There's only one withdrawal against the secondary account. I propose that we:

  • remove balanced_customer_href from the secondary account
  • add a note to the one exchange associated with that account linking to here (I checked, and there isn't already a note)
  • when we mix bank accounts into backfill the status, route, and ref columns of the exchanges table #2779, special case the assignment of the appropriate transaction URL and JSON payload to the one transaction associated with the secondary (duplicate) account

@chadwhitacre
Copy link
Contributor Author

Looks like we've got a similar situation to #2085 but with credit cards (not with bank accounts[!]):

=> select count(*) from exchange_routes where address like 'CC%';
┌───────┐
│ count │
├───────┤
│  3619 │
└───────┘
(1 row)

=> select count(*) from exchange_routes where address like '/cards/CC%';
┌───────┐
│ count │
├───────┤
│  4018 │
└───────┘
(1 row)

@chadwhitacre
Copy link
Contributor Author

Planning to fix with:

UPDATE exchange_routes
   SET address='/cards/'||address
 WHERE address like 'CC%';

@chadwhitacre
Copy link
Contributor Author

Wrong! Those are network='braintree-cc'. ;-)

@chadwhitacre
Copy link
Contributor Author

Actually, I don't think we want to deduplicate routes for archived participants. The reason is that the histories are in fact kept separate. In this case, it's appropriate to have separate exchange_routes that both reference the same Balanced customer.

@chadwhitacre
Copy link
Contributor Author

... which means we should back out #3806 (comment).

@chadwhitacre
Copy link
Contributor Author

Done.

@chadwhitacre
Copy link
Contributor Author

#!/usr/bin/env python -u
from __future__ import absolute_import, division, print_function, unicode_literals

from gratipay import wireup

db = wireup.db(wireup.env())


with db.get_cursor() as cur:
    customers = cur.all("""\
    SELECT * FROM (
        SELECT DISTINCT ON(balanced_customer_href)
               balanced_customer_href, count(username) n, array_agg(username) usernames
          FROM participants
         WHERE balanced_customer_href is not null
      GROUP BY balanced_customer_href
    ) _ ORDER BY n DESC
    """)
    for rec in customers:
        assert rec.n in (1, 2)
        if rec.n == 2:
            print(rec.n, rec.balanced_customer_href, ", ".join(rec.usernames), end='')
            one, two = rec.usernames
            absorption = cur.one("""\
                SELECT *
                  FROM absorptions
                 WHERE (archived_as=%(one)s AND absorbed_by=%(two)s)
                    OR (archived_as=%(two)s AND absorbed_by=%(one)s)
            """, dict(one=one, two=two))
            print(":", absorption.id if absorption else None)
            if absorption is None:
                print('    |')
                print('    |')
                continue

            print("    | Should unset balanced_customer_href for {}.".format(absorption.archived_as))
            good_routes = cur.all("""\

                SELECT er.*, p.username, p.id as user_id
                  FROM exchange_routes er
                  JOIN participants p ON er.participant = p.id
                 WHERE p.username=%s
                   AND network='balanced-cc'

            """, (absorption.absorbed_by,))
            if good_routes:
                assert len(good_routes) == 1

            bad_routes = cur.all("""\

                SELECT er.*, p.username, p.id as user_id
                  FROM exchange_routes er
                  JOIN participants p ON er.participant = p.id
                 WHERE p.username=%s
                   AND network='balanced-cc'

            """, (absorption.archived_as,))
            if bad_routes:
                assert len(bad_routes) == 1
                if good_routes and (bad_routes[0].id == good_routes[0].id):
                    print("    | True duplicate! {}".format(bad_routes[0].id))
                elif good_routes and (bad_routes[0].address == good_routes[0].address):
                    print("    | Should fold route for {} into route for {}."
                          .format(bad_routes[0].user_id, good_routes[0].user_id))
                else:
                    print("    | Archived participant has an unduplicated route!")
            else:
                print('    |')

            if 0 and absorption is not None:
                cur.one("""\

                    UPDATE participants
                       SET balanced_customer_href=null
                     WHERE username=%s

                """)

    print()
    raise Exception  # trigger rollback

@chadwhitacre
Copy link
Contributor Author

Actually, I don't think we want to deduplicate routes for archived participants. The reason is that the histories are in fact kept separate. In this case, it's appropriate to have separate exchange_routes that both reference the same Balanced customer.

Closing as wont-fix based on this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant