Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch TheyWorkForYou’s internal calculation of comparisons to twfy-votes #1742

Closed
ajparsons opened this issue Nov 21, 2023 · 8 comments · Fixed by #1755
Closed

Switch TheyWorkForYou’s internal calculation of comparisons to twfy-votes #1742

ajparsons opened this issue Nov 21, 2023 · 8 comments · Fixed by #1755
Assignees

Comments

@ajparsons
Copy link
Contributor

The XML files from twfy-votes also contain additional fields around comparison with parties.

This can in principle replace the ‘PartyCohort’ generation approach in TheyWorkForYou with a faster and more maintainable approach, and remove complexity from the TheyWorkForYou codebase.

Implementing this requires changing the current sourcing of party comparison scores from TheyWorkForYou’s database, to using entries stashed in the key lookup from the XML import.

This is not quite a prerequisite (but would help a lot) with the change to the public whip comparison algorithm.

Options include:

  1. Don’t do this. Make any adjustments to the comparison algorithm in TheyWorkForYou at same time and maintain both approaches.
  2. Change the data source, but leave the PartyCohort approach in the code base.
  3. Change the data source, remove duplicate approaches from the code base.

I think there’s a good case for at least 2, and would leave 2 vs 3 up to people with a better sense of the pros and cons.

--

A reason to keep the current approach intact would be as a third cross-check against the two approaches used in TWFY-votes. But this is practically a bit awkward - and wouldn’t be subject to the same automated cross-checking. There is a broader question about improving the twfy-votes test suite based on the TheyWorkForYou examples. TheyWorkForYou has scenario tests on fake data - whereas twfy-votes tests by applying two different approaches (following the same principles) on live data and checking for alignment.

@dracos
Copy link
Member

dracos commented Nov 21, 2023

Not sure who has the "better sense of the pros and cons" - I don't see any advantage to having the same calculation in multiple places from a maintenance point of view. (I do think fake data tests are useful in general.)

@ajparsons
Copy link
Contributor Author

Yeah, given they're already defined in TWFY just need a bit of thought on moving those across.

@ajparsons
Copy link
Contributor Author

ajparsons commented Nov 27, 2023

So above I suggest changing everything so it stores information in the same place as the per person score, but an easier way might be to adapt the partypolicy table and store it there.

This table already is being reused by the cohort approach - where the 'party' column, now refers to a hash of a set of comparable MPs.

We can adjust this key to be person_id-party-chamber - e.g. 10001-labour-commons, and then use this to store the comparisons.

So the two steps would be:

  1. Adjust member>cohortKey to create this new key (it should use 'cohortParty' for the party bit)
  2. Add to the new json ingest something that populates this table based on the party comparison scores coming in.

The advantage of this is a) it's not a lot of finding all the references and b) twfy-votes is sometimes calculating multiple party comparisons for party switchers. This isn't being fed down the TWFY feed, but could be.
The TWFY database would have all the comparison options, and I could lose the logic in twfy-votes replicating the 'which party is primary comparison', and it would all be done in TWFY.

@dracos
Copy link
Member

dracos commented Dec 14, 2023

Sorry, I don't follow your latest comment. I think my original comment (which was a bit terse, sorry), was saying go with your option 3, and remove stuff from TWFY if it's being calculated elsewhere. But I think you then/now think the opposite, and say remove it from twfy-votes and calculate it only in TWFY? If you think that works best and has the advantages, then I'm fine with that too, just not clear from this on the implementation.

@ajparsons
Copy link
Contributor Author

No sorry I'm being confusing.

What I'm saying in the last comment is:

  • move all the calculations out to twfy-votes
  • continue to use the same partyPolicy table to store the data - given that all the display logic is set to feed from that table (and the existing cohort logic can be amended to store and retrieve individual comparison scores).

This is least steps from where we currently are - but is less thorough than storing the party comparisons in the same place as the actual scores.

Basically, calculations out of TWFY, there are a few options for where to store them when they come back - leave it with you on what's the best long term approach.

@dracos
Copy link
Member

dracos commented Dec 15, 2023

partypolicy table has a house column so presumably that doesn't need to be in the "party" key. If you're going that way, I'd add a person_id column to party_policy and then 'party' could actually be just the party?

I /think/ with the changes in #1752, if I've understood this right, this is looking at the comparison_party, comparison_distance_from_policy fields in the alignments and adding them to the right place in TWFY (given the above), removing all the cohort stuff that would otherwise be generating that data? And changing the cohort code to then use this new data storage, and it could assume that the per-person figure it gets for a policy is the 'right' one? Have I understood that?

@ajparsons
Copy link
Contributor Author

That sounds right.

  • Originally, we were storing the comparison for a whole party per policy.
  • Currently, we're optimising by storing for an effectively similar cohort per policy.
  • We want to move it so each row is an individual comparison per policy (and take out all the generating logic).

Adding some fields to the table would make it cleaner that's what's happening - yep. I think that means very slightly more work with the cohortKey because it won't be doing a single string lookup anymore - but not a big amount of work - and the clarity is probably worth it.

@ajparsons
Copy link
Contributor Author

The associated PR is good to merge, I'm happy with the tests on this function, will do this early next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants