Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysing expenses with negative values #66

Closed
wants to merge 2 commits into from

Conversation

andrepinho
Copy link
Contributor

I'm trying to figure out how these expenses with negative net_value impact on last year's data-set.
I tried to test the hypothesis on this issue #29 but it's not going to be as simple as I thought it would.

TL;DR
Looking only at last year's data-set:

  • There's 17646 of those expenses.
  • Looking for those 17646 document_number in positive expenses finds 13458 matches;
  • When trying to isolate the negative values by removing the ones with positive matches I got
    4458 expenses (only 4403 unique).

Then looking at previous year's data-set:

  • 6655 matches for the 4458 (negative values without a positive counterpart) through document_number;
  • From those, 17 have negative value;

So... it's a mess. I'll appreciate any insight that might explain those numbers as well as ideas on how to enhance this analysis in general.

@pingfreud
Copy link

@andrepinho , any further advances?

This looks nasty.

@andrepinho
Copy link
Contributor Author

andrepinho commented Sep 26, 2016

@pingfreud No real advances yet. I actually didn't look at the analysis since I opened the PR. Shame on me.
Been thinking about it though: one idea I have is to see if there's any document_number showing up in more than 2 expenses. That might explain some of these number while introducing the question: How can there be the same flight more than twice?

Will give it a try soon enough.

@weslleymberg
Copy link

I think document_number alone is not very reliable because it is very prone to human error. I've talked something about it here and here on issue #32.

Maybe adding other columns such as applicant_id or congressperson_id (or any other column you find interesting) on the filter can help you find more reliable matches.

Another thing to try is to get a list with all unique tiquets, try and remove the prefix and search by the ticket's number alone with something like df[df['document_number'].str.contains("<ticket number>")] (not sure about the syntax)

I think this can help you with these questions:

Those remaining 4.188 are negative expenses without a corresponding positive one?;

The document numbers are messed up? Maybe something like: "Bilhete: 957-2117.270689" for the negative and "Vôo: 957-2117.270689" for the positive;

Hope this helps =)

@Irio
Copy link
Collaborator

Irio commented Nov 23, 2016

@andrepinho Before we can merge it, please finish your analysis with the following changes:

  • Explain the reason you're doing the analysis in the notebook itself. The explanation may link to a GitHub issue, but the notebook should be self-contained.
  • Explain the conclusions you had with the analysis. You may check this example before writing your own.
  • Make sure .html and .py files are up to date with your .ipynb.

@andrepinho andrepinho closed this Nov 23, 2016
@andrepinho
Copy link
Contributor Author

@Irio I'm closing this PR because it's out of date with the latest findings. The rework on net_values actually solves the issue that this analysis tries to shed light upon.

Irio added a commit that referenced this pull request Feb 27, 2018
…egularities

Multiprocessing for irregularities command
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants