Analysing expenses with negative values #66

andrepinho · 2016-09-15T05:13:29Z

I'm trying to figure out how these expenses with negative net_value impact on last year's data-set.
I tried to test the hypothesis on this issue #29 but it's not going to be as simple as I thought it would.

TL;DR
Looking only at last year's data-set:

There's 17646 of those expenses.
Looking for those 17646 document_number in positive expenses finds 13458 matches;
When trying to isolate the negative values by removing the ones with positive matches I got
4458 expenses (only 4403 unique).

Then looking at previous year's data-set:

6655 matches for the 4458 (negative values without a positive counterpart) through document_number;
From those, 17 have negative value;

So... it's a mess. I'll appreciate any insight that might explain those numbers as well as ideas on how to enhance this analysis in general.

pingfreud · 2016-09-23T19:32:08Z

@andrepinho , any further advances?

This looks nasty.

andrepinho · 2016-09-26T18:43:33Z

@pingfreud No real advances yet. I actually didn't look at the analysis since I opened the PR. Shame on me.
Been thinking about it though: one idea I have is to see if there's any document_number showing up in more than 2 expenses. That might explain some of these number while introducing the question: How can there be the same flight more than twice?

Will give it a try soon enough.

weslleymberg · 2016-10-20T01:17:15Z

I think document_number alone is not very reliable because it is very prone to human error. I've talked something about it here and here on issue #32.

Maybe adding other columns such as applicant_id or congressperson_id (or any other column you find interesting) on the filter can help you find more reliable matches.

Another thing to try is to get a list with all unique tiquets, try and remove the prefix and search by the ticket's number alone with something like df[df['document_number'].str.contains("<ticket number>")] (not sure about the syntax)

I think this can help you with these questions:

Those remaining 4.188 are negative expenses without a corresponding positive one?;

The document numbers are messed up? Maybe something like: "Bilhete: 957-2117.270689" for the negative and "Vôo: 957-2117.270689" for the positive;

Hope this helps =)

Irio · 2016-11-23T15:18:43Z

@andrepinho Before we can merge it, please finish your analysis with the following changes:

Explain the reason you're doing the analysis in the notebook itself. The explanation may link to a GitHub issue, but the notebook should be self-contained.
Explain the conclusions you had with the analysis. You may check this example before writing your own.
Make sure .html and .py files are up to date with your .ipynb.

andrepinho · 2016-11-23T20:28:34Z

@Irio I'm closing this PR because it's out of date with the latest findings. The rework on net_values actually solves the issue that this analysis tries to shed light upon.

…egularities Multiprocessing for irregularities command

andrepinho added 2 commits September 15, 2016 01:28

Exploring how documents with negative value impact the dataset

43dc688

Some more exploration but ran out of ideas

4583a83

filipelinhares added work in progress analysis labels Sep 15, 2016

andrepinho mentioned this pull request Oct 19, 2016

Negative expenses on the dataset #29

Closed

andrepinho closed this Nov 23, 2016

Irio added a commit that referenced this pull request Feb 27, 2018

Merge pull request #66 from datasciencebr/cuducos-multiprocessing-irr…

c67e0d7

…egularities Multiprocessing for irregularities command

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analysing expenses with negative values #66

Analysing expenses with negative values #66

andrepinho commented Sep 15, 2016

pingfreud commented Sep 23, 2016

andrepinho commented Sep 26, 2016 •

edited

Loading

weslleymberg commented Oct 20, 2016

Irio commented Nov 23, 2016

andrepinho commented Nov 23, 2016

Analysing expenses with negative values #66

Analysing expenses with negative values #66

Conversation

andrepinho commented Sep 15, 2016

pingfreud commented Sep 23, 2016

andrepinho commented Sep 26, 2016 • edited Loading

weslleymberg commented Oct 20, 2016

Irio commented Nov 23, 2016

andrepinho commented Nov 23, 2016

andrepinho commented Sep 26, 2016 •

edited

Loading