Backup pictures from all receipts #33

Irio · 2016-09-01T23:05:59Z

It is vital for the project have a way of accessing all receipts, from any reimbursement since the first available and not depend from Chamber of Deputies.

Besides having proofs for legal reports, its useful for offline analyses. #32 is one I think about; doing OCR for generating new structured data is another.

Here's a function that, based on a record from quota datasets, returns the picture URL from the Chamber of Deputies' website:

def document_url(record):
    return 'http://www.camara.gov.br/cota-parlamentar/documentos/publ/%s/%s/%s.pdf' % \
        (record['applicant_id'], record['year'], record['document_id'])

cuducos · 2016-09-02T03:27:26Z

I can write a src/fetch_receipts.py (or even refactor src/fetch_datasets.py to include that). Not sure how long it would take, or how many Gb it will require.

But let's find out ; )

If you don't mind, assign that Issue to me and I tackle it soon.

(OCR'd be awesome in a near future!)

andrewhr · 2016-09-02T13:54:00Z

As I understand, the initial idea is just to have some kind of routine to download everything to a given folder for local storage. I'm ok with that.

It's within the scope of this issue to create a secondary, online source, for this image bank? Something like S3? If this is desirable, making this backup routine somewhat incremental and recurrent is necessary. What do you both think about it? Maybe split this task into a different issue?

cuducos · 2016-09-02T14:19:33Z

@andrewhr Exactly : )

As we say in the CONTRIBUTING.md:

a copy of this data will be avaliable elsewhere (just in case…).

The src/backup_data.py, given the proper API keys, copies the files to a Amazon S3 bucket.

cuducos · 2016-09-02T14:25:06Z

Complementing my last comment: I don't mean that everything we got is working perfectly, I just wanted to say that we have a basis to what you're proposing, @andrewhr — we are in tune!

That said any kind of improvement in this pipeline is welcomed. Feel free to start a new Issue to discuss and implement enhancements on that topic ; )

Script to fetch receipt images Fix #33

…o-1.10.3 Update django to 1.10.3

Irio added the data collection label Sep 2, 2016

cuducos self-assigned this Sep 2, 2016

Irio mentioned this issue Sep 2, 2016

Simple web service to return everything we know about a given reimbursement #34

Closed

cuducos mentioned this issue Sep 2, 2016

Script to fetch receipt images #35

Merged

cuducos closed this as completed in #35 Sep 14, 2016

cuducos added a commit that referenced this issue Sep 14, 2016

Merge pull request #35 from datasciencebr/ec-fetch-receipts

22b0482

Script to fetch receipt images Fix #33

Irio unassigned cuducos Feb 9, 2018

Irio pushed a commit that referenced this issue Feb 27, 2018

Merge pull request #33 from datasciencebr/pyup-update-django-1.10.2-t…

61e681c

…o-1.10.3 Update django to 1.10.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backup pictures from all receipts #33

Backup pictures from all receipts #33

Irio commented Sep 1, 2016 •

edited

Loading

cuducos commented Sep 2, 2016

andrewhr commented Sep 2, 2016

cuducos commented Sep 2, 2016 •

edited

Loading

cuducos commented Sep 2, 2016 •

edited

Loading

Backup pictures from all receipts #33

Backup pictures from all receipts #33

Comments

Irio commented Sep 1, 2016 • edited Loading

cuducos commented Sep 2, 2016

andrewhr commented Sep 2, 2016

cuducos commented Sep 2, 2016 • edited Loading

cuducos commented Sep 2, 2016 • edited Loading

Irio commented Sep 1, 2016 •

edited

Loading

cuducos commented Sep 2, 2016 •

edited

Loading

cuducos commented Sep 2, 2016 •

edited

Loading