-
Notifications
You must be signed in to change notification settings - Fork 61
Feature request: Data aggregation #13
Comments
Hi @awerlang, many thanks for your request and questions ; ) The use case gave birth to Jarbas was a simple API to easily share documents found during some data exploration and analysis. With that in mind Jarbas wasn't designed to explore data, but to bring you data of receipts you found while exploring the datasets (presumably with Jupyter Notebooks inside Serenata de Amor's IMHO Jupyter Notebooks and the quantitative analysis tools packed with Anaconda (the main Python at Serenata de Amor) is way better to explore data than the a standard Python distribution running Django (what we have here at Jarbas). That said I'd address your questions like in these terms:
Fell free to create a Jupyter Notebook within Serenata de Amor repo to explore that — it will be a better choice in terms of performance. Also, working with Jupyter Notebooks allows you to ponder on the bias of each exploration (e.g. asking which deputy expends more probably tends to highlight deputies from North states as they have a higher allowance).
I would say that's not in our radar. The API is useful to list the data of the receipt/s (and, in the future of the supplier/s) related to the documents found in Serenata de Amor exploration and analysis. But, again, this is just my humble opinion. In spite of all that I acknowledge that delegating this kind of analysis to Jupyter Notebooks is making these data less accessible. But I do believe it's a temporary condition: once we find relevant data from questions such as how deputies spent the most money? probably they will foster communication and PR material, and the numbers will then become accessible to anyone. And I also acknowledge that this route infers the bias of our own curatorial layer. I would like to hear from more people (including you, André) about two specific thing related to the question raised by André:
|
I have yet to try it out Junyper. Before I'm able to do that, I'll say that this API can keep its current focus. My original question was more about the API being able to crunch numbers than provinding a nice UI for exploring. I do agree we don't need to reinvent the wheel for that. Also, I should have added it earlier, but anyways, I filled this issue because its more time efficient letting the database work on data than fetching every single result from the API then process in-memory. So, if we want to work on a dataset, should we load all data from the source files into memory inside Junyper, then do anything we want with a programming language, all in-memory? In case affirmative, have you got a good performance? I did not know that OPS tool you linked, it's a good starting point. I would like to see charts. Maps to compare flight tickets expenses would be awesome too. PS: As I said I have yet to try it out Junyper. Perhaps I'll be satisfied when I try it. |
Sure, that's what Anaconda (Python, NumPy, Pandas & cia.) is for : ) They are pretty cleaver in managing datasets — go for it. And Jupyter Notebooks are awesome to explore, to analyze and, last but not least, to share your work. |
One way to quickly explore huge amounts of data is through data aggregation. For instance, what are all CNPJ/CPFs found in expenses? How deputies spent the most money?
Does this API aims to provide such feature? If you intend to address this in any other fashion, please let me know.
The text was updated successfully, but these errors were encountered: