Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2028 - Sort heading for CSV Export #2053

Merged
merged 28 commits into from
Sep 14, 2017
Merged

2028 - Sort heading for CSV Export #2053

merged 28 commits into from
Sep 14, 2017

Conversation

rowasc
Copy link
Contributor

@rowasc rowasc commented Sep 12, 2017

This pull request makes the following changes:

  • Respects survey id, survey stage, priority number, alphabetical priority (in that order)
  • Important: For fields that are part of the post itself, including title and description, we do not do stage and priority sorting. They are common fields and they are sorted alphabetically, before any custom field is shown in the CSV
    The sorting procedure is:
  • All native fields go first
  • After native fields
    --- group by survey ID and stage ID
    --- sort groups by numerical key (ie: form id =2 + stage id = 2 go after form id 1, stage id 2)
    --- sort by priority inside each group. if priority is the same, fall back to alphabetical order
    --- flatten, attach to native fields
    --- get all records exported with the correct field order by iterating the heading attributes for each record and printing if there is a value.

TODO

  • Group and sort with survey (so that if you have 4 fields, two in each survey, they are next to each other and not just sorted by priority + stage which would usually match p1 - p1 , s1-s1, p2-p1,p2-p1, etc)

Test checklist:

  • Create one of each type of survey fields in a survey.

    • Sort them however you like.
    • Add posts to the survey, filling the fields . Make sure you publish the posts.
    • Export posts.
    • The CSV heading should match the sort order of your survey.
  • Create a second survey with some custom fields

    • Sort them however you like.
    • Add posts to the survey, filling the fields . Make sure you publish the posts.
    • Make sure your other survey (both) has published posts.
    • Export posts.
    • The CSV heading should match the sort order of your survey, and each set of survey fields should be grouped by survey.
  • run ./bin/behat

Fixes #2028

Ping @ushahidi/platform

… in the post itself.

- Missing support for multi value fields
- needs cleanup & testing -
- Works fine so far with single value fields
Copy link
Contributor

@rjmackay rjmackay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll caveat this with the fact I'm pretty tired. But this looks pretty solid so far. Theres some messy data wrangling, but I think thats mostly due to our messy/mismatched data structure, not your code.

@rowasc
Copy link
Contributor Author

rowasc commented Sep 12, 2017

Some data that helped me testing this to verify the heading:
Native fields:
author_email,author_realname,color,completed_stages.0,completed_stages.1,contact_id,content,created,form_id,form_name,id,locale,message_id,parent_id,post_date,published_to.0,sets.0,slug,source,status,tags.0,tags.1,title,type,updated,user_id
Attributes:
"Last Location (point).lat","Last Location (point).lon","Test varchar.0","Test varchar.1",Categories.0,Categories.1,"Geometry test.0","Second Point.lat","Second Point.lon",Status.0,Links.0,Links.1,"Person Status.0","Last Location.0","Test Field Level Locking 3.0","Test Field Level Locking 4.0","Test Field Level Locking 5.0","A Test Field Level Locking 7.0","Test Field Level Locking 6.0"

Attributes, data for sorting
"Last Location (point).lat","Last Location (point).lon", => no form id, stage 1 , priority 5
"Test varchar.0","Test varchar.1", => form id = 1, stage 1, priority 1
Categories.0,Categories.1,=> form id => 1, stage => 1, priority =>3
"Geometry test.0" => form id = 1, stage 1, priority 5
"Second Point.lat","Second Point.lon" => form id = 1, stage 1, priority 5
Status.0 => form id = 1, stage 1, priority 5
Links.0, Links.1 => form id = 1, stage 1, priority 7
"Person Status.0" => form id = 2, stage 2, priority 5
"Last Location.0" => form id = 4, stage 1, priority 5
"Test Field Level Locking 3.0" => form id = 4, stage 6, priority 0
"Test Field Level Locking 4.0" => form id = 4, stage 6, priority 0
"Test Field Level Locking 5.0" => form id = 4, stage 6, priority 0
"A Test Field Level Locking 7.0" => form id = 4, stage 7, priority 0
"Test Field Level Locking 6.0" => form id = 4, stage 7, priority 0

@rowasc rowasc changed the title WIP - 2028 2028 - Sort heading for CSV Export Sep 12, 2017
Copy link
Contributor

@willdoran willdoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Great use of comments
  • Very good function prototypes

I've read through the sorting, other than code cleaning suggestions, I'm not sure of a better way to sort the data. I'm going to go back through it again. In the mean time, how well does it handle large datasets?

foreach ($fields as $fieldKey => $fieldAttr) {
if (!is_array($fieldAttr)) {
$headingResult[$fieldKey] = $fieldAttr;
} else if (is_array($fieldAttr) && isset($fieldAttr['nativeField'])){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can drop the is_array check because isset will test anything for the presence of a key and the previous if has already confirmed that this must be an array to reach this line.

* uasort is used here to preserve the associative array keys when they are sorted
*/
uasort($attributeKeys, function ($item1, $item2) {
if ($item1['priority'] == $item2['priority']){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use strict === here since we expect the type to be the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah makes sense.

/**
* Separate by fields that have custom priority and fields that do not have custom priority assigned
*/
foreach ($fields as $fieldKey => $fieldAttr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability could you move the (non)priority splitting into a separate function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing this bit? I meant moving 112-119 to its own function just to make this a top level function. Sorry might have not been clear enough, my bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah! I just moved the specific bit with nativeFields :| Sorry about that. I will ping you in a few minutes then with that update

@rowasc
Copy link
Contributor Author

rowasc commented Sep 12, 2017

Some example runs from postman:

2028 Branch

605 posts export -> 95.75kb -> 8529ms
605 posts export -> 95.75kb -> 8584ms
605 posts export -> 95.75kb -> 7427ms

Develop branch

605 posts export -> 93.95kb -> 8096ms
605 posts export -> 93.95kb -> 7789ms
605 posts export -> 93.95kb -> 8371ms

The size difference accounts for fields exported, column diferences (tags, sets and completed_stages column is only always present in the new csv export)
I'm still checking but we should be OK in terms of perf changes.

@rowasc
Copy link
Contributor Author

rowasc commented Sep 12, 2017

@willdoran review feedback was addressed. Can you check? Thanks!

@rowasc
Copy link
Contributor Author

rowasc commented Sep 13, 2017

Results from exporting 4510 posts in 2028 and develop branches.

2028 Branch

  • 4510 posts export -> 635.1 kb -> 41155ms
  • 4510 posts export -> 635.1 kb -> 39775ms
  • 4510 posts export -> 635.1 kb -> 40261ms
  • 4510 posts export -> 635.1 kb -> 41636ms
  • 4510 posts export -> 635.1 kb -> 43092ms

Develop branch

  • 4510 posts export -> 621.86 kb -> 45362ms
  • 4510 posts export -> 621.86 kb -> 46351ms
  • 4510 posts export -> 621.86 kb -> 47052ms
  • 4510 posts export -> 621.86 kb -> 39440ms

@willdoran
Copy link
Contributor

Yep looks good.

@rowasc rowasc merged commit 3d98d04 into develop Sep 14, 2017
@rowasc rowasc deleted the 2028 branch September 14, 2017 12:17
rowasc added a commit that referenced this pull request Sep 14, 2017
rowasc added a commit that referenced this pull request Sep 14, 2017
rowasc added a commit that referenced this pull request Sep 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants