List files processed but excluded from multiuser stats and leader boards #71

jteresco · 2018-04-15T18:41:38Z

We have a user who has submitted a second list for travels in an app in addition to his real travels. He asked to remove it from the system to keep the site's stats and leader boards legitimate. A task to consider is how to allow such a list file to be processed for a user and to be able to see maps and stats for such a user but exclude them from multiuser stats and leader boards. These kinds of list files might be useful for some other purposes. One way I might use them is to see stats and maps for a single trip for which I'd submit a list file.

jteresco · 2021-08-20T22:40:57Z

This issue came back to mind as I traveled a route this week that previously had 0 TM travelers. But since it will show up in both terescoj.list (my overall travels) and jt2021.list (my 2021 travels), it will look like 2 users once I add it to both (I'm just adding travels on this trip to terescoj.list for now). I am pretty sure there are several "users" like this - in fact I have three extra users for subsets of my own travels.

I'm not sure how best to go forward on this. An extra file somewhere listing .list file names that should be excluded from rankings and traveler counts? Can it all be done in the site update and DB or will there need to be some web front end fixes as well?

yakra · 2021-08-21T02:04:10Z

Or some kind of delimiter pattern like _jt2021.list, __jt2021__.list, +jt2021.list, etc.?

jteresco · 2022-12-31T03:58:56Z

Or some kind of delimiter pattern like _jt2021.list, jt2021.list, +jt2021.list, etc.?

Thinking back on this issue a bit, trying to decide if using such a pattern for these kinds of lists might mean this could be entirely a web front end issue. Maybe it's a matter of modifying the SQL queries to ignore traveler names that have the selected pattern.

jteresco · 2022-12-31T04:10:04Z

TravelMapping/Web#741

jteresco · 2022-12-31T04:13:45Z

The DataProcessing contribution here might be to introduce a new DB table for the travelers, with a column indicating if this is a "real" user or one of these users that should be excluded from stats and rankings. This could also be an opportunity to store some other user info in the DB if there are things we wanted to store during site update.

jteresco · 2024-06-11T01:43:40Z

I might try to tackle this one soon. Summarizing options for how to indicate that a .list file is not a "real user":

Put non-real user list in a subdirectory of list_files in the repository, or possibly all lists into one of two subdirectories, one for real users and one for non-real.
Add a separate file in the repository listing the names of non-real users. If this were to be a .csv file, it could have other augmentations like a "real name" for the user, a home location, etc.
Require something special in the filenames for non-real users like starting with an underscore.

When a non-real user is identified during the site update process, a new table would need to be introduced into the DB to make this information available.

michihdeu · 2024-06-11T09:40:38Z

I like option 2. I dislike Option 1 since it is a potential source of error.

jteresco · 2024-06-12T00:39:05Z

Advantages of the extra file listing these kinds of things (option 2): that's much easier to undo or ignore, and nothing else in the repository needs to change to implement it, and new "normal" files can be created without needing to do anything with the file.

jteresco · 2024-06-12T02:36:29Z

Here's how I'm thinking of it: the site update program will read in a new file in the list_files (or rlist_files) directory which I'm calling travelerinfo.csv. The initial version looks like this so far, putting in several of the known non-real traveler names:

traveler;real
defaults;1
6lane.list;0
6lanetest.list;0
futureinterstates.list;0
jt2015a3;0
jt2020;0
jt2020q5;0
jt2021;0
jt2022;0
jt2023;0
jt2024;0

The site update program would read this file, then use its contents to add lines to the generated .sql file to create and populate a new table called travelers. It would have a table row for every .list file that was imported, and a DB table column for each field in the CSV file. The first line of the travelerinfo.csv file gives the column names, and the second line gives the default values to use for each traveler if they are not listed in the remainder of the CSV. For those listed in the CSV, the values there would be used instead.

This would mean additional fields could be added later without further modifications to the site update program.

Once this information is in the DB, it could be used by the web front end in the implementation for TravelMapping/Web#741 to exclude the non-real users from rank counting.

@yakra (or others), any thoughts?

michihdeu · 2024-06-12T15:51:00Z

I liked the augmentations idea of option 2. I‘d minimum add a description column to name what the user refers to eg 6lanetest

jteresco · 2024-06-13T02:41:54Z

Definitely on the description column.

yakra · 2024-06-14T02:56:08Z

This would mean additional fields could be added later without further modifications to the site update program.

We'd still need to get all the CREATE TABLE args right.
This suggests to me an external .sql file as done in df09ebd.
siteupdate could read it in and spit it right back out into the DB file, then append the data from the .csv

For turning the .csv data into an SQL line, maybe use strtok, strcspn or some std::string-based method to get our single-quotes, commas, parentheses, whatever else; whatever makes sense & gets the job done.

Sanity checks:

Double up ' single quotes as needed
Disallow " double quotes
Make sure each line has the right number of fields.
As needed, make sure numeric fields are in fact numeric.
Probably use the same valid_num_str function used to validate Waypoint lat/lng.
When to do this? When the field is indicated in the previous external .sql file as being INTEGER, FLOAT, or... are there other numeric types?
Anything else?

jteresco · 2024-06-14T13:10:03Z

The sanity checks are definitely important. Especially doubling up the single quotes. I would have forgotten that.

For determining types of each field, maybe another "special" line? So the format could be:

Line of field names
Line of field SQL types (can be used both to pick correct sanity checks and directly in CREATE TABLE)
Line of default values
Data lines

jteresco · 2024-06-25T16:00:28Z

In the unrankedusers branch, there is now code to read in a .csv and create a new database table. It is not yet being used in any meaningful way. The TravelMappingTest database on noreaster currently includes this table.

jteresco · 2024-06-25T17:50:20Z

To do: omit unranked users from traveled-format graphs.

yakra · 2024-06-26T02:33:43Z

To do: omit unranked users from traveled-format graphs.

Want to find a way to do this that retains the performance benefits of using TMBitsets.
The first thing that comes to mind is to use a sorted TMArray of TravelerList objects.

Best way to do that is to sort the ids/usernames before constructing the objects, which means doing this before this. This looks safe to do per my first quick glance at the new code, as we're just setting up listinfo.
This allows us to change the sort criterion here to use a lambda/function that checks whether both strings are contained in listinfo and sort appropriately.

Hmm. Exact mechanics TBD.
I assume travelers not in listinfo will all default to being ranked.
For those that are, the data will specify whether they're ranked or not, correct?
We can assume the ranked column will always be at a certain column number, or we can find that column in the header row (fieldnames), and use that to index into the listinfo vectors.
Then there's the matter of how ranked & unranked are specified in the data field on each row -- 1 or 0, a text value, possible other values beyond simple ranked/unranked, etc.

The big advantage of doing this:
This keeps all of the ranked users together at the front of the array, and all the unranked users together at the end.
Just change the 2nd arg in this ctor call to num_ranked_travelers or whatever we end up calling it, and the TMBitset for travelers on edges can then point to a contiguous subset of the TMBitset for travelers on segments, and graph generation can proceed just as it does now -- even a wee bit quicker. The existing TMBitset::fast_union function can be used without modification.

Disadvantage:
Travelers will no longer be sorted alphabetically overall.
Just the 2 sub-arrays of ranked & unranked travelers will be.
I can't think of anywhere this comes into play off the top of my head other than stats CSVs. Is this something we even care about?

Should we also omit unranked travelers from stats CSVs?

jteresco · 2024-06-26T02:44:57Z

To do: omit unranked users from traveled-format graphs.

Want to find a way to do this that retains the performance benefits of using TMBitsets.

Thanks for the thoughts on this. Please feel free to make this a very low priority. Maybe it deserves to be spun off into its own issue.

yakra · 2024-06-26T03:17:03Z

Does the .csv file exist anywhere?
I suppose I could read the forum thread ;)

michihdeu · 2024-06-26T18:01:59Z

Do we need any check for https://github.com/TravelMapping/UserData/blob/master/list_files/listfileinfo.csv to avoid crashing the site update?

yakra · 2024-06-26T18:49:26Z

^ Jim's new code has a few errorchecks.

DataProcessing/siteupdate/cplusplus/classes/TravelerList/TravelerList.cpp

Line 212 in a4d1d07

{ el.add_error("Error opening listinfo.csv file.");

DataProcessing/siteupdate/cplusplus/classes/TravelerList/TravelerList.cpp

Line 242 in a4d1d07

    
           {	el.add_error("Number of defaults does not match number of fieldnames in listinfo.csv.");

DataProcessing/siteupdate/cplusplus/classes/TravelerList/TravelerList.cpp

Line 268 in a4d1d07

    
           {	el.add_error("Number of fields does not match number of fieldnames in listinfo.csv for list " + listname);

I haven't looked closely enough too see whether anything else is necessary.
Maybe a sanity check for a minimum of 2 lines, header + defaults?

michihdeu · 2024-06-27T17:59:21Z

Invalid characters?

yakra · 2024-06-30T20:18:30Z

Whether numeric fields can be fully & correctly converted to ints/floats?

@jteresco wrote:

This would mean additional fields could be added later without further modifications to the site update program

@jteresco wrote:

Line of field SQL types (can be used both to pick correct sanity checks and directly in CREATE TABLE)

Has this idea been abandoned?
I really don't have an issue with that being either a Yes or No, other than to note that actually doing this would be more complicated and harder to get right.
If abandoning, easier to add some add'l errorchecks.

DataProcessing/siteupdate/cplusplus/functions/sql_file.cpp

Lines 397 to 399 in a4d1d07

    
           sqlfile << "CREATE TABLE listEntries (traveler VARCHAR(" << DBFieldLength::traveler 
        
           	<< "), description VARCHAR(" << DBFieldLength::listDescription 
        
           	<< "), includeInRanks BOOLEAN);\n";

The number of fields is hard-coded at 3.

DataProcessing/siteupdate/cplusplus/classes/TravelerList/TravelerList.cpp

Lines 240 to 244 in a4d1d07

    
           // check that the number of defaults matches the number of fieldnames 
        
           if (TravelerList::fieldnames.size() != TravelerList::defaults.size()) 
        
           {	el.add_error("Number of defaults does not match number of fieldnames in listinfo.csv."); 
        
           	return; 
        
           }

For this errorcheck, we may be better off making sure the number of fields in each line is the same as a constant, e.g. DBFieldLength::listEntriesNumFields.
TravelerList::read_listinfo could compare each field against DBFieldLength::listDescription.

DataProcessing/siteupdate/cplusplus/functions/sql_file.cpp

Line 420 in a4d1d07

// report any remaining entries in the listinfo map as errors (TODO)

Difficulty here is that by the time we're writing the DB, we've already checked for ErrorList entries, found none, and failed to abort.
To do the TMBitset stuff, something has to happen earlier in the program.
Makes sense to combine this stuff.
Maybe something like...

Gather traveler IDs
Read listinfo, store necessary info with the ID, and delete entries from the map
- One option is to expand the traveler IDs, to a pair, tuple or custom struct
- Or maybe construct barebones traveler objects with the listinfo, separating out .list reading into its own function, separate from the ctor, to happen later.
- ...depending on how everything is done.
Sort travelers.
Later on, read .list files.

jteresco · 2024-07-01T01:52:36Z

I don't think it's worth pursuing the more generic .csv idea. If we come up with more fields to add some time, we can update the site update code to handle.

Also, I have no objection to combining/moving around how and when the .csv is read in o help with error checking and efficiency.

Thanks!

jteresco · 2024-07-04T13:32:18Z

This has been merged into master and was used for a morning site update on 7/4/2024. Closing this issue. Please open new issues for feature requests and bug reports related to this new functionality.

jteresco added enhancement user statistics labels Jul 10, 2018

jteresco mentioned this issue Dec 31, 2022

Restrict stats/rankings to "real" users TravelMapping/Web#741

Closed

jteresco added a commit that referenced this issue Jun 25, 2024

Read listinfo.csv for #71

b5694a9

jteresco added a commit that referenced this issue Jun 25, 2024

Build listEntries table in SQL file for #71

a4d1d07

jteresco referenced this issue in TravelMapping/UserData Jun 26, 2024

First version of listfileinfo.csv for unranked users

d348884

jteresco added a commit that referenced this issue Jul 2, 2024

Unranked users changes for #71

9aafcf9

jteresco closed this as completed Jul 4, 2024

This was referenced Jul 4, 2024

Datachecks for listfileinfo.csv #635

Open

Traveled-format graphs should exclude unranked travelers #636

Open

Display of list file description from listfileinfo.csv TravelMapping/Web#783

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List files processed but excluded from multiuser stats and leader boards #71

List files processed but excluded from multiuser stats and leader boards #71

jteresco commented Apr 15, 2018

jteresco commented Aug 20, 2021

yakra commented Aug 21, 2021 •

edited

Loading

jteresco commented Dec 31, 2022

jteresco commented Dec 31, 2022

jteresco commented Dec 31, 2022

jteresco commented Jun 11, 2024

michihdeu commented Jun 11, 2024 •

edited

Loading

jteresco commented Jun 12, 2024

jteresco commented Jun 12, 2024

michihdeu commented Jun 12, 2024

jteresco commented Jun 13, 2024

yakra commented Jun 14, 2024

jteresco commented Jun 14, 2024

jteresco commented Jun 25, 2024 •

edited

Loading

jteresco commented Jun 25, 2024

yakra commented Jun 26, 2024 •

edited

Loading

jteresco commented Jun 26, 2024

yakra commented Jun 26, 2024

michihdeu commented Jun 26, 2024

yakra commented Jun 26, 2024

michihdeu commented Jun 27, 2024

yakra commented Jun 30, 2024

jteresco commented Jul 1, 2024

jteresco commented Jul 4, 2024

List files processed but excluded from multiuser stats and leader boards #71

List files processed but excluded from multiuser stats and leader boards #71

Comments

jteresco commented Apr 15, 2018

jteresco commented Aug 20, 2021

yakra commented Aug 21, 2021 • edited Loading

jteresco commented Dec 31, 2022

jteresco commented Dec 31, 2022

jteresco commented Dec 31, 2022

jteresco commented Jun 11, 2024

michihdeu commented Jun 11, 2024 • edited Loading

jteresco commented Jun 12, 2024

jteresco commented Jun 12, 2024

michihdeu commented Jun 12, 2024

jteresco commented Jun 13, 2024

yakra commented Jun 14, 2024

jteresco commented Jun 14, 2024

jteresco commented Jun 25, 2024 • edited Loading

jteresco commented Jun 25, 2024

yakra commented Jun 26, 2024 • edited Loading

jteresco commented Jun 26, 2024

yakra commented Jun 26, 2024

michihdeu commented Jun 26, 2024

yakra commented Jun 26, 2024

michihdeu commented Jun 27, 2024

yakra commented Jun 30, 2024

jteresco commented Jul 1, 2024

jteresco commented Jul 4, 2024

yakra commented Aug 21, 2021 •

edited

Loading

michihdeu commented Jun 11, 2024 •

edited

Loading

jteresco commented Jun 25, 2024 •

edited

Loading

yakra commented Jun 26, 2024 •

edited

Loading