Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make most output file and directory names shorter #618

Closed
3 tasks done
BryceStevenWilley opened this issue Sep 19, 2022 · 11 comments · Fixed by #700
Closed
3 tasks done

Make most output file and directory names shorter #618

BryceStevenWilley opened this issue Sep 19, 2022 · 11 comments · Fixed by #700
Assignees

Comments

@BryceStevenWilley
Copy link
Collaborator

BryceStevenWilley commented Sep 19, 2022

Windows (which most users will be downloading Github artifacts to if their test fails), has a default limit of full file paths of 260. Trying to extract specific files to view their contents will fail sometimes. We should more generally have shorter artifact names.

Steps to shorten artifact names:

  1. Assess how long names are currently: guess an average amounts of chars for each item, e.g. scenario names, timestamps
  2. see by how many characters we would be over the 260 limit
  3. Choose from the below ideas of how to best shorten file names to get under 260 (not worth it if we are well above 260 with all changes):
  • Time - env vars for timezones, allows use to remove UTC from the end (4 characters per instance, 8 total?)
  • "16hrs 14mins 15secs" -> “16h14m15s” (10 characters per instance, 20 total)
  • Shorten start of folder name: just alkiln instead of alkin_tests_output (12 characters)
  • use timestamp instead of human-readable times in some places (which?) (27 characters, likely would only do that inside the folder, so only in one place)
  • shorten amount of scenario name that's included (context dependent on how much we want to shorten)
@BryceStevenWilley
Copy link
Collaborator Author

BryceStevenWilley commented Oct 4, 2022

Ran into this today, so here's some rough calculations for how longs names are in windows, taking the following steps:

The prefix of the path is C:\Users\bwilley\Downloads\. Most people's user names won't be longer than 25 chars or so, so that's 43 chars max for the prefix max, avg closer to 28.

Then due to how windows unzips files, it's the main zip name + the very similar internal folder name: alkiln_tests_output-2022-10-04 at 17hrs 25mins 58secs UTC\alkiln_tests_output-2022-91-04 at 17hr 26min 41sec 573ms UTC. In this case it's 118 characters, and I think that's what it will always be, no variation.

  • 20 for the titles,
  • 20 for the dates,
  • 57 for the timestamps

Here's an example image that won't extract inside that dir: error_on-direct-standard-fields-Fail with was uexepctedly not able to continue for invalid field input message-2022-91-04 at 17hr 31min 30sec 942ms UTC.jpg, which is 155 characters.

  • 23 for screen id
  • 78 for scenario description,
  • 10 for the date,
  • 30 for the timestamp
  • 4 for the extension

The whole thing was 299, so ~40 characters over the limit.

@BryceStevenWilley
Copy link
Collaborator Author

Ran into this again. It does seem that a lot of the characters are from things that we can't shorten, only take out completely (the Downloads folder + screen id + scenario description, 30 + 10 to 30 + 50 to 90 = 90 to 150). The conservative way, avoiding any sort of info loss would be to do "16hrs 14mins 15secs" -> "16h14m15s", saving a total of 30 chars, and changing "alkin_test_output" to "alkiln" to save another 10. That would put this specific example below the file limit, and would help most people avoid the problem.

If we want to be more rigorous, we could put a hard length limit on the screen id and scenario description (how much info is in the scenario description after the first 80 chars?). That can come later though, and IMO isn't urgent.

BryceStevenWilley added a commit that referenced this issue Oct 25, 2022
Avoid artifact names whose whole paths are longer than 260 characters,
which causes issues on Windows machines. See
#618 for more details and
analysis. Doesn't fix completely, but does make less likely.
BryceStevenWilley added a commit that referenced this issue Nov 9, 2022
Avoid artifact names whose whole paths are longer than 260 characters,
which causes issues on Windows machines. See
#618 for more details and
analysis. Doesn't fix completely, but does make less likely.
BryceStevenWilley added a commit that referenced this issue Nov 9, 2022
Avoid artifact names whose whole paths are longer than 260 characters,
which causes issues on Windows machines. See
#618 for more details and
analysis. Doesn't fix completely, but does make less likely.
@plocket
Copy link
Collaborator

plocket commented Apr 7, 2023

To reiterate and add a little, my current plan:

  • "alkin_test_output" to "alkiln", as you described (-11)
  • Shorten the time for the root folder as you suggested (-13) I think this would be pretty safe - not sure how anyone would run 2 tests on the same branch within the same second. If we're worried, we could add ms, which would instead be -9.
  • Change any date-time (other than the root folder) to be a timestamp (just numbers) not a human readable time. Really it's just an id to keep things in order at that point. (-27 and retains ms). I'm concerned people will set a field, take a screenshot, set another field, etc, and I'm not sure how fast puppeteer is - might it be able to take those 2 screenshots within the same second?
  • Create a max for scenario descriptions < 80 because I've seen some essays there (maybe we're doing that already)
  • Max screen id of 30
  • 'screenshot' to 'pic'

So taking the maxes and using a scenario's default screenshot name, which I think is the longest path name (where 1234567890123 represents a timestamp):

Dir names:

input count
Prefix 43
alkiln-2022-10-04 at 17h25m58sUTC 33
alkiln-2022-10-04 at 17h25m58sUTC 33
80 char scenario name1234567890123 93

Filename

input count
screenshot_on- 14
max-screen-id 30
1234567890123 13
.jpg 4

Total = 263

[That's 3 too many, actually, so changing 'screenshot' to 'pic' will save us 7 and get us just under the limit. Maybe will have a 20 char limit on screen id instead of 30 to give us some extra room to work with. As I said lower down, I've seen some really deeply nested stuff with long names. Maybe decrease the max scenario name cutoff too.]

Leaving 36 extra characters for a really really nested folder name.

I'm not completely sure about losing that much of the human-readable date/time info from the scenario folders and screenshot filenames, but even 2022-10-04 17h25m58s123msUTC comes out to 28 chars, which would be 15 more for each, which I guess is 30, leaving 6 extra for people deeply nesting stuff in folders with long names. I happen to know at least one person who really deeply nests stuff.

Maybe we could shorten 'screenshot' to 'pic'. That would give us 7 more...

I'll try to clobber out something.

Just had a possible idea for some of our current users, though! Bit of a pain, but maybe they could put the zip onto another system, like google drive, and open files there.

@plocket plocket self-assigned this Apr 7, 2023
@plocket
Copy link
Collaborator

plocket commented Apr 7, 2023

lol, just rediscovered the random inputs screenshots prefix for a folder name: random_answers_screenshots-timestamp. Inside that are pics with random-timestamp.jpg names. Yikes. And that random screenshot should really have the page id.

So, with all the changes (after the adjustments I mentioned, but forgot to calculate into that table):

Dir names:

input count
Prefix 43
alkiln-2022-10-04 at 17h25m58sUTC/ 34
alkiln-2022-10-04 at 17h25m58sUTC/ 34
70 char scenario name1234567890123/ 84
random_input-1234567890123/ 27
total 222

Filename:

input count
random- 7
max-screen-id 20
1234567890123 13
.jpg 4
total 44

Total: 266 (6 over the limit)

We'll also have story_table_pics soon with a similar configuration.

Le sigh. That's not too bad if their folders aren't too deeply nested I suppose. That is, assuming they don't go over 37 chars in their prefix.

@plocket
Copy link
Collaborator

plocket commented Apr 7, 2023

Do [dir-separating] slashes (/) count as characters?

@BryceStevenWilley
Copy link
Collaborator Author

Yeah, it includes folder separators.

the maximum length for a path is MAX_PATH, which is defined as 260 characters. A local path is structured in the following order: drive letter, colon, backslash, name components separated by backslashes, and a terminating null character. For example, the maximum path on drive D is "D:\some 256-character path string"

https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry

@plocket
Copy link
Collaborator

plocket commented Apr 7, 2023

With a dir prefix that's 36 chars long (with slashes), current longest of our own paths is 272 274 [as per the updated description below - thanks!]:

Edited:
c:/Users/abcdefg/Downloads/1234567890/alkiln-2023-02-13 at 00h27m28s213ms UTC/alkiln-2023-02-13 at 00h27m28s213ms UTC/Fail with error page from random input-2023-02-13 at 00h31m08s455ms UTC/random_answers_screenshots-2023-02-13 at 00h31m09s224ms UTC/random-1676248269525.jpg

@nonprofittechy
Copy link
Member

Thanks, this is helpful for us Windows folks. Just a quick note about your path calculation: you need to include the drive letter and colon in the path length. That just adds two letters. Like: c:\users\ not just \users\

@plocket
Copy link
Collaborator

plocket commented Apr 7, 2023

I keep being tempted to take this opportunity to refactor, but I'm thinking that I'd like this change to be fairly small since we want to cherry pick it into v4 as well as have it in v5. I do think refactoring this functionality would be a win in the future.

@plocket
Copy link
Collaborator

plocket commented Apr 8, 2023

This is 255 characters:

c:/Users/abcdefg/Downloads/1234567890/alkiln-2023-04-08 at 02h08m58sUTC/alkiln-2023-04-08 at 02h08m58sUTC/Longest filepath should be under 184 characters with scenario descript-1680919740070/random1680919742254/random-25-characters-questi1680919742327.jpg

Thoughts?

plocket added a commit that referenced this issue Apr 8, 2023
plocket added a commit that referenced this issue Apr 8, 2023
Merge v5 first, then this will close #618.
plocket added a commit that referenced this issue Apr 19, 2023
plocket added a commit that referenced this issue Apr 19, 2023
plocket added a commit that referenced this issue Apr 19, 2023
@plocket
Copy link
Collaborator

plocket commented Apr 20, 2023

Closed by #700 and PRs for v5 as well

@plocket plocket closed this as completed Apr 20, 2023
plocket added a commit that referenced this issue Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants