Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the report UUID to the ScubaResults.json filename #1426

Merged
merged 40 commits into from
Nov 22, 2024

Conversation

adhilto
Copy link
Collaborator

@adhilto adhilto commented Nov 14, 2024

🗣 Description

  • Add the report UUID to the ScubaResults.json filename.
  • Adds a new parameter -NumberOfUUIDCharactersToTruncate to allow for trunacation or omission of the UUID appneded to the file name.

TODOs discovered over the course of this issue.
#1254
#1354
#1437
#1438

💭 Motivation and context

Ensures each results file has a unique file name. Closes #1422.

🧪 Testing

  • Ensured PowerShell unit tests pass
  • Added new unit tests to cover the case when there are multiple matching ScubaResults*.json files
  • Manually tested Invoke-Scuba, works as expected
  • Manually tested Invoke-ScubaCached, works as expected
  • Manually tested Invoke-ScubaCached where the input folder has one file matching "ScubaResults*.json", "ScubaResults.json", works as expected
  • Manually tested Invoke-ScubaCached where the input folder has one file matching "ScubaResults*.json", "ScubaResults_{uuid}.json", works as expected
  • Manually tested Invoke-ScubaCached where the input folder has two file matching "ScubaResults*.json", the most recently created file is used, as expected
  • Manually tested Invoke-Scuba and Invoke-ScubaCached with a non-default OutPath parameter
  • Manually tested both Invoke-Scuba and Invoke-ScubaCached with a non-default OutJsonFileName parameter

✅ Pre-approval checklist

  • This PR has an informative and human-readable title.
  • PR targets the correct parent branch (e.g., main or release-name) for merge.
  • Changes are limited to a single goal - eschew scope creep!
  • Changes are sized such that they do not touch excessive number of files.
  • All future TODOs are captured in issues, which are referenced in code comments.
  • These code changes follow the ScubaGear content style guide.
  • Related issues these changes resolve are linked preferably via closing keywords.
  • All relevant type-of-change labels added.
  • All relevant project fields are set.
  • All relevant repo and/or project documentation updated to reflect these changes.
  • Unit tests added/updated to cover PowerShell and Rego changes.
  • Functional tests added/updated to cover PowerShell and Rego changes.
  • All relevant functional tests passed.
  • All automated checks (e.g., linting, static analysis, unit/smoke tests) passed.

✅ Pre-merge checklist

  • PR passed smoke test check.

  • Feature branch has been rebased against changes from parent branch, as needed

    Use Rebase branch button below or use this reference to rebase from the command line.

  • Resolved all merge conflicts on branch

  • Notified merge coordinator that PR is ready for merge via comment mention

  • Demonstrate changes to the team for questions and comments.
    (Note: Only required for issues of size Medium or larger)

✅ Post-merge checklist

  • Feature branch deleted after merge to clean up repository.
  • Verified that all checks pass on parent branch (e.g., main or release-name) after merge.

@adhilto adhilto added the enhancement This issue or pull request will add new or improve existing functionality label Nov 14, 2024
@adhilto adhilto added this to the Jellyfish milestone Nov 14, 2024
@adhilto adhilto self-assigned this Nov 14, 2024
@adhilto adhilto linked an issue Nov 14, 2024 that may be closed by this pull request
@buidav buidav self-assigned this Nov 16, 2024
@buidav buidav removed their request for review November 16, 2024 01:03
@buidav buidav force-pushed the scubaresults-uuid-in-name branch from c063b89 to 60d9137 Compare November 18, 2024 04:06
@tkol2022
Copy link
Collaborator

Should we make the NumberOfUUIDCharactersToTruncate and KeepIndividualJson parameters mutually exclusive? I was thinking if you pass KeepIndividualJson then it doesn't make sense to consider the NumberOfUUIDCharactersToTruncate? Should the tool provide an error alerting the user?

If we decide to implement this, we can add this as an enhancement later because I don't feel comfortable making too many changes given the importance of this release.

image

@tkol2022
Copy link
Collaborator

We might want to do some data validation on the OutJsonFileName parameter. If you pass "*" as the name, it crashes.
Maybe add a ValidatePattern to the parameter definition in Invoke-Scuba?

[Parameter(Mandatory = $false, ParameterSetName = 'Configuration')]
[Parameter(Mandatory = $false, ParameterSetName = 'Report')]
[ValidateNotNullOrEmpty()]
[ValidatePattern('^[a-zA-Z0-9]+$')]
[string]
$OutJsonFileName = [ScubaConfig]::ScubaDefault('DefaultOutJsonFileName')

Again this is an enhancement I think we can make in the future but wanted to discuss.

image

@tkol2022
Copy link
Collaborator

tkol2022 commented Nov 20, 2024

The probability of New-Guid generating an exception is probably negligibly low, but if it does fail and the 0000 value is created as the GUID, won't this cause filename collisions in the downstream file system where the results will be posted if another user happened to get an exception as well? There would be two files, both with the 0000 GUID in the filename. Again the probability is low.

image

@buidav
Copy link
Collaborator

buidav commented Nov 20, 2024

Should we make the NumberOfUUIDCharactersToTruncate and KeepIndividualJson parameters mutually exclusive? I was thinking if you pass KeepIndividualJson then it doesn't make sense to consider the NumberOfUUIDCharactersToTruncate? Should the tool provide an error alerting the user?
If we decide to implement this, we can add this as an enhancement later because I don't feel comfortable making too many changes given the importance of this release.

We've talked about eventually deprecating the KeepIndividualJSON parameter in the future because it already is mutually exclusive with the OutJSONFileName parameter. Need to add an issue to this epic #1224 .
For this change at least push it off for later as KeepIndividualJson is essentially our legacy output now. We already have one mutually exclusive parameter, I expect even more mutually exclusive parameters in the future until we deprecate.

@buidav
Copy link
Collaborator

buidav commented Nov 20, 2024

We might want to do some data validation on the OutJsonFileName parameter. If you pass "*" as the name, it crashes. Maybe add a ValidatePattern to the parameter definition in Invoke-Scuba?

[Parameter(Mandatory = $false, ParameterSetName = 'Configuration')]
[Parameter(Mandatory = $false, ParameterSetName = 'Report')]
[ValidateNotNullOrEmpty()]
[ValidatePattern('^[a-zA-Z0-9]+$')]
[string]
$OutJsonFileName = [ScubaConfig]::ScubaDefault('DefaultOutJsonFileName')

Again this is an enhancement I think we can make in the future but wanted to discuss.

Oh we should probably start checking for this for every file name parameter 👀. Will queue up an issue for this for every file name name parameter.

For this PR adding the validation to OutJSONFileName is a low lift.

@tkol2022
Copy link
Collaborator

I haven't tested this section of the code yet but I have a question. Does this code make the logic windows specific? What types of tweaks will we need in the future if we work on a cross-platform version that doesn't have the same file path limitation coded here?

image

@buidav
Copy link
Collaborator

buidav commented Nov 20, 2024

The probability of New-Guid generating an exception is probably negligibly low, but if it does fail and the 0000 value is created as the GUID, won't this cause filename collisions in the downstream file system where the results will be posted if another user happened to get an exception as well? There would be two files, both with the 0000 GUID in the filename. Again the probability is low.

I've been thinking about this as well. Should we just throw an full exception in this case and stop ScubaGear execution?
If New-Guid does fail, there most likely is something catastrophically wrong with the underlying system.

@tkol2022
Copy link
Collaborator

I see that we are truncating the filename GUID in multiple places, ConvertTo-ResultsCsv and Merge-JsonOutput resulting in duplicate logic that we have to keep in synch over time. What if we truncate the GUID in a helper function? Is it worth it?

image

image

Also, the Merge-JsonOutput function gets the GUID from the line of code below which gets it from the settings export object, whereas the ConvertTo-ResultsCsv function gets the GUID from a parameter named $Guid. Should they be consistent? Getting the value from the settings export object seems like a roundabout way to get it.

$ReportUuid = $SettingsExportObject.report_uuid

@tkol2022
Copy link
Collaborator

Testing long paths

This section covers testing with some long file names and folder paths. The purpose was so simulate the conditions where the path of the output JSON file exceeds the maximum system limit and triggers our newly added custom error message in the Merge-JsonOutput function. In some cases I am not getting the expected custom defined error message.

Case 1: long OutJsonFileName parameter value
I am not getting the expected custom error because the GetFullPath cmdlet is failing before it gets to our custom error message.

Invoke-Scuba -ProductNames sharepoint -CertificateThumbPrint "93fd41a761f434f12ee8ac3ff771347597435696" -AppID "db791fde-1ff0-493f-be3d-03d9ba6cb087" -Organization "y2zj1.onmicrosoft.com" -NumberOfUUIDCharactersToTruncate 13 -OutJsonFileName "aB3cD4eF5gH6iJ7kL8mN9oP0qR1sT2uV3wX4yZ5aB6cD7eF8gH9iJ0kL1mN2oP3qR4sT5uV6wX7yZ8aB9cD0eF1gH2iJ3kL4mN5oP6qR7sT8uV9wX0yZ1aB2cD3eF4gH5iJ6kL7mN8oP9qR0sT1uV2wX3yZ4aB5cD6eF7gH8iJ9kL0mN1oP2qR3sT4uV5wX6yZ7aB8cD9eF0gH1iJ2kL3mN4oP5qR6sT7uV8wX9yZ0"

image

Case 2: long OutPath parameter value
This works as expected because I get the custom error message in Merge-JsonOutput.

Invoke-Scuba -ProductNames sharepoint -CertificateThumbPrint "93fd41a761f434f12ee8ac3ff771347597435696" -AppID "db791fde-1ff0-493f-be3d-03d9ba6cb087" -Organization "y2zj1.onmicrosoft.com" -NumberOfUUIDCharactersToTruncate 13 -OutPath "C:\Users\tkolovos\Downloads\ScubaLongFolder\aB3cD4eF5gH6iJ7kL8mN9oP0qR1sT2uV3wX4yZ5aB6cD7eF8gH9iJ0kL1mN2oP3qR4sT5uV6wX7yZ8aB9cD0eF1gH2iJ3kL4mN5oP6qR7sT8uV9wX0yZ1aB2cD3eF4gH5iJ6kL7mN8oP9qR0sT1uV2wX3yZ4aB5cD6eF7gH8iJ9kL0mN1oP2qR3sT4uV5wX6yZ7aB8cD9eF"

image

Case 3: long OutFolderName parameter value
This works as expected because I get the custom error message in Merge-JsonOutput.

Invoke-Scuba -ProductNames sharepoint -CertificateThumbPrint "93fd41a761f434f12ee8ac3ff771347597435696" -AppID "db791fde-1ff0-493f-be3d-03d9ba6cb087" -Organization "y2zj1.onmicrosoft.com" -NumberOfUUIDCharactersToTruncate 13 -OutFolderName "aB3cD4eF5gH6iJ7kL8mN9oP0qR1sT2uV3wX4yZ5aB6cD7eF8gH9iJ0kL1mN2oP3qR4sT5uV6wX7yZ8aB9cD0eF1gH2iJ3kL4mN5oP6qR7sT8uV9wX0yZ1aB2cD3eF4gH5iJ6kL7mN8oP9qR0sT1uV2wX3yZ4aB5cD6eF7gH8iJ9kL0mN1oP2qR3sT4uV5wX6yZ7aB8cD9eF"

image

Case 4: long OutFolderName parameter value that is exactly 256 characters (under the limit)
This works as expected and it created the json merged in the long output folder.

Invoke-Scuba -ProductNames sharepoint -CertificateThumbPrint "93fd41a761f434f12ee8ac3ff771347597435696" -AppID "db791fde-1ff0-493f-be3d-03d9ba6cb087" -Organization "y2zj1.onmicrosoft.com" -NumberOfUUIDCharactersToTruncate 13 -OutFolderName "aB3cD4eF5gH6iJ7kL8mN9oP0qR1sT2uV3wX4yZ5aB6cD7eF8gH9iJ0kL1mN2oP3qR4sT5uV6wX7yZ8aB9cD0eF1gH2iJ3kL4mN5oP6qR7sT8uV9wX0yZ1aB2cD3eF4gH5iJ6kL7"

image

Case 5: long OutFolderName parameter value that is exactly 256 characters (under the limit) but no truncated characters
This works as expected because I get the custom error message since the addition of the full GUID make the path exceed the limit.

Invoke-Scuba -ProductNames sharepoint -CertificateThumbPrint "93fd41a761f434f12ee8ac3ff771347597435696" -AppID "db791fde-1ff0-493f-be3d-03d9ba6cb087" -Organization "y2zj1.onmicrosoft.com" -NumberOfUUIDCharactersToTruncate 0 -OutFolderName "aB3cD4eF5gH6iJ7kL8mN9oP0qR1sT2uV3wX4yZ5aB6cD7eF8gH9iJ0kL1mN2oP3qR4sT5uV6wX7yZ8aB9cD0eF1gH2iJ3kL4mN5oP6qR7sT8uV9wX0yZ1aB2cD3eF4gH5iJ6kL7"

image

@adhilto
Copy link
Collaborator Author

adhilto commented Nov 20, 2024

We might want to do some data validation on the OutJsonFileName parameter. If you pass "*" as the name, it crashes. Maybe add a ValidatePattern to the parameter definition in Invoke-Scuba?

[Parameter(Mandatory = $false, ParameterSetName = 'Configuration')]
[Parameter(Mandatory = $false, ParameterSetName = 'Report')]
[ValidateNotNullOrEmpty()]
[ValidatePattern('^[a-zA-Z0-9]+$')]
[string]
$OutJsonFileName = [ScubaConfig]::ScubaDefault('DefaultOutJsonFileName')

Again this is an enhancement I think we can make in the future but wanted to discuss.

Oh we should probably start checking for this for every file name parameter 👀. Will queue up an issue for this for every file name name parameter.

For this PR adding the validation to OutJSONFileName is a low lift.

I agree with adding validation, but I think it's not as trivial to implement as it seems. See https://stackoverflow.com/questions/1976007/what-characters-are-forbidden-in-windows-and-linux-directory-names.

For example, S{c}u b-a_(R)e%s^ult$s.json is a completely valid file name. Will anyone actually use all those characters in a file name? I sure hope not, but we shouldn't impose stricter rules than Windows itself imposes. As some less extreme examples, the example regex in your comment wouldn't allow names such as "Scuba-Results.json", "Scuba_Results.json", and "Scuba Results.json".

TLDR: I think we should save this for a later issue to make sure we think through all the edge cases.

@adhilto
Copy link
Collaborator Author

adhilto commented Nov 20, 2024

I haven't tested this section of the code yet but I have a question. Does this code make the logic windows specific? What types of tweaks will we need in the future if we work on a cross-platform version that doesn't have the same file path limitation coded here?

image

Good point, this is a Windows specific limitation. Linux, for example, appears to impose a 255 character limit on file name but no limit on path. https://stackoverflow.com/questions/6571435/limit-on-file-name-length-in-bash

Perhaps a more simple (and more system agnostic) approach would be to simply remove all this added logic and just wrap the Set-Content call in a try/catch? Effectively let the system decide what it can or can't handle.

@adhilto
Copy link
Collaborator Author

adhilto commented Nov 20, 2024

The probability of New-Guid generating an exception is probably negligibly low, but if it does fail and the 0000 value is created as the GUID, won't this cause filename collisions in the downstream file system where the results will be posted if another user happened to get an exception as well? There would be two files, both with the 0000 GUID in the filename. Again the probability is low.

I've been thinking about this as well. Should we just throw an full exception in this case and stop ScubaGear execution? If New-Guid does fail, there most likely is something catastrophically wrong with the underlying system.

I'd be ok with New-Guid failing being a fatal error.

@mitchelbaker-cisa
Copy link
Collaborator

The probability of New-Guid generating an exception is probably negligibly low, but if it does fail and the 0000 value is created as the GUID, won't this cause filename collisions in the downstream file system where the results will be posted if another user happened to get an exception as well? There would be two files, both with the 0000 GUID in the filename. Again the probability is low.

I've been thinking about this as well. Should we just throw an full exception in this case and stop ScubaGear execution? If New-Guid does fail, there most likely is something catastrophically wrong with the underlying system.

I'd be ok with New-Guid failing being a fatal error.

Capturing the two other locations in InvokeSCuBACached where the same guid logic is performed, fatal error should be thrown for these instances as well.

Also, since this try/catch block is repeated 3 times in the orchestrator, does it make sense to break it out as its own utility function?

Screenshot (339)

@adhilto
Copy link
Collaborator Author

adhilto commented Nov 20, 2024

I see that we are truncating the filename GUID in multiple places, ConvertTo-ResultsCsv and Merge-JsonOutput resulting in duplicate logic that we have to keep in synch over time. What if we truncate the GUID in a helper function? Is it worth it?

image

image

Also, the Merge-JsonOutput function gets the GUID from the line of code below which gets it from the settings export object, whereas the ConvertTo-ResultsCsv function gets the GUID from a parameter named $Guid. Should they be consistent? Getting the value from the settings export object seems like a roundabout way to get it.

$ReportUuid = $SettingsExportObject.report_uuid

@tkol2022 I just made this refactor (addresses both comments here).

@tkol2022
Copy link
Collaborator

We might want to do some data validation on the OutJsonFileName parameter. If you pass "*" as the name, it crashes. Maybe add a ValidatePattern to the parameter definition in Invoke-Scuba?

[Parameter(Mandatory = $false, ParameterSetName = 'Configuration')]
[Parameter(Mandatory = $false, ParameterSetName = 'Report')]
[ValidateNotNullOrEmpty()]
[ValidatePattern('^[a-zA-Z0-9]+$')]
[string]
$OutJsonFileName = [ScubaConfig]::ScubaDefault('DefaultOutJsonFileName')

Again this is an enhancement I think we can make in the future but wanted to discuss.

Oh we should probably start checking for this for every file name parameter 👀. Will queue up an issue for this for every file name name parameter.
For this PR adding the validation to OutJSONFileName is a low lift.

I agree with adding validation, but I think it's not as trivial to implement as it seems. See https://stackoverflow.com/questions/1976007/what-characters-are-forbidden-in-windows-and-linux-directory-names.

For example, S{c}u b-a_(R)e%s^ult$s.json is a completely valid file name. Will anyone actually use all those characters in a file name? I sure hope not, but we shouldn't impose stricter rules than Windows itself imposes. As some less extreme examples, the example regex in your comment wouldn't allow names such as "Scuba-Results.json", "Scuba_Results.json", and "Scuba Results.json".

TLDR: I think we should save this for a later issue to make sure we think through all the edge cases.

Agree or honestly just let it fail whenever the orchestrator tries to create the file and avoid trying to be too fancy with validation logic, that like you said, might be non trivial. Also keep in mind, we need to think cross-platform.

@tkol2022
Copy link
Collaborator

The probability of New-Guid generating an exception is probably negligibly low, but if it does fail and the 0000 value is created as the GUID, won't this cause filename collisions in the downstream file system where the results will be posted if another user happened to get an exception as well? There would be two files, both with the 0000 GUID in the filename. Again the probability is low.

I've been thinking about this as well. Should we just throw an full exception in this case and stop ScubaGear execution? If New-Guid does fail, there most likely is something catastrophically wrong with the underlying system.

Yes, I am not a fan of the 000 GUID. That just seems a little whacky although it will likely work :)

@tkol2022
Copy link
Collaborator

I haven't tested this section of the code yet but I have a question. Does this code make the logic windows specific? What types of tweaks will we need in the future if we work on a cross-platform version that doesn't have the same file path limitation coded here?
image

Good point, this is a Windows specific limitation. Linux, for example, appears to impose a 255 character limit on file name but no limit on path. https://stackoverflow.com/questions/6571435/limit-on-file-name-length-in-bash

Perhaps a more simple (and more system agnostic) approach would be to simply remove all this added logic and just wrap the Set-Content call in a try/catch? Effectively let the system decide what it can or can't handle.

My preference would be to do this (your try/catch idea). Maybe we can catch a specific exception that occurs when the path is too long and then print the custom error message.

Copy link
Collaborator

@mitchelbaker-cisa mitchelbaker-cisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • PS unit tests pass

  • Ran Invoke-SCuBA against GCC tenant successfully

  • Ran Invoke-SCuBA -KeepIndividualJSON against GCC tenant successfully, ProviderSettingsExport.json is generated

  • Ran Invoke-SCuBACached against GCC successfully

  • Ran Invoke-SCuBA against GCC high tenant successfully

  • Ran Invoke-SCuBA -KeepIndividualJSON against GCC high tenant successfully, ProviderSettingsExport.json is generated

Ran Invoke-SCuBACached with input that had multiple ScubaResults.json files; pulled from the most recent created
Ran Invoke-SCuBA/Invoke-SCuBACached` with non-default OutPath/OutJsonFileName parameters

Tested -NumberOfUUIDCharactersToTruncate in config file; works as expected with various cases (valid positive integer, invalid negative integer) although for invalid cases the error is caught in the generic MergeJson block as opposed to the standard message, "Hey, the integer you provided is not included 0, 13, 18, 36 provided in ValidateSet()."
Screenshot (341)

@buidav buidav force-pushed the scubaresults-uuid-in-name branch from 87cef86 to 031a26f Compare November 22, 2024 01:31
@mitchelbaker-cisa mitchelbaker-cisa merged commit 77dc279 into main Nov 22, 2024
27 checks passed
@mitchelbaker-cisa mitchelbaker-cisa deleted the scubaresults-uuid-in-name branch November 22, 2024 18:41
schrolla pushed a commit that referenced this pull request Nov 25, 2024
* Initial implementation of adding UUID to the file name

* Add back missing ConvertFrom-Json call

* Mock Get-ChildItem in unit tests

* Document addition of UUID to ScubaResults file name

* Add unit tests for when there are multiple ScubaResults*.json files

* Correct minor typo in documentation

* remove wildcard search in ConvertTo-ResultsCSV code path

* add error handling of window path length limit errors

* fix some of the tests

* fix the all the current broken unit tests

* additional unit tests

* add back in accidentally removed fields in config

* complete lorem ipsum

* todo message

* remove UUID truncation for now

* first draft

* add truncation param to documentation

* spacing

* fix failing test cases; handle full truncation case

* make code consistent; add code comments to describe it's purpose

* review feedback; point to additional options in the error message

* PR Review: Fix absolute path check; fix config file override

* review feedback; move new parameter in alphabetical order in docs

* keep documentation consistent

* remove configuration paramset from scubacached

* code comments for the new edge case

* Remove OBE unit test

* Remove duplicate word

* fix typos, wording and formatting in config

* Refactor truncation logic into own function

* rm duplicate text from PowerShell as well

* captialize

* remove long path check let the set-content naturally error out

* add long path error check within the catch block

* remove todo

* Add UUID to mock data for cached tests

* Fix unit tests

* Remove commented out validation code.

Co-authored-by: mitchelbaker-cisa <149098823+mitchelbaker-cisa@users.noreply.github.com>

* add validation set to check invalid config file parameter

* Remove stacktrace

---------

Co-authored-by: buidav <105074908+buidav@users.noreply.github.com>
Co-authored-by: mitchelbaker-cisa <149098823+mitchelbaker-cisa@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This issue or pull request will add new or improve existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Append UUID to ScubaResults.json by default
4 participants