You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Request] Can you add attackToExcel.get_stix_data_from( "/path/to/export/folder") to make loading data much faster? Or some other more efficient cache file format?
#73
The example from the usage page we've been using takes an extremely long time to load.
Describe the solution you'd like
Just make it a little more clear (in the basic usage example) how we can not only export, but cache and import the att&ck matrix data rather than slowly loading it.
Describe alternatives you've considered
There doesn't seem to be one since the documentation only mentions an export feature, not import.
Additional context
importmitreattack.attackToExcel.attackToExcelasattackToExcelimportmitreattack.attackToExcel.stixToDfasstixToDf# download and parse ATT&CK STIX data# SUGGESTED ADDITION / PSEUDO CODE:attackToExcel.export("enterprise-attack", "v8.1", "/path/to/export/folder")
# instead of:# attackdata = attackToExcel.get_stix_data("enterprise-attack")# allow:attackdata=attackToExcel.get_stix_data_from( "/path/to/export/folder")
# END ADDITION# get Pandas DataFrames for techniques, associated relationships, and citationstechniques_data=stixToDf.techniquesToDf(attackdata, "enterprise-attack")
# show T1102 and sub-techniques of T1102techniques_df=techniques_data["techniques"]
print(techniques_df[techniques_df["ID"].str.contains("T1102")]["name"])
And I don't really know if exporting as excel is the most efficient way to cache the data, probably not, but it seems to be the format supported. My only goal is to get the data into a DataFrame as efficiently as possible instead of having to go take a 5 minute coffee break to wait every time I restart my Jupyter kernel.
We're going to be solving this by adding some code to use Apache's Parquet to store the DataFrame efficiently, but that is not something that would make sense as a PR in a library designed for converting to Excel. That said, people shouldn't need to invent a caching solution for this, in my opinion. It would make sense to support it by default when the library takes 3-5 minutes to load into a DataFrame.
Like I said, I don't know if it really fits into the library since it's named to be an excel conversion tool, but I'm thinking something like:
Oh I should've suggested, I guess, Python's pickling feature rather than Parquet, which is more optimal for very large and diverse data structures. I only have a year of Python experience, so I'd forgotten it was the right option here. Nonetheless, I still think caching the data in a file should be a built-in option in the library rather than the user needing to do it manually. I could understand if the maintainers of the project feel differently; it's not that hard to cache with pandas.DataFrame.to_pickle. Just my suggestion / opinion.
Is your feature request related to a problem?
The example from the usage page we've been using takes an extremely long time to load.
Describe the solution you'd like
Just make it a little more clear (in the basic usage example) how we can not only export, but cache and import the att&ck matrix data rather than slowly loading it.
Describe alternatives you've considered
There doesn't seem to be one since the documentation only mentions an export feature, not import.
Additional context
And I don't really know if exporting as excel is the most efficient way to cache the data, probably not, but it seems to be the format supported. My only goal is to get the data into a DataFrame as efficiently as possible instead of having to go take a 5 minute coffee break to wait every time I restart my Jupyter kernel.
We're going to be solving this by adding some code to use Apache's Parquet to store the DataFrame efficiently, but that is not something that would make sense as a PR in a library designed for converting to Excel. That said, people shouldn't need to invent a caching solution for this, in my opinion. It would make sense to support it by default when the library takes 3-5 minutes to load into a DataFrame.
Like I said, I don't know if it really fits into the library since it's named to be an excel conversion tool, but I'm thinking something like:
The text was updated successfully, but these errors were encountered: