-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quickTunerPreproc.py, preprocessor script for quick tuner scripts #1575
base: develop
Are you sure you want to change the base?
Conversation
This script accepts the *.debug files that are generated from tuningRunner.py as input and concatenates them into a single tsv file for use in the next step of the quick tuning pipeline: generation.
…pts." This reverts commit 0489788.
def __init__(self, pargs): | ||
self.input_dir = pargs.input_dir | ||
|
||
@staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why staticmethod? Not a criticism, I'm just less familiar with why that's done in python and I'm curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was intended to be imported and called within the quickTunerGen.py set of classes but I could not rationalize each individual class having their own instance of this preprocessor class hanging out inside:
So, it was either have a standalone function or a class with a static method that would allow me to do something like:
qtPreprocessor.process( /* args */ )
From my experience with using the @staticmethod
decorator, it has always been what to use when you want to package functions together that have similar functionality and can share data but do not share state.
df = pd.read_csv(file, sep='\t') | ||
if normalize: | ||
scaler = MinMaxScaler() | ||
df['TFlops'] = scaler.fit_transform(df[['TFlops']]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why one set of brackets on the left but two on the right??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that was a mess up on my end. Fortunately (or unfortunately) the syntax is forgiving.
Also, this will actually be changed to add a NormalizedTFlops
column instead of overwriting the current one.
if normalize:
scaler = MinMaxScaler()
df['NormalizedTFlops'] = scaler.fit_transform(df[['TFlops']])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python nitpicking
|
||
to: | ||
|
||
import uuid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this instead be a patch to rocmlir_worker
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in its current form; it cuts off the usual functionality of passing results to stdout and we don't have code to read results from the files and send it on. I do have an open issue about collecting all the results into the database and this may become part of it.
import glob | ||
from sklearn.preprocessing import MinMaxScaler | ||
|
||
class qtPreprocessor(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit class case, and I'm pretty sure Python 3 doesn't require the (object)
anymore
|
||
dfs = [] | ||
ct = 0 | ||
for file in tsv_files: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can emumerate()
instead of maintaining a ct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will replace this with a len()
call since it is not needed within the loop only for generating stats.
counter for processing loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly approved
import glob | ||
from sklearn.preprocessing import MinMaxScaler | ||
|
||
class qtPreprocessor(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Standard complaining re style on class names
1eac2a7
to
30999f4
Compare
This script accepts the *.debug files that are generated from tuningRunner.py as input and concatenates them into a single tsv file for use in the next step of the quick tuning pipeline: generation.