Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to use fstrings #224

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

JadynHax
Copy link

@JadynHax JadynHax commented Jul 31, 2020

Fstrings are better for RAM usage and processing time, as well as being more easily readable. Literally no reason not to use them over string + string stuff. Converted over some of the string.format() stuff to fstrings as well for slightly reduced filesize.

Also removed os.path.join() statements in the copy_[XYZ]_[to/from]_gdrive() functions. They were unnecessary, took up a bit extra processing time/RAM, and didn't make this any more compatible because these were already functions that can be run on Colaboratory anyways.

Then, finally, added .txt file extension to sample files so they can be viewed directly in Colaboratory and you don't need to download them, and it's more obviously a file to be opened for those running this tool on their own computers.

A Recommendation

I didn't implement it in this pull request, but I recommend using zipfile to compress the checkpoints for Google Drive storage (or at least gzip to make compressed .tar.gz archives). While it realistically doesn't save much percentage-wise, it still shaves off a good 100+ MB from each checkpoint, which is insanely beneficial for Google Drive storage.

If you're okay with using something not directly from Python for that, you can even use the following code I have made for creating extremely well-compressed checkpoint archives with 7-Zip. With this code, I've achieved a filesize reduction of 200 MB or more. It simply takes ~5 minutes extra to compress the archive, though I consider this a small price to pay for a huge filesize reduction. Since these functions can run only in Colaboratory, there should be no problems using this programming in them.

# These functions are fully tested and functional.

def  get_archive_name(checkpoint_folder):
	"""Converts a folder path into a filename for a .7z archive"""
	archive_name = checkpoint_folder.replace(os.path.sep,  '_') + '.7z'

	return archive_name

def  copy_checkpoint_to_gdrive(run_name='run1', copy_folder=False):
	"""Copies the checkpoint folder to a mounted Google Drive."""
	is_mounted()

	checkpoint_folder = os.path.join('checkpoint', run_name)

	if copy_folder:
		shutil.copytree(checkpoint_folder,  f"/content/drive/My Drive/{checkpoint_folder}")

	else:
		file_path = get_archive_name(checkpoint_folder)

		if os.path.exists(file_path):
			# Necessary, otherwise the command just adds all the files to the existing archive.
			os.remove(file_path)

		!7z a -slp -mx -myx -r -m0=LZMA2:d=29:fb=128:mf=bt4:mc=9999:lc=4:c=1536m -ms1f -mmt4 -mf=BCJ2 -bsp2 -- {file_path}  {checkpoint_folder}

		shutil.copyfile(file_path,  f"/content/drive/My Drive/{file_path}")

def  copy_checkpoint_from_gdrive(run_name='run1', copy_folder=False):
	"""Copies the checkpoint folder from a mounted Google Drive."""
	is_mounted()

	checkpoint_folder = os.path.join('checkpoint', run_name)

	if copy_folder:
		shutil.copytree("/content/drive/My Drive/" + checkpoint_folder, checkpoint_folder)

	else:
		file_path = get_archive_name(checkpoint_folder)
		shutil.copyfile("/content/drive/My Drive/" + file_path, file_path)

		!7z x -slp -mx -myx -r -m0=LZMA2:d=29:fb=128:mf=bt4:mc=9999:lc=4:c=1536m -ms1f -mmt4 -mf=BCJ2 -bsp2 -- {file_path}  {checkpoint_folder}

Fstrings are better for RAM usage and processing time, as well as being more easily readable. Literally no reason not to use them over "string + string" stuff. Converted over some of the "string.format()" stuff to fstrings as well for slightly reduced filesize.

Also removed os.path.join() statements in the "copy_[XYZ]_[to/from]_gdrive()" functions. They were unnecessary, took up a bit extra processing time/RAM, and didn't make this any more compatible because these were already functions that can be run on Colaboratory anyways.

Then, finally, added .txt file extension to sample files so they can be viewed directly in Colaboratory and you don't need to download them, and it's more obviously a file to be opened for those running this tool on their own computers.
os.path.join(checkpoint_path,
'model-{}').format(counter-1))
os.path.join(checkpoint_path, f"model-{counter-1}"))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, changes nothing besides a slight efficiency increase.

Comment on lines -297 to +296
text = '======== SAMPLE {} ========\n{}\n'.format(
index + 1, text)
text = f'======== SAMPLE {index + 1} ========\n{text}\n'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More readable in the code, slightly smaller filesize

Comment on lines -304 to +302
os.path.join(SAMPLE_DIR, run_name,
'samples-{}').format(counter), 'w') as fp:
os.path.join(SAMPLE_DIR, run_name, f"samples-{counter}.txt", 'w') as fp:
Copy link
Author

@JadynHax JadynHax Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, more readable in the code, slightly smaller filesize, and added .txt file extension for accessibility in Colaboratory.

Comment on lines -351 to +353
'[{counter} | {time:2.2f}] loss={loss:2.2f} avg={avg:2.2f}'
'[{} | {2.2f}] loss={2.2f} avg={2.2f}'
.format(
counter=counter,
time=time.time() - start_time,
loss=v_loss,
avg=avg_loss[0] / avg_loss[1]))
counter,
time.time() - start_time,
v_loss,
avg_loss[0] / avg_loss[1]))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really no particular reason here, other than the fact that the identifiers were unnecessary (and personal preference).

Comment on lines -576 to +573
checkpoint_folder = os.path.join('checkpoint', run_name)
checkpoint_folder = f'checkpoint/{run_name}'
Copy link
Author

@JadynHax JadynHax Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More efficient, and os.path.join() was unneeded, since this is a function that can only be run on Colaboratory with the /content/XYZ references later in the function.

EDIT: called the function a "command" lmao

Comment on lines -579 to +576
shutil.copytree(checkpoint_folder, "/content/drive/My Drive/" + checkpoint_folder)
shutil.copytree(checkpoint_folder, f"/content/drive/My Drive/{checkpoint_folder}")
Copy link
Author

@JadynHax JadynHax Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you get the point by now. Fstrings are just overall better in these scenarios.

I'll stop commenting on every single one of these, as you can probably guess by now why I used them in each scenario.

r = requests.get(url_base + "/models/" + model_name + "/" + file_name, stream=True)
r = requests.get(f"{url_base}/models/{model_name}/{file_name}", stream=True)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes nothing, besides increasing the efficiency of the code.

Copy link
Author

@JadynHax JadynHax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting with reasons for changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant