Update to use fstrings #224

JadynHax · 2020-07-31T17:09:51Z

Fstrings are better for RAM usage and processing time, as well as being more easily readable. Literally no reason not to use them over string + string stuff. Converted over some of the string.format() stuff to fstrings as well for slightly reduced filesize.

Also removed os.path.join() statements in the copy_[XYZ]_[to/from]_gdrive() functions. They were unnecessary, took up a bit extra processing time/RAM, and didn't make this any more compatible because these were already functions that can be run on Colaboratory anyways.

Then, finally, added .txt file extension to sample files so they can be viewed directly in Colaboratory and you don't need to download them, and it's more obviously a file to be opened for those running this tool on their own computers.

A Recommendation

I didn't implement it in this pull request, but I recommend using zipfile to compress the checkpoints for Google Drive storage (or at least gzip to make compressed .tar.gz archives). While it realistically doesn't save much percentage-wise, it still shaves off a good 100+ MB from each checkpoint, which is insanely beneficial for Google Drive storage.

If you're okay with using something not directly from Python for that, you can even use the following code I have made for creating extremely well-compressed checkpoint archives with 7-Zip. With this code, I've achieved a filesize reduction of 200 MB or more. It simply takes ~5 minutes extra to compress the archive, though I consider this a small price to pay for a huge filesize reduction. Since these functions can run only in Colaboratory, there should be no problems using this programming in them.

# These functions are fully tested and functional.

def  get_archive_name(checkpoint_folder):
	"""Converts a folder path into a filename for a .7z archive"""
	archive_name = checkpoint_folder.replace(os.path.sep,  '_') + '.7z'

	return archive_name

def  copy_checkpoint_to_gdrive(run_name='run1', copy_folder=False):
	"""Copies the checkpoint folder to a mounted Google Drive."""
	is_mounted()

	checkpoint_folder = os.path.join('checkpoint', run_name)

	if copy_folder:
		shutil.copytree(checkpoint_folder,  f"/content/drive/My Drive/{checkpoint_folder}")

	else:
		file_path = get_archive_name(checkpoint_folder)

		if os.path.exists(file_path):
			# Necessary, otherwise the command just adds all the files to the existing archive.
			os.remove(file_path)

		!7z a -slp -mx -myx -r -m0=LZMA2:d=29:fb=128:mf=bt4:mc=9999:lc=4:c=1536m -ms1f -mmt4 -mf=BCJ2 -bsp2 -- {file_path}  {checkpoint_folder}

		shutil.copyfile(file_path,  f"/content/drive/My Drive/{file_path}")

def  copy_checkpoint_from_gdrive(run_name='run1', copy_folder=False):
	"""Copies the checkpoint folder from a mounted Google Drive."""
	is_mounted()

	checkpoint_folder = os.path.join('checkpoint', run_name)

	if copy_folder:
		shutil.copytree("/content/drive/My Drive/" + checkpoint_folder, checkpoint_folder)

	else:
		file_path = get_archive_name(checkpoint_folder)
		shutil.copyfile("/content/drive/My Drive/" + file_path, file_path)

		!7z x -slp -mx -myx -r -m0=LZMA2:d=29:fb=128:mf=bt4:mc=9999:lc=4:c=1536m -ms1f -mmt4 -mf=BCJ2 -bsp2 -- {file_path}  {checkpoint_folder}

Fstrings are better for RAM usage and processing time, as well as being more easily readable. Literally no reason not to use them over "string + string" stuff. Converted over some of the "string.format()" stuff to fstrings as well for slightly reduced filesize. Also removed os.path.join() statements in the "copy_[XYZ]_[to/from]_gdrive()" functions. They were unnecessary, took up a bit extra processing time/RAM, and didn't make this any more compatible because these were already functions that can be run on Colaboratory anyways. Then, finally, added .txt file extension to sample files so they can be viewed directly in Colaboratory and you don't need to download them, and it's more obviously a file to be opened for those running this tool on their own computers.

JadynHax · 2020-07-31T17:26:09Z

gpt_2_simple/gpt_2.py

- os.path.join(checkpoint_path,
- 'model-{}').format(counter-1))
+ os.path.join(checkpoint_path, f"model-{counter-1}"))


Again, changes nothing besides a slight efficiency increase.

JadynHax · 2020-07-31T17:28:33Z

gpt_2_simple/gpt_2.py

- text = '======== SAMPLE {} ========\n{}\n'.format(
- index + 1, text)
+ text = f'======== SAMPLE {index + 1} ========\n{text}\n'


More readable in the code, slightly smaller filesize

JadynHax · 2020-07-31T17:29:37Z

gpt_2_simple/gpt_2.py

- os.path.join(SAMPLE_DIR, run_name,
- 'samples-{}').format(counter), 'w') as fp:
+ os.path.join(SAMPLE_DIR, run_name, f"samples-{counter}.txt", 'w') as fp:


Once again, more readable in the code, slightly smaller filesize, and added .txt file extension for accessibility in Colaboratory.

JadynHax · 2020-07-31T17:31:00Z

gpt_2_simple/gpt_2.py

- '[{counter} | {time:2.2f}] loss={loss:2.2f} avg={avg:2.2f}'
+ '[{} | {2.2f}] loss={2.2f} avg={2.2f}'
 .format(
- counter=counter,
- time=time.time() - start_time,
- loss=v_loss,
- avg=avg_loss[0] / avg_loss[1]))
+ counter,
+ time.time() - start_time,
+ v_loss,
+ avg_loss[0] / avg_loss[1]))


Really no particular reason here, other than the fact that the identifiers were unnecessary (and personal preference).

JadynHax · 2020-07-31T17:34:01Z

gpt_2_simple/gpt_2.py

- checkpoint_folder = os.path.join('checkpoint', run_name)
+ checkpoint_folder = f'checkpoint/{run_name}'


More efficient, and os.path.join() was unneeded, since this is a function that can only be run on Colaboratory with the /content/XYZ references later in the function.

EDIT: called the function a "command" lmao

JadynHax · 2020-07-31T17:34:59Z

gpt_2_simple/gpt_2.py

- shutil.copytree(checkpoint_folder, "/content/drive/My Drive/" + checkpoint_folder)
+ shutil.copytree(checkpoint_folder, f"/content/drive/My Drive/{checkpoint_folder}")


I think you get the point by now. Fstrings are just overall better in these scenarios.

I'll stop commenting on every single one of these, as you can probably guess by now why I used them in each scenario.

JadynHax · 2020-07-31T17:35:37Z

gpt_2_simple/gpt_2.py

- r = requests.get(url_base + "/models/" + model_name + "/" + file_name, stream=True)
+ r = requests.get(f"{url_base}/models/{model_name}/{file_name}", stream=True)


Changes nothing, besides increasing the efficiency of the code.

JadynHax

Commenting with reasons for changes

JadynHax commented Jul 31, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to use fstrings #224

Update to use fstrings #224

JadynHax commented Jul 31, 2020 •

edited

Loading

JadynHax Jul 31, 2020

JadynHax Jul 31, 2020

JadynHax Jul 31, 2020 •

edited

Loading

JadynHax Jul 31, 2020

JadynHax Jul 31, 2020 •

edited

Loading

JadynHax Jul 31, 2020 •

edited

Loading

JadynHax Jul 31, 2020

JadynHax left a comment

		checkpoint_folder = os.path.join('checkpoint', run_name)
		checkpoint_folder = f'checkpoint/{run_name}'

		shutil.copytree(checkpoint_folder, "/content/drive/My Drive/" + checkpoint_folder)
		shutil.copytree(checkpoint_folder, f"/content/drive/My Drive/{checkpoint_folder}")

		r = requests.get(url_base + "/models/" + model_name + "/" + file_name, stream=True)
		r = requests.get(f"{url_base}/models/{model_name}/{file_name}", stream=True)

Update to use fstrings #224

Are you sure you want to change the base?

Update to use fstrings #224

Conversation

JadynHax commented Jul 31, 2020 • edited Loading

A Recommendation

JadynHax Jul 31, 2020

Choose a reason for hiding this comment

JadynHax Jul 31, 2020

Choose a reason for hiding this comment

JadynHax Jul 31, 2020 • edited Loading

Choose a reason for hiding this comment

JadynHax Jul 31, 2020

Choose a reason for hiding this comment

JadynHax Jul 31, 2020 • edited Loading

Choose a reason for hiding this comment

JadynHax Jul 31, 2020 • edited Loading

Choose a reason for hiding this comment

JadynHax Jul 31, 2020

Choose a reason for hiding this comment

JadynHax left a comment

Choose a reason for hiding this comment

JadynHax commented Jul 31, 2020 •

edited

Loading

JadynHax Jul 31, 2020 •

edited

Loading

JadynHax Jul 31, 2020 •

edited

Loading

JadynHax Jul 31, 2020 •

edited

Loading