Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update from_pretrained method #15

Open
wants to merge 1 commit into
base: development
Choose a base branch
from

Conversation

not-lain
Copy link
Contributor

fixes #14
a bit of a hacky approach since i'm using wrappers & kwargs instead of the original parameter names in the from_pretrained but let me know if you approve of this or if we should pass the parameters to the methods as well.

still very useful when running

help(AutoTikTokenizer.from_pretrained) 

@bhavnicksm
Copy link
Collaborator

Hey @not-lain!

Thanks for opening a PR!

Could you explain what exactly is happening in the code here? I see **kwargs being passed around, but not sure what are some of the parameters these arguments could potentially have.

Or if you could point me to some link that could explain kwargs that are important to have, that would also be great.

Thanks!

@not-lain
Copy link
Contributor Author

sure thing, let's consider the following minimalistic function

# expects keyword arguments
def placeholder(**kwargs) : 
	print(kwargs)


placeholder(a=5, b="hello" ) 
>>> {'a': 5, 'b': 'hello'}

basically kwargs expects ulimited keyword arguments and stores them in a dictionary

@bhavnicksm
Copy link
Collaborator

sure thing, let's consider the following minimalistic function

# expects keyword arguments
def placeholder(**kwargs) : 
	print(kwargs)


placeholder(a=5, b="hello" ) 
>>> {'a': 5, 'b': 'hello'}

basically kwargs expects ulimited keyword arguments and stores them in a dictionary

Hey @not-lain! 😆

Thanks for the explaination~ What I meant was, were there any specific kwargs we wanted to support? Sorry, that wasn't clear earlier...

Thanks!

@not-lain
Copy link
Contributor Author

Hi @bhavnicksm
I think the top ones would be :

  • token
  • cache_dir & local_dir
  • force_download (useful in case of internet failures or corrupted files or something)

let me know if you approve of these parameters or you have any other recommendations in mind and i'll update the PR accordingly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants