Skip to content

Commit

Permalink
chore: Fix multiple typos (huggingface#28574)
Browse files Browse the repository at this point in the history
  • Loading branch information
hugo-syn authored and wgifford committed Jan 21, 2024
1 parent d231358 commit 33f438d
Show file tree
Hide file tree
Showing 5 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion examples/research_projects/codeparrot/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ The raw dataset contains many duplicates. We deduplicated and filtered the datas
- fraction of alphanumeric characters < 0.25
- containing the word "auto-generated" or similar in the first 5 lines
- filtering with a probability of 0.7 of files with a mention of "test file" or "configuration file" or similar in the first 5 lines
- filtering with a probability of 0.7 of files with high occurence of the keywords "test " or "config"
- filtering with a probability of 0.7 of files with high occurrence of the keywords "test " or "config"
- filtering with a probability of 0.7 of files without a mention of the keywords `def` , `for`, `while` and `class`
- filtering files that use the assignment operator `=` less than 5 times
- filtering files with ratio between number of characters and number of tokens after tokenization < 1.5 (the average ratio is 3.6)
Expand Down
2 changes: 1 addition & 1 deletion examples/research_projects/jax-projects/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1153,7 +1153,7 @@ In the following, we will describe how to do so using a standard console, but yo
2. Once you've installed the google cloud sdk, you should set your account by running the following command. Make sure that `<your-email-address>` corresponds to the gmail address you used to sign up for this event.

```bash
$ gcloud config set account <your-email-adress>
$ gcloud config set account <your-email-address>
```

3. Let's also make sure the correct project is set in case your email is used for multiple gcloud projects:
Expand Down
2 changes: 1 addition & 1 deletion examples/research_projects/jax-projects/big_bird/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,4 @@ wget https://huggingface.co/datasets/vasudevgupta/natural-questions-validation/r
python3 evaluate.py
```

You can find our checkpoint on HuggingFace Hub ([see this](https://huggingface.co/vasudevgupta/flax-bigbird-natural-questions)). In case you are interested in PyTorch BigBird fine-tuning, you can refer to [this repositary](https://github.com/thevasudevgupta/bigbird).
You can find our checkpoint on HuggingFace Hub ([see this](https://huggingface.co/vasudevgupta/flax-bigbird-natural-questions)). In case you are interested in PyTorch BigBird fine-tuning, you can refer to [this repository](https://github.com/thevasudevgupta/bigbird).
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ To adapt the script for other models, we need to also change the `ParitionSpec`

TODO: Add more explantion.

Before training, let's prepare our model first. To be able to shard the model, the sharded dimention needs to be a multiple of devices it'll be sharded on. But GPTNeo's vocab size is 50257, so we need to resize the embeddings accordingly.
Before training, let's prepare our model first. To be able to shard the model, the sharded dimension needs to be a multiple of devices it'll be sharded on. But GPTNeo's vocab size is 50257, so we need to resize the embeddings accordingly.

```python
from transformers import FlaxGPTNeoForCausalLM, GPTNeoConfig
Expand Down
2 changes: 1 addition & 1 deletion examples/research_projects/mlm_wwm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,4 +95,4 @@ python run_mlm_wwm.py \

**Note1:** On TPU, you should the flag `--pad_to_max_length` to make sure all your batches have the same length.

**Note2:** And if you have any questions or something goes wrong when runing this code, don't hesitate to pin @wlhgtc.
**Note2:** And if you have any questions or something goes wrong when running this code, don't hesitate to pin @wlhgtc.

0 comments on commit 33f438d

Please sign in to comment.