-
Couldn't load subscription status.
- Fork 37
Support of generate #261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support of generate #261
Conversation
|
Next steps are to crate tests and add docs |
|
There are 3 open questions:
|
|
As a side note, to support downstream evals during training, we do not necessarily need |
Let's keep the existing format and not use a new rule.
The
I don't really see the point of |
I just added future parity with However, I meant a different thing here — Fast-LLM's That said, @tscholak is okay with using tuples as keys in |
We still need to support different global batch sizes for the same For example, consider a training scenario with:
This configuration uses However, when performing evaluation during training, we cannot use the same Additionally, |
Yeah, you're right — |
The tuple and dot formats are basically the same, it's just parsed vs unparsed. I don't think there is much use for the unparsed version outside of a runnable, because dicts are better in-code.
Again, Note that sequential micro-batches are irrelevant for inference, it's just separate batches for all practical purposes. |
|
Thanks for the details.
Still, if you pass a OK, I’ll remove the batch config–related parameters from the constructor. |
|
I needed to downgrade |
Does this relate to #249? |
|
Should be a separate issue as However this branch is from May 12 main and it build docs on github |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, only need to address the documentation and slow test issues.
Concerning the doc, I pushed a tentative fix in #271, but we won't know if it works until we merge the PR.
Concerning the tests, I suggest you add one with a dummy checkpoint (ex. the one from test_checkpoint, and keep the existing one with a skip mark. These longer tests do make sense, and I'd like us to find a way to bring them back without disrupting existing workflows.
| @@ -0,0 +1,77 @@ | |||
| --- | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't be published because of @249. I think the problem is missing variables in
Fast-LLM/.github/workflows/docs.yaml
Line 59 in 3ac976b
| FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE FLASH_ATTENTION_FORCE_BUILD=TRUE pip install --no-build-isolation -e ".[CORE,OPTIONAL,DEV,DOCS]" |
(Like those in
Fast-LLM/.github/workflows/docs.yaml
Line 34 in 3ac976b
| FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE FLASH_ATTENTION_FORCE_BUILD=TRUE MAMBA_SKIP_CUDA_BUILD=TRUE MAMBA_FORCE_BUILD=TRUE CAUSAL_CONV1D_FORCE_BUILD=TRUE CAUSAL_CONV1D_SKIP_CUDA_BUILD=TRUE pip install --no-build-isolation -e ".[CORE,OPTIONAL,DEV,DOCS]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
We can postpone finding a smaller model for the test to another time. The current test should be kept but not run by default.
|
i have added |
|
I’ve added faster tests using a dummy checkpoint from the checkpoint tests. They’re still relatively slow (around 11–15 seconds), so I’ve marked them as |
|
|
||
|
|
||
| @pytest.mark.slow | ||
| @requires_cuda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.depends? on the checkpoint test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not depend on it, trains a little independently and saves directly to HF format:
@pytest.fixture(scope="module")
def small_model():
from .common import _CONFIGS, TEST_RESULTS_PATH, run_test_script
_, _, _, common_config, fast_llm_checkpoint_format = _CONFIGS["llama"]
run_test_script(
f"test_llama_generate_and_forward",
common_config
+ ["training.checkpoint.interval=1", "training.export.format=llama", "training.export.interval=1"],
)
return TEST_RESULTS_PATH / "test_llama_generate_and_forward/export/llama/2", fast_llm_checkpoint_format
✨ Description
Adds support for
generateand extends support forforwardwithout handlingcache,past_key_values,labels,attentionoutput, orinputs_embeds.position_idsare ignored and reconstructed from the attention mask.Currently, only data-parallel generation is supported.
Closes #217
🔍 Type of change
Select all that apply:
📝 Changes
List the key changes introduced in this PR:
HuggingfacePreTrainedModelinterface to achieve feature parity withFastLLMModel.from_pretrained, and modified the__init__method to optionally acceptrunnerandconfig.forwardto supportgenerate, aligning behavior with Hugging Face.generate.✅ Checklist
Make sure the following tasks are completed before submitting the PR:
General
Testing