Using Hugging Face model card name in export_llama

### 🚀 The feature, motivation and pitch

Currently, user need to manually download hugging face safetensors, convert to llama_transformer format, and load the checkpoint and config for the export and inference. 

It would be great to directly download and cache (don't have to load it again) the converted checkpoints, and do the inference. Similar to what mlx does:

```
from mlx_lm import load, generate

model, tokenizer = load("mlx-community/dolphin3.0-llama3.2-3B-4Bit")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
```


### Alternatives

_No response_

### Additional context

_No response_

### RFC (Optional)

_No response_

cc @mergennachin @cccclai @helunwencser @jackzhxng

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using Hugging Face model card name in export_llama #8872

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using Hugging Face model card name in export_llama #8872

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions