SolCoder is LLM which generates solidity source code from user prompt comments. It is hosted on the Huggingface repository Pipper/Solcoder and is continuously uptated. The model output is focussed on generating functions from user comments.
P.S.: No source code context is considered if using an editor or such. The generated code might not be executable due to context variables and other dependencies, that might be defined prior to the generation. The user might adjust these to be aligned with the context.
The datasets used is an aggregation of multiple datasources.
You can follow the instructions in the docs and contact Kaan Uzdogan for the credentials.
See slither-solidity for the sourcify data processing.
mwritescode/slither-audited-smart-contracts
from datasets import load_dataset
# specific language (e.g. Dockerfiles)
ds = load_dataset("bigcode/the-stack-dedup", data_dir="data/solidity", split="train")
IMPORTANT: None of the code parsed in the dataset is statically audited or further audited if already. Use static analyzers such as slither for the purpose.
Important: The token must have write access. This is use to push data and model to the HF hub automatically at creation time
export HF_TOKEN="your write access token"
export HF_BEARER_TK="your api bearer token from HF"
accelerate launch --num_cpu_threads_per_process 8 train/train_phi2_lora_qa.py