Skip to content

Commit b26782e

Browse files
Merge pull request #193 from alimaredia/data-prep-utils
Introduce new utils repo
2 parents d3b06cf + e972118 commit b26782e

File tree

1 file changed

+40
-0
lines changed

1 file changed

+40
-0
lines changed
+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Create Separate Repo for User Utilities
2+
3+
## Idea Overview
4+
5+
Create a separate repository within the `instructlab` GitHub org called `support-utils`.
6+
This repository would house scripts and notebooks outside of the scope of the LAB Methodology implemented in the [Instructlab Core](https://github.com/instructlab/instructlab) repository that enhance the InstructLab experience.
7+
Many users and community members already have such scripts they use day to day.
8+
The `support-utils` repo would be a place where the maintainers of the InstructLab project can collect and curate them for the benefit of the community.
9+
Scripts in this repository may become features or incorporated in the InstructLab Core repository after use and review by users and developers.
10+
11+
## Repository Structure
12+
13+
The repository will have two categories of scripts. Scripts either live in the `hack` and `beta` directories.
14+
15+
```bash
16+
support-utils
17+
|
18+
|
19+
|- beta
20+
|
21+
|- hack
22+
```
23+
24+
The `hack` directory is open for the contribution of scripts of any quality.
25+
26+
Scripts in the `beta` directory will be required to have documentation, and automated functional testing.
27+
These scripts are meant to be run by users for feedback and may graduate into full blown features in other InstructLab repos.
28+
29+
Beyond this initial structure, the structure within those two directories will evolve as scripts are contributed to each.
30+
31+
## Additional Info
32+
33+
A few areas of focus for the first scripts that will be added to the repository are:
34+
35+
- Automating qna.yaml creation
36+
- Assessing document readiness knowing the limitations of Docling
37+
- Visualizing synthetically generated data for inspection
38+
39+
This repo would not be released as a package on PYPI but initially as just `.zip` and `.tar.gz` files on GitHub.
40+
Releases would serve the purpose of giving users having specific versions of scripts in `beta` and for development project management purposes.

0 commit comments

Comments
 (0)