Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce document for instructlab-sdk #184
introduce document for instructlab-sdk #184
Changes from all commits
cf0bf3f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent with the Python SDK should be to serve the Data Scientists and AI engineers doing experiments as the primary use. Then the constructs required for embedding or extending InstructLab from other products and platforms.
For a data science experience I propose to be inspired to what sklearn or keras or pytorch high level structure provide for this persona.
For example, what would it take to have an experience similar to :
I would propose to focus on accelerating the goal of the persona (the data scientists), and not on mapping or exposing the InstructLab internal architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is possibly a good end goal, but would require significant re-architecture of InstructLab, and the libraries (what they expose).
For the purposes of this dev-doc, I can incorporate some of this into what I am proposing, but for an alpha SDK, keeping the interactions as simple yet expandable as possible is my goal.
So we should aim to not require library adjustments/changes at first, and then we can add functionality once the structure is in place.
Let me incorporate some of this and I will update the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @williamcaban has a point here about the persona we're building this SDK for. Who are we making this SDK for? Datascientist? People trying to build REST-API based instructlab services? The design of the SDK might vary based on who our target user of the SDK would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely important to understand the persona and goal of the SDK. William is proposing a fairly large re-architecting of the APIs, while this dev doc seems mostly focused on exposing existing CLI flows and functionality via a Python SDK. Is our goal to expose the existing end-to-end flow via a Python SDK? Or to provide a new way to interact with various InstructLab components that's more granular and flexible in how you compose your own end-to-end workflow from the pieces we're exposing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am making some updates to the PR (some are already up), aiming to meld the two approaches a bit, lmk what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the updates look like a reasonable enough place for work to start.
The actual Python code and list of APIs is only illustrative, right? From reading this, it's my understanding that the actual parameters and methods on individual classes shown here are just an example since you call out future work is to design the actual SDK based on the structure above and negotiate user contracts with library maintainers. For example, if SDG points out that the
Data.Generator
constructor needs different params or that we have to actually use the taxonomy diff results as input to data generation as opposed to just checking if there is a diff, this kind of thing can be ironed out later?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One additional point I want to raise - what we definitely don't want is to be exposing pure library functionalities through a class in the Core repo - users that want that should just be importing the libraries directly.
As Charlie notes above what we want here is an SDK for the opinionated
ilab
workflow that this package is centered around - for example, our Data class wouldn't expose every public function in the SDG library, but it would expose functions we have withinsrc/instructlab/data
such asgenerate_data
andlist_data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bbrowning yep, the code is pretty much illustrative. The structure of classes I think is what I am focusing on the most. The actual arguments, functions within the
Data
class, etc can be ironed out later on. If we were to figure that out now, it would take forever!