Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

[WIP] 30 mins timeout for nightly CI - nodes PR:298 #394

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,20 +1,29 @@
Hugging Face Pipeline for Image Classification.
The HUGGING_FACE_PIPELINE node uses a classification pipeline to process and classify an image.

For more information about Vision Transformers,
see: https://huggingface.co/google/vit-base-patch16-224

For a complete list of models, see:
https://huggingface.co/models?pipeline_tag=image-classification

For examples of how revision parameters (such as 'main') is used,
see: https://huggingface.co/google/vit-base-patch16-224/commits/main

Parameters
----------
default: Image
The input image to be classified. The image must be a PIL.Image object wrapped in a flojoy Image object.
model: str
default : Image
The input image to be classified.
The image must be a PIL.Image object, wrapped in a Flojoy Image object.
model : str
The model to be used for classification.
If not specified, Vision Transformers (i.e. `google/vit-base-patch16-224`) are used.
For more information about Vision Transformers, see: https://huggingface.co/google/vit-base-patch16-224
For a complete list of models see: https://huggingface.co/models?pipeline_tag=image-classification
revision: str
If not specified, Vision Transformers (i.e. 'google/vit-base-patch16-224') are used.
revision : str
The revision of the model to be used for classification.
If not specified, main is `used`. For instance see: https://huggingface.co/google/vit-base-patch16-224/commits/main
If not specified, 'main' is used.

Returns
-------
DataFrame:
A DataFrame containing as columns the `label` classification label and `score`, its confidence score.
All scores are between 0 and 1 and sum to 1.
A DataFrame containing the columns 'label' (as classification label)
and 'score' (as the confidence score).
All scores are between 0 and 1, and sum to 1.
57 changes: 57 additions & 0 deletions docs/nodes/AI_ML/LOAD_MODEL/ONNX_MODEL/ONNX_MODEL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@

[//]: # (Custom component imports)

import DocString from '@site/src/components/DocString';
import PythonCode from '@site/src/components/PythonCode';
import AppDisplay from '@site/src/components/AppDisplay';
import SectionBreak from '@site/src/components/SectionBreak';
import AppendixSection from '@site/src/components/AppendixSection';

[//]: # (Docstring)

import DocstringSource from '!!raw-loader!./a1-[autogen]/docstring.txt';
import PythonSource from '!!raw-loader!./a1-[autogen]/python_code.txt';

<DocString>{DocstringSource}</DocString>
<PythonCode GLink='AI_ML/LOAD_MODEL/ONNX_MODEL/ONNX_MODEL.py'>{PythonSource}</PythonCode>

<SectionBreak />



[//]: # (Examples)

## Examples

import Example1 from './examples/EX1/example.md';
import App1 from '!!raw-loader!./examples/EX1/app.json';



<AppDisplay
nodeLabel='ONNX_MODEL'
appImg={''}
outputImg={''}
>
{App1}
</AppDisplay>

<Example1 />

<SectionBreak />



[//]: # (Appendix)

import Notes from './appendix/notes.md';
import Hardware from './appendix/hardware.md';
import Media from './appendix/media.md';

## Appendix

<AppendixSection index={0} folderPath='nodes/AI_ML/LOAD_MODEL/ONNX_MODEL/appendix/'><Notes /></AppendixSection>
<AppendixSection index={1} folderPath='nodes/AI_ML/LOAD_MODEL/ONNX_MODEL/appendix/'><Hardware /></AppendixSection>
<AppendixSection index={2} folderPath='nodes/AI_ML/LOAD_MODEL/ONNX_MODEL/appendix/'><Media /></AppendixSection>


41 changes: 41 additions & 0 deletions docs/nodes/AI_ML/LOAD_MODEL/ONNX_MODEL/a1-[autogen]/docstring.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
ONNX_MODEL loads a serialized ONNX model and uses it to make predictions using ONNX Runtime.

This allows supporting a wide range of deep learning frameworks and hardware platforms.

Notes
-----

On the one hand, ONNX is an open format to represent deep learning models.
ONNX defines a common set of operators - the building blocks of machine learning
and deep learning models - and a common file format to enable AI developers
to use models with a variety of frameworks, tools, runtimes, and compilers.

See: https://onnx.ai/

On the other hand, ONNX Runtime is a high-performance inference engine for machine
learning models in the ONNX format. ONNX Runtime has proved to considerably increase
performance in inferencing for a broad range of ML models and hardware platforms.

See: https://onnxruntime.ai/docs/

Moreover, the ONNX Model Zoo is a collection of pre-trained models for common
machine learning tasks. The models are stored in ONNX format and are ready to use
in different inference scenarios.

See: https://github.com/onnx/models

Parameters
----------
file_path : str
Path to a ONNX model to load and use for prediction.

default : Vector
The input tensor to use for prediction.
For now, only a single input tensor is supported.
Note that the input tensor shape is not checked against the model's input shape.

Returns
-------
Vector:
The predictions made by the ONNX model.
For now, only a single output tensor is supported.
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
from flojoy import flojoy, run_in_venv, Vector
from flojoy.utils import FLOJOY_CACHE_DIR


@flojoy
@run_in_venv(
pip_dependencies=[
"onnxruntime",
"numpy",
"onnx",
]
)
def ONNX_MODEL(
file_path: str,
default: Vector,
) -> Vector:


import os
import onnx
import urllib.request
import numpy as np
import onnxruntime as rt

model_name = os.path.basename(file_path)

if file_path.startswith("http://") or file_path.startswith("https://"):
# Downloading the ONNX model from a URL to FLOJOY_CACHE_DIR.
onnx_model_zoo_cache = os.path.join(
FLOJOY_CACHE_DIR, "cache", "onnx", "model_zoo"
)

os.makedirs(onnx_model_zoo_cache, exist_ok=True)

filename = os.path.join(onnx_model_zoo_cache, model_name)

urllib.request.urlretrieve(
url=file_path,
filename=filename,
)

# Using the downloaded file.
file_path = filename

# Pre-loading the serialized model to validate whether is well-formed or not.
model = onnx.load(file_path)
onnx.checker.check_model(model)

# Using ONNX runtime for the ONNX model to make predictions.
sess = rt.InferenceSession(file_path, providers=["CPUExecutionProvider"])

# TODO(jjerphan): Assuming a single input and a single output for now.
input_name = sess.get_inputs()[0].name
label_name = sess.get_outputs()[0].name

# TODO(jjerphan): For now NumPy is assumed to be the main backend for Flojoy.
# We might adapt it in the future so that we can use other backends
# for tensor libraries for application using Deep Learning libraries.
input_tensor = np.asarray(default.v, dtype=np.float32)
predictions = sess.run([label_name], {input_name: input_tensor})[0]

return Vector(v=predictions)
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This node does not require any peripheral hardware to operate. Please see INSTRUMENTS for nodes that interact with the physical world through connected hardware.
1 change: 1 addition & 0 deletions docs/nodes/AI_ML/LOAD_MODEL/ONNX_MODEL/appendix/media.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
No supporting screenshots, photos, or videos have been added to the media.md file for this node.
1 change: 1 addition & 0 deletions docs/nodes/AI_ML/LOAD_MODEL/ONNX_MODEL/appendix/notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
No theory or technical notes have been contributed for this node yet.
16 changes: 7 additions & 9 deletions docs/nodes/AI_ML/NLP/COUNT_VECTORIZER/a1-[autogen]/docstring.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
The COUNT_VECTORIZER node receives a collection (matrix, vector or dataframe) of text documents and converts it to a matrix of token counts.

The COUNT_VECTORIZER node receives a collection (matrix, vector or dataframe) of
text documents to a matrix of token counts.

Returns
-------
tokens: DataFrame
holds all the unique tokens observed from the input.
word_count_vector: Vector
contains the occurences of these tokens from each sentence.
Returns
-------
tokens: DataFrame
Holds all the unique tokens observed from the input.
word_count_vector: Vector
Contains the occurences of these tokens from each sentence.
Original file line number Diff line number Diff line change
@@ -1,44 +1,44 @@
The PROPHET_PREDICT node runs a Prophet model on the incoming dataframe.

The PROPHET_PREDICT node rains a Prophet model on the incoming dataframe.
The DataContainer input type must be a dataframe, and the first column (or index) of the dataframe must be of a datetime type.

The DataContainer input type must be a dataframe, and the first column (or index) of dataframe must be of a datetime type.
This node always returns a DataContainer of a dataframe type. It will also always return an 'extra' field with a key 'prophet' of which the value is the JSONified Prophet model.
This model can be loaded as follows:

This node always returns a DataContainer of a dataframe type. It will also always return an "extra" field with a key "prophet" of which the value is the JSONified Prophet model.
This model can be loaded as follows:
```python
from prophet.serialize import model_from_json
```python
from prophet.serialize import model_from_json

model = model_from_json(dc_inputs.extra["prophet"])
```
model = model_from_json(dc_inputs.extra["prophet"])
```

Parameters
----------
run_forecast : bool
If True (default case), the dataframe of the returning DataContainer
("m" parameter of the DataContainer) will be the forecasted dataframe.
It will also have an "extra" field with the key "original", which is
the original dataframe passed in.
Parameters
----------
run_forecast : bool
If True (default case), the dataframe of the returning DataContainer
('m' parameter of the DataContainer) will be the forecasted dataframe.
It will also have an 'extra' field with the key 'original', which is
the original dataframe passed in.

If False, the returning dataframe will be the original data.
If False, the returning dataframe will be the original data.

This node will also always have an "extra" field, run_forecast, which
matches that of the parameters passed in. This is for future nodes
to know if a forecast has already been run.
This node will also always have an 'extra' field, run_forecast, which
matches that of the parameters passed in. This is for future nodes
to know if a forecast has already been run.

Default = True
Default = True

periods : int
The number of periods to predict out. Only used if run_forecast is True.
Default = 365
periods : int
The number of periods to predict out. Only used if run_forecast is True.
Default = 365

Returns
-------
DataFrame
With parameter as df.
Indicates either the original df passed in, or the forecasted df
(depending on if run_forecast is True).
Returns
-------
DataFrame
With parameter as df.
Indicates either the original df passed in, or the forecasted df
(depending on if run_forecast is True).

DataContainer
With parameter as "extra".
Contains keys run_forecast which correspond to the input parameter,
and potentially "original" in the event that run_forecast is True.
DataContainer
With parameter as 'extra'.
Contains keys run_forecast which correspond to the input parameter,
and potentially 'original' in the event that run_forecast is True.
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
The LEAST_SQUARE node computes the coefficients that minimize the distance between the inputs 'Matrix' or 'OrderedPair' class and the regression.

The LEAST_SQUARE node computes the coefficients that minimizes the distance between the inputs 'Matrix' or 'OrderedPair' class and the regression.

Returns
-------
OrderedPair
x: input matrix (data points)
y: fitted line computed with returned regression weights
Matrix
m : fitted matrix computed with returned regression weights
Returns
-------
OrderedPair
x: input matrix (data points)
y: fitted line computed with returned regression weights
Matrix
m: fitted matrix computed with returned regression weights
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@

The DEEPLAB_V3 node returns a segmentation mask from an input image in a dataframe.

The input image is expected to be a DataContainer of an "image" type.
The input image is expected to be a DataContainer of an 'image' type.

The output is a DataContainer of an "image" type with the same dimensions as the input image, but with the red, green, and blue channels replaced with the segmentation mask.
The output is a DataContainer of an 'image' type with the same dimensions as the input image, but with the red, green, and blue channels replaced with the segmentation mask.

Returns
-------
Image
Returns
-------
Image
4 changes: 2 additions & 2 deletions docs/nodes/EXTRACTORS/FILE/READ_S3/a1-[autogen]/docstring.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ The READ_S3 node takes a S3_key name, S3 bucket name, and file name as input, an
Parameters
----------
s3_name : str
name of the key that the user used to save access and secret access key
name of the key that the user used to save the access and secret access keys
bucket_name : str
AWS S3 bucket name that they are trying to access
Amazon S3 bucket name that they are trying to access
file_name : str
name of the file that they want to extract

Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
The R_DATASET node retrieves a pandas DataFrame from 'rdatasets', using the provided dataset_key parameter, and returns it wrapped in a DataContainer.

The R_DATASET node retrieves a pandas DataFrame from rdatasets using the provided dataset_key parameter and returns it wrapped in a DataContainer.
Parameters
----------
dataset_key : str

Parameters
----------
dataset_key : str

Returns
-------
DataFrame
A DataContainer object containing the retrieved pandas DataFrame.
Returns
-------
DataFrame
A DataContainer object containing the retrieved pandas DataFrame.
Loading