fennel-ai · nikhilgarg28 · Aug 22, 2023 · Aug 21, 2023 · Aug 21, 2023 · Aug 21, 2023
diff --git a/.github/workflows/python_lint.yml b/.github/workflows/python_lint.yml
@@ -13,13 +13,17 @@ jobs:
         uses: actions/setup-python@v4
         with:
           python-version: 3.11
+      - name: Install aspell for pyspelling
+        run: sudo apt-get install -y aspell
       - name: Upgrade pip
         run: pip install --upgrade pip
       - name: Install packages
-        run: pip install "flake8>=4.0.1" "black>=22.6.0" "mypy==0.981" # install 0.981 of mypy since future versions seem to be not working with `--exclude`
+        run: pip install "flake8>=4.0.1" "black>=22.6.0" "mypy==0.981" "pyspelling>=2.8.2" # install 0.981 of mypy since future versions seem to be not working with `--exclude`
       - name: flake8 lint
         run: flake8 .
       - name: black lint
         run: black --diff --check .
       - name: mypy typechecking
-        run: mypy .
+        run: mypy .
+      - name: spellcheck
+        run: pyspelling
diff --git a/.gitignore b/.gitignore
@@ -143,4 +143,4 @@ dmypy.json
 /data/*.csv
 
 **/.DS_Store
-
+wordlist.dic
diff --git a/.pyspelling.yml b/.pyspelling.yml
@@ -0,0 +1,29 @@
+matrix:
+  - name: Python source
+    sources:
+      - docs/examples/**/*.py
+    dictionary:
+      wordlists:
+      - .wordlist.txt
+      output: docs/wordlist.dic
+    encoding: utf-8
+    pipeline:
+      - pyspelling.filters.python:
+  - name: markdown
+    sources:
+      - 'docs/**/*.md'
+    dictionary:
+      wordlists:
+        - .wordlist.txt
+      output: wordlist.dic
+    pipeline:
+      - pyspelling.filters.markdown:
+      - pyspelling.filters.html:
+          comments: false
+          attributes:
+          - title
+          - alt
+          ignores:
+          - code
+          - pre
+      - pyspelling.filters.url:
diff --git a/.wordlist.txt b/.wordlist.txt
@@ -0,0 +1,187 @@
+AdministratorAccess
+ai
+api
+APIs
+architected
+assertEqual
+AST
+async
+autoscaling
+Avro
+backend
+backfill
+backfilled
+bmi
+bmr
+bool
+boolean
+booleans
+CIDR
+classmethod
+classmethods
+codebase
+codepaths
+compilable
+config
+configs
+csv
+dataclass
+Datadog
+dataflow
+DataFrame
+dataframe
+DataFrames
+dataset
+dataset's
+datasets
+datastore
+datastores
+datetime
+dateutil
+DDL
+declaratively
+dedup
+denormalize
+dev
+df
+dfe
+Dockerfile
+docsnip
+ds
+DSL
+DSLs
+durations
+embeddings
+enabledTLSProtocols
+featureset
+featuresets
+fintech
+Flink
+frontend
+GCP
+GCP's
+geocoding
+geoid
+Github
+Grafana
+Graviton
+Groupby
+groupby
+GRPC
+gserviceaccount
+hackathon
+hardcoded
+html
+hudi
+iam
+InfoSec
+Instacart
+IOPS
+ip
+ish
+ith
+jdbc
+JSON
+json
+JSX
+JVM
+kafka
+Kaggle
+Kubernetes
+kwarg
+kwargs
+LastK
+latencies
+LHS
+lifecycle
+lookup
+lookups
+metaflags
+MockClient
+multicolumn
+mysql
+nan
+natively
+Nones
+noqa
+np
+nullable
+OAuth
+OOM
+OpenSSL's
+PagerDuty
+params
+parseable
+pid
+PII
+PLAINTEXT
+PoolableConnectionFactory
+postgres
+pre
+precompute
+precomputed
+PrivateLink
+protobuf
+protobufs
+Pulumi
+Pydantic
+PyO
+quickstart
+realtime
+Realtimeliness
+regex
+regexes
+repo
+RHS
+RocksDB
+ROI
+RPCs
+runtime
+SASL
+scalability
+scalable
+schemas
+SDK
+SearchRequest
+SHA
+Signifier
+signup
+SLA
+snowflakecomputing
+SSL
+stateful
+Stddev
+str
+strftime
+struct
+TestCase
+TestDataset
+tiering
+TLS
+TLSv
+Tokio
+Tokio's
+UI
+uid
+uint
+uints
+uncomment
+unittest
+uptime
+uptimes
+UserCreator
+UserCreditScore
+UserFeature
+UserFeatures
+userid
+UserInfo
+UserInfoDataset
+UserLocation
+UserPost
+UserTransactionsAbroad
+VPC
+webhook
+webhooks
+WIP
+WIP
+YAML
diff --git a/docs/.gitignore b/docs/.gitignore
@@ -3,4 +3,5 @@ examples/**.json
 .vscode
 venv/
 **/__pycache__/*
-.idea/
+.idea/
+wordlist.dic
diff --git a/docs/README.md b/docs/README.md
@@ -7,32 +7,32 @@ To get up and running contributing to the documentation, take the following step
 2. Rename `.env.example` to `.env` and fill out the values.
     - `GITHUB_TOKEN` should be a valid Github PAT with access to read the `fennel-ai/turbo` repo
     - `GITHUB_REPO` is the location of the Dockerfile that builds the frontend, and should be set to `fennel-ai/turbo`
-3. Run in your terminal `make up` from the root 
-	- This will pull in the Docs UI repo from Github, and run it on `localhost:3001/docs`
+3. Run in your terminal `make up` from the root
+    - This will pull in the Docs UI repo from Github, and run it on `localhost:3001/docs`
 4. Edit the markdown and python files in this repo, and get hot-reloading showing the latest changes on `localhost`
 5. Commit changes once you're ready.
-	- Upon commit, the python files will run through the test suite and block any broken examples from going live in the documentation.
+    - Upon commit, the python files will run through the test suite and block any broken examples from going live in the documentation.
 
-> When new updates are made to the UI, you may need to run `make build` before `make up` in order to force Docker to refetch the latest changes and rebuild the image.
+> When new updates are made to the UI, you may need to run `make build` before `make up` in order to force Docker to fetch the latest changes and rebuild the image.
 
 ## `./examples`
-The example directory holds Python test files. Anywhere in these files you can wrap any number of lines between `# docsnip` comments 
+The example directory holds Python test files. Anywhere in these files you can wrap any number of lines between `# docsnip` comments
 **e.g.** `example.py`:
 ```python
-from fennel import * 
+from fennel import *
 
 # docsnip my_snippet
 @dataset
 class UserInfoDataset:
-	name: str
-	email: str
-	id: str
-	age: int
+    name: str
+    email: str
+    id: str
+    age: int
 # /docsnip
 
-	def my_pipeline():
-		# todo
-		return False
+    def my_pipeline():
+        # todo
+        return False
 ```
 
 Now, in any of our markdown files you can write:
@@ -55,4 +55,4 @@ The `index.yml` file is used to set global configuration options for the docs. C
 
 Any pages that are _not_ in the file are still generated in dev and production (if they are not a `draft`) and can be navigated/linked to, but won't appear in the sidebar.
 
-The `version` field gives us a way to easily pick out the version tag for this branch of the documentation from the UI side.
+The `version` field gives us a way to easily pick out the version tag for this branch of the documentation from the UI side.
diff --git a/docs/examples/datasets/datasets.py b/docs/examples/datasets/datasets.py
@@ -90,7 +90,7 @@ class User:
 
 # invalid - no explicitly marked `timestamp` field
 # and multiple fields of type `datetime` so timestamp
-# field is amgiguous
+# field is ambiguous
 def test_ambiguous_timestamp_field():
     with pytest.raises(Exception) as e:
         # docsnip invalid_user_dataset_ambiguous_timestamp_field

diff --git a/docs/pages/api-reference/aggregations.md b/docs/pages/api-reference/aggregations.md
@@ -6,7 +6,7 @@ status: 'published'
 
 # Aggregations
 
-Aggregations are provided to the \`aggregate\` operator and specify how the agggregation should happen. All aggregations take two common arguments:
+Aggregations are provided to the \`aggregate\` operator and specify how the aggregation should happen. All aggregations take two common arguments:
 
 1. `window`: Window - argument that specifies the length of the duration across which Fennel needs to perform the aggregation. See how [duration](/api-reference/duration) is specified in Fennel.
 2. `into_field`: str - the name of the field in the output dataset that corresponds to this aggregation. This&#x20;
@@ -18,7 +18,7 @@ Besides these common arguments, here is the rest of the API reference for all th
 Count computes a rolling count for each group key across a window.  It returns 0 by default. Its output type is always `int`.&#x20;
 The count aggregate also takes an optional argument `unique` which is a boolean. If set to true, counts the number of unique values in the given window.&#x20;
 The field over which the count is computed is specified by the `of` parameter of type `str`.&#x20;
-Count also takes `approx` as an argument that when set to true, makes the count an approximate, but allows Fennel to be more efficient with state storage. 
+Count also takes `approx` as an argument that when set to true, makes the count an approximate, but allows Fennel to be more efficient with state storage.
 Currently, Fennel only supports approximate unique counts, hence if `unique` is set to true, `approx` must also be set to true.&#x20;
 
 ### 2. Sum &#x20;

diff --git a/docs/pages/api-reference/client.md b/docs/pages/api-reference/client.md
@@ -1,7 +1,7 @@
 ---
 title: Client
 order: 0
-status: wip
+status: WIP
 ---
 
 # Client
@@ -14,7 +14,7 @@ Given some input and output features, extracts the current value of all the outp
 
 **Arguments:**
 
-* `output_feature_list: List[Union[Feature, Featureset]]`: list of features (written as fully qualified name of a feature along with the featureset) that should be extracted. Can also take featurset objects as input, in which case all features in the featureset are extracted.
+* `output_feature_list: List[Union[Feature, Featureset]]`: list of features (written as fully qualified name of a feature along with the featureset) that should be extracted. Can also take featureset objects as input, in which case all features in the featureset are extracted.
 * `input_feature_list: List[Union[Feature, Featureset]]` : list of features/featuresets for which values are known
 * `input_df: Dataframe`: a pandas dataframe object that contains the values of all features in the input feature list. Each row of the dataframe can be thought of as one entity for which features are desired.
 * `log: bool` - boolean which indicates if the extracted features should also be logged (for log-and-wait approach to training data generation). Default is False
@@ -73,11 +73,11 @@ This method throws an error if the schema of the dataframe (i.e. column names an
 
 ### **extract_historical_features**
 
-For offline training of models, users often need to extract features for a large number of entities. 
+For offline training of models, users often need to extract features for a large number of entities.
 This method allows users to extract features for a large number of entities in a single call while ensuring
 point-in-time correctness of the extracted features.
 
-This api is an asynchronous api that returns a request id and the path to the output folder in S3 containing the extracted features.&#x20; 
+This api is an asynchronous api that returns a request id and the path to the output folder in S3 containing the extracted features.&#x20;
 &#x20;
 
 **Arguments:**
@@ -112,7 +112,7 @@ A completion rate of 1.0 and a failure rate of 0.0 indicates that all processing
 
 ### **extract_historical_features_progress**
 
-This method allows users to monitor the progress of the extract_historical_features asynchronous operation. 
+This method allows users to monitor the progress of the extract_historical_features asynchronous operation.
 It accepts the request ID that was returned by the `extract_historical_features` method and returns the current status of that operation.
 
 The response format of this function and the `extract_historical_features` function are identical.&#x20;
@@ -129,7 +129,7 @@ The response format of this function and the `extract_historical_features` funct
   * request_id
   * output s3 bucket
   * output s3 path prefix
-  * completion rate. 
+  * completion rate.
   * failure rate.
 
 A completion rate of 1.0 indicates that all processing has been completed.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -143,4 +143,4 @@ dmypy.json
		/data/*.csv

		**/.DS_Store

		wordlist.dic
-Original file line number
+Diff line change
@@ Expand Up / @@ -3,4 +3,5 @@ examples/**.json @@
     .vscode
     venv/
     **/__pycache__/*
-    .idea/
+    .idea/
+    wordlist.dic