Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spellcheck for docs #272

Merged
merged 8 commits into from
Aug 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .github/workflows/python_lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,17 @@ jobs:
uses: actions/setup-python@v4
with:
python-version: 3.11
- name: Install aspell for pyspelling
run: sudo apt-get install -y aspell
- name: Upgrade pip
run: pip install --upgrade pip
- name: Install packages
run: pip install "flake8>=4.0.1" "black>=22.6.0" "mypy==0.981" # install 0.981 of mypy since future versions seem to be not working with `--exclude`
run: pip install "flake8>=4.0.1" "black>=22.6.0" "mypy==0.981" "pyspelling>=2.8.2" # install 0.981 of mypy since future versions seem to be not working with `--exclude`
- name: flake8 lint
run: flake8 .
- name: black lint
run: black --diff --check .
- name: mypy typechecking
run: mypy .
run: mypy .
- name: spellcheck
run: pyspelling
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -143,4 +143,4 @@ dmypy.json
/data/*.csv

**/.DS_Store

wordlist.dic
29 changes: 29 additions & 0 deletions .pyspelling.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
matrix:
- name: Python source
sources:
- docs/examples/**/*.py
dictionary:
wordlists:
- .wordlist.txt
output: docs/wordlist.dic
encoding: utf-8
pipeline:
- pyspelling.filters.python:
- name: markdown
sources:
- 'docs/**/*.md'
dictionary:
wordlists:
- .wordlist.txt
output: wordlist.dic
pipeline:
- pyspelling.filters.markdown:
- pyspelling.filters.html:
comments: false
attributes:
- title
- alt
ignores:
- code
- pre
- pyspelling.filters.url:
187 changes: 187 additions & 0 deletions .wordlist.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
AdministratorAccess
ai
api
APIs
architected
assertEqual
AST
async
autoscaling
Avro
backend
backfill
backfilled
bmi
bmr
bool
boolean
booleans
CIDR
classmethod
classmethods
codebase
codepaths
compilable
config
configs
csv
dataclass
Datadog
dataflow
DataFrame
dataframe
DataFrames
dataset
dataset's
datasets
datastore
datastores
datetime
dateutil
DDL
declaratively
dedup
denormalize
dev
df
dfe
Dockerfile
docsnip
ds
DSL
DSLs
durations
embeddings
enabledTLSProtocols
featureset
featuresets
fintech
Flink
frontend
GCP
GCP's
geocoding
geoid
Github
Grafana
Graviton
Groupby
groupby
GRPC
gserviceaccount
hackathon
hardcoded
html
hudi
iam
InfoSec
Instacart
IOPS
ip
ish
ith
jdbc
JSON
json
JSX
JVM
kafka
Kaggle
Kubernetes
kwarg
kwargs
LastK
latencies
LHS
lifecycle
lookup
lookups
metaflags
MockClient
multicolumn
mysql
nan
natively
Nones
noqa
np
nullable
OAuth
OOM
OpenSSL's
PagerDuty
params
parseable
pid
PII
PLAINTEXT
PoolableConnectionFactory
postgres
pre
precompute
precomputed
PrivateLink
protobuf
protobufs
Pulumi
Pydantic
PyO
quickstart
realtime
Realtimeliness
regex
regexes
repo
RHS
RocksDB
ROI
RPCs
runtime
SASL
scalability
scalable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add extractor, depends_on, version, inputs, ( all our sources maybe ? kinesis, bigquery etc )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of those are already part of the dictionary -- I'm not quite sure why scalable and scalability aren't. It's just this text file (which doesn't even need to be sorted) so trivial to add things later.

schemas
SDK
SearchRequest
SHA
Signifier
signup
SLA
snowflakecomputing
SSL
stateful
Stddev
str
strftime
struct
TestCase
TestDataset
tiering
TLS
TLSv
Tokio
Tokio's
UI
uid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep it sorted across cases ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

uint
uints
uncomment
unittest
uptime
uptimes
UserCreator
UserCreditScore
UserFeature
UserFeatures
userid
UserInfo
UserInfoDataset
UserLocation
UserPost
UserTransactionsAbroad
VPC
webhook
webhooks
WIP
WIP
YAML
3 changes: 2 additions & 1 deletion docs/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ examples/**.json
.vscode
venv/
**/__pycache__/*
.idea/
.idea/
wordlist.dic
28 changes: 14 additions & 14 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,32 +7,32 @@ To get up and running contributing to the documentation, take the following step
2. Rename `.env.example` to `.env` and fill out the values.
- `GITHUB_TOKEN` should be a valid Github PAT with access to read the `fennel-ai/turbo` repo
- `GITHUB_REPO` is the location of the Dockerfile that builds the frontend, and should be set to `fennel-ai/turbo`
3. Run in your terminal `make up` from the root
- This will pull in the Docs UI repo from Github, and run it on `localhost:3001/docs`
3. Run in your terminal `make up` from the root
- This will pull in the Docs UI repo from Github, and run it on `localhost:3001/docs`
4. Edit the markdown and python files in this repo, and get hot-reloading showing the latest changes on `localhost`
5. Commit changes once you're ready.
- Upon commit, the python files will run through the test suite and block any broken examples from going live in the documentation.
- Upon commit, the python files will run through the test suite and block any broken examples from going live in the documentation.

> When new updates are made to the UI, you may need to run `make build` before `make up` in order to force Docker to refetch the latest changes and rebuild the image.
> When new updates are made to the UI, you may need to run `make build` before `make up` in order to force Docker to fetch the latest changes and rebuild the image.

## `./examples`
The example directory holds Python test files. Anywhere in these files you can wrap any number of lines between `# docsnip` comments
The example directory holds Python test files. Anywhere in these files you can wrap any number of lines between `# docsnip` comments
**e.g.** `example.py`:
```python
from fennel import *
from fennel import *

# docsnip my_snippet
@dataset
class UserInfoDataset:
name: str
email: str
id: str
age: int
name: str
email: str
id: str
age: int
# /docsnip

def my_pipeline():
# todo
return False
def my_pipeline():
# todo
return False
```

Now, in any of our markdown files you can write:
Expand All @@ -55,4 +55,4 @@ The `index.yml` file is used to set global configuration options for the docs. C

Any pages that are _not_ in the file are still generated in dev and production (if they are not a `draft`) and can be navigated/linked to, but won't appear in the sidebar.

The `version` field gives us a way to easily pick out the version tag for this branch of the documentation from the UI side.
The `version` field gives us a way to easily pick out the version tag for this branch of the documentation from the UI side.
2 changes: 1 addition & 1 deletion docs/examples/datasets/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ class User:

# invalid - no explicitly marked `timestamp` field
# and multiple fields of type `datetime` so timestamp
# field is amgiguous
# field is ambiguous
def test_ambiguous_timestamp_field():
with pytest.raises(Exception) as e:
# docsnip invalid_user_dataset_ambiguous_timestamp_field
Expand Down
4 changes: 2 additions & 2 deletions docs/pages/api-reference/aggregations.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ status: 'published'

# Aggregations

Aggregations are provided to the \`aggregate\` operator and specify how the agggregation should happen. All aggregations take two common arguments:
Aggregations are provided to the \`aggregate\` operator and specify how the aggregation should happen. All aggregations take two common arguments:

1. `window`: Window - argument that specifies the length of the duration across which Fennel needs to perform the aggregation. See how [duration](/api-reference/duration) is specified in Fennel.
2. `into_field`: str - the name of the field in the output dataset that corresponds to this aggregation. This 
Expand All @@ -18,7 +18,7 @@ Besides these common arguments, here is the rest of the API reference for all th
Count computes a rolling count for each group key across a window. It returns 0 by default. Its output type is always `int`. 
The count aggregate also takes an optional argument `unique` which is a boolean. If set to true, counts the number of unique values in the given window. 
The field over which the count is computed is specified by the `of` parameter of type `str`. 
Count also takes `approx` as an argument that when set to true, makes the count an approximate, but allows Fennel to be more efficient with state storage.
Count also takes `approx` as an argument that when set to true, makes the count an approximate, but allows Fennel to be more efficient with state storage.
Currently, Fennel only supports approximate unique counts, hence if `unique` is set to true, `approx` must also be set to true. 

### 2. Sum  
Expand Down
12 changes: 6 additions & 6 deletions docs/pages/api-reference/client.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Client
order: 0
status: wip
status: WIP
---

# Client
Expand All @@ -14,7 +14,7 @@ Given some input and output features, extracts the current value of all the outp

**Arguments:**

* `output_feature_list: List[Union[Feature, Featureset]]`: list of features (written as fully qualified name of a feature along with the featureset) that should be extracted. Can also take featurset objects as input, in which case all features in the featureset are extracted.
* `output_feature_list: List[Union[Feature, Featureset]]`: list of features (written as fully qualified name of a feature along with the featureset) that should be extracted. Can also take featureset objects as input, in which case all features in the featureset are extracted.
* `input_feature_list: List[Union[Feature, Featureset]]` : list of features/featuresets for which values are known
* `input_df: Dataframe`: a pandas dataframe object that contains the values of all features in the input feature list. Each row of the dataframe can be thought of as one entity for which features are desired.
* `log: bool` - boolean which indicates if the extracted features should also be logged (for log-and-wait approach to training data generation). Default is False
Expand Down Expand Up @@ -73,11 +73,11 @@ This method throws an error if the schema of the dataframe (i.e. column names an

### **extract_historical_features**

For offline training of models, users often need to extract features for a large number of entities.
For offline training of models, users often need to extract features for a large number of entities.
This method allows users to extract features for a large number of entities in a single call while ensuring
point-in-time correctness of the extracted features.

This api is an asynchronous api that returns a request id and the path to the output folder in S3 containing the extracted features. 
This api is an asynchronous api that returns a request id and the path to the output folder in S3 containing the extracted features. 
 

**Arguments:**
Expand Down Expand Up @@ -112,7 +112,7 @@ A completion rate of 1.0 and a failure rate of 0.0 indicates that all processing

### **extract_historical_features_progress**

This method allows users to monitor the progress of the extract_historical_features asynchronous operation.
This method allows users to monitor the progress of the extract_historical_features asynchronous operation.
It accepts the request ID that was returned by the `extract_historical_features` method and returns the current status of that operation.

The response format of this function and the `extract_historical_features` function are identical. 
Expand All @@ -129,7 +129,7 @@ The response format of this function and the `extract_historical_features` funct
* request_id
* output s3 bucket
* output s3 path prefix
* completion rate.
* completion rate.
* failure rate.

A completion rate of 1.0 indicates that all processing has been completed.
Expand Down
Loading
Loading