Skip to content

Commit

Permalink
chore: Fix spelling (#229)
Browse files Browse the repository at this point in the history
Signed-off-by: Josh Soref <2119212+jsoref@users.noreply.github.com>
Co-authored-by: Paul-Cornell <paul@unstructured.io>
  • Loading branch information
jsoref and Paul-Cornell authored Sep 12, 2024
1 parent 83e681f commit ff1b226
Show file tree
Hide file tree
Showing 52 changed files with 78 additions and 78 deletions.
2 changes: 1 addition & 1 deletion api-reference/api-services/accessing-unstructured-api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Overview
---

To process an indvidual file, you can choose from several available methods, including a direct `POST` request, Python code, and JavaScript/TypeScript code.
To process an individual file, you can choose from several available methods, including a direct `POST` request, Python code, and JavaScript/TypeScript code.
Whether you're using the Free Unstructured API, the Unstructured Serverless API, the Unstructured API on Azure/AWS, or your local deployment of the Unstructured API, the functionality is the same.

Choose your preferred method:
Expand Down
2 changes: 1 addition & 1 deletion api-reference/api-services/api-parameters.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ The following parameters only apply when a chunking strategy is specified. Other
| `overlap_all` (_bool_) | `overlapAll` (_boolean_) | True to have an overlap also applied to "normal" chunks formed by combining whole elements. Use with caution, as this can introduce noise into otherwise clean semantic units. Default: none. |
| `similarity_threshold` (_float_) | `similarityThreshold` (_number_) | Applies only when the chunking strategy is set to `by_similarity`. The minimum similarity text in consecutive elements must have to be included in the same chunk. Must be between 0.0 and 1.0, exclusive (0.01 to 0.99, inclusive). Default: 0.5. |

The following parameters are specific to the Python and Javascript/TypeScript clients and are not sent to the server. [Learn more](/api-reference/api-services/sdk-python#page-splitting).
The following parameters are specific to the Python and JavaScript/TypeScript clients and are not sent to the server. [Learn more](/api-reference/api-services/sdk-python#page-splitting).

| POST, Python | JavaScript/TypeScript | Description |
|---------------------------------------|---------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
Expand Down
2 changes: 1 addition & 1 deletion api-reference/api-services/aws.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ You will establish the foundational network structure for deploying the Unstruct

* Select an **Availability Zone**.

* Specify the **IPv4 CIDR block** (for exampple, `10.0.0.0/16`).
* Specify the **IPv4 CIDR block** (for example, `10.0.0.0/16`).

* Specify the **IPv4 subnet CIDR block** (for example, `10.0.1.0/24`).

Expand Down
6 changes: 3 additions & 3 deletions api-reference/api-services/azure.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Follow these steps to deploy the Unstructured API service into your Azure accoun
Go to [https://portal.azure.com](https://portal.azure.com/).
</Step>
<Step title="Access the Azure Marketplace">
Go to the [Unstructured Data Prepocessing - Customer Hosted API](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/unstructured1691024866136.customer_api_v1?tab=Overview/) offering in the Azure Marketplace.
Go to the [Unstructured Data Preprocessing - Customer Hosted API](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/unstructured1691024866136.customer_api_v1?tab=Overview/) offering in the Azure Marketplace.

![Azure Marketplace](/img/api/Azure_Step2.png)
</Step>
Expand Down Expand Up @@ -98,8 +98,8 @@ Follow these steps to deploy the Unstructured API service into your Azure accoun
```

8. Now run the container again, setting the environment variables at the same time: Run the following command: `sudo docker image ls`.
9. Note the `RESPOSITORY` and `TAG` value for the Docker image.
10. Run the following command, replacing `<REPOSITORY>` and `<TAG>` with the `RESPOSITORY` and `TAG` values for the Docker image, and replacing
9. Note the `REPOSITORY` and `TAG` value for the Docker image.
10. Run the following command, replacing `<REPOSITORY>` and `<TAG>` with the `REPOSITORY` and `TAG` values for the Docker image, and replacing
`<VAR1>=<value1>`, `<VAR2>=<value2>` and so on with the environment variable name and value pairs:

```bash
Expand Down
2 changes: 1 addition & 1 deletion api-reference/api-services/chunking.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import SharedChunkingStrategyBasic from '/snippets/concepts/chunking-strategy-ba

<SharedChunkingStrategyBasic/>

import SharedChunkingStrategyByTitle from '/snippets/concepts/chunking-stategy-by-title.mdx';
import SharedChunkingStrategyByTitle from '/snippets/concepts/chunking-strategy-by-title.mdx';

<SharedChunkingStrategyByTitle/>

Expand Down
4 changes: 2 additions & 2 deletions api-reference/api-services/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@ Unstructured Serverless API services provide the following benefits beyond the [

## Supported file types

import SupportedFileTpes from '/snippets/general-shared-text/supported-file-types.mdx';
import SupportedFileTypes from '/snippets/general-shared-text/supported-file-types.mdx';

<SupportedFileTpes />
<SupportedFileTypes />

## Data ingestion

Expand Down
4 changes: 2 additions & 2 deletions api-reference/api-services/saas-api-development-guide.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ To call the Unstructured Serverless API, you need an API key and API URL:
5. To get your API key, click the copy icon in the **Actions** column for your API key. Store your copied API key in a secure location. Do not share it with others.
6. To get your API URL, click the copy icon next to the URL next to **API URL**. Store your copied API URL in a secure location. Do not share it with others.

import SeverlessKeyNoFreeURL from '/snippets/general-shared-text/serverless-api-key-no-free-access.mdx';
import ServerlessKeyNoFreeURL from '/snippets/general-shared-text/serverless-api-key-no-free-access.mdx';

<SeverlessKeyNoFreeURL />
<ServerlessKeyNoFreeURL />

[Try the quickstart](#quickstart).

Expand Down
4 changes: 2 additions & 2 deletions api-reference/api-services/supported-file-types.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
title: Supported file types
---

import SupportedFileTpes from '/snippets/general-shared-text/supported-file-types.mdx';
import SupportedFileTypes from '/snippets/general-shared-text/supported-file-types.mdx';

<SupportedFileTpes />
<SupportedFileTypes />
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Speed up processing of large files and batches
---

When you use Unstructued API services, here are some techniques that you can try to help speed up the processing of large files and large batches of files.
When you use Unstructured API services, here are some techniques that you can try to help speed up the processing of large files and large batches of files.

- Choose your partitioning strategy wisely. For example, if you have simple PDFs that don't have images and tables, you might be able to use the `fast` strategy. Try the `fast` strategy on a few of your documents before you try using the `hi_res` strategy. [Learn more](/api-reference/api-services/partitioning).
- For processing large numbers of documents, use [ingestion](/ingestion/overview) and [add CPUs](#adding-cpus).
Expand Down
2 changes: 1 addition & 1 deletion api-reference/how-to/choose-partitioning-strategy.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ See [Changing partition strategy for a PDF](/api-reference/api-services/examples

Setting `--strategy` or `strategy` to `auto` leaves the decision up to Unstructured on a file-by-file basis about which partitioning strategy to use. Specifically:

- If the file is an image, the `hi_res` stategy is used for that file. The `layout_v1.0.0` high-resolution object detection model is used.
- If the file is an image, the `hi_res` strategy is used for that file. The `layout_v1.0.0` high-resolution object detection model is used.
- If the file is a PDF, the local processing logic or Unstructured tries to detect whether there are any embedded tables or images in that file.

- If no embedded tables or images are detected, the `fast` strategy is used for that file. No high-resolution object detection model is used.
Expand Down
2 changes: 1 addition & 1 deletion api-reference/how-to/embedding.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
- `mixedbread-ai` for [Mixedbread](https://www.mixedbread.ai/). [Learn more](https://www.mixedbread.ai/docs/embeddings/overview).
- `octoai` for [Octo AI](https://octo.ai/). [Learn more](https://octo.ai/docs/text-gen-solution/using-unstructured-io-for-embedding-documents).

2. Run the following command to install the required Python pacakge for the embedding provider:
2. Run the following command to install the required Python package for the embedding provider:

- For `langchain-aws-bedrock`, run `pip install "unstructured-ingest[bedrock]"`.
- For `langchain-huggingface`, run `pip install "unstructured-ingest[embed-huggingface]"`.
Expand Down
2 changes: 1 addition & 1 deletion api-reference/ingest/ingest-cli.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-d

<AdditionalIngestDependencies />

For additional installation options, see [Unstructed Ingest CLI](/ingestion/overview#unstructured-ingest-cli) in the [Ingest](/ingestion/overview) section.
For additional installation options, see [Unstructured Ingest CLI](/ingestion/overview#unstructured-ingest-cli) in the [Ingest](/ingestion/overview) section.

<Info>To migrate from older, deprecated versions of the Ingest CLI that used `pip install unstructured`, see the [migration guide](/ingestion/overview#migration-guide).</Info>

Expand Down
6 changes: 3 additions & 3 deletions api-reference/ingest/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ The following 3-minute video shows how to use the Unstructured Ingest Python lib
## Ingest flow

The Unstructured ingest flow is similar to an extract, transform and load (ETL) data pipeline.
Because of this, a customer-defined implementation of the Unstructured ingest flow is sometimes referred to an an _ingest pipeline_ or simply a _pipline_.
Because of this, a customer-defined implementation of the Unstructured ingest flow is sometimes referred to as an _ingest pipeline_ or simply a _pipeline_.
An Unstructured ingest pipeline contains the following logical steps:

<Steps>
Expand All @@ -35,7 +35,7 @@ An Unstructured ingest pipeline contains the following logical steps:
For example, this could include information such as the path to the files to be analyzed.

- For the Unstructured CLI, you can control this behavior, where available for a connector, through its `--input-path` command option.
- For the Unstructured Ingest Python library's v2 calling pattern, you can control this behavior, where available for a connector, through its `<Prefix>IndexerConfig` class (where `<Prefex>` represents the connector provider's name, such as `Azure` for Azure.)
- For the Unstructured Ingest Python library's v2 calling pattern, you can control this behavior, where available for a connector, through its `<Prefix>IndexerConfig` class (where `<Prefix>` represents the connector provider's name, such as `Azure` for Azure.)
</Step>
<Step title="Post-Index Filter">
After indexing, you might not want to download everything that was indexed.
Expand All @@ -47,7 +47,7 @@ An Unstructured ingest pipeline contains the following logical steps:
<Step title="Download">
Using the information generated from the indexer and the filter, downloads the content as files on the local file system for processing. This may require manipulation of the data to prepare it for partitioning.

For example, this could incude information such as the path to a local directory to download files to.
For example, this could include information such as the path to a local directory to download files to.

- For the Unstructured CLI, you can control this behavior through a connector's `--download-dir` command option.
- For the Unstructured Ingest Python library's v2 calling pattern, you can control this behavior through a connector's `<Prefix>DownloaderConfig` class.
Expand Down
2 changes: 1 addition & 1 deletion api-reference/ingest/python-ingest.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-d

<AdditionalIngestDependencies />

For additional installation options, and information about v2 and v1 implementations in this library, see the [Unstructed Ingest Python library](/ingestion/overview#unstructured-ingest-python-library) in the [Ingest](/ingestion/overview) section.
For additional installation options, and information about v2 and v1 implementations in this library, see the [Unstructured Ingest Python library](/ingestion/overview#unstructured-ingest-python-library) in the [Ingest](/ingestion/overview) section.

<Info>To migrate from older, deprecated versions of the Ingest Python library that used `pip install unstructured`, see the [migration guide](/ingestion/overview#migration-guide).</Info>

Expand Down
2 changes: 1 addition & 1 deletion api-reference/troubleshooting/api-key-url.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ SDKError: API error occurred: Status 401

1. The Unstructured API key, Unstructured API URL, or both are missing or malformed in your script or code.
2. The API key, API URL, or both are not present in your current session.
3. The API key is no longer valid, or the the API key and API URL combination is not valid.
3. The API key is no longer valid, or the API key and API URL combination is not valid.

## Suggested solutions

Expand Down
2 changes: 1 addition & 1 deletion examplecode/codesamples/api/huggingchat.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ along with some queries about this content.

To run this example, you'll need:

- The [hugchat](https://pypi.org/project/hugchat/) pacakge for Python, or the [huggingface-chat](https://www.npmjs.com/package/huggingface-chat) pacakge for JavaScript/TypeScript.
- The [hugchat](https://pypi.org/project/hugchat/) package for Python, or the [huggingface-chat](https://www.npmjs.com/package/huggingface-chat) package for JavaScript/TypeScript.
- Your Unstructured API key and API URL. [Get an API key and API URL](/api-reference/api-services/saas-api-development-guide#get-started).
- Your Hugging Face account's email address and account password. [Get an account](https://huggingface.co/join).
- A PDF file for Unstructured to process. This example uses a sample PDF file containing the text of the United States Constitution,
Expand Down
4 changes: 2 additions & 2 deletions examplecode/notebooks.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,9 @@ description: "Notebooks contain complete working sample code for end to end solu
<br/>
```Unstructured``` ```🤗 Hugging Face``` ```LangChain``` ```Llama 3```
</Card>
<Card title="Building RAG With Powerpoint presentations" href="https://colab.research.google.com/drive/1NmLSmUMb9ozlELnWa3J4WwdrBfGomwPk?usp=sharing">
<Card title="Building RAG With PowerPoint presentations" href="https://colab.research.google.com/drive/1NmLSmUMb9ozlELnWa3J4WwdrBfGomwPk?usp=sharing">
<br/>
A RAG solution that is based on Powerpoint files.
A RAG solution that is based on PowerPoint files.
<br/>
```Unstructured``` ```🤗 Hugging Face``` ```LangChain``` ```Llama 3```
</Card>
Expand Down
2 changes: 1 addition & 1 deletion faq/faq.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Yes, you can still use your old API keys. We will migrate all the user keys to t

When you log in to the Serverless API dashboard, you can access your API keys by clicking the `API Keys` link in the side navigation.
Under the `Actions` column, click the `Copy` icon to copy the key or an example code snippet to process the documents
using the Unstructured REST API, or the [Unstructured Ingest CLI](/ingestion/overview#unstructured-ingest-cli), or the [Unstructured Python SDK](https://github.com/Unstructured-IO/unstructured-python-client) or [Unstructured JavaScript/Typescript SDK](https://github.com/Unstructured-IO/unstructured-js-client).
using the Unstructured REST API, or the [Unstructured Ingest CLI](/ingestion/overview#unstructured-ingest-cli), or the [Unstructured Python SDK](https://github.com/Unstructured-IO/unstructured-python-client) or [Unstructured JavaScript/TypeScript SDK](https://github.com/Unstructured-IO/unstructured-js-client).

### What is the new Unstructured API pricing structure?

Expand Down
2 changes: 1 addition & 1 deletion ingestion/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ flowchart LR
- For the Unstructured Ingest CLI or the Unstructured Ingest Python library, to use this flow:

- When using the Unstructured Ingest CLI, omit the `--partition-by-api`, `--api-key`, and `--partition-endpoint` options.
- When using the Unstructured Ingest Python library, omit `partition_by_api` or explicitly set `parition_by_api=False`. Also omit `api_key` and `partition_endpoint`.
- When using the Unstructured Ingest Python library, omit `partition_by_api` or explicitly set `partition_by_api=False`. Also omit `api_key` and `partition_endpoint`.

## Unstructured Ingest CLI

Expand Down
2 changes: 1 addition & 1 deletion open-source/concepts/models.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ elements = partition(filename=filename,

* To use any model with the partition, set the `strategy` to `hi_res` as shown above.

* To maintain the consistency between the `unstructured` and `unstructured-api` libraries, we are deprecating the `model_name` parameter. Please use `hi_res_model_name` parameter when specifing a model.
* To maintain the consistency between the `unstructured` and `unstructured-api` libraries, we are deprecating the `model_name` parameter. Please use `hi_res_model_name` parameter when specifying a model.
</Note>

The `hi_res_model_name` parameter supports the `yolox` and `detectron2_onnx` arguments.
Expand Down
2 changes: 1 addition & 1 deletion open-source/core-functionality/chunking.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ import SharedChunkingStrategyBasic from '/snippets/concepts/chunking-strategy-ba

<SharedChunkingStrategyBasic/>

import SharedChunkingStrategyByTitle from '/snippets/concepts/chunking-stategy-by-title.mdx';
import SharedChunkingStrategyByTitle from '/snippets/concepts/chunking-strategy-by-title.mdx';

<SharedChunkingStrategyByTitle/>

Expand Down
4 changes: 2 additions & 2 deletions open-source/core-functionality/cleaning.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -151,9 +151,9 @@ Examples:
```python
from unstructured.cleaners.core import clean_non_ascii_chars

text = "\x88This text contains®non-ascii characters!●"
text = "\x88This text contains ®non-ascii characters!●"

# Returns "This text containsnon-ascii characters!"
# Returns "This text contains non-ascii characters!"
clean_non_ascii_chars(text)

```
Expand Down
Loading

0 comments on commit ff1b226

Please sign in to comment.