Skip to content

Commit

Permalink
Update readthedocs to reference endpoints
Browse files Browse the repository at this point in the history
  • Loading branch information
erinys committed May 18, 2024
1 parent 3d1759f commit 9693c44
Show file tree
Hide file tree
Showing 11 changed files with 81 additions and 73 deletions.
28 changes: 15 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,15 +103,17 @@ Our examples are based on a small Titanic dataset you can see and explore [here]

A XetHub URL for pyxet is in the form:
```
xet://<repo_owner>/<repo_name>/<branch>/<path_to_file>
xet://<endpoint>:<repo_owner>/<repo_name>/<branch>/<path_to_file>
```

Use our public `xethub.com` endpoint unless you're on a custom enterprise deployment.

Reading files from pyxet is easy: `pyxet.open` on a Xet path will return a
python file-like object which you can directly read from.

```python
import pyxet
print(pyxet.open('xet://XetHub/titanic/main/README.md').readlines())
print(pyxet.open('xet://xethub.com:XetHub/titanic/main/README.md').readlines())
```


Expand All @@ -125,7 +127,7 @@ dataframe:
import pyxet # make xet:// protocol available
import pandas as pd # assumes pip install pandas has been run

df = pd.read_csv('xet://XetHub/titanic/main/titanic.csv')
df = pd.read_csv('xet://xethub.com:XetHub/titanic/main/titanic.csv')
df
```

Expand Down Expand Up @@ -198,11 +200,11 @@ To write files with pyxet, you need to first make a repository you have access t
An easy thing you can do is to simply fork the titanic repo. You can do so with

```bash
xet repo fork xet://XetHub/titanic
xet repo fork xet://xethub.com:XetHub/titanic
```
(see the Xet CLI documentation below)

This will create a private version of the titanic repository under `xet://<username>/titanic`.
This will create a private version of the titanic repository under `xet://xethub.com:<username>/titanic`.

Unlike typical blob stores, XetHub writes are *transactional*. This means the
entire write succeeds, or the entire write fails
Expand All @@ -229,35 +231,35 @@ The Xet Command line is the easiest way to interact with a Xet repository.
## Listing and time travel
You can browse the repository with:
```bash
xet ls xet://<username>/titanic/main
xet ls xet://xethub.com:<username>/titanic/main
```

You can even browse it at any point in history (say 5 minutes ago) with:
```bash
xet ls xet://<username>/titanic/main@{5.minutes.ago}
xet ls xet://xethub.com:<username>/titanic/main@{5.minutes.ago}
```

## Downloading
This syntax works everywhere, you can download files with `xet cp`
```bash
# syntax is similar to AWS CLI
xet cp xet://<username>/titanic/main/<path> <local_path>
xet cp xet://<username>/titanic/main@{5.minutes.ago}/<path> <local_path>
xet cp xet://xethub.com:<username>/titanic/main/<path> <local_path>
xet cp xet://xethub.com:<username>/titanic/main@{5.minutes.ago}/<path> <local_path>
```

And you can also use `xet cp` to upload files:

## Uploading
```bash
xet cp <file/directory> xet://<username>/titanic/main/<path>
xet cp <file/directory> xet://xethub.com:<username>/titanic/main/<path>
```
Of course, you cannot rewrite history, so uploading to `main@{5.minutes.ago}`
is prohibited.

## Branches
You can easily create branches for collaboration:
```bash
xet branch make xet://<username>/titanic main another_branch
xet branch make xet://xethub.com:<username>/titanic main another_branch
```
This is fast regardless of the size of the repo.

Expand All @@ -267,9 +269,9 @@ copy of a file which you accidentally overwrote:

```bash
# copying across branch
xet cp xet://<username>/titanic/branch/<file> xet://<username>/titanic/main/<file>
xet cp xet://xethub.com:<username>/titanic/branch/<file> xet://xethub.com:<username>/titanic/main/<file>
# copying from history to current
xet cp xet://<username>/titanic/main@{5.minutes.ago}/<file> xet://<username>/titanic/main/<file>
xet cp xet://xethub.com:<username>/titanic/main@{5.minutes.ago}/<file> xet://xethub.com:<username>/titanic/main/<file>
```

## S3, GCP, etc
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ Read a CSV file:
import pyxet # make xet:// protocol available
import pandas as pd # assumes pip install pandas has been run
df = pd.read_csv('xet://XetHub/titanic/main/titanic.csv')
df = pd.read_csv('xet://xethub.com:XetHub/titanic/main/titanic.csv')
Checkout the rest of the documentation for detailed usage examples!

Expand Down
62 changes: 31 additions & 31 deletions docs/markdowns/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,23 +43,23 @@ list files and folders
```bash

# list files and directories
$ xet ls xet://user/repo/main/path/to/file/or/dir
$ xet ls xet://xethub.com:user/repo/main/path/to/file/or/dir
# list repos
$ xet ls xet://user/
$ xet ls xet://xethub.com:user/

# list organisation users
$ xet ls
# list all available repos for current user + organisation
$ xet repo ls
# list all available branches for a project
$ xet branch ls xet://user/repo
$ xet branch ls xet://xethub.com:user/repo

# examples
$ xet ls xet://xdssio/gitease/giteas
$ xet ls xet://xethub.com:xdssio/gitease/gitease
xdssio/gitease/gitease/__init__.py 2 file
xdssio/gitease/gitease/cli.py 6146 file

$ xet ls xet://xdssio
$ xet ls xet://xethub.com:xdssio
name type
---------------------------- ------
xdssio/datasets repo
Expand Down Expand Up @@ -98,10 +98,10 @@ $ xet ls s3://<bucket>

```bash
# Copy files or directories
$ xet cp xet://user/repo/branch/path/to/source xet://user/repo/branch/path/to/target
$ xet cp xet://xethub.com:user/repo/branch/path/to/source xet://xethub.com:user/repo/branch/path/to/target

# examples
$ xet cp xet://xdssio/titanic/experiment-1/titanic.csv xet://xdssio/titanic/experiment-2/titanic.csv
$ xet cp xet://xethub.com:xdssio/titanic/experiment-1/titanic.csv xet://xethub.com:xdssio/titanic/experiment-2/titanic.csv
Copying xdssio/titanic/experiment-1/titanic.csv to xdssio/titanic/experiment-2/titanic.csv...
```

Expand Down Expand Up @@ -135,10 +135,10 @@ $ xet cp s3://... xet://...

```bash

$ xet mv xet://user/repo/branch/path/to/source xet://user/repo/branch/path/to/target
$ xet mv xet://xethub.com:user/repo/branch/path/to/source xet://xethub.com:user/repo/branch/path/to/target

# examples
$ xet mv xet://xdssio/titanic/experiment-1/titanic.csv xet://xdssio/titanic/experiment-1/titanic2.csv
$ xet mv xet://xethub.com:xdssio/titanic/experiment-1/titanic.csv xet://xethub.com:xdssio/titanic/experiment-1/titanic2.csv
```

## rm (delete)
Expand All @@ -159,10 +159,10 @@ $ xet mv xet://xdssio/titanic/experiment-1/titanic.csv xet://xdssio/titanic/expe
### Usage

```bash
xet rm xet://user/repo/branch/path/to/file/or/dir
xet rm xet://xethub.com:user/repo/branch/path/to/file/or/dir

# examples
$ xet rm xet://xdssio/titanic/experiment-2/titanic2.csv
$ xet rm xet://xethub.com:xdssio/titanic/experiment-2/titanic2.csv
Synchronizing with remote
```

Expand All @@ -185,10 +185,10 @@ Prints a file to stdout
### Usage

```bash
xet cat xet://user/repo/branch/path/to/file
xet cat xet://xethub.com:user/repo/branch/path/to/file

# examples
xet cat xet://xdssio/titanic/main/titanic.csv --limit=200
xet cat xet://xethub.com:xdssio/titanic/main/titanic.csv --limit=200
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thaye%
Expand All @@ -201,10 +201,10 @@ Provide information about a project branch or file.
### Usage
```bash
$ xet info xet://user/repo/branch/path/to/file
$ xet info xet://xethub.com:user/repo/branch/path/to/file
# examples
$ xet info xet://xdssio/titanic/main/titanic.csv
$ xet info xet://xethub.com:xdssio/titanic/main/titanic.csv
------------------------------- ----- ----
xdssio/titanic/main/titanic.csv 61194 file
------------------------------- ----- ----
Expand All @@ -220,15 +220,15 @@ This is great for data exploration and analysis.
```bash
╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * source TEXT Repository and branch of the form xet://user/repo/branch [default: None] [required]
│ * source TEXT Repository and branch of the form xet://xethub.com:user/repo/branch [default: None] [required] │
│ * path TEXT Path to mount to. (or a drive letter on windows) [default: None] [required] │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
```
### Usage
```bash
$ xet mount xet://user/repo/branch /path/to/local/dir
$ xet mount xet://xethub.com:user/repo/branch /path/to/local/dir
## examples
$ xet mount XetHub/Laion400M/main laion
Expand Down Expand Up @@ -258,21 +258,21 @@ The *branch* sub commands let you manage your project branches.
# Create a new branch
$ xet branch make repo source-branch new-branch
# List branches
$ xet branch ls xet://user/repo
$ xet branch ls xet://xethub.com:user/repo
# Delete a branch
$ xet branch delete xet://user/repo/new-branch
$ xet branch delete xet://xethub.com:user/repo/new-branch
# examples
$ xet branch make xet://xdssio/titanic main experiment-3
$ xet branch list xet://xdssio/titanic
$ xet branch make xet://xethub.com:xdssio/titanic main experiment-3
$ xet branch list xet://xethub.com:xdssio/titanic
name type
------------ ------
experiment-2 branch
experiment-1 branch
experiment-3 branch
main branch
$ xet branch delete xet://xdssio/titanic experiment-3 --yes
$ xet branch delete xet://xethub.com:xdssio/titanic experiment-3 --yes
---------------------------------------------------
WARNING
---------------------------------------------------
Expand All @@ -281,7 +281,7 @@ Any data which only exists on a branch will become irreversibly inaccessible
--yes is set. Issuing deletion
$ xet branch info xet://xdssio/titanic main
$ xet branch info xet://xethub.com:xdssio/titanic main
```
## repo
Expand All @@ -307,14 +307,14 @@ at https://xethub.com/<user>/<repo-name>/settings.
# Create a new repository
$ xet repo make repo-name
# List repositories
$ xet repo ls xet://user
$ xet repo ls xet://xethub.com:user
# fork a repository
$ xet repo fork xet://user/repo-name xet://user/new-repo-name --private
$ xet repo fork xet://xethub.com:user/repo-name xet://xethub.com:user/new-repo-name --private
# Rename a repository
$ xet repo rename xet://user/repo-name new-repo-name
$ xet repo rename xet://xethub.com:user/repo-name new-repo-name
examples
xet repo fork xet://xdssio/titanic xet://xdssio/titanic-fork -p
xet repo fork xet://xethub.com:xdssio/titanic xet://xethub.com:xdssio/titanic-fork -p
```
## sync
Expand All @@ -324,7 +324,7 @@ xet repo fork xet://xdssio/titanic xet://xdssio/titanic-fork -p
is provided, then a file whose size is the same will be copied if the modification time for the source is
*later* than the target. Note that this flag makes the sync significantly slower.
* Only non-xet sources (e.g. S3 or local filesystem) are allowed.
* Only XetHub targets are allowed (i.e. `xet://<user>/<repo>/<branch>`).
* Only XetHub targets are allowed (i.e. `xet://xethub.com:<user>/<repo>/<branch>`).
* Modifying source files while a sync is happening has undefined behavior for whether those files copy.
```bash
Expand All @@ -345,10 +345,10 @@ xet repo fork xet://xdssio/titanic xet://xdssio/titanic-fork -p
```bash
# Sync remote S3 bucket to repo
$ xet sync s3://bucket/path/to/source xet://user/repo/branch/path/to/target
$ xet sync s3://bucket/path/to/source xet://xethub.com:user/repo/branch/path/to/target
# Example sync from S3
$ xet sync s3://my-files xet://XetHub/import-test/my-files
$ xet sync s3://my-files xet://xethub.com:XetHub/import-test/my-files
Checking sync
Starting sync
Copying my-files/data.csv to XetHub/import-test/my-files/data.csv...
Expand All @@ -357,7 +357,7 @@ Copying my-files/data2.csv to XetHub/import-test/my-files/data2.csv...
Completed sync. Copied: 20 files, ignored: 277 files
# Example sync from local
$ xet sync . xet://XetHub/import-test/my-local-files
$ xet sync . xet://xethub.com:XetHub/import-test/my-local-files
Checking sync
Starting sync
Copying ./dir/data.csv to XetHub/import-test/my-local-files/data.csv...
Expand Down
10 changes: 6 additions & 4 deletions docs/markdowns/filesystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,11 @@ library. Use it to access local files, remote files, and files in XetHub.

Xet URLs are in the form:
```sh
xet://<repo_owner>/<repo_name>/<branch>/<path_to_file>
xet://<endpoint>:<repo_owner>/<repo_name>/<branch>/<path_to_file>
```

Use our public `xethub.com` endpoint unless you're on a custom enterprise deployment.

The `<path_to_file>` argument is optional if the URL
refers to a repository and the `xet://` prefix is optional when using pyxet.XetFS.

Expand All @@ -35,10 +37,10 @@ Example usage of `pyxet.XetFS`:
fs = pyxet.XetFS()

# List files in the repository.
files = fs.ls('xet://XetHub/Flickr30k/main')
files = fs.ls('xet://xethub.com:XetHub/Flickr30k/main')

# Open a file from the repository.
f = fs.open('xet://XetHub/Flickr30k/main/results.csv')
f = fs.open('xet://xethub.com:XetHub/Flickr30k/main/results.csv')

# Read the contents of the file.
contents = f.read()
Expand Down Expand Up @@ -95,6 +97,6 @@ xet:// URLs must be used as file paths when interacting with these packages. For
import pyxet # make xet protocol available to fsspec
import pandas as pd

df = pd.read_csv('xet://XetHub/Flickr30k/main/results.csv')
df = pd.read_csv('xet://xethub.com:XetHub/Flickr30k/main/results.csv')
```

4 changes: 2 additions & 2 deletions docs/markdowns/model_versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,8 @@ Optionally, use [Git submodules](https://git-scm.com/book/en/v2/Git-Tools-Submod


```bash
xet://org/project (branch-prod/dev/ab-test-210322/ab-test-210323)
├── data/ xet://org/project/data (submodule)
xet://xethub.com:org/project (branch-prod/dev/ab-test-210322/ab-test-210323)
├── data/ xet://xethub.com:org/project/data (submodule)
│ ├── data.csv
├── models/
│ ├── model.pkl
Expand Down
8 changes: 4 additions & 4 deletions docs/markdowns/model_versioning_tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ fs = pyxet.XetFS()
# a transaction is needed for write
with fs.transaction as tr:
tr.set_commit_message("Adding data")
fs.cp("data/titanic.csv", "xet://${XET_USER_NAME}/kickstart_data/main/titanic.csv")
fs.cp("data/titanic.csv", "xet://xethub.com:${XET_USER_NAME}/kickstart_data/main/titanic.csv")
```

We can delete our local data file: `rm -rf data`.
Expand All @@ -94,7 +94,7 @@ Let’s adjust our Jupyter Notebook to load the data from “local” and not sa

```bash
...
# df = pd.read_csv("xet://xdssio/titanic/main/titanic.csv") <-- delete
# df = pd.read_csv("xet://xethub.com:xdssio/titanic/main/titanic.csv") <-- delete
df = pd.read_csv("../data/titanic.csv")
...
# df.to_csv('../data/data.csv', index=False) <-- delete
Expand Down Expand Up @@ -254,7 +254,7 @@ import pandas as pd
username = os.getenv('XET_USER_NAME')
results = []
for branch in ["prod", "experiment1"]:
results.append(pd.read_csv(pyxet.open(f"xet://{username}/kickstart_ml/{branch}/metrics/results.csv")))
results.append(pd.read_csv(pyxet.open(f"xet://xethub.com:{username}/kickstart_ml/{branch}/metrics/results.csv")))
df = pd.concat(results)
df = df[df['target']=='weighted avg']
Expand Down Expand Up @@ -287,7 +287,7 @@ import pyxet
fs = pyxet.XetFS()
with fs.transaction as tr:
tr.set_commit_message("Adding more data")
fs.cp("data/titanic.csv", "xet://${XET_USER_NAME}/kickstart_data/main/titanic2.csv")
fs.cp("data/titanic.csv", "xet://xethub.com:${XET_USER_NAME}/kickstart_data/main/titanic2.csv")
```
We can have any naming convention for our ”training-cycle-jobs” branches.
Expand Down
Loading

0 comments on commit 9693c44

Please sign in to comment.