Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposed functionality for loading/saving analyses #50

Closed
magland opened this issue Jun 10, 2024 · 8 comments
Closed

proposed functionality for loading/saving analyses #50

magland opened this issue Jun 10, 2024 · 8 comments
Labels
Milestone

Comments

@magland
Copy link
Collaborator

magland commented Jun 10, 2024

@WardBrian @jsoules

Below are various possible conventions for loading/saving SP analyses/projects

Specify individual files in URL query parameters

https://stan-playground.vercel.app?...query...

main.stan - URL to remote main.stan file
data.json - URL to remote data.json file
sampling_opts.json - URL to remote sampling_opts.json file
t - title

FUTURE:
data.py - URL to remote data.py file for generating data.json
data.r - URL to remote data.r file for generating data.json
analyze.py (or similar) - URL to remote analyze.py file for downstream analysis of sampling output - generating figures, tables, etc
analyze.r (or similar) - same, except for r language

POSSIBLE FUTURE:
samples.csv

Specify individual files in URL query parameters (different naming) from #39

https://stan-playground.vercel.app?...query...

stanURL - URL to remote main.stan file
dataURL - URL to remote data.json file
samplingOptsURL - URL to remote sampling_opts.json file
t - title
project (or projectURL) - URL to a project.json file that contains any/all of the above

FUTURE:
dataPyURL
dataRURL
analyzePyURL
analyzeRURL

Load from a GitHub Gist

`https://stan-playground.vercel.app?project=https://gist.github.com/magland/da3d4143276827609ec9317bb3db8b04`

The Gist is a directory of files including main.stan, data.json, sampling_opts.json, etc.

The description of the Gist is the title

Load from a single project file, reference file system

`https://stan-playground.vercel.app?project=[projectURL]`

Points to a remote json file that is a reference file system that includes main.stan, etc.

More information about reference file system

https://fsspec.github.io/kerchunk/spec.html

Here's an example rfs

{
    "version": 1,
    "meta": {
        "title": "..."
    },
    "refs": {
      "main.stan": "<the STAN project text>",
      "data.json": "...",
      "sampling_opts.json": "..."
    }
}

Importantly you can also do this (this is why it's called a "reference file system")

{
    "version": 1,
    "meta": {
        "title": "..."
    },
    "refs": {
      "main.stan": [stanURL],
      "data.json": [dataURL],
      "sampling_opts.json": "..."
    }
}

So you can mix and max inline/external references

Save to new Gist

Save the current analysis/project as a new Gist with the above convention

Update an existing Gist

Save an analysis/project as an update to a Gist that has been loaded

Save to new remote project file

Save to a remote project file in the cloud (e.g., tempory.net - a service I am hosting for temporary storage of files)

This project file could be a reference file system, or other simpler structure.

If reference file system, user could have the option of selecting which files stay as references (for example, to a GitHub repo)

Save individual files

In the case where the URL specifies individual independent files, allow the user to save one or more of these files (e.g., main.stan) to new locations. This offers a way of saving to the cloud while still retaining the query pointer to individual files.

Download to local machine

This could be a .zip file or a .json file with reference file system.

Load from uploaded file from local machine

This could be a .zip file or a .json file with reference file system.

Other considerations

In the future we may want to support multiple stan, data, or analysis files in a single analysis/project. We'd need to figure out the naming conventions for this.

The above does not address storing/loading from local browser storage, or how to persist local edits.

@magland
Copy link
Collaborator Author

magland commented Jun 12, 2024

From our meeting today, we decided on the following for URL query strings

query parameters: stan, data, sampling_opts, title

Optionally, instead of sampling_opts, we can just specify individual opts, num_chains=..., seed, etc

Upon making local edits, the query string will disappear so that user will not be given false impression that they can share their local edits by copy/pasting the url.

We did not yet decide whether loading from a gist should be allowed like the following. I think it should:

https://stan-playground.vercel.app?project=https://gist.github.com/magland/da3d4143276827609ec9317bb3db8b04

@WardBrian
Copy link
Collaborator

While we didn’t discuss it, I am on board with a “project” query that encapsulates all the others. I am fine with it being a bit magic and directly accepting a gist url, but I think it would be nice if it was additionally able to load a JSON (hosted anywhere) that has a reference-style representation of all the other parameters

@magland
Copy link
Collaborator Author

magland commented Jun 12, 2024

@WardBrian What do you think of the reference file system proposed above. It has a number of advantages... including being able to mix and match inline content with external references.

@WardBrian
Copy link
Collaborator

Yes I think that is a reasonable choice for such a feature. It’s a bit more complicated than just having it directly align with the query parameters, but being able to specify things in-line rather than forcing them to be at some other link also seems like it would be beneficial

in the case of the sampler options, which is also a JSON, would we accept a URL, stringified JSON, and also a direct sub-object? Or just the url or string? It seems complicated to support all of them, but also like a user would reasonably expect each

@magland
Copy link
Collaborator Author

magland commented Jun 12, 2024

in the case of the sampler options, which is also a JSON, would we accept a URL, stringified JSON, and also a direct sub-object? Or just the url or string? It seems complicated to support all of them, but also like a user would reasonably expect each

With this PR (which has been merged), the reference file system can also allow JSON file content to be specified as a "dict". So, one can do the following

{
    ...
    "refs": {
        ...,
        "sampling_opts.json": {
            "num_chains": 8,
            "seed": 1,
            ...
        },
    }
}

which is a lot better than having to include the content as escaped JSON text. It's still not as straightforward as being able to specify the parameters directly, without "sampling_opts.json", but that sort of thing does not fit into the rfs framework.

Another option is that we could ALSO support a simple JSON that is not a rfs, that just gives the query parameters. I would personally vote to only support the rfs.

@WardBrian WardBrian modified the milestone: 1.0 Jun 13, 2024
@WardBrian
Copy link
Collaborator

I believe after #114, this is complete (at least re: v1). Supporting a reference-file-system style project parameter can be a separate issue if we still want to revisit that idea after v1

@magland
Copy link
Collaborator Author

magland commented Jul 3, 2024

We can postpone the rfs until after v1.
But I think saving to browser, as named projects, could be useful, and maybe even desirable for v1.

@WardBrian
Copy link
Collaborator

I don't see an outline for such a feature in this issue, so it would certainly warrant a separate one at this point

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants