-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functions that import sample GMT remote files #1436
Comments
Functions to load the file are nice, though I am not sure that we should export them or include in the API docs since there's no stability guarantee for those files (see https://docs.generic-mapping-tools.org/dev/datasets/remote-data.html#cache-file-updates). |
xref GenericMappingTools/gmtserver-admin#100, which has some discussions about alternate options for data files used in examples. |
It looks like the consensus is that cache files should be used only in examples. Should the PyGMT examples just call the remote files directly or should there be a Python function to load the remote files first before they're used in the examples? |
This is a good question. I am not sure the best solution here, but my current thinking is that we should either have a separate namespace for the functions related to GMT’s cache data relative to the earth relief, age, etc grids or a single public function like |
I think functions that take a I'm not familiar with what it means to move the functions to a separate namespace. Does this mean there would be a separate class that would have sub-functions to call different datasets? |
100% agree
Not a new class, just a different module structure. So there would still be |
As far as I can tell with the seaborn model, it would look something like this:
Then the function would look something like:
Does that look about right? |
That looks right. It would also be necessary to decide whether to deprecate |
👍 for me. The def load_dataset(name, **kwargs):
path = f"https://raw.githubusercontent.com/mwaskom/seaborn-data/master/{name}.csv"
df = pd.read_csv(path, **kwargs)
if name == "flights":
months = df["month"].str[:3]
df["month"] = pd.Categorical(months, months.unique())
if name == "titanic":
df["class"] = pd.Categorical(df["class"], ["First", "Second", "Third"])
df["deck"] = pd.Categorical(df["deck"], list("ABCDEFG"))
return df So we could probably get away with one big function with lots of if-statements, rather than have too many dataset_dict = {
"japan_quakes": dict(url="@tut_quakes.ngdc", header=1, sep=r"\s+"),
"sample_bathymetry": dict(
url="@tut_ship.xyz",
sep="\t",
header=None,
names=["longitude", "latitude", "bathymetry"],
),
}
fname = which(fname=dataset_dict[name]["url"], download="c")
dataframe = pd.read_csv(fname, **dataset_dict[name]) There's also |
I like the dictionary option, basically a switch...case for python.
I agree with just working on the table datasets for now. |
Although the dictionary structure can make the code more compact, I feel that the if-statement option is easier to read, especially for new contributors who want to add examples. BTW, the above code doesn't work, because |
I think the |
Please note that we already have a |
Can we close the issue? |
I've recently opened some pull requests (#1386 and #1420) that import the sample remote files, and I've realized there are a lot of sample files on the GMT server that are used in the GMT examples. Should functions be added that import remote files (e.g. "ternary.txt" and "EGM96_to_36.txt" for
ternary
andsph2grd
) whenever they are used in a GMT module example? I think it's more Pythonic to have the datasets available as a DataFrame/array than just the text file, and I think those sample files are a great resource when writing a test or making an example, but I could also see it as bad practice to be adding all of these functions that import little-used sample files.The text was updated successfully, but these errors were encountered: