-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Simplifying dependencies #23115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not sure of the history but do wonder if it wouldn't make sense to combine |
@jreback can you help me understand about the difference between dev dependencies and optional dependencies? Thanks! |
probably not worth the complexity.
I have a slight preference for providing both. But auto-generating is fine as well. Really, I think we should just have a code check that runs convert_deps.py and ensures that there's no change in the dependencies (to ensure we don't add dependencies to the auto-generated file).
I value having a single |
the CI are already way simpler than before tom refactored am -1 on changing this anymore |
Thanks for the feedback. Besides generating the dependencies files automatically, that I trust you experience an may not be a good idea, it may still be worth cleaning a bit. May be separating the yaml files from the scripts, unifying the scripts in ci and scripts, which to me seems somehow arbitrary, making it simpler and more standard for users to set up pandas. I'll try to open a PR, so we don't need to discuss things in an abstract way. |
I've been taking a look at the dependencies, and I generated a spreadsheet with what we have in each file. I see couple of things that I'm not sure are intentional:
Then, for me it's very difficult to tell what we install in every case (and why). After taking a look at all files, I'm more strongly in favor of generating the files:
@TomAugspurger @jreback @jorisvandenbossche thoughts? I attach the file I generated, in case you want to take a look: |
I think we should split the discussion in two parts: dependencies for CI vs dependencies for contributors (to set up their local dev environment). The question you raised above about "What is the reason for splitting into dev and optional? And are they enough to justify the increased complexity?" is as far as I know only about the contributor dependencies (and not CI) ? Your last exploration and your proposal to autogenerate those files is then only about the CI dependencies I think? |
OK, didn't see your second unnamed sheet .. ;) Because that is doing exactly that. You now did this as exploration, but would you consider something cleaned-up like that as basis for autogeneration? |
I agree on your points. I'm discussing (at least) two things here. For the contributors dependencies, I don't know of any case of users who install I'd personally not provide dependencies for pip. But if we do, I'd generate them automatically from This could be name: pandas-dev
channels:
- defaults
- conda-forge
dependencies:
# required
- NumPy
- python-dateutil>=2.5.0
- pytz
# development
- Cython>=0.28.2
- flake8
- flake8-comprehensions
- hypothesis>=3.58.0
- isort
- moto
- pytest>=3.6
- python=3
- setuptools>=24.2.0
- sphinx
- sphinxcontrib-spelling
# optional
- beautifulsoup4>=4.2.1
- blosc
- bottleneck>=1.2.0
- fastparquet
- gcsfs
- html5lib
- ipython>=5.6.0
- ipykernel
- jinja2
- lxml
- matplotlib>=2.0.0
- nbsphinx
- numexpr>=2.6.1
- openpyxl
- pyarrow>=0.4.1
- pymysql
- pytables>=3.4.2 # pip: tables>=3.4.2
- pytest-cov
- pytest-xdist
- s3fs
- scipy>=0.18.1
- seaborn
- sqlalchemy
- statsmodels
- xarray
- xlrd
- xlsxwriter
- xlwt Regarding the dependencies for CI. I think it's impossible to maintain what we have without incurring in errors and inconsistencies. I need some more info to have an informed opinion, and make a proposal, but something like this sounds like a much better option: deps = ['numpy', 'pytz', 'python-dateutil', 'pytest']
if docs:
deps += ['sphinx',
'nbsphinx', # used for the `.. ipython:: python` directive
...]
if slow:
deps += ['pymysql', ...]
if py2:
deps.remove('pytz')
deps += ['pytz=2013b', ...] This would make very clear what do we need in each case, we could have comments to document why a dependency is needed, changes (upgrading a version for example) would just be made in one place, simplifying work and avoiding errors... And we wouldn't need to have the 10 or more files adding clutter to the May be I'm missing something, but it seems a much much better approach to me. |
+1 on changing users to have a single file. The reason we had multiple files is to make it 'faster' to install, but I suspect it doesn't make much difference at the end of the day and avoids problems of having deps installed. +0 on consolidating the CI dep files a little. Its already in the /ci directory so a user shouldn't even be caring about this if they actually read the documentation. -1 on trying to change even more the CI .yaml files. These are quite straightforward atm. These are meant to test very specific things; consolidating these has in the past caused us not to test what we want, IOW to test oldest possible versions of things and/or separate the CI runs into logical groups (e.g. slow tests on older vesrions of python and so on ). |
I fail to see how unifying/automating things a bit in the generation of CI yaml files should cause more confusion on what is being tested than maintaining 14 independent files. But I don't have the experience you have with the CI. So if you and @TomAugspurger are sure about that, let's forget about it. Btw, did you check the points I mentioned in #23115 (comment) ? I guess at least the first one is an error, not sure about the others. Regarding the single
I'd start by 1, and would implement 3 only if users complain. :) |
I have a slight preference for many environment.yaml files, rather than generating them, since it makes the CI stuff a bit more explicit (smaller chance of us accidentally not testing something because of a bug in whatever we would use to generate them). I think seeing a matrix of exactly what deps we test where would be extremely valuable. I started to write a script to do this. IIRC I stopped since I have a slight preference for providing both an |
+ 1 on a ci/env_files/ directory or something +1 on joris's suggestion to split the discussion along CI vs user-facing lines
If this is not obvious to @datapythonista, this is prima facie evidence that things can be better-documented or otherwise clarified. |
CI deps:
We can still check-in the autogenerated yaml files, then you don't have this concern? Contributor deps:
+1
Isn't that what we already do? The pip requirement files are autogenerated. BTW, I am fine with providing a single environment file instead of the dev + optional files for contributors. But, I think it is still useful to distinguish those two "kinds" of dependencies (eg in the docs): the dev dependencies are those dependencies that are needed to build pandas + run the test suite. With just those, the test suite should pass, as we should skip all tests that use optional dependencies (which is not the case for test dependencies). I think this notion can be useful for people who want to do a minimal install of pandas but want to test their installation. |
OK, see that there are already PRs, will check those :) |
In case it's useful for anyone else, I upload a summary of the builds we have. |
I think this was mainly addressed and no further action is expected, closing. |
I guess this has been previously discussed, but I'd personally appreciate understanding better the dependencies, and see if they can be simplified. As I think they often generate confusion (e.g. not installing the optional ones), and some errors (e.g. editing files that are automatically generated, or forgetting to run the script).
For what I know, pandas has 3 dependencies,
numpy
,dateutil
andpytz
. Those live insetup.py
, so when packaging they are required. No question about this part.Then, for the development environment, I think "ideally" we would like to have a
environment.yml
file in the root of the project, so setting up a pandas environment is as easy asconda env create
, and maintaining the list of dependencies is as easy as updating that file.The questions then are:
dev
andoptional
? And are they enough to justify the increased complexity?requirements.txt
file, just provide the script that generates it (i.e.convert_deps.sh
)?ci/requirements/
)CC @jreback, @pandas-dev/pandas-core
The text was updated successfully, but these errors were encountered: