Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: discussion of workspace "extending" behavior #48

Closed
jbohren opened this issue May 19, 2014 · 33 comments
Closed

build: discussion of workspace "extending" behavior #48

jbohren opened this issue May 19, 2014 · 33 comments

Comments

@jbohren
Copy link
Contributor

jbohren commented May 19, 2014

Currently, catkin build performs the workspace auto-extension behavior that catkin_make also does by default. I think it would be really helpful to people to be able to create a clean workspace without having to messily unset various environment variables. This could be related to #47

@wjwwood
Copy link
Member

wjwwood commented May 19, 2014

Could explain what you mean by auto-extension?

@jbohren
Copy link
Contributor Author

jbohren commented May 19, 2014

Whenever you call catkin build or catkin_make, it extends your current CMAKE_PREFIX_PATH (and other paths) with the new workspace, ie:

unset CMAKE_PREFIX_PATH
source /opt/ros/hydro/setup.bash
# calling catkin build will extend /opt/ros/hydro
mkdir -p ~/ws1/src
cd ~/ws1
catkin build
source ~/ws1/devel/setup.bash
# calling catkin build somewhere else will extend /opt/ros/hydro and ~/ws1
mkdir -p ~/ws2/src
cd ~/ws2
catkin build
source ~/ws2/devel/setup.bash 
# calling catkin build somewhere else will extend /opt/ros/hydro and ~/ws1 and ~/ws2

etc

@wjwwood
Copy link
Member

wjwwood commented May 19, 2014

This is not something catkin build does, but instead something every catkin package does. This produces the same result:

unset CMAKE_PREFIX_PATH
source /opt/ros/hydro/setup.bash
# CMAKE_PREFIX_PATH will be "/opt/ros/hydro:"
cd ~/some_single_package
mkdir build
cd build
cmake ..
make
source ~/some_single_package/build/devel/setup.bash
# CMAKE_PREFIX_PATH will now be "/home/user/some_single_package/build/devel:/opt/ros/hydro"

@jbohren
Copy link
Contributor Author

jbohren commented May 22, 2014

This is not something catkin build does, but instead something every catkin package does. This produces the same result:

Yeah, I know that it's a feature of the underlying catkin_make and catkin_make_isolated but since catkin build is a new tool, it has an opportunity to tell the underlying tools either to extend or not to extend. It might require adding such options to those tools, but I'm not sure about the extent to which the extension is configurable.

@jbohren
Copy link
Contributor Author

jbohren commented May 22, 2014

@tkruse @jack-oquin You guys brought this up over in #47 and it might be good to focus that discussion here.

@jack-oquin
Copy link

OK, @tkruse said it well there.

Workspace chaining should not depend on a bunch of poorly-understood variables lying around in the users' shell environments. The workflow should declare the chaining explicitly when creating a new workspace.

I don't know what that means for backwards compatibility or how to deal with all the documentation showing people various ways of making the current design work. In the absence of explicit chaining, I suppose one could fall back on implicit chaining via the shell environment, but that preserves one of the main causes for confusion and failure.

@tkruse
Copy link
Contributor

tkruse commented May 22, 2014

Okay, so some other resources from the past on the topic:
https://groups.google.com/d/msg/ros-sig-buildsystem/NMGN9iYxTNI/GCWKSAEETIoJ
https://groups.google.com/d/msg/ros-sig-buildsystem/rOv4dJLdxjw/F7YVSHmUmIEJ

catkin currently uses the CMAKE_PREFIX_PATH when creating a new workspace to determine the workspaces the new one will depend on. An alternative approach would be that when creating a workspace, this variable is not used, but instead explicit user input is required, else the new workspace has no parents.

I will guess this would require changing catkin itself, as I believe it currently does not offer any high-level python API to change that behavior. I believe the code doing this is in catkin/cmake/templates/_setup_util.py.in, so even from the filename you can see that this would be a messy operation.

@wjwwood
Copy link
Member

wjwwood commented May 22, 2014

Yeah, I know that it's a feature of the underlying catkin_make and catkin_make_isolated but since catkin build is a new tool

This behavior has nothing to do with catkin_make or catkin_make_isolated (I was trying to point this out with my example).

Workspace chaining should not depend on a bunch of poorly-understood variables

It depends on one variable, CMAKE_PREFIX_PATH. I'm not sure why we think it is poorly understood, maybe that's just a documentation problem. We should try to improve that reality rather than condemning the whole system because it is, in some people's opinion, poorly understood.

lying around in the users' shell environments.

The intention is that the user's environment drives the behavior of the build (sort of like CFLAGS, PKG_CONFIG_PATH, CMAKE_PREFIX_PATH, PYTHONPATH, etc...). You could argue that we should use something like CATKIN_WORKSPACE_PATH or something like that.

The workflow should declare the chaining explicitly when creating a new workspace.

I agree, being explicit is always better, though commonly people complain about that too, e.g. defining build and run depends separately in the package.xml and in a few places in the CMakeLists.txt is explicit (there is a very good reason to be able to include or exclude dep names in each of those places), and people clearly do not like that.

An alternative approach would be that when creating a workspace, this variable is not used, but instead explicit user input is required, else the new workspace has no parents.

Ok, so lets consider that for a moment. When I build a workspace I could pass the list of workspaces explicitly to catkin build. Immediately I can imagine some questions about the behavior:

  • What happens if I sourced workspace A and B, then build workspace E by specifying C and D?
    • Should catkin build clear the environment of A and B before building?
    • What about if I also manually modified the CFLAGS? Should that be captured, cleared, or ignored?
  • When specifying the workspaces to build against, do I specify the "leaf" workspaces or do I specify every recursive workspace explicitly?
    • Does the order matter?

I think the biggest problem conceptually is that the act of building (invoking cmake and make) relies on the state of the environment when run, whereas we are proposing to change catkin such that it ignores the environment when "creating" a workspace. What this means, as far as I can see, is that the catkin tool now needs to be able to sanitize the environment and then rebuild it based on the explicitly passed workspaces. This might be possible for catkin build to do, but I don't see any way that this could be done by a single package.

@tkruse
Copy link
Contributor

tkruse commented May 23, 2014

No tool for general audiences should ever remove anything from any environment variable (other than environment variables exclusively managed and used by that tool, i.e. 'private' env variables).
A tool like virtualenv may start a session with added values, and remove those later, but that's about the only exception.

Any sh file provided by ROS should only add to the environment. And the delta between the environment before and after sourcing must be fully predictable from just the command line used when creating the workspace and the current state of the workspace. No knowledge of the environment at workspace creation time should ever be required to predict the additions that any sh file will do to environment variables.

When you source two sh files in sequence, then they will perform their modifications in sequence, so the order obviously matters. Given that workspace chaining implies some kind of overlaying, only the last element of the chain should be sourced, and that source action should do what is necessary to provide overlaying respecting the chain order.

The CMAKE_PREFIX_PATH is intended to help resolve the location of packages for building against them, that's the only purpose it should ever be used for. I have always argued for that.

Explicit is better than implicit, but DRY still applies, as does convention over configuration. Convention means using fixed values when no explicit value is given, not reading values from the environment.

Catkin is based on implicit over explicit, makes people repeat themselves, and in some places does not provide useful conventions allowing to avoid explicit configuration.

The preference order is for configuring stuff is (> means better than):
By Convention > explicit once in one place > explicit multiple times in one place > explicit multiple times in multiple places > implicit logic.

@jack-oquin
Copy link

@wjwwood: Please do not interpret my comments as harshly critical of you or of the current implementation. I hold you and your work in the highest regard.

But, repeatability is the most important thing when building, and complex interactions with the shell environment work against that. I fully understand the need for some shell variables, and I know how tempting it can be to solve a tricky design problem by adding an extra variable. But they need to be simple, well-documented and relatively constant per-user. I did not like the excessive shell environment usage of rosbuild 5 years ago. With catkin, it's gotten even more complex. My personal experience with catkin over the last year and a half has been frustrating. Explaining catkin to intelligent lower-division undergraduates is very difficult. They waste a lot of time on the robots because they botched their shell environment, each in some different way.

I defer to your judgement to determine what, if anything, can be done to improve matters at this late date.

If there is no solution to this problem, then we'll need to document how it works better. But, that is difficult, given the present complexity. A cleaner design would be easier to document well.

@dirk-thomas
Copy link
Contributor

Even if a "new" cleaner design might not be applicable in the near term I would consider it very valuable if we would come up with something like that.

Based on a clear and better vision we can always strive to get closer to it. But until now we have only identified some aspects which makes the current approach difficult to use. I don't see any concrete idea for a cleaner design yet.

@wjwwood
Copy link
Member

wjwwood commented May 24, 2014

Well, like @dirk-thomas said, I think if we feel that there is a problem we have to identify it, and come up with a concrete way to change it. Only then can we try to come to some consensus on if and how it should change.

So @jack-oquin you mentioned that you believe it:

  • Has poor repeatability
  • Is a complex interaction involving many environment variables
  • Shell complexity is worse than rosbuild's

In order to act on these points we need examples of how that is the case and suggestions on how it should/could be different.

They waste a lot of time on the robots because they botched their shell environment, each in some different way.

We cannot act on this, we need to know HOW it was botched and we need suggestions and insight on how it could be different to avoid that. I sympathize with the point, but it is simply not helpful in resolving the issues to simply state that they exist without describing them in detail.

I already tried to make progress on the "many, complex" environment variables discussion by pointing out that we only use the CMAKE_PREFIX_PATH and that most of the environment variables set by the setup files are not catkin specific and required by the toolchains on the system. I was hoping to get back some feedback on how you see it differently.

Additionally, in this thread we've already discussed having the build command take a list of workspaces explicitly (ignoring any workspaces in the environment) and I've raised some questions about how that affects certain workflows and the user experience which haven't been addressed since.

So, I'm trying to solicit concrete input from you guys and then discuss it, but I'm having trouble keeping us on track.

@jack-oquin
Copy link

Nobody except you, Dirk or Tully knows or cares which parts of the environment is used by which tools.

As best I can tell, here's what I need in order to get useful work done on ROS:

CATKIN_TEST_RESULTS_DIR=/home/joq/ros/W/build/test_results
CMAKE_PREFIX_PATH=/home/joq/ros/ws/devel:/opt/ros/hydro
GAZEBO__DIR=/usr/share/gazebo
GAZEBO_MASTER_URI=http://localhost:11345
GAZEBO_MODEL_DATABASE_URI=http://gazebosim.org/models
GAZEBO_MODEL_PATH=/home/joq/ros/gazebo/gazebo_models:
GAZEBO_PLUGIN_PATH=/usr/lib/gazebo-1.9/plugins
GAZEBO_RESOURCE_PATH=/usr/share/gazebo-1.9:/usr/share/gazebo_models
LD_LIBRARY_PATH=/usr/lib/gazebo-1.9/plugins:/home/joq/ros/ws/devel/lib:/opt/ros/hydro/lib
PATH=/home/joq/ros/ws/devel/bin:/opt/ros/hydro/bin:/home/joq/bin:/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/joq/bin
PKG_CONFIG_PATH=/home/joq/ros/ws/devel/lib/pkgconfig:/opt/ros/hydro/lib/pkgconfig
PYTHONPATH=/home/joq/ros/ws/devel/lib/python2.7/dist-packages:/opt/ros/hydro/lib/python2.7/dist-packages
ROSCONSOLE_CONFIG_FILE=/home/joq/.ros/config/rosconsole.config
ROS_DISTRO=hydro
ROS_EMAIL=jack.oquin@gmail.com
ROS_ETC_DIR=/opt/ros/hydro/etc/ros
ROS_HOME=/home/joq/.ros
ROSLISP_PACKAGE_DIRECTORIES=/home/joq/ros/ws/devel/share/common-lisp
ROS_MASTER_URI=http://iron:11311
ROS_PACKAGE_PATH=/home/joq/ros/ws/src/image_common/polled_camera:/home/joq/ros/ws/src/image_common/camera_info_manager:/home/joq/ros/ws/src/image_common/image_transport:/home/joq/ros/ws/src/image_common/image_common:/home/joq/ros/ws/src/image_common/camera_calibration_parsers:/opt/ros/hydro/share:/opt/ros/hydro/stacks
ROS_ROOT=/opt/ros/hydro/share/ros
ROS_TEST_RESULTS_DIR=/home/joq/ros/ws/build/camera_info_manager/test_results
ROS_WORKSPACE=/home/joq/ros/ws

There's probably something I am forgetting, and maybe some of it is no longer necessary.

My ROS_PACKAGE_PATH is relatively short at the moment, because my current workspace does not have much in it. It was over 5KB long recently.

If our students get any of that wrong, they are in deep trouble. Most of them don't know how to deal with it.

They all have to share a common student account on the robots, so they each need to create a separate workspace. Disaster!

@tkruse
Copy link
Contributor

tkruse commented May 25, 2014

There have been suggestion for a cleaner design already.

The first is that no part of the current CMAKE_PREFIX_PATH is stored in the generated setup files (currently in _setup_util.py).

So if a user wants to chaining, he has to say explicitly which workspace he wants to chain to, and only this information will be used. So if there is a lot or rubbish in the current environment, the generated setup files still remain pristine. The rubbish is not persisted to reappear later on. Don't store rubbish in the workspace configuration, don't blindly mash together setup files.

The next suggestion is: Do not make any other modifications to the CMAKE_PREFIX_PATH or other env variables than appending values. No 'rollback_env_variable'. Again, no tool in the world dares to implicitly decide for the user that something in his current environment has to be removed. Consider the CMAKE_PREFIX_PATH as a write-only variable. If the user wants something out, it must be done explicitly by the user. If you want to provide extra convenience, then make it explicit, like the deactivate function in virtualenv.

And given the way catkin was designed, do not expect the community to humbly report their problems to you. catkin has many flaws, and every flaws may have many symptoms, none of which have any useful error message pointing to the source of the problem, and the problems may appear in combinations. Users are happy enough if after yet another wasted day, their robot can start acting on the demo. Users have then no motivation at all to go back and find what was originally wrong and write a bug report. catkin has made it too difficult to debug problems and create clean reports.

Again, I do not get my hopes up here because the current catkin maintainer still thinks this is a great idea:

if __name__ == '__main__':
   try:
        sys.exit(main())
    except Exception as e:
        sys.exit(str(e))

(taken from catkin_make)

As long as you have this attitude to reporting failures, you cannot expect useful feedback. I think it is rather ironical that you throw such design at users and then say something like "We cannot act on this, we need to know HOW it was botched". If you need to know that, then write tools that generate useful failure reports. Duh.

@jbohren
Copy link
Contributor Author

jbohren commented May 25, 2014

So, I'm trying to solicit concrete input from you guys and then discuss it, but I'm having trouble keeping us on track.

Let's focus on the use of $CMAKE_PREFIX_PATH for describing workspaces. Currently, as far as the catkin CMake macros are concerned, $CMAKE_PREFIX_PATH just affects how CMake works. It's only the catkin setup files which start manipulating it and overloading it to describe multiple workspaces. I know this is not a catkin_tools issue, but we're already talking about it here. Since there is no external interface to catkin_make to control chaining, maybe we should move it to catkin_pkg or catkin. I originally created this ticket here because I thought there was some way to control workspace chaining through command line arguments or the API.

Proposal 1: Use a Catkin-specific veriable in place of $CMAKE_PREFIX_PATH

(inspired by comments in #47)

So I think the first thing is that catkin uses (and some argue abuses) the $CMAKE_PREFIX_PATH environment variable to manage the "sourced" workspaces. This is already a confusing point for new users. When using CMake directly, it's understandable that they would have to manipulate $CMAKE_PREFIX_PATH, but when using tools like catkin_make and catkin build, I usually get confused looks when I tell students that to reset their catkin environment they need to unset $CMAKE_PREFIX_PATH.

What if we add a catkin-specific environment variable, call it $CATKIN_PREFIX_PATH which is added to $CMAKE_PREFIX_PATH either when you call find_package(catkin ...) or when you run catkin build. Catkin can add and remove whatever it want to $CATKIN_PREFIX_PATH, without concern for colliding with users' own modifications to $CMAKE_PREFIX_PATH and the new variable will be more intuitive for novices trying to understand the buildsystem.

Proposal 2: Add an optional argument to catkin_make_* which overrides chaining

Running catkin_make_* currently chains off of the current environment by default. This default behavior doesn't have to change. What if, however, we added an optional argument --extend or --chain or --inherit which extends a single other workspace. This would result in a new workspace which would be equivalent to (1) un-setting $CMAKE_PREFIX_PATH and other environment variables, (2) sourcing the other worksapce's setup file, and (3) catkin_make-ing the new workspace.

This would enable repeatability at the catkin_make level because users could run catkin_make --extend /opt/ros/hydro to be sure that their environment would be clean.

Proposal 3: Take advantage of Proposal 2 in catkin build

With this new interface to extension as described above, catkin build could have the following features:

  • A pass-through to the --extend feature, or whatever it ends up being called
  • A workspace to extend by default when no explicit arguments are given (either $CATKIN_PREFIX_PATH, or /opt/ros/$ROS_DISTRO, or ~/path/to/my/overlay). This could be stored in .config/catkin/workspaces or something.

@dirk-thomas
Copy link
Contributor

Thank you @jbohren for keeping this on track.

I completely agree with Proposal 1. Having a separate variable for the catkin workspaces is clearly the better choice.

I also agree that having an option as in Proposal 2 would make it easier. If I understand it correctly it should work the same way as when the following would be invoked manually CMAKE_PREFIX_PATH=/opt/ros/hydro catkin_make, right?

One potential problem with this is the following: if the user has sourced a different workspace before (e.g. /opt/ros/groovy) then the environment contains different variables (e.g. the PATH). As a consequence the CMake process will pick up e.b. binaries from there which it is not supposed to. I would assume that only from a "clean" environment (nothing sourced before) this approach guarantees a "correct" result. Is that an acceptable restriction? It implies that the CMAKE_PREFIX_PATH must not be set when using this option since that would indicate that a different environment was sourced before.

Regarding Proposal 3 I am not sure if I do see the need for having a default stored somewhere rather then just spelling it out explicitly. But lets not focus on that specific feature - as you described it can be added on top if found useful.

@jbohren
Copy link
Contributor Author

jbohren commented May 25, 2014

I also agree that having an option as in Proposal 2 would make it easier. If I understand it correctly it should work the same way as when the following would be invoked manually CMAKE_PREFIX_PATH=/opt/ros/hydro catkin_make, right?

Exactly.

Regarding Proposal 3 I am not sure if I do see the need for having a default stored somewhere rather then just spelling it out explicitly. But lets not focus on that specific feature - as you described it can be added on top if found useful.

Yeah, we can return to this in #47

@dirk-thomas
Copy link
Contributor

What about my remark and question regarding proposal 2? Should we only allow the usage when the environment is clean to guarantee that the result is correct?

@tfoote
Copy link
Contributor

tfoote commented May 25, 2014

@tkruse Please focus on the issue at hand. It is inappropriate and damaging to our discussion to bring in separate topics and use hyperbole to make your point. Please speak for yourself and make your opinions heard while focusing on constructive solutions for the topic at hand.

@wjwwood
Copy link
Member

wjwwood commented May 25, 2014

We are starting to have multiple conversations here, but hopefully that's ok.

@jack-oquin I wanted to address your feedback on the environment variables.

Nobody except you, Dirk or Tully knows or cares which parts of the environment is used by which tools.

In fairness I think that list includes other develops of things which are represented in your list and anyone who would like to understand the system (the people who would stand to benefit from this being simpler).

Lets take a closer look at the environment variables:

CATKIN_TEST_RESULTS_DIR=/home/joq/ros/W/build/test_results
CMAKE_PREFIX_PATH=/home/joq/ros/ws/devel:/opt/ros/hydro

These two variables are the only two variables catkin actually uses internally and these are the only two variables which we could remove and replace with some other functionality. I'm not sure how we would do that, maybe a marker file or something. Currently the discussion is to stop using CMAKE_PREFIX_PATH directly and use something like CATKIN_PREFIX_PATH, which seems like it is in the wrong direction to making your complaint better. I agree with the discussion that not reusing the CMAKE_PREFIX_PATH will be more straight forward for the user, but the original reason for using it like this was to keep the number of environment variables down, in fact we have worked pretty hard to minimize the use of environment variables in our new work.

GAZEBO__DIR=/usr/share/gazebo
GAZEBO_MASTER_URI=http://localhost:11345
GAZEBO_MODEL_DATABASE_URI=http://gazebosim.org/models
GAZEBO_MODEL_PATH=/home/joq/ros/gazebo/gazebo_models:
GAZEBO_PLUGIN_PATH=/usr/lib/gazebo-1.9/plugins
GAZEBO_RESOURCE_PATH=/usr/share/gazebo-1.9:/usr/share/gazebo_models

These are all Gazebo specific, you'd have to take up these with them, catkin neither can control this nor is it forcing them to do things this way.

LD_LIBRARY_PATH=/usr/lib/gazebo-1.9/plugins:/home/joq/ros/ws/devel/lib:/opt/ros/hydro/lib
PATH=/home/joq/ros/ws/devel/bin:/opt/ros/hydro/bin:/home/joq/bin:/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/joq/bin
PKG_CONFIG_PATH=/home/joq/ros/ws/devel/lib/pkgconfig:/opt/ros/hydro/lib/pkgconfig
PYTHONPATH=/home/joq/ros/ws/devel/lib/python2.7/dist-packages:/opt/ros/hydro/lib/python2.7/dist-packages

These, along with CMAKE_PREFIX_PATH if we move to CATKIN_PREFIX_PATH, are required to be set for our toolchain to work. Not setting these is not an option, we could have users set them manually or only install packages to a place which is already on those paths, but doesn't sound very user friendly.

ROSCONSOLE_CONFIG_FILE=/home/joq/.ros/config/rosconsole.config
ROS_DISTRO=hydro
ROS_EMAIL=jack.oquin@gmail.com
ROS_ETC_DIR=/opt/ros/hydro/etc/ros
ROS_HOME=/home/joq/.ros
ROSLISP_PACKAGE_DIRECTORIES=/home/joq/ros/ws/devel/share/common-lisp
ROS_MASTER_URI=http://iron:11311
ROS_PACKAGE_PATH=/home/joq/ros/ws/src/image_common/polled_camera:/home/joq/ros/ws/src/image_common/camera_info_manager:/home/joq/ros/ws/src/image_common/image_transport:/home/joq/ros/ws/src/image_common/image_common:/home/joq/ros/ws/src/image_common/camera_calibration_parsers:/opt/ros/hydro/share:/opt/ros/hydro/stacks
ROS_ROOT=/opt/ros/hydro/share/ros
ROS_TEST_RESULTS_DIR=/home/joq/ros/ws/build/camera_info_manager/test_results
ROS_WORKSPACE=/home/joq/ros/ws

ROS sets quite a few variables, but the more mercurial of the them is probably the ROS_PACKAGE_PATH. At this point the ROS_PACKAGE_PATH is a legacy PATH used by rosbuild and all of the ROS tools which reused it. We have already been working on ways to replace the ROS_PACKAGE_PATH with a system which uses the filesystem to look up resources and doesn't require recursive crawling like the ROS_PACKAGE_PATH does. Obviously a change like this would not be easy to push out into the current ROS environment, so it's unlikely we could use this to remove the ROS_PACKAGE_PATH in the current system. The rest of the variables are set for legacy reasons, but in general I wouldn't guess that they cause users much heartache.

I guess my point here is that all of the environment variables serve a purpose and most of them cannot be controlled by catkin.

My ROS_PACKAGE_PATH is relatively short at the moment, because my current workspace does not have much in it. It was over 5KB long recently.

That only occurs in the extreme case when you have every package installed into a different folder so you have N workspace's chained where N in the number of packages you are building.

If our students get any of that wrong, they are in deep trouble. Most of them don't know how to deal with it.

They shouldn't be setting these manually, they should only be sourcing a setup file, so I'm not sure what they would do to get it wrong, but I believe you that they do get in erroneous states and I take the point that it would be complicated to debug in that case.

So given that we want to improve this scenario, what do we do to make it simpler? We've already discussed making a custom CATKIN_PREFIX_PATH variable to make it semantically clearer. What other ideas do you have?

@wjwwood
Copy link
Member

wjwwood commented May 25, 2014

There have been suggestion for a cleaner design already.

I didn't say that there had not been, I said that I had raised questions about how that change would behave in a few described workflows and I got no direct feedback on that.

And given the way catkin was designed, do not expect the community to humbly report their problems to you.

I'm not sure that makes sense, but as a user of software, I would never expect to go to an issue tracker and say "There was a problem and it sucked" and then have the developer be able to fix that.

I think it is far more unreasonable to expect that the developers can address something which hasn't been described in detail and/or made reproducible.

Further more, problems that user run into which have not been described cannot support any engineering decision to change the design because you have no idea what the actual problem was.

Also the Python snippet you pulled has nothing to do with any of the problems described here, though I agree that silently suppressing the traceback makes it harder for users to make good reports. However, catching and suppressing the traceback in catkin_make would not hide problems with setup files, cmake invocation, or make invocation.

If you need to know that, then write tools that generate useful failure reports. Duh.

Do you have any constructive suggestions on how we can generate better reports? (excluding the catkin_make Python traceback one)

@jbohren
Copy link
Contributor Author

jbohren commented May 26, 2014

What about my remark and question regarding proposal 2? Should we only allow the usage when the environment is clean to guarantee that the result is correct?

It would be ideal if someone could run the command even when the environment isn't clean. When the user sources the generated setup file, then it should get cleaned. Even if it only manages to clean CATKIN_PREFIX_PATH, that would make it easier to recover from environment issues since they could source it from a clean shell. This also means they could have the source line in their bashrc, and any new shells would be clean.

@dirk-thomas
Copy link
Contributor

Only cleaning the CATKIN_PREFIX_PATH would be incomplete. It needs to undo previous prepends of PATH, PYTHONPATH, LD_LIBRARY_PATH, etc. So I do think when sourcing a different environment it is mandatory to rollback changes introduces by previously sourcing a different file.

Currently that is done implicitly. Arguably it could be done with an explicit command instead. And if not done explicitly by the user it could error out telling the user the command to first rollback the previous environment.

Anyway this kind of rollback is always limited to the environment variable modified by the _setup_util.py file. For any user defined environment hook it is not possible since there is just not enough knowledge about it in order to roll it back.

@jbohren
Copy link
Contributor Author

jbohren commented May 26, 2014

Only cleaning the CATKIN_PREFIX_PATH would be incomplete. It needs to undo previous prepends of PATH, PYTHONPATH, LD_LIBRARY_PATH, etc. So I do think when sourcing a different environment it is mandatory to rollback changes introduces by previously sourcing a different file.

Yeah, I think that's fine as long as it's done with the current rollback functionality so that we don't blow away custom additions to things like $PATH and $LD_LIBRARY_PATH that people might add manually.

Currently that is done implicitly. Arguably it could be done with an explicit command instead. And if not done explicitly by the user it could error out telling the user the command to first rollback the previous environment.

Yeah, I think to determine which is better we'd need to try out both methods.

Anyway this kind of rollback is always limited to the environment variable modified by the _setup_util.py file. For any user defined environment hook it is not possible since there is just not enough knowledge about it in order to roll it back.

Of course, but I think (and I expect we're all in agreement that) those variables are the ones that are most likely to cause problems.

Is there any way to define a rollback function or script for an exported catkin env hook? If not, that could be a nice future feature to have so that other environment hooks can be supported.

@jack-oquin
Copy link

CATKIN_TEST_RESULTS_DIR=/home/joq/ros/W/build/test_results
CMAKE_PREFIX_PATH=/home/joq/ros/ws/devel:/opt/ros/hydro

These two variables are the only two variables catkin actually uses internally and these are the only two variables which we could remove and replace with some other functionality. I'm not sure how we would do that, maybe a marker file or something.

Somehow using the file system seems like a huge step in the right direction. The fundamental design problem is that we are effectively using the equivalent of C++ global variables for things that should be per-object. Because the scope of these variables is inappropriate, people dealing with multiple "objects" (i.e. workspaces) get very confused.

Currently the discussion is to stop using CMAKE_PREFIX_PATH directly and use something like CATKIN_PREFIX_PATH, which seems like it is in the wrong direction to making your complaint better.

Yes, I did notice that. Perhaps I have not succeeded in convincing anyone that there is a problem.

I agree with the discussion that not reusing the CMAKE_PREFIX_PATH will be more straight forward for the user, but the original reason for using it like this was to keep the number of environment variables down, in fact we have worked pretty hard to minimize the use of environment variables in our new work.

I would like to understand how adding yet another environment variable will "make things more straightforward". Maybe it's true, but I don't get it.

These are all Gazebo specific, you'd have to take up these with them, catkin neither can control this nor is it forcing them to do things this way

The Gazebo variables have not caused much trouble, probably because they are relatively static, not changing for every workspace. I only mentioned them to remind you that actual users must to keep track of a large number of messy details. Saying "catkin only adds two" tends to obscure that point.

LD_LIBRARY_PATH=/usr/lib/gazebo-1.9/plugins:/home/joq/ros/ws/devel/lib:/opt/ros/hydro/lib
PATH=/home/joq/ros/ws/devel/bin:/opt/ros/hydro/bin:/home/joq/bin:/usr/lib/lightdm/lightdm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/joq/bin
PKG_CONFIG_PATH=/home/joq/ros/ws/devel/lib/pkgconfig:/opt/ros/hydro/lib/pkgconfig
PYTHONPATH=/home/joq/ros/ws/devel/lib/python2.7/dist-packages:/opt/ros/hydro/lib/python2.7/dist-packages

These, along with CMAKE_PREFIX_PATH if we move to CATKIN_PREFIX_PATH, are required to be set for our toolchain to work. Not setting these is not an option, we could have users set them manually or only install packages to a place which is already on those paths, but doesn't sound very user friendly.

Along with ROS_PACKAGE_PATH, they seem to be the root cause of most problems. Every one of them contains per-workspace components. Wrong scope! Maybe it's time to think outside of the box.

My ROS_PACKAGE_PATH is relatively short at the moment, because my current workspace does not have much in it. It was over 5KB long recently.

That only occurs in the extreme case when you have every package installed into a different folder so you have N workspace's chained where N in the number of packages you are building.

It occurs because I am using catkin build --merge-devel. With catkin_make it was much shorter.

If our students get any of that wrong, they are in deep trouble. Most of them don't know how to deal with it.

They shouldn't be setting these manually, they should only be sourcing a setup file, so I'm not sure what they would do to get it wrong, but I believe you that they do get in erroneous states and I take the point that it would be complicated to debug in that case.

Each robot has a single student account which they must all share. So, we asked them to create a separate workspace for each project. If everybody were to edit the .bashrc, things would get mixed up in a hurry, one more problem for the next student to trip over. So, every time they open a new shell they need to remember to source the correct setup.bash. Even when they do that correctly, there can be problems with sourcing a new workspace in a shell that already had one.

I see the mess after it has happened. It's usually hard to tell exactly how it got that way. In many cases, they probably had cruft in their environment when creating the workspace initially. ROS is not very helpful in diagnosing these kinds of problems.

The students are intelligent, but inexperienced.

So given that we want to improve this scenario, what do we do to make it simpler? We've already discussed making a custom CATKIN_PREFIX_PATH variable to make it semantically clearer. What other ideas do you have?

I don't know how it works right now, nor how to fix it. If I were designing something like this from scratch, I would use the file system and not the shell environment for per-workspace customization. The variables set in .bashrc should be relatively static, not things that change when moving between workspaces.

I have been trying to document the fact that there is a problem. Not every user with a problem knows how to fix it. That does not make the report invalid.

@tfoote
Copy link
Contributor

tfoote commented May 26, 2014

If you're sharing a single user account I would suggest that putting anything in your bashrc which relates to a users workspace is inappropriate and that is very much like a power user where you're jumping workspaces all the time. Many people i've seen that reuse workspaces just add an alias to source their workspace for convenience. Often one letter or a very short word. "h" or "hydro" or "hsrc"

Stepping back I don't think that we want to reinvent the isolation of shell processes. If we want to leave a clean environment behind after running something we should use a child shell process and then terminate it. (Or a new terminal) The other solution is that we can do a lot of accounting and try to clean things up automatically. But there's always going to be a corner case where something was manually modified which was automatically set and it will get rolled back incorrectly.

The other option we could pursue would be to make it more visible that you're in an environment and what environment you're in. In the same way that virtualenv does this already by updating your PS1 display. We can probably even do this via a hooks package and allow customization. Virtualenv does not have any tools for rolling back env changes except quitting.

I believe this approach can solve the useability issues without requiring us to change how the tool we rely on underneath us, such as the linux runtime linking, gcc and cmake. The workspaces sourced can be displayed, and you'll be able to see what is already on your path. And we don't need to try to support rolling back. If you want to roll back you will need to reconstruct from an empty environment.

@jbohren
Copy link
Contributor Author

jbohren commented May 26, 2014

CATKIN_TEST_RESULTS_DIR=/home/joq/ros/W/build/test_results
CMAKE_PREFIX_PATH=/home/joq/ros/ws/devel:/opt/ros/hydro

These two variables are the only two variables catkin actually uses internally and these are the only two variables which we could remove and replace with some other functionality. I'm not sure how we would do that, maybe a marker file or something.

Somehow using the file system seems like a huge step in the right direction. The fundamental design problem is that we are effectively using the equivalent of C++ global variables for things that should be per-object. Because the scope of these variables is inappropriate, people dealing with multiple "objects" (i.e. workspaces) get very confused.

I think maintaining context is really important, too, but I've found that tools that are based on file structure can get really messy, themselves. I think a marker file is important, and I even think putting configuration info into that marker file is really the way to go. In that case, we could change the way catkin determines the environment at configure/build time like the following:

_Given the following actions:_

# Load /opt/ros/hydro workspace
source /opt/ros/hydro/setup.bash
# Make a new workspace, called "ws1" 
mkdir -p ~/ws1/src
cd ~/ws1/src && catkin_create_pkg foo
# Build the new workspace, chaining it from /opt/ros/hydro
cd ~/ws1 && catkin_make

_Current Catkin Sticky Behavior:_ (correct me if I'm wrong)

  • /opt/ros/hydro is statically stored in the following places:
    • ~/ws1/build/catkin_generated/setup_cached.sh
    • ~/ws1/build/catkin_generated/generate_cached_setup.py
    • ~/ws1/devel/_setup_util.py
  • The only way to re-set the parent workspace is by removing the build directory
  • Aside from sourcing the environment there is no front-end tool for querying which workspace this workspace is chained against
  • There is no error-checking if the user's $CMAKE_PREFIX_PATH has changed since this workspace was built

_Idea for New Catkin Sticky Behavior:_

  • List of /opt/ros/hydro (and other prefixes) is stored in ~/.catkin_workspace catkin_pkg#95
  • Each time cmake is invoked, ~/.catkin_workspace is re-read
    • If the prefix list doesn't change and matches the current $CMAKE_PREFIX_PATH, then it builds normally
    • If the prefix list changes from the last build, then a warning is issued, all targets are rebuilt, and setup files are re-generated
    • If the prefix list is different from the current $CMAKE_PREFIX_PATH then an error or at least a warning should be reported

I agree with the discussion that not reusing the CMAKE_PREFIX_PATH will be more straight forward for the user, but the original reason for using it like this was to keep the number of environment variables down, in fact we have worked pretty hard to minimize the use of environment variables in our new work.

I would like to understand how adding yet another environment variable will "make things more straightforward". Maybe it's true, but I don't get it.

It makes things more straightforward in three ways:

  1. It means that we can be more strict about how we interpret $CATKIN_PREFIX_PATH since only catkin should be modifying it, and a path on $CATKIN_PREFIX_PATH must be a catkin workspace. This strictness means we can more easily detect a broken environment automatically.
  2. It means that a user who knows about environment variables can more easily associate $CATKIN_PREFIX_PATH with catkin when trying to debug their environment.
  3. It means that people can add things to their $CMAKE_PREFIX_PATH without having to worry about it colliding with catkin use of the same variable. This is important on less-supported platforms like OS X and other systems where there might be non-standard install paths for libraries.

@tkruse
Copy link
Contributor

tkruse commented May 26, 2014

Do you have any constructive suggestions on how we can generate better reports? (excluding
the catkin_make Python traceback one)

  • Make catkin_lint a mandatory (opt-out) part of workspace actions.
  • Be more restrictive by default and fail early. As an example, only allow workspace chains of length 2 (underlay and overlay) by default, with an opt-out switch for expert users. This will eliminate plenty of situations where students had blotched setups due to accidental longer chains.
    Or disallow chains where the same package name appears more than once in the chain, by default (opt-out). The more restrictive the defaults are, the better for average-user usability.

Such restrictions will give you better error reports because the circumstances of any failure will have less variation.

  • Make information transparent, like what @jbohren says.
    Catkin was based on the "ideal" that a workspace is nothing but a huge cmake project (and builds are just volatile setups). Reject this ideal as a (predictable) total failure for the ROS community (though certain teams can be successful with it), and start to treat a workspace as a workspace, like any IDE treats a workspace. A workspace has metadata in the workspace root that describes the workspace to the user, and the meta-information does not easily get lost on any kind of 'clean' operation. The .rosinstall file in rosbuild did not merely contain VCS information, it gave users a rationale for their environment. In catkin, that is hidden in some generate complex files. Whatever changes sourcing a setup.sh does to the environment, this should be easily deducable for a non-expert user by reading in an obviously placed workspace meta-data file.

As a result of all the hidden mechanics, when faced with any failure, users will start hacking away in unpredictable ways before asking for help, at which point their setup is too broken to reconstruct the original almost pristine state where the original failure occured. And thus you would not get a useful error report from the original incident.

Make this test: Take some novice ROS user, give him a pencil, tell him a given CMAKE_PREFIX_PATH, give him a sequence of actions in the shell to imagine, and ask him to write down the resulting CMAKE_PREFIX_PATH. If he cannot get this right easily, the design is flawed.

  • Deprecate catkin_make, to get rid of it as soon as possible. As long as it is still around, some people will still use it, reducing the help team members can give each other, and "muddying the water" of workspaces. Expert users are those who will be most likely to report errors. If you tempt them to use catkin_make, they will not write reports for catkin_tools, reducing the number of useful error reports for catkin_tools.

@tkruse
Copy link
Contributor

tkruse commented May 27, 2014

I would also again like to encourage people to consider that

$ source /opt/ros/hydro/setup.bash
...
$ "create workspace"

is not a case where the second command should act by default as if the user wanted to create a workspace chained to hydro. Between those two commands weeks may have passed, the user may have run a thousand commands in between, the first command may have come from a .bashrc, the user might not be aware that he returned to a shell where previously he ran the source command, etc.

Viable alternatives are explicit arguments or user interaction ("Do you really want to chain against ...(n):")

@wjwwood wjwwood added this to the untargeted milestone May 27, 2014
@jbohren
Copy link
Contributor Author

jbohren commented May 30, 2014

Dirk's patch ros/catkin#641 fixes the rollback failure, and now I want to look at overriding the auto-chaining behavior.

I just did a little experimenting and looking into the catkin source, I'm going to continue discussing overriding $CMAKE_PREFIX_PATH in ros/catkin#643

Once we figure out what the interface is to control that, I figure we can get back to discussing how catkin build should behave w.r.t. workspace chaining?

@dirk-thomas
Copy link
Contributor

In order to advance this discussion I have proposed a conference call to talk with all interested people directly (https://groups.google.com/forum/#!topic/ros-sig-buildsystem/K7_QBJzFlVA). Please consider to reply to the email on the mailing list.

@jbohren
Copy link
Contributor Author

jbohren commented Jun 2, 2014

Here's a working example of how we can enable manual extension, make the CMAKE_PREFIX_PATH more obvious, and enable people to more easily manipulate their workspace heirarchies: #58

@jbohren
Copy link
Contributor Author

jbohren commented Oct 2, 2014

I think we can close this issue. There was a lot of really helpful discussion, and I think the features in #58 and #80 enabled much greater control over workspace extension. I understand that there still aren't good guarantees when not starting with a clean shell, but at least there's more introspection into it, now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants