Skip to content

Conversation

@zizhong
Copy link
Member

@zizhong zizhong commented Oct 8, 2015

traffic_line -x doesn't reload when SSL certs change file contents without changing the file names.

@bgaff
Copy link
Member

bgaff commented Oct 8, 2015

@zwoop @jpeach @SolidWallOfCode mind doing a review for me?

@jpeach
Copy link
Contributor

jpeach commented Oct 14, 2015

I will review this. Hoping to get to it by the weekend of Oct 17th. There are some style changes to make, but I'm mainly concerned about the integration with the config files mechanics and the SSLConfig changes (especially statics). Doing it this way looks like it won't correctly update proxy.node.config.reconfigure_required.

@jpeach
Copy link
Contributor

jpeach commented Oct 24, 2015

ink_file_lmtime

  • Should be named "ink_file_time".
  • I don't see why we need to use ink_timezone() here, or anywhere else for that matter.
  • This should return ink_hrtime (using ink_hrtime_from_timespec) rather than time_t.

If you make these ink_file_mtime changes and update Rollback to use
ink_file_mtime, I'll merge that as a separate patch.

This change crashes traffic_manager every time for me, with the following
backtrace:

(lldb) bt
* thread #2: tid = 0xe4f28, 0x000000010003b7eb traffic_manager`LocalManager::signalFileChange(this=0x0000000100a26680, var_name="proxy.config.ssl.server.multicert.filename", incVersion=true) + 27 at LocalManager.cc:769, stop reason = breakpoint 1.1
* frame #0: 0x000000010003b7eb traffic_manager`LocalManager::signalFileChange(this=0x0000000100a26680, var_name="proxy.config.ssl.server.multicert.filename", incVersion=true) + 27 at LocalManager.cc:769
    frame #1: 0x0000000100004ad1 traffic_manager`fileUpdated(fname="ssl_multicert.config", incVersion=false) + 993 at traffic_manager.cc:941
    frame #2: 0x000000010003700a traffic_manager`FileManager::fileChanged(this=0x0000000100a270a0, baseFileName="ssl_multicert.config", incVersion=false) + 122 at FileManager.cc:189
    frame #3: 0x000000010004109e traffic_manager`Rollback::checkForUserUpdate(this=0x0000000100a27ce0, how=ROLLBACK_CHECK_ONLY) + 510 at Rollback.cc:928
    frame #4: 0x0000000100038458 traffic_manager`FileManager::isConfigStale(this=0x0000000100a270a0) + 120 at FileManager.cc:657
    frame #5: 0x000000010008a378 traffic_manager`sync_thr(data=0x0000000100a270a0) + 376 at RecLocal.cc:97
    frame #6: 0x00007fff915469b1 libsystem_pthread.dylib`_pthread_body + 131
    frame #7: 0x00007fff9154692e libsystem_pthread.dylib`_pthread_start + 168
    frame #8: 0x00007fff91544385 libsystem_pthread.dylib`thread_start + 13

It looks like this is a side-effect of always sending events.

listened_var_name is an array literal, so you can always calculate
its size at compile time using countof(). Use this for iteration
and in place of listened_var_name_size.

SSLCertificateConfig::need_reconfigure() should not have the
side-effect of triggerring a reconfiguration.

There are various problems with SSLCertificateConfig::FileInfo and VarInfo:

  • set_value() should not be a separate call, do it in the constructor
  • check_value() should be called "bool changed() const;"

We should not need the static data in SSLCertificateConfig.

In general, I don't like this approach. The additional config files for SSL
configuration (if they are handled at all), should be handled like other
configuration files by the Rollback mechanics. I certainly could see that you
don't want to create previous versions of keys, however, which makes me winder
whether these really should be treated like config files?

It looks like you handle the additional files for a SSL context can
be specified in a ssl_multicert.config entry by collecting all the
file names into the SSLCertificateConfig static data and then always
triggerring a reload event from traffic_manager. I think that this
approach adds complexity to the SSL subsystem, which is already far
to complex.

A better approach to explore is adding the capability for traffic_server
to send a message up to traffic_manager to tell it to watch a set
of files. traffic_manager is already doing this, so much of the
mechanism should be there. This capability could eventually be used
by other subsystems and even by plugins. One tricky aspect of this
approach is finding a clean way to tell traffic_manager to stop
watching a set of files.

@zizhong
Copy link
Member Author

zizhong commented Oct 25, 2015

Thanks @jpeach for the comment. I'll consider about another way to handle this.

@bgaff
Copy link
Member

bgaff commented Oct 26, 2015

@jpeach , the approach you're describing sounds incredibly complex given the problem we're trying to solve.

With respect to the following:

A better approach to explore is adding the capability for traffic_server
to send a message up to traffic_manager to tell it to watch a set
of files. traffic_manager is already doing this, so much of the
mechanism should be there. This capability could eventually be used
by other subsystems and even by plugins. One tricky aspect of this
approach is finding a clean way to tell traffic_manager to stop
watching a set of files.

@jpeach
Copy link
Contributor

jpeach commented Oct 27, 2015

I don't think it is that complex. All the machinery for doing this exists, it just needs to be tied together in a sightly different way. My view is that this approach adds complexity in the wrong place (though I definitely agree that the config file management code can be refactored and simplified). One approach would be to make an upcall to traffic_manager that installs a "temporary watch" (ie. watch the file until the next reconfigure event).

Note that both ssl_multicert.config and remap.config have this issue, so a general mechanism can be used to deal with both cases. Your proposed solution also fails to update proxy.node.config.reconfigure_required correctly.

@bgaff
Copy link
Member

bgaff commented Oct 27, 2015

Ok @jpeach : i'll bite on the proposal to allow server to send a message to manager, perhaps the most general purpose way to do this is to allow server to specify that file F is associated with config C and if F changes it means that C is also considered changed, the set of F associated with a given C will last until the config C is reloading next. I believe this is what you're suggesting.

My next question is, how do you think this should be implemented, what message passing mechanism should we use?

@jpeach
Copy link
Contributor

jpeach commented Oct 27, 2015

The local manager can be used to send a message from traffic_server up to traffic_manager. Look for RecMessageSend. It's unfortunate that RecMessage sends hand-marshalled data packets, but that could be refactored to use the standard marshaling routines in MgmtMarshall.h. One problem that could happen going down this path is that we end up in a big ball o' mud due to linker dependencies. I guess that could be refactored away, but it could be painful.

I think that you are correct that the temporary watches need to be associated with a top-level config file. When the parent (?) watch triggers, then we can automatically drop all the child watches.

@zizhong zizhong closed this Nov 13, 2015
@zizhong zizhong reopened this Nov 13, 2015
@zizhong
Copy link
Member Author

zizhong commented Nov 13, 2015

hey @jpeach , I updated the branch with your opinion. Would you mind taking another look at it?

@jpeach
Copy link
Contributor

jpeach commented Nov 17, 2015

  • Remove _strlen and _memcpy usage (see TS-4029).

  • In ConfigFileGroup(), use ats_strdup() rather than malloc+memcpy.

  • In Rollback::createPathStr(), use the Layout::relative_to() that returns an allocated string.

  • Layout::relative_to, correctly deals with the file argument being
    an absolute path. So in Rollback::createPathStr(), you don't need
    to treat absolute paths specially.

  • Rollback::isFileNameRooted should be a function, not a member variable.

      bool isFileNameRooted() const {
        return *fileName == '/';
      }
    
  • ConfigFileGroup should not be responsible for unmarshalling data.
    The data marshalling format is a contract internal to the local
    manager. ConfigFileGroup should just deal with paths.

  • Why did you need to introduce ConfigFileGroup? An alternative
    would be to organize Rollback objects in a tree. Then you don't
    need to do all the name lookups; you can just keep a pointer from
    child to parent.

  • You can simplify the local manager protocol a lot. Rename
    MGMT_SIGNAL_CONFIG_FILE_GROUP_UPDATE to MGMT_SIGNAL_CONFIG_FILE_CHILD.
    The payload to this message is the parent config name and the file
    file path, like this:

      struct {
        int options;
        const char * parent; // config name
        const char * child;  // child path (absolute)
      };
    

    I'm not sure whether passing options is needed, but consider whether
    Rollback versioning should make backup copies of SSL keys. Passing
    options would allow you to deal with this.

    You don't need to marshall this by hand, use the API in MgmtMarshall.h
    to marshall the fields to a memory buffer. This marshalling should
    be done in a helper method on ProcessManager() that has an API
    similar to:

      ProcessManager::signalConfigFileChild(/* int options? */ const char * name, const char * path)
    
  • This adds a binary dependency from SSL configuration on the local
    manager that didn't exist before. Obviously I am OK with this, but
    I wanted to make that explicit.

  • Please sit with Brian and Thomas and make a TSQA test for this
    (in a separate pull request is OK).

@zizhong
Copy link
Member Author

zizhong commented Nov 18, 2015

Thanks @jpeach for the helpful comments.
For the protocol you suggested,

  1. How to clean up the old watched children files correctly?
  2. In a parent with all children style, the whole information can be packed in one message instead of O(children) messages, which is more efficient on both time and space.
    The tree structure for this problem is too general. A parent with all children (a.k.a. ConfigFileGroup) is all that is need. Those groups are few foreseeably(ssl_multicerts.conf, remap.conf) and all those can be treated as configFileGroups(a special tree whose depth is always 2) instead of a general tree.

After considering about the pros and cons about applying a tree structure in this issue, I prefer ConfigFileGroup.

@jpeach
Copy link
Contributor

jpeach commented Nov 18, 2015

  1. How to clean up the old watched children files correctly?

If you have a tree of Rollback objects, then any object in the tree that is updated would trigger the root of the tree and remove all the children. You always know the parent of the watched children, so when you trigger on the group, you get the parent and remove the children. IIRC you are already doing this.

  1. In a parent with all children style, the whole information can be packed in one message
    instead of O(children) messages, which is more efficient on both time and space.

Since the message to traffic_manager is asynchronous and not performance sensitive, there's no need to try to optimize this. We need to optimize for simplicity. When you use a single item per message, it is easy to re-use the existing message marshaling code, and the code in the SSL layer that collects all the child files is no longer needed.

Although I agree that the tree is more general than strictly necessary, I think that the code for that will turn out to be fairly clean. I am OK with file groups, but please add a back pointer from the file group to the parent Rollback so that the parent lookups can be avoided. Note that if you make the ConfigFileGroup a member of Rollback then you basically have the tree structure already, which is why I'm suggesting it :)

@zizhong
Copy link
Member Author

zizhong commented Nov 20, 2015

@jpeach I made another pull request with your helpful commits.
About the marshalling using APIs from MgmtMarshall, the marshalling API uses va_lists which is not fit in this scenario.
cc @bgaff

@jpeach
Copy link
Contributor

jpeach commented Nov 20, 2015

Thanks, I'll review again next week.

I don't think I understand what you mean about the marshaling APIs being unsuitable. Here's how you would use it to send a local manager message:

static int
send_a_message(int flags, const char * parent, const char * child)
{
  static const MgmtMarshallType fields[] = {
    MGMT_MARSHALL_INT,
    MGMT_MARSHALL_STRING,
    MGMT_MARSHALL_STRING,
  };

  void * buffer;
  size_t len;

  len = mgmt_message_length(fields, COUNTOF(fields), &flags, &parent, &child);
  buffer = ats_malloc(buffer);

  mgmt_message_marshall(buffer, length, fields, COUNTOF(fields), &flags, &parent, &child);
  pmgmt->signalManager(MGMT_SIGNAL_CONFIG_FILE_GROUP_UPDATE, buffer, len);

  ats_free(buffer);
}

@zizhong
Copy link
Member Author

zizhong commented Nov 22, 2015

@jpeach Thanks for your reply.
When I said the marshaling APIs being unsuitable, I meant using the APIs marshalling the Vec<char*>(which is configFileGroup).
However, on second thought, I think you are right. What you suggested (using a back_ptr instead of the file group) makes code more tidier. So that I rewrote this feature on your advice.
cc @bgaff

@zizhong
Copy link
Member Author

zizhong commented Nov 24, 2015

@jpeach , after I rewrote the whole patch on your idea, I found it way more clear and concise than before. Thank you for your great suggestions.
Let me know if you have questions with the new request.

@jpeach
Copy link
Contributor

jpeach commented Nov 24, 2015

@zizhong Great! I don't see the new patch here though?

@zizhong
Copy link
Member Author

zizhong commented Nov 25, 2015

@jpeach ,Sorry! You can see the lastest version of patch in this pull request. I updated this old request rather than making a new one.

@jpeach
Copy link
Contributor

jpeach commented Nov 25, 2015

Ok the next thing we need to do is remove the cert_files_vec from all the functions in SSLUtils.cc. The way we can do this is to add another callback to SSLConfigParams that is triggered when any SSL-related file is loaded. This callback can be attached in proxy/Main.cc and will just call your new ProcessManager::signalConfigFileChild().

In FileManager::configFileChild() you need to hold the lock the whole time you are adding the new Rollback so that the parent pointer cannot get deleted out from under you. I think you should add an assertion that parent must not also have a parent in order to prevent construction of arbitrary Rollback trees.

You don't need ts/ink_string++.h in mgmt/FileManager.h.

In FileManager::rereadConfig(), you can't delete entries out of the hash table while you are iterating over it. The logic of this is pretty hard to follow and can be simplified. You don't need to track all the parents, just the changed files. For each changed file, you know it is a parent because getParentRollback will return NULL. So you just have to walk the set of changed files, check if each is a parent, then delete the children if it is.

I found the logic of which files trigger the change notification confusing because it is split between FileManager::fileChanged and Rollback::internalUpdate. You should clarify this either in comments or code.

In places where you check isFileNameRooted() to determine whether the Rollback is a child, use an explicit member function:

    bool isChildRollback() const { return getParentRollback() != NULL; }

I tested this and it writes backup copies of certificates:

    -rw-r--r--   1 root    nobody   2785 Nov 25 10:18 ssl_multicert.config_2
    -rw-r--r--   1 root    nobody   2785 Nov 25 10:18 ssl_multicert.config
    -rw-------   1 nobody  nobody  12859 Nov 25 10:20 records.config
    -rw-r--r--   1 root    nobody   3278 Nov 25 10:22 example.com.crt_1
    -rw-r--r--   1 root    nobody   3278 Nov 25 10:22 example.com.crt

Do we really want this to happen for certificates? What about for private keys?. This is why I suggested sending flags with the management message so that the rollback can turn this off.

It looks like this change only tracks certificate files. What about all the other files that contribute to the SSL context?

@zizhong
Copy link
Member Author

zizhong commented Dec 2, 2015

@jpeach you are right. Coping SSL related files can cause secure issues. In my opinion, all the child files should not be copied. what do you think handling it in this way?
FileManager::fileChanged is the function that directly triggers notifications. Rollback::internalUpdate backs up files, maintains the correct version and then calls fileChanged .
A new commit has been pushed on your advice. Would you mind taking another look? Thanks a lot!

@jpeach
Copy link
Contributor

jpeach commented Dec 4, 2015

This looks pretty good. in a previous comment I suggested sending a flags field with the child rollback registration. Maybe that could be a way to request that the child not be versioned?

@zizhong
Copy link
Member Author

zizhong commented Dec 14, 2015

@jpeach , the option is added in the last commit as you asked. Please let me know if you have any questions.

cc @bgaff

@bgaff
Copy link
Member

bgaff commented Dec 14, 2015

@jpeach I reviewed this latest commit and it lgtm. If you're cool w/ this please go ahead and land it, otherwise I'm happy to do it in the next day or so.

@zizhong
Copy link
Member Author

zizhong commented Dec 15, 2015

@jpeach Thanks for the useful comments. I've made a new commit on your suggestions. Would you mind having another look?

@asfgit asfgit closed this in 4d2c262 Dec 15, 2015
@zizhong
Copy link
Member Author

zizhong commented Dec 15, 2015

@jpeach , I noticed you have changed the some details of the last commit. Thanks for your help. However, there is a bug that you introduced with the changes.
The line below
if (rb->checkForUserUpdate(rb->isVersioned() ? ROLLBACK_CHECK_ONLY : ROLLBACK_CHECK_AND_UPDATE))
should be
if (rb->checkForUserUpdate(!rb->isVersioned() ? ROLLBACK_CHECK_ONLY : ROLLBACK_CHECK_AND_UPDATE)).

@jpeach
Copy link
Contributor

jpeach commented Dec 15, 2015

Thanks @zizhong

bneradt pushed a commit to bneradt/trafficserver that referenced this pull request Nov 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants