-
Notifications
You must be signed in to change notification settings - Fork 82
Packaging refactor #38
Comments
That'd be great!
|
Awesome! Any chance I could twist your arm to schedule a rebuild of the parsers? |
The current parser (2015.2.19) works against master.
|
Ohhh, interesting, ok I'll try it again, thanks |
I'd prefer anything with "Train" or "Pipeline" be left in core, along with Actually, ideally, I'd rather set it up in such a way that all of the main That seems like the best of all worlds? On Sat, Jul 25, 2015 at 1:11 PM, Brian Topping notifications@github.com
|
Ok, I'm pretty far along with this already, so let me take a look at it. I think you'd be surprised that the workflow is not disrupted by these kinds of changes, if you are importing via sbt, you'll get both projects as a transitive. If you are working in the IDE, you can still just right-click on the main method and run it, same results. Both projects will load into the IDE transparently. Maybe you could try it and give specific feedback? |
I mostly run as part of an assembly jar. So as long as there's a In addition, the train and pipeline main methods being in the core jar is Thanks! -- David On Sat, Jul 25, 2015 at 2:22 PM, Brian Topping notifications@github.com
|
Ok thanks for the feedback. Part of decomposing modules into a DAG of modules means users who are trying to work with the code can get a faster map of what they are doing and where to focus their energy (it's a lot easier to dismiss whole swaths of code inaccessible to a given module). That said, it's never a good idea to create multiple modules from code that is always used together. Developers have to make arbitrary choices where code should go and it becomes a PITA. I don't imagine generating models in production but building them in CI and deploying them wholesale in an upgrade process. Are you training in production? I'm starting to believe "tools" should be called "training", since that's what it seems most of the command line tools are oriented toward. But if all the training functionality needs to be a part of core... ? |
Okay, been nonstop on this, was useful to learn the code a bit more. Generated three separate trees, the last was as you suggested just kind of punted on the refactor. For the second one, modules were broken out as There's some issues that are common for a code base like this, for instance The other option is API changes, but that's usually not well received without discussion, if even then. In any event the PR that's generated has the build bits that will make it easier to start extracting modules when there's a better idea where to go with this. |
awesome thanks! I'll look now. I'd be curious to hear your recommendations about API refactoring. It's important to me that "core" always have the training code and everything needed for it. It helps researchers (esp. my labmates) get up and running quickly. And, especially because it doesn't add too much dependency weight, there's relatively little harm. Tika is a beast and it definitely needs to split out as much as possible. |
Hey sorry for the delay to get back to you. This week has been a blur. Totally understand what you're saying about the dependencies. Just curious though, is your crew using a build tool that deals with transitive dependencies well? If so, there shouldn't ever be a problem that a user needs to include a dependency in their build except one they have direct client linkages to. The build tool will work out the classpath such that the indirect dependencies are included automatically. In exchange for needing to know which library to link to initially, there's some positive tradeoffs:
As an occasional project owner, the reason I like modules is it keeps developers from making weird dependencies that are hard to unwind. It might be very useful that a debugging library knows how to print a tree (even though the tree printer imports a quarter of the universe), but it's far better if that's left to the deployer. It's the same problem as Tika dependencies in microcosm. I totally appreciate such a refactor is much easier to say "yes" to after it's done, but I can almost assure you that you'll choke on your lunch if you saw one of these PRs. It's just really a leap of faith to approve it and a bigger deal for everyone to update their builds. But if you plan on continuing to improve the library (and I see that with the recent Neural commits), the sooner you start on this path, the better. What can I do to help? |
hey, first, publish-signed went away and it doesn't ask for my p/w for my key anymore. what can i do? |
Oof, just got this. let me get on it. How much time do I have?
|
I mean, you replied in 10 min, that's pretty good. :) I just figured it out On Wed, Jul 29, 2015 at 8:09 PM, Brian Topping notifications@github.com
|
(I just added the sbt-pgp plugin again) On Wed, Jul 29, 2015 at 8:10 PM, David Hall david.lw.hall@gmail.com wrote:
|
Whew! I was away from the computer for a while.
In the future, if you have an iPhone, you can reach me via messages on my email address...
|
i'm one of those pesky android people :) On Wed, Jul 29, 2015 at 8:12 PM, Brian Topping notifications@github.com
|
emailed...
|
Should the epic-parser-en-span_2.11.2015.7.29-SNAPSHOT work with master branch? I'm getting the following exception:
|
Meh, my bad. |
you're welcome! |
On Wed, Jul 29, 2015 at 12:09 PM, Brian Topping notifications@github.com
Except they need to develop Epic, and so it's useful if everything they
I go back and forth between splitting and merging. If you split too much
Clients can know that they just need to look at the classes in either -- David
|
Hiyas, I'd like to create a PR for packaging and wanted to see if it would be accepted before spending time on it. I'm still stuck on e0238ce given epic-parser-en-span_2.11/2015.2.19 being incompatible with anything newer, so I could just as easily make the changes I need locally and remain forked.
What I'm after at the minimum is to create two modules, one for "core" and one for "tools". Goal here is to get Tika out of the core dependencies, used only in epic.preprocess.TextExtractor, which is really a command-line tool. I don't know how many other tools there are like this or what other effects it might have on the dependency closure, but I think it will be significant.
The reason I am even doing that is Tika depends on Apache POI and a kitchen sink of other detritus. POI has a split-package problem. Once everything is cleaned up, I should at least be able to make the core module into an OSGi bundle.
The text was updated successfully, but these errors were encountered: