Skip to content

Conversation

@cchepelov
Copy link
Contributor

A quick-n-dirty attempt at using Cascading3 in scalding, at the source level.

At this point, this will choke quickly based on a few API that either disappeared or moved within Cascading; specifically, FlowStep#getSources and FlowStep#getGraph.

Really only a quick RFC at this point, will need to investigate how to best to support the "missing" APIs
(suspecting this is really a matter of adapting to the new API made just to help building ExecutionContext.getDesc in a nicer way)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really correct for scalding, the comment mentions the method is realted to the associated operations which wires down to https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/Operations.scala#L126 which uses the same key as cascading. Not sure we shouldn't just be keeping the method/key as it was just defining the constant in scalding instead

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to what @ianoc said. I'd rather us take the max value of the two keys in Operations.scala, and here set the non-deprecated internally, but not deprecate our API (instead explain what it means to the user, not necessarily in cascading terms).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call the capacity variant?

@cchepelov
Copy link
Contributor Author

Thanks for the prompt feedback, @ianoc and @johnynek ! Will address the various points now, then proceed with the rest

@cchepelov
Copy link
Contributor Author

… Right now, busy updating quite a few dependencies to Cascading3 (cascading-parquet, cascading-avro, elephant-bird-cascading3, etc.
To be continued!

@cchepelov
Copy link
Contributor Author

@ianoc @johnynek Done for now. Trouble is, this requires that code from a few dependencies' pull requests be available at compile time, and I'm not quite sure how best to convince Travis to do this:

For that reason, the code currently fails to run on Travis.

@cchepelov
Copy link
Contributor Author

Now working off binaries dumped onto conjars.org for patched dependencies.

There is an outstanding problem at test stage: some Test Jobs seem to deadlock, apparently within the Cascading test driver. Investigating this week (hopefully). Already tried to run the tests single core, no behaviour change.

@cchepelov
Copy link
Contributor Author

synced with 0.15.1-RC1+ at 55f03f7

@cchepelov
Copy link
Contributor Author

The deadlock in SkewJoinPipeTest should hopefully be fixed by cwensel/cascading#47 (unless that one breaks more things elsewhere ☺).

@cchepelov
Copy link
Contributor Author

Update: some tests fixed with respect to .WithDescription("hi there")
(9896ed9 will not build on Travis for now, awaiting for successful integration of cwensel/cascading#47 )

cc @simondumas

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reasons for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, oversight. reverting.

@cchepelov cchepelov force-pushed the try_cascading3 branch 7 times, most recently from fc006e0 to d33b76d Compare January 21, 2016 20:20
Cyrille Chépélov (TP12) and others added 25 commits January 26, 2016 14:55
dfs-datastores' semantics have shifted since 1.3.4, and now it is of the consumer's responsibility to create
the empty version directory which will contain the version's data.
In this whitebox test, nobody was creating the empty directory anymore, so the test failed.

Added a "hint" and an explanation (not sure this tests much now, but keeps the original dance mostly in)
	+ cancel accidental downgrade of scala 2.10
@johnynek
Copy link
Collaborator

I think #1586 may actually be getting close to merge... I'm closing this in favor of that.

If that is not right. Reopen. Thanks a ton @cchepelov for kicking this all off. I'm sorry it has taken so long.

@johnynek johnynek closed this Jan 25, 2018
@cchepelov
Copy link
Contributor Author

cchepelov commented Jan 25, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants