Skip to content

Conversation

@phalodi
Copy link
Contributor

@phalodi phalodi commented Jul 8, 2016

What changes were proposed in this pull request?

Add Asynchronous Actions documentation inside action of programming guide

How was this patch tested?

check the documentation indentation and formatting with md preview.

@phalodi phalodi changed the title Add Asynchronous Actions documentation [SPARK-16438]Add Asynchronous Actions documentation Jul 8, 2016
@phalodi phalodi changed the title [SPARK-16438]Add Asynchronous Actions documentation [SPARK-16438] Add Asynchronous Actions documentation Jul 8, 2016
@phalodi
Copy link
Contributor Author

phalodi commented Jul 8, 2016

@AmplabJenkins I dont know i just create the issue on JIRA and make changes and make pull request for merge , I think it is valid pull request because in real world all application need non blocking execution. So if its is valid then merge it.

@srowen
Copy link
Member

srowen commented Jul 8, 2016

@phalodi it's a bot, not a person.

@srowen
Copy link
Member

srowen commented Jul 8, 2016

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Jul 8, 2016

Test build #61980 has finished for PR 14104 at commit aba737f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 8, 2016

Test build #61979 has finished for PR 14104 at commit aba737f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jul 8, 2016

Can you paste a screenshot of the doc?

@phalodi
Copy link
Contributor Author

phalodi commented Jul 9, 2016

@rxin Below are screen shot of Asynchronous Actions inside actions and screen shot of index of programming guide.

screenshot from 2016-07-09 14-20-54

screenshot from 2016-07-09 14-21-17

</table>

#### Asynchronous Actions
Spark provide asynchronous actions to execute two or more actions concurrently, these actions are execute asynchronously without blocking each other.
Copy link
Member

@srowen srowen Jul 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"without blocking each other": actions never block each other anyway. They block the calling thread.

@srowen
Copy link
Member

srowen commented Jul 10, 2016

Mostly this just duplicates the non-async documentation. I think it's worth mentioning the existence of the async methods with a pointer to the API docs, but, I don't think this table adds value.

@phalodi
Copy link
Contributor Author

phalodi commented Jul 10, 2016

@srowen yeah you are right its not blocking calling thread but they execute sequentially right but in async action its return future so its running on different threads soo not run sequentially soo its fully non blocking example if i have action in that we have some transformation like map n inside map i have blocking code then how its work its block or not?

@phalodi
Copy link
Contributor Author

phalodi commented Jul 10, 2016

@srowen yeah i agree with you its same like non async actions but yeah we should list it because its more useful in real life application when single application running multiple jobs of different users. so we should list it so user will more familiar with it and also know which are the async actions. i think its helpful for users still if you think its not a valid point then just tell me i change pull request and add just reference for async actions but i want user should know because when i first time see it i m surprise that in whole documentation no where mention about async actions.

@srowen
Copy link
Member

srowen commented Jul 10, 2016

I fixed my comment above, sorry. They do block the calling thread, but actions triggered in different threads on one RDD do not block each other. The first sentence isn't correct. My suggestion is to omit the table, not all of the text you added.

@phalodi
Copy link
Contributor Author

phalodi commented Jul 10, 2016

@srowen ok no problem so what you suggest should i remove the table and just add the line below the actions table and give link to scala and java docs? And the line "Spark provide asynchronous actions to execute two or more actions concurrently, these actions are execute asynchronously without blocking each other." is not correct so should i add that to "return future to non-block calling thread?" even in above statement my intention is also talk about calling thread because when we call action it will always in calling thread so calling thread will block but when we use async actions it will not block so we are targeted here non-blocking for calling thread because actions are called in calling thread. so i make changes and push it again as u suggest just add single line at end of actions and give link for scala and java doc api. Its cool?

@phalodi
Copy link
Contributor Author

phalodi commented Jul 10, 2016

@srowen i make changes as you suggest i remove the table and add a single below actions table and also change statement to be more clear about non blocking.

@phalodi
Copy link
Contributor Author

phalodi commented Jul 10, 2016

@srowen @rxin Please review it and give your comments.

screenshot from 2016-07-11 01-08-21

</tr>
</table>

Spark provide asynchronous actions to execute two or more actions concurrently, these actions are execute asynchronously without blocking calling thread. Refer to the RDD API doc for asynchronous actions ([Scala](api/scala/index.html#org.apache.spark.rdd.AsyncRDDActions),[Java](api/java/org/apache/spark/rdd/AsyncRDDActions.html))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think this is accurate. Spark can execute actions concurrently without this API. This merely makes the call non-blocking for the caller. It really adds very little beyond calling the normal API and wrapping in a Future (though it's more useful for the Java API where that's harder). Hence why I thought it was unofficially deprecated.

@phalodi
Copy link
Contributor Author

phalodi commented Jul 11, 2016

@srowen so what do you think we should not add this api reference in document?

@phalodi
Copy link
Contributor Author

phalodi commented Jul 11, 2016

@srowen i know its nothing major just call normal functions in future but still naive user first time who learn scala and spark dont know what are future and all so at least we should add reference of it. if you suggest i simple add reference link without adding any statement of blocking and non blocking. so atleast user know there is some api like this.

@srowen
Copy link
Member

srowen commented Jul 11, 2016

Although I don't think it's essential to mention this API (it's in the API doc) it wouldn't hurt to point to it here. The mention should be accurate though. I think it's correct to say that the Spark RDD API also exposes asynchronous versions of some actions like foreach, which return a handle immediately to the caller which can be used to wait for its completion.

@phalodi
Copy link
Contributor Author

phalodi commented Jul 11, 2016

@srowen sure i will make appropriate changes and push it again

@phalodi
Copy link
Contributor Author

phalodi commented Jul 11, 2016

@srowen now its perfect i think at last from table to one line but yeah you are right it good to just mention that spark provide it and not duplicate table of actions.

@phalodi
Copy link
Contributor Author

phalodi commented Jul 11, 2016

@srowen @rxin
screenshot from 2016-07-11 16-51-53

</tr>
</table>

Spark RDD API also exposes asynchronous versions of some actions like foreach, collect, count etc. which return a handle immediately to the caller which can be used to wait for its completion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few edits to suggest so let me just propose some text directly:

The Spark RDD API also exposes asynchronous versions of some actions, like foreachAsync for foreach, which immediately return a FutureAction to the caller instead of blocking on completion of the action. This can be used to manage or wait for the asynchronous execution of the action.

@phalodi
Copy link
Contributor Author

phalodi commented Jul 11, 2016

@srowen Thanks man i know i am not giving correct language but my intention is also same i hope its final commit :) if its done can you merge it.

</tr>
</table>

The Spark RDD API also exposes asynchronous versions of some actions, like foreachAsync for foreach, which immediately return a FutureAction to the caller instead of blocking on completion of the action. This can be used to manage or wait for the asynchronous execution of the action.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nit pick: back-tick-quote the code nouns like I did.

@phalodi
Copy link
Contributor Author

phalodi commented Jul 11, 2016

@srowen I put back-tick-quote for foreach, foreachAsync and FutureAction.

@phalodi
Copy link
Contributor Author

phalodi commented Jul 11, 2016

@srowen Below is final screen shot as you suggested i hope its looks good please review it and merge it.
screenshot from 2016-07-11 17-45-24

@phalodi
Copy link
Contributor Author

phalodi commented Jul 12, 2016

@srowen Is this correct?? if it is then please merge it otherwise give comment if needed.

@srowen
Copy link
Member

srowen commented Jul 12, 2016

@phalodi no need to ping. We don't merge things immediately, especially when non-essential. It gives more time for review.

@srowen
Copy link
Member

srowen commented Jul 13, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Jul 13, 2016

Test build #62231 has finished for PR 14104 at commit 845a123.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in bf107f1 Jul 13, 2016
asfgit pushed a commit that referenced this pull request Jul 13, 2016
## What changes were proposed in this pull request?

Add Asynchronous Actions documentation inside action of programming guide

## How was this patch tested?

check the documentation indentation and formatting with md preview.

Author: sandy <phalodi@gmail.com>

Closes #14104 from phalodi/SPARK-16438.

(cherry picked from commit bf107f1)
Signed-off-by: Sean Owen <sowen@cloudera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants