-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#213: Added parallelism parameter examples to user guide #218
Conversation
Co-authored-by: Christoph Pirkl <christoph.pirkl@exasol.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing overview @morazow!
I left few small comments.
doc/user_guide/user_guide.md
Outdated
|
||
In the import statement, we are importing data from many files. Using the user | ||
provider parallelism number, we distribute these files into that many importer | ||
processes. For example, simply by taking modulo of file hash by parallelism |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modulo?
I understand that this might be a simplification for better understanding.
But we already mentioned round robin above.
Or there is no conflict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I removed the sentence. It may confuse, and does not add any new information.
doc/user_guide/user_guide.md
Outdated
For example, to increase the exporter processes four times, set it as below: | ||
|
||
```sql | ||
PARALLELISM = 'iproc(), floor(random()*4)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I once discussed with Torsten, that
`iproc(),mod(rownum,4)'
should also work but have somewhat better performance (less calculation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also great option, I am changing to this as suggested less calculation indeed.
doc/user_guide/user_guide.md
Outdated
``` | ||
|
||
This will set the maximum number of parallel processes to `64` and each process | ||
will have around `6 GiB (376 GiB / 64)` of RAM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert in memory model.
Are we sure that all RAM will be used by UDF?
There is SQL process heap, memory for data blocks, OS memory ...
How about a more flexible formulation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about?
This will set the maximum number of parallel processes to `64`. Additionally,
there is enough RAM `6 GiB (376 GiB / 64)` to use for importer/exporter and
other SQL processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just
and each process will have up to 6 GiB (376 GiB / 64) of RAM.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Kudos, SonarCloud Quality Gate passed! |
Fixes #213