Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#213: Added parallelism parameter examples to user guide #218

Merged
merged 11 commits into from
Sep 29, 2022

Conversation

morazow
Copy link
Contributor

@morazow morazow commented Sep 28, 2022

Fixes #213

doc/user_guide/user_guide.md Outdated Show resolved Hide resolved
doc/user_guide/user_guide.md Outdated Show resolved Hide resolved
doc/user_guide/user_guide.md Outdated Show resolved Hide resolved
Co-authored-by: Christoph Pirkl <christoph.pirkl@exasol.com>
allipatev
allipatev previously approved these changes Sep 29, 2022
Copy link
Contributor

@allipatev allipatev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing overview @morazow!
I left few small comments.

doc/user_guide/user_guide.md Outdated Show resolved Hide resolved

In the import statement, we are importing data from many files. Using the user
provider parallelism number, we distribute these files into that many importer
processes. For example, simply by taking modulo of file hash by parallelism
Copy link
Contributor

@allipatev allipatev Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo?
I understand that this might be a simplification for better understanding.
But we already mentioned round robin above.
Or there is no conflict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I removed the sentence. It may confuse, and does not add any new information.

doc/user_guide/user_guide.md Outdated Show resolved Hide resolved
For example, to increase the exporter processes four times, set it as below:

```sql
PARALLELISM = 'iproc(), floor(random()*4)'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I once discussed with Torsten, that

`iproc(),mod(rownum,4)'

should also work but have somewhat better performance (less calculation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also great option, I am changing to this as suggested less calculation indeed.

```

This will set the maximum number of parallel processes to `64` and each process
will have around `6 GiB (376 GiB / 64)` of RAM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert in memory model.
Are we sure that all RAM will be used by UDF?
There is SQL process heap, memory for data blocks, OS memory ...
How about a more flexible formulation here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about?

This will set the maximum number of parallel processes to `64`. Additionally,
there is enough RAM `6 GiB (376 GiB / 64)` to use for importer/exporter and
other SQL processes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just
and each process will have up to 6 GiB (376 GiB / 64) of RAM.
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@morazow morazow enabled auto-merge (squash) September 29, 2022 08:05
@morazow morazow merged commit de13878 into main Sep 29, 2022
@morazow morazow deleted the doc/#213-add-parallelism-examples branch September 29, 2022 08:05
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallelism examples
3 participants