-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DML samples for BigQuery. #546
Conversation
This generates a random MySQL database (tables defined in create_sample_db.sql) with random user actions. The reason for putting these into a SQL database instead of directly into BigQuery is that it will be used to show export form a SQL database into BigQuery. Hopefully the limits on query sizes for BigQuery are large enough that this works for larger databases. Change-Id: I446c3af72dab60d9ed79a2c814a68f05801ae17b
This tests the SQL code to create the tables, as well as the Python code that creates the rows. Uses SQLAlchemy to abstract away differences between database engines. Change-Id: Id9e70eef56f5e203921b6c1f21708631f3a767f7
This sample reads a SQL file (for example: one that was output from mysqldump) and executes each line as a query. At least in my configuration of mysqldump, each insert statement was on a single line, so I was able to write data from MySQL to BigQuery with this sample. Change-Id: Id14b648b0ce6bac651e436d402f480c56d80bd37
Also, adds a few explanatory comments for the docs. Change-Id: I623bf226839ab43f8da8297938223323a04e5838
Change-Id: Ie0d12ac9aedfdd83d2f6f533ad30265f75126e4f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have said this in my original review, but the populate_db.py script is a really complex sample. I'm curious as to the reasoning behind choosing such a complex sample. Is there a much simpler sample that could teach the same thing?
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
"""Sample to run line-separated SQL statements in Big Query from a file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awkward phrasing, maybe: Sample that runs a file containing line-separated SQL statements in Big Query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. (I realized this sample was a bit more complicated than it needed to be, too. I changed it to "Sample that runs a file containing INSERT SQL statements in Big Query." and modified the loop to look for lines that start with INSERT (to match the command-line sample I wrote for the docs)
from __future__ import print_function | ||
|
||
import argparse | ||
# [START insert_sql] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either include all imports, or include none of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. (Included all)
query.run() | ||
return | ||
except exceptions.GCloudError as err: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This newline isn't needed (blank newline is recommended between expressions and statements, but not between two statements)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, good to know. I deleted this function entirely based on your previous feedback. (I made some command-line samples to do the same thing and it really does complicate it a lot to add retries. It doesn't seem to be helping for most errors, anyway)
parser = argparse.ArgumentParser( | ||
description=__doc__, | ||
formatter_class=argparse.RawDescriptionHelpFormatter) | ||
parser.add_argument('--project', help='Google Cloud project name') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use positional command-line args for required items, use flags for items with sensible defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
session = create_session(engine) | ||
|
||
try: | ||
populate_db(session, total_users=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do like < 10 for the sake of speed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Removes unnecessary UserActions table from populate_db.py. Removes retry logic and changes sample to only look for INSERT lines in insert_sql.py Change-Id: If8994c420cd95babf3c4673a3b87affbfca4f32a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified the populate_db.py script. You are right that it was too complicated. The UserActions table was unnecessary for the docs I have written.
from __future__ import print_function | ||
|
||
import argparse | ||
# [START insert_sql] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. (Included all)
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
"""Sample to run line-separated SQL statements in Big Query from a file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. (I realized this sample was a bit more complicated than it needed to be, too. I changed it to "Sample that runs a file containing INSERT SQL statements in Big Query." and modified the loop to look for lines that start with INSERT (to match the command-line sample I wrote for the docs)
query.run() | ||
return | ||
except exceptions.GCloudError as err: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, good to know. I deleted this function entirely based on your previous feedback. (I made some command-line samples to do the same thing and it really does complicate it a lot to add retries. It doesn't seem to be helping for most errors, anyway)
parser = argparse.ArgumentParser( | ||
description=__doc__, | ||
formatter_class=argparse.RawDescriptionHelpFormatter) | ||
parser.add_argument('--project', help='Google Cloud project name') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
session = create_session(engine) | ||
|
||
try: | ||
populate_db(session, total_users=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* chore(deps): update all dependencies * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * revert Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Anthonios Partheniou <partheniou@google.com>
* chore(deps): update all dependencies * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * revert Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Anthonios Partheniou <partheniou@google.com>
* chore(deps): update all dependencies * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * revert Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Anthonios Partheniou <partheniou@google.com>
No description provided.