|
34 | 34 | "source": [
|
35 | 35 | "# Set up your environment\n",
|
36 | 36 | "\n",
|
37 |
| - "Please refer to the notebooks in the `getting_started` folder for instructions on setting up your environment. Once your environment is ready, run the following code to import the necessary packages for working with BigFrames arrays:" |
| 37 | + "To get started, follow the instructions in the notebooks within the `getting_started` folder to set up your environment. Once your environment is ready, you can import the necessary packages by running the following code:" |
38 | 38 | ]
|
39 | 39 | },
|
40 | 40 | {
|
41 | 41 | "cell_type": "code",
|
42 |
| - "execution_count": 17, |
| 42 | + "execution_count": 2, |
43 | 43 | "metadata": {},
|
44 | 44 | "outputs": [],
|
45 | 45 | "source": [
|
|
50 | 50 | },
|
51 | 51 | {
|
52 | 52 | "cell_type": "code",
|
53 |
| - "execution_count": 18, |
| 53 | + "execution_count": 3, |
54 | 54 | "metadata": {},
|
55 | 55 | "outputs": [],
|
56 | 56 | "source": [
|
57 | 57 | "REGION = \"US\" # @param {type: \"string\"}\n",
|
| 58 | + "\n", |
58 | 59 | "bpd.options.display.progress_bar = None\n",
|
59 |
| - "bpd.options.bigquery.location = REGION\n" |
| 60 | + "bpd.options.bigquery.location = REGION" |
60 | 61 | ]
|
61 | 62 | },
|
62 | 63 | {
|
|
65 | 66 | "source": [
|
66 | 67 | "# Array Data Types\n",
|
67 | 68 | "\n",
|
68 |
| - "In BigQuery, an [array](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_type), also referred to as a `repeated` column, is an ordered list of zero or more non-array elements. These elements must be of the same data type, and arrays cannot contain other arrays. Furthermore, query results cannot include arrays with `NULL` elements.\n", |
| 69 | + "In BigQuery, an [array](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_type) (also called a repeated column) is an ordered list of zero or more elements of the same data type. Arrays cannot contain other arrays or `NULL` elements.\n", |
69 | 70 | "\n",
|
70 |
| - "BigFrames DataFrames, inheriting these properties, map BigQuery array types to `pandas.ArrowDtype(pa.list_())`. This section provides code examples demonstrating how to effectively work with array columns within BigFrames DataFrames." |
| 71 | + "BigQuery DataFrames map BigQuery array types to `pandas.ArrowDtype(pa.list_())`. The following code examples illustrate how to work with array columns in BigQuery DataFrames." |
71 | 72 | ]
|
72 | 73 | },
|
73 | 74 | {
|
74 | 75 | "cell_type": "markdown",
|
75 | 76 | "metadata": {},
|
76 | 77 | "source": [
|
77 |
| - "## Create DataFrames with array columns \n", |
| 78 | + "## Create DataFrames with array columns\n", |
78 | 79 | "\n",
|
79 |
| - "Let's create a sample BigFrames DataFrame where the `Scores` column holds array data of type `list<int64>[pyarrow]`:" |
| 80 | + "Create a DataFrame in BigQuery DataFrames from local sample data. Use a list of lists to create a column with the `list<int64>[pyarrow]` dtype, which corresponds to the `ARRAY<INT64>` type in BigQuery." |
80 | 81 | ]
|
81 | 82 | },
|
82 | 83 | {
|
|
178 | 179 | "cell_type": "markdown",
|
179 | 180 | "metadata": {},
|
180 | 181 | "source": [
|
181 |
| - "## CRUD operations for array data\n", |
182 |
| - "\n", |
183 |
| - "While Pandas offers vectorized operations and lambda expressions to manipulate array data, BigFrames leverages BigQuery's computational power. BigFrames introduces the [`bigframes.bigquery`](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.bigquery) package to provide access to a variety of native BigQuery array operations, such as [array_agg](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.bigquery#bigframes_bigquery_array_agg), [array_length](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.bigquery#bigframes_bigquery_array_length), and others. This module allows you to seamlessly perform create, read, update, and delete (CRUD) operations on array data within your BigFrames DataFrames.\n", |
| 182 | + "## Operate on array data\n", |
184 | 183 | "\n",
|
185 |
| - "Let's delve into how you can utilize these functions to effectively manipulate array data in BigFrames." |
| 184 | + "While pandas offers vectorized operations and lambda expressions for array manipulation, BigQuery DataFrames leverages the computational power of BigQuery itself. You can access a variety of native BigQuery array operations, such as [`array_agg`](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.bigquery#bigframes_bigquery_array_agg) and [`array_length`](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.bigquery#bigframes_bigquery_array_length), through the [`bigframes.bigquery`](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.bigquery) package (abbreviated as `bbq` in the following code samples)." |
186 | 185 | ]
|
187 | 186 | },
|
188 | 187 | {
|
|
205 | 204 | }
|
206 | 205 | ],
|
207 | 206 | "source": [
|
208 |
| - "# Find the length in each array\n", |
| 207 | + "# Find the length in each array.\n", |
209 | 208 | "bbq.array_length(df['Scores'])"
|
210 | 209 | ]
|
211 | 210 | },
|
|
235 | 234 | }
|
236 | 235 | ],
|
237 | 236 | "source": [
|
238 |
| - "# Explode array elements into rows\n", |
| 237 | + "# Transforms array elements into individual rows, preserving original order when in ordering\n", |
| 238 | + "# mode. If an array has multiple elements, exploded rows are ordered by the element's index\n", |
| 239 | + "# within its original array.\n", |
239 | 240 | "scores = df['Scores'].explode()\n",
|
240 | 241 | "scores"
|
241 | 242 | ]
|
|
248 | 249 | {
|
249 | 250 | "data": {
|
250 | 251 | "text/plain": [
|
251 |
| - "0 95.238095\n", |
252 |
| - "0 88.571429\n", |
253 |
| - "0 92.380952\n", |
254 |
| - "1 79.047619\n", |
255 |
| - "1 81.904762\n", |
256 |
| - "2 82.857143\n", |
257 |
| - "2 89.52381\n", |
258 |
| - "2 94.285714\n", |
259 |
| - "2 100.0\n", |
| 252 | + "0 100.0\n", |
| 253 | + "0 93.0\n", |
| 254 | + "0 97.0\n", |
| 255 | + "1 83.0\n", |
| 256 | + "1 86.0\n", |
| 257 | + "2 87.0\n", |
| 258 | + "2 94.0\n", |
| 259 | + "2 99.0\n", |
| 260 | + "2 105.0\n", |
260 | 261 | "Name: Scores, dtype: Float64"
|
261 | 262 | ]
|
262 | 263 | },
|
|
266 | 267 | }
|
267 | 268 | ],
|
268 | 269 | "source": [
|
269 |
| - "# Adjust the scores\n", |
270 |
| - "adj_scores = (scores + 5) / 105.0 * 100.0\n", |
| 270 | + "# Adjust the scores.\n", |
| 271 | + "adj_scores = scores + 5.0\n", |
271 | 272 | "adj_scores"
|
272 | 273 | ]
|
273 | 274 | },
|
|
279 | 280 | {
|
280 | 281 | "data": {
|
281 | 282 | "text/plain": [
|
282 |
| - "0 [95.23809524 88.57142857 92.38095238]\n", |
283 |
| - "1 [79.04761905 81.9047619 ]\n", |
284 |
| - "2 [ 82.85714286 89.52380952 94.28571429 100. ...\n", |
| 283 | + "0 [100. 93. 97.]\n", |
| 284 | + "1 [83. 86.]\n", |
| 285 | + "2 [ 87. 94. 99. 105.]\n", |
285 | 286 | "Name: Scores, dtype: list<item: double>[pyarrow]"
|
286 | 287 | ]
|
287 | 288 | },
|
|
291 | 292 | }
|
292 | 293 | ],
|
293 | 294 | "source": [
|
294 |
| - "# Aggregate adjusted scores back into arrays\n", |
| 295 | + "# Aggregate adjusted scores back into arrays.\n", |
295 | 296 | "adj_scores_arr = bbq.array_agg(adj_scores.groupby(level=0))\n",
|
296 | 297 | "adj_scores_arr"
|
297 | 298 | ]
|
|
332 | 333 | " <th>0</th>\n",
|
333 | 334 | " <td>Alice</td>\n",
|
334 | 335 | " <td>[95 88 92]</td>\n",
|
335 |
| - " <td>[95.23809524 88.57142857 92.38095238]</td>\n", |
| 336 | + " <td>[100. 93. 97.]</td>\n", |
336 | 337 | " </tr>\n",
|
337 | 338 | " <tr>\n",
|
338 | 339 | " <th>1</th>\n",
|
339 | 340 | " <td>Bob</td>\n",
|
340 | 341 | " <td>[78 81]</td>\n",
|
341 |
| - " <td>[79.04761905 81.9047619 ]</td>\n", |
| 342 | + " <td>[83. 86.]</td>\n", |
342 | 343 | " </tr>\n",
|
343 | 344 | " <tr>\n",
|
344 | 345 | " <th>2</th>\n",
|
345 | 346 | " <td>Charlie</td>\n",
|
346 | 347 | " <td>[ 82 89 94 100]</td>\n",
|
347 |
| - " <td>[ 82.85714286 89.52380952 94.28571429 100. ...</td>\n", |
| 348 | + " <td>[ 87. 94. 99. 105.]</td>\n", |
348 | 349 | " </tr>\n",
|
349 | 350 | " </tbody>\n",
|
350 | 351 | "</table>\n",
|
351 | 352 | "<p>3 rows × 3 columns</p>\n",
|
352 | 353 | "</div>[3 rows x 3 columns in total]"
|
353 | 354 | ],
|
354 | 355 | "text/plain": [
|
355 |
| - " Name Scores \\\n", |
356 |
| - "0 Alice [95 88 92] \n", |
357 |
| - "1 Bob [78 81] \n", |
358 |
| - "2 Charlie [ 82 89 94 100] \n", |
359 |
| - "\n", |
360 |
| - " NewScores \n", |
361 |
| - "0 [95.23809524 88.57142857 92.38095238] \n", |
362 |
| - "1 [79.04761905 81.9047619 ] \n", |
363 |
| - "2 [ 82.85714286 89.52380952 94.28571429 100. ... \n", |
| 356 | + " Name Scores NewScores\n", |
| 357 | + "0 Alice [95 88 92] [100. 93. 97.]\n", |
| 358 | + "1 Bob [78 81] [83. 86.]\n", |
| 359 | + "2 Charlie [ 82 89 94 100] [ 87. 94. 99. 105.]\n", |
364 | 360 | "\n",
|
365 | 361 | "[3 rows x 3 columns]"
|
366 | 362 | ]
|
|
371 | 367 | }
|
372 | 368 | ],
|
373 | 369 | "source": [
|
374 |
| - "# Incorporate adjusted scores into the DataFrame\n", |
| 370 | + "# Add adjusted scores into the DataFrame. This operation requires an implicit join \n", |
| 371 | + "# between the two tables, necessitating a unique index in the DataFrame (guaranteed \n", |
| 372 | + "# in the default ordering and index mode).\n", |
375 | 373 | "df['NewScores'] = adj_scores_arr\n",
|
376 | 374 | "df"
|
377 | 375 | ]
|
|
382 | 380 | "source": [
|
383 | 381 | "# Struct Data Types\n",
|
384 | 382 | "\n",
|
385 |
| - "In BigQuery, a [struct](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#struct_type) (also known as a `record`) is a collection of ordered fields, each with a defined data type (required) and an optional field name. BigFrames maps BigQuery struct types to the Pandas equivalent, `pandas.ArrowDtype(pa.struct())`. In this section, we'll explore practical code examples illustrating how to work with struct columns within your BigFrames DataFrames." |
| 383 | + "In BigQuery, a [struct](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#struct_type) (also known as a `record`) is a collection of ordered fields, each with a defined data type (required) and an optional field name. BigQuery DataFrames maps BigQuery struct types to the pandas equivalent, `pandas.ArrowDtype(pa.struct())`. This section provides practical code examples illustrating how to use struct columns with BigQuery DataFrames." |
386 | 384 | ]
|
387 | 385 | },
|
388 | 386 | {
|
|
391 | 389 | "source": [
|
392 | 390 | "## Create DataFrames with struct columns \n",
|
393 | 391 | "\n",
|
394 |
| - "Let's create a sample BigFrames DataFrame where the `Address` column holds struct data of type `struct<City: string, State: string>[pyarrow]`:" |
| 392 | + "Create a DataFrame with an `Address` struct column by using dictionaries for the data and setting the dtype to `struct<City: string, State: string>[pyarrow]`." |
395 | 393 | ]
|
396 | 394 | },
|
397 | 395 | {
|
|
403 | 401 | "name": "stderr",
|
404 | 402 | "output_type": "stream",
|
405 | 403 | "text": [
|
406 |
| - "/usr/local/google/home/chelsealin/src/bigframes2/venv/lib/python3.12/site-packages/google/cloud/bigquery/_pandas_helpers.py:537: UserWarning: Pyarrow could not determine the type of columns: bigframes_unnamed_index.\n", |
| 404 | + "/usr/local/google/home/chelsealin/src/bigframes/venv/lib/python3.12/site-packages/google/cloud/bigquery/_pandas_helpers.py:570: UserWarning: Pyarrow could not determine the type of columns: bigframes_unnamed_index.\n", |
407 | 405 | " warnings.warn(\n"
|
408 | 406 | ]
|
409 | 407 | },
|
|
509 | 507 | "cell_type": "markdown",
|
510 | 508 | "metadata": {},
|
511 | 509 | "source": [
|
512 |
| - "## CRUD operations for struct data\n", |
| 510 | + "## Operate on struct data\n", |
513 | 511 | "\n",
|
514 |
| - "Similar to Pandas, BigFrames provides a [`StructAccessor`](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.operations.structs.StructAccessor) to streamline the manipulation of struct data. Let's explore how you can utilize this feature for efficient CRUD operations on your nested struct columns." |
| 512 | + "Similar to pandas, BigQuery DataFrames provides a [`StructAccessor`](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.operations.structs.StructAccessor). Use the methods provided in this accessor to manipulate struct data." |
515 | 513 | ]
|
516 | 514 | },
|
517 | 515 | {
|
|
0 commit comments