-
Notifications
You must be signed in to change notification settings - Fork 15
/
spider2.jsonl
632 lines (632 loc) · 175 KB
/
spider2.jsonl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
{"instance_id": "postgres_chinook001", "instruction": "How can we consolidate invoice data with customer and date details for comprehensive reporting, and how can we aggregate playlist tracks with their metadata for analysis? After data transformation, please convert the target tables into CSV files as the final answer.", "type": "Postgres"}
{"instance_id": "postgres_shopify001", "instruction": "comprehensively analyze Shopify orders and transactions, incorporating incremental data processing, order adjustments, refunds, discounts, fulfillment details, and classifying customers as new or repeat. After data transformation, please convert the target tables into CSV files as the final answer.", "type": "Postgres"}
{"instance_id": "postgres_airport001", "instruction": " calculate the distances between Malaysian airports and summarize their arrival flight counts. After data transformation, please convert the target tables into CSV files as the final answer.", "type": "Postgres"}
{"instance_id": "postgres_playbook001", "instruction": "How can we calculate customer attribution using various models to assign revenue from customer conversions to different touchpoints, and then aggregate this data with ad spend information to determine key marketing metrics such as Cost Per Acquisition (CPA) and Return on Advertising Spend (ROAS)? After data transformation, please convert the target tables into CSV files as the final answer.", "type": "Postgres"}
{"instance_id": "postgres_mrr001", "instruction": "How can we analyze customer subscription data to calculate the Monthly Recurring Revenue (MRR) for each customer by month\u2014including changes due to upgrades, downgrades, churns, and reactivations\u2014identify the churn month for customers by adding records for the months after their last active month with zero MRR, and generate monthly records for each customer between their first and last active months to track customer activity and MRR over time? After data transformation, please convert the target tables into CSV files as the final answer.", "type": "Postgres"}
{"instance_id": "postgres_tpch001", "instruction": "How can we identify the top 10 returned parts per month by suppliers, analyze low-cost brass suppliers in Europe including their supply costs, and specifically extract details of such suppliers in the United Kingdom? After data transformation, please convert the target tables into CSV files as the final answer.", "type": "Postgres"}
{"instance_id": "postgres_airbnb001", "instruction": "How can we aggregate user reviews by sentiment on a daily and monthly basis, including month-over-month comparisons, and combine listings with host information to generate unique keys for tracking the latest updates to both listings and hosts? After data transformation, please convert the target tables into CSV files as the final answer.", "type": "Postgres"}
{"instance_id": "postgres_maturity001", "instruction": "How can we create dimensional models for doctors and patients, and a fact model for billed patient claims, by joining claim information with diagnoses and charge amounts? After data transformation, please convert the target tables into CSV files as the final answer.", "type": "Postgres"}
{"instance_id": "postgres_tickit001", "instruction": "analyze sales data to identify the top-grossing events, top sellers, and aggregate sales by category, providing detailed metrics such as total transactions, total tickets sold, total sales, total commissions, total earnings, and average ticket sale prices for each group? After data transformation, please convert the target tables into CSV files as the final answer.", "type": "Postgres"}
{"instance_id": "postgres_google_ads001", "instruction": "Generate daily performance reports for Google Ads at the account, ad group, and ad levels. After data transformation, please convert the target tables into CSV files as the final answer.", "type": "Postgres"}
{"instance_id": "ch001", "instruction": "Please help me identify the year with the highest average property prices for the entire UK and for London specifically, as well as finding the town and district with the highest average property prices and at least 100 transactions since 2020.", "type": "Clickhouse"}
{"instance_id": "ch006", "instruction": "Determine the daily trend (increase, decrease, or no change) and percentage change in new confirmed COVID-19 cases for the location identified by 'US_DC'.", "type": "Clickhouse"}
{"instance_id": "ch009", "instruction": "Plese calculate and list the percentage of flights with departure delays greater than 10 minutes for each airline carrier, first for the year 2004 and then for the combined years 2004 and 2005, ordered by the highest delay percentage.", "type": "Clickhouse"}
{"instance_id": "ch010", "instruction": "Help me first calculate the yearly percentage of flights with departure delays greater than 10 minutes over all years and then identify airlines, between 1990 and 2009, excluding weekends and non-continental US flights, with more than 100,000 flights, ranking them by the rate of flights delayed over 30 minutes, listing up to 1000 airlines with the highest delay rates.", "type": "Clickhouse"}
{"instance_id": "ch005", "instruction": "Please calculate monthly power consumption metrics, including total and average usage, for different device categories (coffee machines, printers, projectors, and vending machines) based on hourly averages.", "type": "Clickhouse"}
{"instance_id": "ch003", "instruction": "Please help me find the top five ski resorts in the US with the highest recorded snowfalls from selected weather stations since 2017, where the stations are located within 20 kilometers of the resorts and are situated at elevations above 1800 meters.", "type": "Clickhouse"}
{"instance_id": "ch004", "instruction": "Could you help me identify distinct devices and their types and locations within a building where significant temperature variations, defined as differences of 25 degrees or more in hourly averages, occur during the winter (Dec 2018 - Feb 2019) or summer periods (Jun 2019 - Aug 2019).", "type": "Clickhouse"}
{"instance_id": "playbook001", "instruction": "Complete the project of this database to show the metrics of each traffic source, I believe every touchpoint in the conversion path is equally important, please choose the most suitable attribution method.", "type": "DBT"}
{"instance_id": "provider001", "instruction": "How can I map Medicare specialties to NUCC taxonomy codes, prioritize the most specific one, and assign a primary taxonomy code to each provider in the NPI dataset? Additionally, how can I combine this with provider details, including their entity type, practice location, and specialty, while checking if their NPI is active or deactivated?", "type": "DBT"}
{"instance_id": "asana001", "instruction": "Can you describe the process used to aggregate task and project metrics for Asana teams and users, including open tasks, completed tasks, and average close times?", "type": "DBT"}
{"instance_id": "shopify001", "instruction": "Create two tables: one that pulls together product data like total sales, refunds, discounts, and taxes, and another that tracks daily shop performance, including orders, abandoned checkouts, and fulfillment statuses.", "type": "DBT"}
{"instance_id": "asset001", "instruction": "Calculate the average bid, ask, and mid prices for stock tickers on a daily basis and then determine the total book value of shares based on these prices", "type": "DBT"}
{"instance_id": "flicks001", "instruction": "How can I calculate the average IMDb and TMDb ratings for actors based on the movies they've appeared in, and also determine the number of movies each actor participated in by release year?", "type": "DBT"}
{"instance_id": "analytics_engineering001", "instruction": "comprehensive report that combines customer, employee, product, and purchase order details, including order quantities, costs, and timestamps, all joined together for easy analysis", "type": "DBT"}
{"instance_id": "xero_new001", "instruction": "You are tasked with completing a data transformation project to generate three financial report models: the Profit and Loss Report, the Balance Sheet Report, and the Invoice Line Items Report. These models primarily rely on data sourced from the General Ledger.", "type": "DBT"}
{"instance_id": "chinook001", "instruction": "Create a comprehensive invoice table that combines invoice details, customer information, and date attributes, linking each invoice with its customer and date dimensions", "type": "DBT"}
{"instance_id": "f1001", "instruction": "Generate views that rank Formula 1 drivers based on the number of fastest laps, podium finishes, and pole positions, while summarizing their race finishes and positions across all events?", "type": "DBT"}
{"instance_id": "netflix001", "instruction": "combine Netflix original programs from various genres, clean up their titles, and standardize premiere dates while also categorizing them by genre and renewal status", "type": "DBT"}
{"instance_id": "workday002", "instruction": "Create a table that aggregates job profile information along with job family and job family group details", "type": "DBT"}
{"instance_id": "pendo001", "instruction": "Generate daily metrics reports for Pendo guides and pages", "type": "DBT"}
{"instance_id": "synthea001", "instruction": "How can we aggregate financial data related to healthcare events, such as conditions, drug exposures, and procedures, to calculate total costs, charges, and payments for each event?", "type": "DBT"}
{"instance_id": "inzight001", "instruction": "Calculate the monthly peak electricity usage, along with its 12-month rolling average and percentage change compared to the previous month, using timestamps and date-time dimensions.", "type": "DBT"}
{"instance_id": "google_play001", "instruction": "Generate a Google Play country report and a Google Play device report based on aggregated install and rating metrics..", "type": "DBT"}
{"instance_id": "airbnb002", "instruction": "Complete the data transformation by aggregating review data over a 7-day rolling window, calculating week-over-week percentage changes in review totals by sentiment, and generating unique identifiers for each date and sentiment combination.", "type": "DBT"}
{"instance_id": "biketheft001", "instruction": "How can I combine current and archived theft report data, join it with geographical information for Berlin districts, and calculate the relevant crime metrics including damage in euros and crime location details like district IDs and names?", "type": "DBT"}
{"instance_id": "tickit002", "instruction": "Get detailed information about events and ticket listings, including venue details, event timing, categories, seller information, and pricing for each listing", "type": "DBT"}
{"instance_id": "activity001", "instruction": "How can I compare user activities to see how many users signed up and visited a page, using both the 'aggregate after' and 'aggregate all ever' methods for capturing the visit page activities?", "type": "DBT"}
{"instance_id": "scd001", "instruction": "Generate a report that aggregates corporate account metrics, including the number of gaggles, users, events, and orders associated with corporate emails, while also identifying the first user, most active user, and the user with the most orders for each corporate email", "type": "DBT"}
{"instance_id": "lever001", "instruction": "Pull together the data from multiple tables related to job postings and create a complete report that covers job applications, interviews, requisitions, tags, and the hiring manager details.", "type": "DBT"}
{"instance_id": "greenhouse001", "instruction": "Please generate a report on enhanced data of job and application.", "type": "DBT"}
{"instance_id": "app_reporting002", "instruction": "Please generate an overview report of the app combine apple store and google play.", "type": "DBT"}
{"instance_id": "mrr001", "instruction": "Complete the project on this database to calculate the monthly recurring revenue.", "type": "DBT"}
{"instance_id": "xero001", "instruction": "Create a balance sheet report that represents the balance sheet state for each account on a monthly basis.", "type": "DBT"}
{"instance_id": "movie_recomm001", "instruction": "combine movie_recomm original programs from various genres, clean up their titles, and standardize premiere dates while also categorizing them by genre and renewal status", "type": "DBT"}
{"instance_id": "quickbooks003", "instruction": "Pull a table with all balance sheet entries for asset, liability, and equity accounts. Make sure it includes account details, class, parent information, and the monthly period balances.", "type": "DBT"}
{"instance_id": "qualtrics001", "instruction": "calculate key metrics for each directory, including distinct contacts, emails, phones, unsubscribed contacts, and new contacts from the last 30 days, along with survey engagement stats (sent, opened, started, completed) and the number of mailing lists?", "type": "DBT"}
{"instance_id": "recharge002", "instruction": "Calculate daily and running totals for customer transactions, including charges, discounts, taxes, refunds, and order quantities, and determine the number of active months for each customer?", "type": "DBT"}
{"instance_id": "atp_tour001", "instruction": "How can you generate a summary report of ATP tennis matches that includes details about the tournaments, players, match statistics (such as the winner, loser, aces, and scores), and their associated dates, utilizing dimensional tables for tournaments and players to enrich the report?", "type": "DBT"}
{"instance_id": "quickbooks002", "instruction": "Generate a table that includes bill and invoice transaction information, including supplier and customer details, payment status, balance, overdue days, and other financial information.", "type": "DBT"}
{"instance_id": "google_ads001", "instruction": "Generate reports for Google Ads campaigns and keywords, including spend, clicks, impressions, conversions, and related metrics, with data grouped by account, campaign, and ad group details.", "type": "DBT"}
{"instance_id": "airport001", "instruction": "aggregate and summarize Malaysian airport arrival data, including flight counts and calculate the distances between Malaysian airports in kilometers.", "type": "DBT"}
{"instance_id": "tpch001", "instruction": "Calculate the lifetime value of a customer by analyzing their total purchases and returns, categorize their status based on the percentage of returns, and combine this data with lost revenue information.", "type": "DBT"}
{"instance_id": "salesforce001", "instruction": "I need a daily report on key sales activities\u2014covering tasks completed, events held, leads generated, and the status of opportunities.", "type": "DBT"}
{"instance_id": "hubspot001", "instruction": "How can I merge HubSpot contact data with email metrics, engagement activities, and email campaign performance to provide a comprehensive view of each contact\u2019s interactions and the overall effectiveness of email campaigns?", "type": "DBT"}
{"instance_id": "shopify002", "instruction": "Generate a table that combines Shopify discount code data with price rules, order aggregates, and abandoned checkout aggregates. Include metrics like discount amounts, order counts, shipping costs, and customer data for each discount, handling cases where discount codes apply to both shipping and line items.", "type": "DBT"}
{"instance_id": "social_media001", "instruction": "Generate a comprehensive social media report that includes Facebook, Instagram, LinkedIn, and Twitter reports.", "type": "DBT"}
{"instance_id": "xero_new002", "instruction": "Generate a monthly balance sheet report, summarizing asset, equity, and liability accounts with their net amounts, and categorizing earnings as 'Retained Earnings' or 'Current Year Earnings' based on the financial year-end date", "type": "DBT"}
{"instance_id": "divvy001", "instruction": "Analyze bike trips by combining user data, trip duration, and geo-locational information for start and end stations, while filtering trips based on their duration and associating stations with specific neighborhoods", "type": "DBT"}
{"instance_id": "playbook002", "instruction": "Please assist me in completing the data transformation project of this database.", "type": "DBT"}
{"instance_id": "apple_store001", "instruction": "Please finish the data transformation project to generate source type and territory report for me.", "type": "DBT"}
{"instance_id": "jira001", "instruction": "Retrieve information about Jira projects, including project lead details, associated epics, components, and metrics like the average and median time for closing issues, both in days and seconds.", "type": "DBT"}
{"instance_id": "zuora001", "instruction": "Generate the daily account overview and the account overview for Zuora.", "type": "DBT"}
{"instance_id": "superstore001", "instruction": "How can I generate a dataset that associates sales transactions with their respective regional managers, including details about products, customers, shipping, and geographical data?", "type": "DBT"}
{"instance_id": "marketo001", "instruction": "How can I combine the most recent version of each email template with aggregated metrics for sends, opens, bounces, clicks, deliveries, and unsubscribes?", "type": "DBT"}
{"instance_id": "f1002", "instruction": "Summarize Formula 1 constructors' race performances, rank constructors based on driver championships, and track driver championships across seasons?", "type": "DBT"}
{"instance_id": "gitcoin001", "instruction": "Transform and clean the raw application, project, and application answer data by renaming fields, extracting metadata, and linking answers to their respective questions and projects.", "type": "DBT"}
{"instance_id": "shopify_holistic_reporting001", "instruction": "Combine daily customer order data from Shopify and user engagement metrics from Klaviyo, ensuring that records are merged based on email, date, and attribution-related details (such as campaign, flow, and variation IDs) to create a unified view of customer activity across both platforms, including last touch information and source-specific data.", "type": "DBT"}
{"instance_id": "hive001", "instruction": "Process raw COVID-19 case data, clean it, and join it with country codes to display the number of cases and deaths by country and report date?", "type": "DBT"}
{"instance_id": "workday001", "instruction": "Create a table that combines organization roles with worker position", "type": "DBT"}
{"instance_id": "f1003", "instruction": "Create data models to track Formula 1 drivers' podium finishes, fastest laps, and constructors' retirements per season", "type": "DBT"}
{"instance_id": "retail001", "instruction": "Which countries have the highest total revenue based on the number of invoices, and what are the top 10 countries by total revenue?", "type": "DBT"}
{"instance_id": "google_play002", "instruction": "Generate an overview report on Google Play app performance, including installs, uninstalls, crashes, ANRs, ratings, and store performance metrics over time.", "type": "DBT"}
{"instance_id": "sap001", "instruction": "Can you explain the process used to handle and aggregate SAP general ledger data for `0fi_gl_10` and `0fi_gl_14`?", "type": "DBT"}
{"instance_id": "airbnb001", "instruction": "Aggregate user reviews by sentiment on a daily and month-over-month basis, while also combining listings and host information to track the latest updates for each listing", "type": "DBT"}
{"instance_id": "app_reporting001", "instruction": "Please generate reports for app version and OS version.", "type": "DBT"}
{"instance_id": "mrr002", "instruction": "Please complete this data transformation project to analyze the trends in user subscription changes.", "type": "DBT"}
{"instance_id": "twilio001", "instruction": "Aggregate messaging data for Twilio, one at the phone number level and another at the account level, including details like inbound/outbound message counts, message statuses, and total spend.", "type": "DBT"}
{"instance_id": "intercom001", "instruction": "Can you explain the process used to calculate admin-specific and optionally team-specific metrics for closed conversations in Intercom, including total conversations, average ratings, and median response times?.", "type": "DBT"}
{"instance_id": "tickit001", "instruction": "Generating a complete sales summary that includes buyer and seller details, event categories, and sales metrics.", "type": "DBT"}
{"instance_id": "reddit001", "instruction": "clean and join Reddit post and comment data from paranormal subreddits to analyze metadata like post scores, comment counts, and timestamps, while ensuring relationships between posts and comments?.", "type": "DBT"}
{"instance_id": "recharge001", "instruction": "Create a model to combine charge data, including line items, discounts, taxes, shipping, and refunds, while ensuring each item is uniquely identified and linked to its charge?", "type": "DBT"}
{"instance_id": "maturity001", "instruction": "How can I retrieve detailed information about doctors, including their specialties, and patients, including their medical details, such as diabetes status, from the respective dimension tables?", "type": "DBT"}
{"instance_id": "tpch002", "instruction": "Find low-cost brass part suppliers located in the United Kingdom, including their part availability, retail prices, and contact details", "type": "DBT"}
{"instance_id": "nba001", "instruction": "Create comprehensive views summarizing NBA teams' regular season, playoff progress, and Elo ratings. Use data from season summaries, playoff simulations, and ratings, and include key playoff milestones like semifinals, finals, and championship wins. Keep it concise.", "type": "DBT"}
{"instance_id": "quickbooks001", "instruction": "Please create a table that unions all records from each model within the double_entry_transactions directory. The table should result in a comprehensive general ledger, ensuring each transaction has an offsetting debit and credit entry.", "type": "DBT"}
{"instance_id": "bq023", "instruction": "What are the average political donation amounts and median incomes for each census tract identifier in Kings County (Brooklyn), NY, using data from the 2018 ACS and 2020 FEC contributions?", "type": "Bigquery"}
{"instance_id": "bq229", "instruction": "Can you provide a count of how many image URLs are categorized as \u2018cat\u2019 (with label '/m/01yrx' and full confidence) and how many contain no such cat labels(categorized as \u2018other\u2019) at all? ", "type": "Bigquery"}
{"instance_id": "bq024", "instruction": "For the year 2012, which top 10 evaluation groups have the largest subplot acres when considering only the condition with the largest subplot acres within each group? Please include the evaluation group, evaluation type, condition status code, evaluation description, state code, macroplot acres, and subplot acres.", "type": "Bigquery"}
{"instance_id": "bq220", "instruction": "Which states had the largest average subplot size and the largest average macroplot size respectively for each of the years 2015, 2016, and 2017, based on accessible forest land and current evaluations (EXPCURR)? Display the type of plot (subplot or macroplot), the year, the state, and the corresponding average size for each type and each specific year.", "type": "Bigquery"}
{"instance_id": "bq218", "instruction": "What are the top 5 alcoholic beverages name with the highest year-over-year growth percentage in total sales revenue for the year 2023?", "type": "Bigquery"}
{"instance_id": "bq015", "instruction": "Rank the top 10 most discussed tags on Stack Overflow questions that were mentioned on Hacker News since 2014.", "type": "Bigquery"}
{"instance_id": "bq227", "instruction": "Could you provide the annual percentage shares, rounded to two decimal places, of the top 5 minor crime categories from 2008 in London's total crimes, with each year displayed in one row?", "type": "Bigquery"}
{"instance_id": "bq287", "instruction": "What is the employment rate (only consider population over 16) in the Utah zip code that has the fewest number of bank locations based on American Community Survey data in 2017?", "type": "Bigquery"}
{"instance_id": "bq041", "instruction": "What are the monthly statistics for new StackOverflow users created in 2021, including the percentage of new users who asked questions and the percentage of those who asked questions and then answered questions within their first 30 days?", "type": "Bigquery"}
{"instance_id": "bq425", "instruction": "List all distinct molecules associated with the company 'SanofiAventis,' along with their trade name and approval date, retaining the most recent approval date for each molecule, using data from ChEMBL Release 23.", "type": "Bigquery"}
{"instance_id": "bq046", "instruction": "Find case barcodes and their corresponding GDC file URLs for female patients aged 30 or younger diagnosed with breast cancer, whose clinical history includes problematic prior treatments for other cancers or redacted annotations. Only consider relevant clinical and annotation data from TCGA with GDC archive release 14.", "type": "Bigquery"}
{"instance_id": "bq280", "instruction": "Please provide the display name of the user who has answered the most questions on Stack Overflow, considering only users with a reputation greater than 10.", "type": "Bigquery"}
{"instance_id": "bq079", "instruction": "Given the latest evaluations of timberland and forestland plots, which state within each category has the highest total acreage? Please provide the state code, the evaluation group, the state name, and the total acres for the top state in each category.", "type": "Bigquery"}
{"instance_id": "bq414", "instruction": "Retrieve the object id, title, and the formatted metadata date (as a string in 'YYYY-MM-DD' format) for objects in the \"The Libraries\" department where the cropConfidence is greater than 0.5, the object's title contains the word \"book\".", "type": "Bigquery"}
{"instance_id": "bq413", "instruction": "Retrieve the venue titles of publications inserted from 2024 onwards, where the associated grid's city is 'Qianjiang', prioritizing the venue titles from journal first, then proceedings, book, or book series titles.", "type": "Bigquery"}
{"instance_id": "bq077", "instruction": "For each year from 2010 to 2016, what is the highest number of motor thefts in one month?", "type": "Bigquery"}
{"instance_id": "bq048", "instruction": "Which common complaint types have the strongest positive and negative correlation with wind speed respectively, given the data in NYC JFK Airport from year 2011 to 2020? Also, provide the corresponding correlation values (rounded to 4 decimals).", "type": "Bigquery"}
{"instance_id": "bq228", "instruction": "Please provide a list of the top three major crime categories in the borough of Barking and Dagenham, along with the number of incidents in each category.", "type": "Bigquery"}
{"instance_id": "bq025", "instruction": "Provide a list of the top 10 countries for the year 2020, ordered by the highest percentage of their population under 20 years old. For each country, include the total population under 20 years old, the total midyear population, and the percentage of the population that is under 20 years old.", "type": "Bigquery"}
{"instance_id": "bq441", "instruction": "Please help me compile the critical details on traffic accidents in 2015, as listed in the info document.", "type": "Bigquery"}
{"instance_id": "bq022", "instruction": "Given the taxi trip data in Chicago, partition the trips that last no more than 1 hour into 6 quantiles based on trip duration. Please provide the minimum/maximum trip duration (rounded-off to integer minutes), total trips, and average fare for each quantile.", "type": "Bigquery"}
{"instance_id": "bq076", "instruction": "Which month generally has the greatest number of motor vehicle thefts in 2016?", "type": "Bigquery"}
{"instance_id": "bq049", "instruction": "Display the monthly per capita Bourbon Whiskey sales in 2022 for the zip code with the third-highest total sales in Dubuque County, considering only the population aged 21 and over.", "type": "Bigquery"}
{"instance_id": "bq085", "instruction": "Could you provide the total number of confirmed COVID-19 cases and the number of cases per 100,000 people, based on the 2020 population, on April 20, 2020, for the US, France, China, Italy, Spain, Germany, and Iran?", "type": "Bigquery"}
{"instance_id": "bq288", "instruction": "What is the total number of all banking institutions in the state that has the highest sum of assets from banks established between January 1, 1900, and December 31, 2000, with institution names starting with 'Bank'?", "type": "Bigquery"}
{"instance_id": "bq281", "instruction": "What is the highest number of electric bike rides lasting more than 10 minutes taken by subscribers with 'Student Membership' in a single day, excluding rides starting or ending at 'Mobile Station' or 'Repair Shop'?", "type": "Bigquery"}
{"instance_id": "bq275", "instruction": "Can you provide a list of visitor IDs for those who made their first transaction on a mobile device on a different day than their first visit?", "type": "Bigquery"}
{"instance_id": "bq047", "instruction": "Could you help me analyze the relationship between each complaint type and daily temperature in New York city, focusing on data in airports LaGuardia and JFK over the 10 years starting from 2008? Calculate the total complaint count, the total day count, and the Pearson correlation coefficient (rounded to 4 decimals) between temperature and both the count and percentage of each common (>5000 occurrences) and strongly correlated (absolute value > 0.5) complaint type.", "type": "Bigquery"}
{"instance_id": "bq078", "instruction": "Retrieve the approved symbol of target genes with the highest overall score that are associated with the disease 'EFO_0000676' from the data source 'IMPC'.", "type": "Bigquery"}
{"instance_id": "bq040", "instruction": "For NYC yellow taxi trips between January 1-7, 2016, excluding pickups from 'EWR' and 'Staten Island', calculate the proportion of trips by tip category for each pickup borough. Show the borough, tip category, and proportion, ensuring trips where the dropoff occurs after the pickup, the passenger count is greater than 0, and trip distance, tip, tolls, MTA tax, fare, and total amount are non-negative.", "type": "Bigquery"}
{"instance_id": "bq424", "instruction": "List the top 10 countries with respect to the total amount of long-term external debt in descending order, excluding those without a specified region.", "type": "Bigquery"}
{"instance_id": "bq286", "instruction": "Can you tell me the name of the most popular female baby in Wyoming for the year 2021, based on the proportion of female babies given that name compared to the total number of female babies given the same name across all states?", "type": "Bigquery"}
{"instance_id": "ga018", "instruction": "I'd like to analyze the appeal of our products to users. Can you calculate the percentage of times users go from browsing the product list pages to clicking into the product detail pages during a single session on January 2nd, 2021?", "type": "Bigquery"}
{"instance_id": "bq300", "instruction": "What is the highest number of answers received for a single Python 2 specific question on Stack Overflow, excluding any discussions that involve Python 3?", "type": "Bigquery"}
{"instance_id": "bq338", "instruction": "Can you find the census tracts in the 36047 area that made both the top 20 lists for biggest population and median income increases from 2011 to 2018, and had over 1000 residents each year?", "type": "Bigquery"}
{"instance_id": "ga020", "instruction": "Which quickplay event type had the lowest user retention rate during the second week after their initial engagement, for users who first engaged between August 1 and August 15, 2018?", "type": "Bigquery"}
{"instance_id": "bq103", "instruction": "Generate summary statistics on genetic variants in the region between positions 55039447 and 55064852 on chromosome 1. This includes the number of variants, the total allele count, the total number of alleles, and distinct gene symbols (using Variant Effect Predictor, VEP, for gene annotation). Additionally, compute the density of mutations by dividing the length of the region by the number of variants. Using data from the gnomAD v3 version.", "type": "Bigquery"}
{"instance_id": "bq309", "instruction": "Show the top 10 longest Stack Overflow questions where the question has an accepted answer or an answer with a score-to-view ratio above 0.01, including the user's reputation, net votes, and badge count.", "type": "Bigquery"}
{"instance_id": "ga011", "instruction": "What is the highest number of page views for different pages under website \"shop.googlemerchandisestore.com\" in December 2020?", "type": "Bigquery"}
{"instance_id": "bq396", "instruction": "Which top 3 states had the largest differences in the number of traffic accidents between rainy and clear weather during weekends in 2016? Please also provide the respective differences for each state.", "type": "Bigquery"}
{"instance_id": "bq362", "instruction": "Which three companies had the largest increase in trip numbers between two consecutive months in 2018?", "type": "Bigquery"}
{"instance_id": "bq391", "instruction": "Could you find out which health conditions have the most types of medications per case, for living patients whose last names start with 'A' and have only one unique condition? I'd like to see the top eight conditions and their codes, ranked by the highest number of different meds prescribed to any single patient.", "type": "Bigquery"}
{"instance_id": "bq161", "instruction": "Calculate the net difference between the number of pancreatic adenocarcinoma (PAAD) patients in TCGA's dataset who are confirmed to have mutations in both KRAS and TP53 genes, and those without mutations in either gene. Utilize patient clinical and follow-up data alongside genomic mutation details from TCGA\u2019s cancer genomics database, focusing specifically on PAAD studies where the mutations have passed quality filters.", "type": "Bigquery"}
{"instance_id": "bq398", "instruction": "What are the top three debt indicators for Russia based on the highest debt values?", "type": "Bigquery"}
{"instance_id": "bq354", "instruction": "Could you provide the percentage of participants for standard acne, atopic dermatitis, psoriasis, and vitiligo defined by the International Classification of Diseases 10-CM(ICD-10-CM), including their subcategories? The ICD-10 codes are: Acne (L70), Atopic dermatitis (L20), Psoriasis (L40), and Vitiligo (L80). ", "type": "Bigquery"}
{"instance_id": "bq308", "instruction": "Show the number of Stack Overflow questions asked each day of the week in 2021, and find out how many and what percentage of those were answered within one hour.", "type": "Bigquery"}
{"instance_id": "bq105", "instruction": "How many traffic accidents per 100,000 people, specifically due to driver distraction, were recorded in each state in the years 2015 and 2016? Identify the top five states each year with the highest rates. Exclude accidents where the distraction status of the driver was recorded as 'Not Distracted,' 'Unknown if Distracted,' or 'Not Reported.' Use state population data from the 2010 census for calculating the rate.", "type": "Bigquery"}
{"instance_id": "ga010", "instruction": "Can you give me an overview of our website traffic for December 2020? I'm particularly interested in the channel with the fourth highest number of sessions.", "type": "Bigquery"}
{"instance_id": "ga028", "instruction": "Please perform a 7-day retention analysis for users who first used the app during the week starting on July 2, 2018. Provide the total number of these new users and the number of retained users for each week from Week 0 (the initial week) through Week 4.", "type": "Bigquery"}
{"instance_id": "ga017", "instruction": "How many distinct users viewed the most frequently visited page during January 2021?", "type": "Bigquery"}
{"instance_id": "bq102", "instruction": "Identify which start positions are associated with missense variants in the BRCA1 gene on chromosome 17, where the reference base is 'C' and the alternate base is 'T'.", "type": "Bigquery"}
{"instance_id": "bq330", "instruction": "Which Colorado zip code has the highest concentration of bank locations per block group, based on the overlap between zip codes and block groups?", "type": "Bigquery"}
{"instance_id": "bq339", "instruction": "Which month in 2017 had the largest absolute difference between cumulative bike usage minutes for customers and subscribers?", "type": "Bigquery"}
{"instance_id": "ga021", "instruction": "What is the retention rate for users two weeks after their initial quickplay event within the period from July 2, 2018, to July 16, 2018, calculated separately for each quickplay event type?", "type": "Bigquery"}
{"instance_id": "bq306", "instruction": "Identify the top 10 tags for user 1908967 by calculating a reputation score based on upvotes and accepted answers before June 7, 2018. The score is calculated as 10 times the upvotes plus 15 times the accepted answers.", "type": "Bigquery"}
{"instance_id": "ga019", "instruction": "Could you determine what percentage of users either did not uninstall our app within seven days or never uninstalled it after installing during August and September 2018?", "type": "Bigquery"}
{"instance_id": "bq301", "instruction": "Retrieve details of accepted answers related to JavaScript security topics such as XSS, cross-site scripting, exploits, and cybersecurity, for questions posted in January 2016 on Stack Overflow. For each accepted answer, include the answer's ID, the answerer's reputation, score, and comment count, along with the associated question's tags, score, answer count, the asker's reputation, view count, and comment count.", "type": "Bigquery"}
{"instance_id": "bq355", "instruction": "Please tell me the percentage of participants not using quinapril and related medications(Quinapril RxCUI: 35208).", "type": "Bigquery"}
{"instance_id": "bq352", "instruction": "Please list the average number of prenatal weeks in 2018 for counties in Wisconsin where more than 5% of the employed population had commutes of 45-59 minutes in 2017.", "type": "Bigquery"}
{"instance_id": "bq399", "instruction": "Which high-income country had the highest average crude birth rate respectively in each region, and what are their corresponding average birth rate, during the 1980s?", "type": "Bigquery"}
{"instance_id": "bq169", "instruction": "For cases in which:\n- Chromosome 13 has a loss of genetic material between positions 48,303,751 and 48,481,890,\n- Chromosome 17 has a loss of genetic material between positions 7,668,421 and 7,687,490, and\n- Chromosome 11 has a gain of genetic material between positions 108,223,067 and 108,369,102,\nretrieve the case information where all three conditions above are met. For each case, also return the chromosomal details for each region (chromosome number, start and end positions) and the corresponding karyotype information.", "type": "Bigquery"}
{"instance_id": "bq151", "instruction": "Using TCGA dataset, calculate the chi-squared statistic to evaluate the association between KRAS and TP53 gene mutations in patients diagnosed with pancreatic adenocarcinoma (PAAD). Incorporate clinical follow-up data and high-quality mutation annotations to accurately determine the frequency of patients with co-occurring KRAS and TP53 mutations compared to those with each mutation occurring independently. Ensure that patient records are meticulously matched based on unique identifiers to maintain data integrity. This analysis aims to identify and quantify potential correlations between KRAS and TP53 genetic alterations within the PAAD patient population.", "type": "Bigquery"}
{"instance_id": "bq363", "instruction": "For taxi trips with a duration rounded to the nearest minute, and between 1 and 50 minutes, if the trip durations are divided into 10 quantiles, what are the total number of trips and the average fare for each quantile?", "type": "Bigquery"}
{"instance_id": "bq397", "instruction": "Identify the country with the highest total transactions within each channel grouping, provided that the channel includes transactions from more than one country. What is the transaction total for that country?", "type": "Bigquery"}
{"instance_id": "bq379", "instruction": "Which target approved symbol has the overall association score closest to the mean score for psoriasis?", "type": "Bigquery"}
{"instance_id": "bq383", "instruction": "Could you provide the highest recorded precipitation, minimum temperature, and maximum temperature from the last 15 days of each year from 2013 to 2016 at weather station USW00094846? Ensure each value represents the peak measurement for that period, with precipitation in millimeters and temperatures in degrees Celsius, including only valid and high-quality data.", "type": "Bigquery"}
{"instance_id": "bq111", "instruction": "Follow the instruction documentation guide, please help me compute Pearson correlation for each chromosome comparing Mitelman DB frequencies with those computed from TCGA.", "type": "Bigquery"}
{"instance_id": "ga004", "instruction": "Can you figure out the average difference in pageviews between users who bought something and those who didnt in December 2020? Just label anyone who was involved in purchase events as a purchaser", "type": "Bigquery"}
{"instance_id": "ga003", "instruction": "I'm trying to evaluate which board types were most effective on September 15, 2018. Can you find out the average scores for each board type from the quick play level completions on that day?.", "type": "Bigquery"}
{"instance_id": "bq116", "instruction": "What was the highest annual revenue in billions of dollars reported by a U.S. state in 2016, across the main revenue categories and covering all four quarters?", "type": "Bigquery"}
{"instance_id": "bq120", "instruction": "What are the top 10 regions with the highest total SNAP participation, along with their respective ratios of households earning under $20,000 to SNAP households, as of 2017?", "type": "Bigquery"}
{"instance_id": "ga032", "instruction": "Can you pull up the sequence of pages our customer 1362228 visited on January 28th 2021, linking them with '>>' between each page? I want to see their navigation flow through our site. Please refer to the docs to convert the corresponding page title to \"PDP\" or \"PLP\" if necessary and merge adjacent identical page titles into one.", "type": "Bigquery"}
{"instance_id": "bq144", "instruction": "I would like to merge NCAA basketball historical tournament games outcomes with additional pace and efficiency performance metrics to enable comprehensive analysis of team and opponent dynamics from the 2014 season onwards (2018 included). Please refer to the Query Variable Guide to provide all the data.", "type": "Bigquery"}
{"instance_id": "bq376", "instruction": "For each neighborhood in San Francisco, list the number of bike share stations and the total number of crime incidents.", "type": "Bigquery"}
{"instance_id": "bq143", "instruction": "Use CPTAC proteomics and RNAseq data for Clear Cell Renal Cell Carcinoma to select 'Primary Tumor' and 'Solid Tissue Normal' samples. Join the datasets on sample submitter IDs and gene symbols. Calculate the correlation between protein abundance (log2 ratio) and gene expression levels (log-transformed+1 FPKM) for each gene and sample type. Filter out correlations with an absolute value greater than 0.5, and compute the average correlation for each sample type.", "type": "Bigquery"}
{"instance_id": "bq181", "instruction": "How much percentage of weather stations recorded temperature data for at least 90% of the days in 2022?", "type": "Bigquery"}
{"instance_id": "bq186", "instruction": "Please help me calculate the first, last, highest, and lowest bike trip durations in minutes for each month.", "type": "Bigquery"}
{"instance_id": "bq172", "instruction": "For the drug with the highest total number of prescriptions in New York State during 2014, could you list the top five states with the highest total claim counts for this drug? Please also include their total claim counts and total drug costs. ", "type": "Bigquery"}
{"instance_id": "bq126", "instruction": "What are the titles, artist names, mediums, and original image URLs of objects with 'Photograph' in their names from the 'Photographs' department, created not by an unknown artist, with an object end date of 1839 or earlier?", "type": "Bigquery"}
{"instance_id": "bq119", "instruction": "Please show information of the hurricane with the third longest total travel distance in the North Atlantic during 2020, including its travel coordinates, the cumulative travel distance at each point, and the maximum sustained wind speed at those times.", "type": "Bigquery"}
{"instance_id": "ga002", "instruction": "Tell me the most purchased other products and their quantities by customers who bought the Google Red Speckled Tee each month for the three months starting from November 2020.", "type": "Bigquery"}
{"instance_id": "ga005", "instruction": "Please conduct a weekly cohort analysis for user retention starting July 9, 2018. Group users by their first week of app use and calculate the retention rates for each cohort over the next two weeks, showing the rate of the original cohort that returned each week. The data is available up to October 2, 2018.", "type": "Bigquery"}
{"instance_id": "bq110", "instruction": "What has been the change in the number of homeless veterans in each CoC region of New York between 2012 and 2018?", "type": "Bigquery"}
{"instance_id": "bq406", "instruction": "Please calculate the growth rates for Asians, Black people, Latinx people, Native Americans, White people, US women, US men, global women, and global men from 2014 to 2024 concerning the overall workforce.", "type": "Bigquery"}
{"instance_id": "bq096", "instruction": "Which year had the first day after January with more than 10 sightings of Sterna paradisaea north of 40 degrees latitude?", "type": "Bigquery"}
{"instance_id": "bq268", "instruction": "Identify the longest number of days between the first visit and the last recorded event (either the last visit or the first transaction) for a user where the last recorded event was associated with a mobile device.", "type": "Bigquery"}
{"instance_id": "bq098", "instruction": "For NYC yellow taxi trips between January 1-7, 2016, could you tell me the percentage of no tips in each borough. Ensure trips where the dropoff occurs after the pickup, the passenger count is greater than 0, and trip distance, tip, tolls, MTA tax, fare, and total amount are non-negative.", "type": "Bigquery"}
{"instance_id": "bq053", "instruction": "How has the number of trees of each fall color in New York City changed from 1995 to 2015, considering only trees that were still alive in 2015 and excluding those marked as dead in 1995?", "type": "Bigquery"}
{"instance_id": "bq054", "instruction": "Can you provide the top 10 tree species in New York with their species latin not empty, based on the growth in their population from 1995 to 2015, including the count of trees, the count of alive and dead trees for both years, and the respective growth values?", "type": "Bigquery"}
{"instance_id": "bq430", "instruction": "Find pairs of different molecules tested in the same assay and standard type, where both have 10\u201315 heavy atoms, fewer than 5 activities in that assay, fewer than 2 duplicate activities, non-null standard values, and pChEMBL values over 10. For each pair, report the maximum heavy atom count, the latest publication date (calculated based on the document's rank within the same journal and year, and map it to a synthetic month and day), the highest document ID, classify the change in standard values as 'increase', 'decrease', or 'no-change' based on their values and relations, and generate UUIDs from their activity IDs and canonical SMILES.", "type": "Bigquery"}
{"instance_id": "bq232", "instruction": "Could you provide the total number of 'Other Theft' incidents within the 'Theft and Handling' category for each year in the Westminster borough?", "type": "Bigquery"}
{"instance_id": "bq235", "instruction": "Can you tell me which healthcare provider incurs the highest combined average costs for both outpatient and inpatient services in 2014?", "type": "Bigquery"}
{"instance_id": "bq038", "instruction": "Can you list the top 10 station id with the highest proportion of group rides? The group rides are trips that start and end at the same station within a 2-minute window.", "type": "Bigquery"}
{"instance_id": "bq203", "instruction": "What percentage of subway stations in each New York borough have at least one ADA-compliant entrance?", "type": "Bigquery"}
{"instance_id": "bq031", "instruction": "Show me the daily weather data (temperature, precipitation, and wind speed) in Rochester for the first season of year 2019, converted to Celsius, centimeters, and meters per second, respectively. Also, include the moving averages (window size = 8) and the differences between the moving averages for up to 8 days prior (all values rounded to one decimal place, sorted by date in ascending order, and records starting from 2019-01-09).", "type": "Bigquery"}
{"instance_id": "bq452", "instruction": "Identify variants on chromosome 12, calculate their chi-squared scores using allele counts in cases and controls, and return the start, end, chi-squared score (after Yates's correction for continuity) of top variants where the chi-squared score is no less than 29.71679, ensuring that each group has expected counts of at least 5 for the chi-squared calculation.", "type": "Bigquery"}
{"instance_id": "bq204", "instruction": "Find the user with the highest total clicks across all records from all available photo collections.", "type": "Bigquery"}
{"instance_id": "bq009", "instruction": "Which traffic source receives the top revenue in 2017 and what is the difference (millions, rounded to two decimal places) between its highest and lowest revenue months?", "type": "Bigquery"}
{"instance_id": "bq293", "instruction": "What were the top 5 busiest pickup times and locations (by ZIP code) for yellow taxi rides in New York City on January 1, 2015? Additionally, provide detailed metrics for each of these top 5 records, including the count of rides, hourly, daily, and weekly lagged counts, as well as 14-day and 21-day average and standard deviation of ride counts.", "type": "Bigquery"}
{"instance_id": "bq055", "instruction": "Can you provide the top 3 races with the largest percentage differences between Google's 2021 hiring data and the average percentages from the 2021 BLS reports, along with their respective differences? Focus the analysis on the technology sectors specifically defined as Internet content broadcast, software publishing, data management, hosting, and associated services, or within the industry group of 'Computer systems design and related services.' ", "type": "Bigquery"}
{"instance_id": "bq269", "instruction": "Compute the average pageviews per visitor for non-purchase events and purchase events each month between June 1st and July 31st in 2017.", "type": "Bigquery"}
{"instance_id": "bq064", "instruction": "Could you calculate the population and average individual income (both rounded to 1 decimal) for each zip code based on U.S. census tract data in 2017? Only include those zip codes within a 5-mile radius of a specific geographic point (47.685833\u00b0N, -122.191667\u00b0W) in Washington and sort the results according to income in descending order.", "type": "Bigquery"}
{"instance_id": "bq400", "instruction": "What are the start and end times of trips from 'Clay St & Drumm St' to 'Sacramento St & Davis St' (one direction only), in the format of HH:MM:SS? I also want the trip headsign for each route.", "type": "Bigquery"}
{"instance_id": "bq090", "instruction": "How much higher the average intrinsic value is for trades using the feeling-lucky strategy compared to those using the momentum strategy under long-side trades?", "type": "Bigquery"}
{"instance_id": "bq097", "instruction": "What is the increasing amount of the average earnings per job between the years 2012 and 2017 for each geographic region in Massachusetts (indicated by \"MA\" at the end of GeoName)?", "type": "Bigquery"}
{"instance_id": "bq407", "instruction": "Find the top three counties with populations over 50,000, using the 2020 5-year census data, that had the highest COVID-19 case fatality rates on August 27, 2020. For these counties, provide the name, state, median age, total population, number of confirmed COVID-19 cases per 100,000 people, number of deaths per 100,000 people, and the case fatality rate as a percentage", "type": "Bigquery"}
{"instance_id": "bq453", "instruction": "What are the reference names, start positions, end positions, reference bases, alternate bases, variant types, chi-squared scores (calculated using Hardy-Weinberg equilibrium), and the observed and expected counts of homozygous reference, heterozygous, and homozygous alternate genotypes, including their allele frequencies and allele frequencies, for variants on chromosome 17 between positions 41196311 and 41277499?", "type": "Bigquery"}
{"instance_id": "bq008", "instruction": "What's the most common next page for visitors who were part of \"Data Share\" campaign and after they accessed the page starting with '/home' in January 2017. And what's the maximum duration time (in seconds) when they visit the corresponding home page?", "type": "Bigquery"}
{"instance_id": "bq030", "instruction": "As of May 10, 2020, which three countries with over 50,000 confirmed COVID-19 cases had the highest recovery rates? Please provide the list of these countries along with their respective recovery rates.", "type": "Bigquery"}
{"instance_id": "bq202", "instruction": "For the station with the most citibike trips in 2018, what are the peak day of the week (as a numeric value) and the peak hour of the day?", "type": "Bigquery"}
{"instance_id": "bq454", "instruction": "Identify the number of common autosomal variants (with an allele frequency \u2265 0.05) shared by different combinations of super populations, total population size for each super population, variant types, and sample counts. Exclude sex chromosomes (X, Y, MT) from the analysis.", "type": "Bigquery"}
{"instance_id": "bq462", "instruction": "Please provide the top five records since 2010 for NCAA basketball in four categories: largest venues (Date N/A), biggest championship margins since 2015, highest scoring games, and most three-pointers in a matchup (\"Team A vs Team B\").", "type": "Bigquery"}
{"instance_id": "bq006", "instruction": "What is the date with the second highest Z-score for daily counts of 'PUBLIC INTOXICATION' incidents in Austin for the year 2016? List the date in the format of '2016-xx-xx'.", "type": "Bigquery"}
{"instance_id": "bq234", "instruction": "What is the most prescribed medication in each state in 2014?", "type": "Bigquery"}
{"instance_id": "bq039", "instruction": "Which are the top 10 taxi trips in New York City from July 1 to July 7, 2016, with more than 5 passengers, a trip distance of at least 10 miles, and a positive fare, ranked by total fare amount? Display the pickup and dropoff zones, trip duration, driving speed in miles per hour, and tip rate. Note that you should avoid invalid items.", "type": "Bigquery"}
{"instance_id": "bq001", "instruction": "I wonder how many days between the first transaction and the first visit both in Feburary 2017 for each transacting visitor, along with the device used in the transaction.", "type": "Bigquery"}
{"instance_id": "bq277", "instruction": "Can you provide the name of the port that is most frequently within the geographical area of named tropical storms in the region of the code \u20186585\u2019 with winds of at least 35 knots in the North Atlantic basin, which is also located within a specific region and intersects with interstate roads?", "type": "Bigquery"}
{"instance_id": "bq045", "instruction": "Which weather stations in Washington State had more than 150 rainy days in 2023 but fewer rainy days than in 2022? Define a 'rainy day' as any day where the precipitation recorded is more than 0 millimeters.", "type": "Bigquery"}
{"instance_id": "bq284", "instruction": "Can you provide a breakdown of the total number of articles into different categories and the percentage of those articles that mention \"education\" within each category from the BBC News?", "type": "Bigquery"}
{"instance_id": "bq042", "instruction": "Help me analyze the weather conditions (including temperature, wind speed and precipitation) at NYC's airport LaGuardia for June 12, year over year, starting from 2011 to 2020.", "type": "Bigquery"}
{"instance_id": "bq270", "instruction": "What were the monthly add-to-cart and purchase conversion rates, calculated as a percentage of pageviews on product details, from January to March 2017?", "type": "Bigquery"}
{"instance_id": "bq089", "instruction": "Given the latest population estimates from the 2018 five-year American Community Survey, what is the number of vaccine sites per 1000 people for counties in California?", "type": "Bigquery"}
{"instance_id": "bq419", "instruction": "Which 5 states had the most storm events from 1980 to 1995, considering only the top 1000 states with the highest event counts each year? Please use state abbreviations.", "type": "Bigquery"}
{"instance_id": "bq074", "instruction": "Count the number of counties that experienced an increase in unemployment from 2015 to 2018, using 5-year ACS data, and a decrease in dual-eligible enrollee counts between December 1, 2015, and December 1, 2018.", "type": "Bigquery"}
{"instance_id": "bq279", "instruction": "Can you provide the number of distinct active and closed bike share stations for each year 2013 and 2014 in a chronological view?", "type": "Bigquery"}
{"instance_id": "bq087", "instruction": "Can you assess the collective percentage change in average search frequency for Anosmia symptoms across the five major boroughs of New York City from 2019 to 2020?", "type": "Bigquery"}
{"instance_id": "bq428", "instruction": "For the top five team markets with the highest number of distinct players who scored at least 15 points during the second period of games between 2010 and 2018, provide details of each game they played in NCAA basketball historical tournament matches during the same period, as specified in the data model document.", "type": "Bigquery"}
{"instance_id": "bq018", "instruction": "Which day in March and April had the highest COVID-19 confirmed case growth rate in the United States? The format is MM-DD.", "type": "Bigquery"}
{"instance_id": "bq011", "instruction": "How many pseudo users were active in the last 7 days but inactive in the last 2 days as of January 7, 2021?", "type": "Bigquery"}
{"instance_id": "bq086", "instruction": "What percentage of each country\u2019s population was confirmed to have COVID-19 as of June 30, 2020?", "type": "Bigquery"}
{"instance_id": "bq075", "instruction": "Can you provide a consolidated report that compares the racial and gender distribution across various data sources for the year 2021, focusing specifically on the overall workforce? This should include data from Google's hiring and workforce representation initiatives, as well as from BLS reports that target the technology sectors defined as Internet content broadcasting and 'Computer systems design and related services\u2018.", "type": "Bigquery"}
{"instance_id": "bq081", "instruction": "Find the latest ride data for each region between 2014 and 2017. I want to know the name of each region, the trip ID of this ride, the ride duration, the start time, the starting station, and the gender of the rider.", "type": "Bigquery"}
{"instance_id": "bq278", "instruction": "Can you provide a detailed comparison of the solar potential for each state, distinguishing between postal code and census tract levels? Include the number of buildings available for solar installations, the percentage covered by Project Sunroof, the percentage suitable for solar, total potential panel count, total kilowatt capacity, energy generation potential, carbon dioxide offset, and the gap in potential installations.", "type": "Bigquery"}
{"instance_id": "bq427", "instruction": "Can you find the average x and y coordinates, the average number of shot attempts, and the average number of successful shots for the most frequent score delta interval in each shot type, considering only shots taken before March 15, 2018, and ensuring that the shots are on the correct side of the court based on the team's basket?", "type": "Bigquery"}
{"instance_id": "bq285", "instruction": "Could you provide me with the zip code of the location that has the highest number of bank institutions in Florida?", "type": "Bigquery"}
{"instance_id": "bq418", "instruction": "What are the counts of targets and non-targets within and outside the top 3 pathways with the highest chi-squared statistic for target species 'homo sapiens' associated with 'sorafenib' only? The targets should meet specific assay conditions (median assay value \u2264 100, with assay values below and above also \u2264 100 or NULL if not applicable), and only pathways with TAS evidence at the lowest level should be considered.", "type": "Bigquery"}
{"instance_id": "bq088", "instruction": "Can you provide the average levels of anxiety and depression symptoms from the weekly country data in the United States for the years 2019 and 2020, and calculate the percentage increase in these symptoms from 2019 to 2020?", "type": "Bigquery"}
{"instance_id": "bq282", "instruction": "Can you tell me the numeric value of the active council district in Austin which has the highest number of bike trips that start and end within the same district, but not at the same station?", "type": "Bigquery"}
{"instance_id": "bq010", "instruction": "Find the top-selling product among customers who bought 'Youtube Men\u2019s Vintage Henley' in July 2017, excluding itself.", "type": "Bigquery"}
{"instance_id": "bq445", "instruction": "Find the start and end positions of the BRCA1 gene, and retrieve the first missense variants based on their protein positions within this region. The variants must have a consequence type of \"missense_variant\". Using data from the gnomAD v2.1.1 version.", "type": "Bigquery"}
{"instance_id": "bq021", "instruction": "For the top 20 Citi Bike routes in 2016, which route is faster than yellow taxis and among those, which one has the longest average bike duration? Please provide the start station name of this route. The coordinates are rounded to three decimals.", "type": "Bigquery"}
{"instance_id": "bq019", "instruction": "For the most common inpatient diagnosis in the US in 2014, what was the citywise average payment respectively in the three cities that had the most cases?", "type": "Bigquery"}
{"instance_id": "bq442", "instruction": "Please collect the information of the top 6 trade report with the highest closing prices. Refer to the document for all the information I want.", "type": "Bigquery"}
{"instance_id": "bq366", "instruction": "What are the top three most frequently associated labels with artworks from each historical period in The Met's collection, only considering labels linked to 50 or more artworks? Provide me with the period, label, and the associated count.", "type": "Bigquery"}
{"instance_id": "bq392", "instruction": "What are the top 3 dates in October 2009 with the highest average temperature for station number 723758, in the format YYYY-MM-DD?", "type": "Bigquery"}
{"instance_id": "bq395", "instruction": "Which 5 states' percentage change in unsheltered homeless individuals from 2015 to 2018 were top 5 closest to the national average? Please provide the state abbreviation.", "type": "Bigquery"}
{"instance_id": "bq198", "instruction": "What are the top 5 most successful college basketball teams over the seasons from 1900 to 2000, based on the number of times they had the maximum wins in a season?", "type": "Bigquery"}
{"instance_id": "bq165", "instruction": "Can you use CytoConverter genomic coordinates to calculate the frequency of chromosomal gains and losses across a cohort of breast cancer (morphology='3111') and adenocarcinoma (topology='0401') samples? Concretely, please include the number and frequency (2 decimals in percentage) of amplifications (gains of more than 1 copy), gains (1 extra copy), losses (1 copy) and homozygous deletions (loss of 2 copies) for each chromosomal band. And sort the result by the ordinal of each chromosome and the starting-ending base-pair position of each band in ascending order.", "type": "Bigquery"}
{"instance_id": "bq357", "instruction": "What are the latitude and longitude coordinates and dates between 2005 and 2015 with the top 5 highest daily average wind speeds, excluding records with missing wind speed values? Using data from tables start with prefix \"icoads_core\".", "type": "Bigquery"}
{"instance_id": "bq350", "instruction": "For the detailed molecule data, Please display the drug id, drug type and withdrawal status for approved drugs with a black box warning and known drug type among 'Keytruda', 'Vioxx', 'Premarin', and 'Humira'", "type": "Bigquery"}
{"instance_id": "bq162", "instruction": "According to the 5th revision (r5) of the HTAN data, list the different imaging assay types and their respective data levels (Level1, Level2, Level3, Level4) available at the HTAN WUSTL center. Exclude any records where the 'Component' is NULL or contains 'Auxiliary' or 'OtherAssay'. For each assay, provide the available data levels.", "type": "Bigquery"}
{"instance_id": "bq109", "instruction": "Find the average, variance, max-min difference, and the QTL source(right study) of the maximum log2(h4/h3) for data where right gene id is \"ENSG00000169174\", h4 > 0.8, h3 < 0.02, reported trait includes \"lesterol levels\", right biological feature is \"IPSC\", and the variant is '1_55029009_C_T'.", "type": "Bigquery"}
{"instance_id": "bq304", "instruction": "What are the top 50 most viewed 'how' questions for each of the following Android-related tags on StackOverflow: 'android-layout', 'android-activity', 'android-intent', 'android-edittext', 'android-fragments', 'android-recyclerview', 'listview', 'android-actionbar', 'google-maps', and 'android-asynctask'? Ensure that each tag has at least 50 questions and exclude any questions containing terms typically associated with troubleshooting, such as 'fail', 'problem', 'error', 'wrong', 'fix', 'bug', 'issue', 'solve', or 'trouble'.", "type": "Bigquery"}
{"instance_id": "bq303", "instruction": "What are the user IDs and tags for comments, answers, and questions posted by users with IDs between 16712208 and 18712208 on Stack Overflow during July to December 2019?", "type": "Bigquery"}
{"instance_id": "ga012", "instruction": "Find the transaction IDs, total item quantities, and purchase revenues for the item category with the highest tax rate on November 30, 2020.", "type": "Bigquery"}
{"instance_id": "bq356", "instruction": "What is the number of weather stations where the valid temperature record days in 2019 reached 90% or more of the maximum number of recorded days, and have had tracking back to 1/1/2000 or before and through at least 6/30/2019 according to the field 'begin' and 'end'?", "type": "Bigquery"}
{"instance_id": "bq360", "instruction": "Which of the top 10 most common healthcare provider specializations in Mountain View, CA, has a specialist count closest to the average of these ten specializations?", "type": "Bigquery"}
{"instance_id": "bq394", "instruction": "What are the top 3 months between 2010 and 2014 with the smallest sum of absolute differences between the average air temperature, wet bulb temperature, dew point temperature, and sea surface temperature, including respective years and sum of differences? Please present the year and month in numerical format.", "type": "Bigquery"}
{"instance_id": "bq199", "instruction": "Identify the top 10 liquor categories in Iowa by average price per liter in 2021, and provide their average prices per liter for 2019, 2020, and 2021.", "type": "Bigquery"}
{"instance_id": "bq393", "instruction": "Can you tell me the ID and corresponding month number of the user with the highest month number who became inactive after their last recorded activity month, considering data only up until September 10, 2024?", "type": "Bigquery"}
{"instance_id": "ga014", "instruction": "Please tell me the number of sessions for each website traffic channel in December 2020.", "type": "Bigquery"}
{"instance_id": "ga013", "instruction": "I want to know all the pages visited by user 1402138.5184246691 on January 2, 2021. Please show the names of these pages and adjust the names to PDP or PLP where necessary.", "type": "Bigquery"}
{"instance_id": "bq130", "instruction": "Analyze daily new COVID-19 case counts from March to May 2020, identifying the top five states by daily increases. Please compile a ranking based on how often each state appears in these daily top fives. Then, examine the state that ranks fourth overall and identify its top five counties based on their frequency of appearing in the daily top five new case counts.", "type": "Bigquery"}
{"instance_id": "bq302", "instruction": "What is the monthly proportion of Stack Overflow questions tagged with 'python' in the year 2022?", "type": "Bigquery"}
{"instance_id": "ga025", "instruction": "For all users who first opened the app in September 2018 and then uninstalled within seven days, I want to know what percentage of them experienced an app crash.", "type": "Bigquery"}
{"instance_id": "bq108", "instruction": "Calculate the percentage of traffic accidents in 2015 from January to August that involved multiple people and had multiple instances of severe injuries (injury severity 4).", "type": "Bigquery"}
{"instance_id": "ga022", "instruction": "Could you please help me get the weekly customer retention rate in September 2018 for new customers who first used our app within the first week starting from September 1st, 2018 (timezone in Shanghai)? The retention rates should cover the following 3-week period after the initial use and display them in column format.", "type": "Bigquery"}
{"instance_id": "bq305", "instruction": "Identify the top 10 users by the total view count of their associated questions. Include users who own a question, provide an accepted answer, have an answer with a score above 5, rank in the top 3 for a question, or have an answer with a score over 20% of the total answer score for that question. Use these criteria to determine the questions and answers to include in the view count calculation.", "type": "Bigquery"}
{"instance_id": "bq137", "instruction": "Find details about zip code areas within 10 kilometers of the coordinates (-122.3321, 47.6062), including their geographic polygons, land and water area in meters, latitude and longitude points, state code, state name, city, county, and population from the 2010 census data.", "type": "Bigquery"}
{"instance_id": "bq115", "instruction": "Which country has the highest percentage of population under the age of 25 in 2017?", "type": "Bigquery"}
{"instance_id": "bq327", "instruction": "How many debt indicators for Russia have a value of 0, excluding NULL values?", "type": "Bigquery"}
{"instance_id": "bq112", "instruction": "Did the increase on average annual wages for all industries in Allegheny County, Pittsburgh keep pace with inflation of all consumer items between 1998 and 2017? Tell me their growth rates respectively (2 decimals).", "type": "Bigquery"}
{"instance_id": "ga007", "instruction": "Please find out what percentage of the page views on January 2, 2021, were for PDP type pages.", "type": "Bigquery"}
{"instance_id": "bq124", "instruction": "Can you identify how many alive patients, currently managing chronic conditions such as diabetes or hypertension, are prescribed seven or more medications?", "type": "Bigquery"}
{"instance_id": "ga031", "instruction": "I want to know our user session conversion rate on January 2nd, 2021, calculated as the percentage ratio of user visits that reached both the Home and Checkout Confirmation page in one session to those landed on the Home page.", "type": "Bigquery"}
{"instance_id": "bq123", "instruction": "Which day of the week has the third highest percentage of questions answered within an hour? Please tell me the day along with the percentage.", "type": "Bigquery"}
{"instance_id": "ga009", "instruction": "I want to know the average number of engaged sessions per user of December 2020.", "type": "Bigquery"}
{"instance_id": "bq177", "instruction": "For the provider with the highest total inpatient service cost from 2011-2015, tell me its annual inpatient and outpatient revenues averaged by case for each year during that period.", "type": "Bigquery"}
{"instance_id": "bq389", "instruction": "Please calculate the monthly average levels of PM10, PM2.5 FRM, PM2.5 non-FRM, volatile organic emissions, SO2 (scaled by a factor of 10), and Lead (scaled by a factor of 100) air pollutants in California for the year 2020.", "type": "Bigquery"}
{"instance_id": "bq374", "instruction": "Calculates the percentage of new users who, between August 1, 2016, and April 30, 2017, both stayed on the site for more than 5 minutes during their initial visit and made a purchase on a subsequent visit at any later time, relative to the total number of new users in the same period.", "type": "Bigquery"}
{"instance_id": "bq310", "instruction": "What is the title of the most viewed \"how\" question related to Android development on StackOverflow, across specified tags such as 'android-layout', 'android-activity', 'android-intent', and others", "type": "Bigquery"}
{"instance_id": "ga008", "instruction": "Can you give me the average page views per buyer and total page views among those buyers for each day in November 2020?", "type": "Bigquery"}
{"instance_id": "ga030", "instruction": "Can you group users by the week they first used the app starting from July 2, 2018 and show which group has the most active users remained in the next four weeks, with each group named by the Monday date of that week? Please answer in the format of \" YYYY-MM-DD\".", "type": "Bigquery"}
{"instance_id": "bq328", "instruction": "Which region has the highest median GDP (constant 2015 US$) value?", "type": "Bigquery"}
{"instance_id": "ga006", "instruction": "Provide the IDs and the average purchase value (in USD) per session for users who were engaged in multiple purchase sessions in November 2020.", "type": "Bigquery"}
{"instance_id": "bq113", "instruction": "Which Utah county has witnessed the greatest percentage increase of construction jobs from 2000 to 2018? And what is the corresponding increase rate?", "type": "Bigquery"}
{"instance_id": "bq326", "instruction": "How many countries saw increases of more than 1% in both their population and per capita current health expenditures (adjusted for purchasing power parity) in 2018 based on the world bank health and global population data", "type": "Bigquery"}
{"instance_id": "bq114", "instruction": "What are the top three cities where the difference between the PM2.5 measurements in 1990 from the EPA and in 2020 from OpenAQ is the greatest, given that the locations are matched with latitude and longitude rounded to two decimal places?", "type": "Bigquery"}
{"instance_id": "ga001", "instruction": "I want to know the preferences of customers who purchased the Google Navy Speckled Tee in December 2020. What other product was purchased with the highest total quantity alongside this item?", "type": "Bigquery"}
{"instance_id": "bq185", "instruction": "What is the average valid trip duration (in minutes) for yellow taxi rides in Brooklyn with more than 3 passengers and a trip distance of at least 10 miles between February 1 and February 7, 2016?", "type": "Bigquery"}
{"instance_id": "bq004", "instruction": "What's the most popular other purchased product in July 2017 with consumers who bought products relevant to YouTube?", "type": "Bigquery"}
{"instance_id": "bq003", "instruction": "Compare the average pageviews per visitor between purchase and non-purchase sessions for each month from April to July in 2017.", "type": "Bigquery"}
{"instance_id": "bq451", "instruction": "Extract genotype data for single nucleotide polymorphisms (SNPs) from chromosome X , ensuring that the start positions are not between 59999 and 2699519 nor between 154931042 and 155260559. Output the sample ID, counts of homozygous reference alleles, homozygous alternate alleles, heterozygous alternate alleles, the total number of callable sites, the total number of SNVs, the percentage of heterozygous alternate alleles among all SNVs, and the percentage of homozygous alternate alleles among all SNVs.", "type": "Bigquery"}
{"instance_id": "bq035", "instruction": "What is the total distance traveled by each bike in the San Francisco Bikeshare program? Use data from bikeshare trips and stations to calculate this.", "type": "Bigquery"}
{"instance_id": "bq032", "instruction": "Can you provide the latitude of the final coordinates for the hurricane that traveled the second longest distance in the North Atlantic during 2020?", "type": "Bigquery"}
{"instance_id": "bq200", "instruction": "Show the full name of the fastest pitcher on each team with their maximum valid pitch speed, using both regular and post-season data", "type": "Bigquery"}
{"instance_id": "bq059", "instruction": "What is the highest average speed (rounded to 1 decimal, in metric m/s) for bike trips in Berkeley with trip distance greater than 1000 meters?", "type": "Bigquery"}
{"instance_id": "bq066", "instruction": "Could you assess the relationship between the poverty rates from the previous year's census data and the percentage of births without maternal morbidity for the years 2016 to 2018? Use only data for births where no maternal morbidity was reported and for each year, use the 5-year census data from the year before to compute the Pearson correlation coefficient", "type": "Bigquery"}
{"instance_id": "bq402", "instruction": "What is the conversion rate from unique visitors to purchasers, where purchasers are defined as visitors with at least one transaction? Additionally, what is the average number of transactions per purchaser?", "type": "Bigquery"}
{"instance_id": "bq061", "instruction": "Which census tract has witnessed the largest increase in median income between 2015 and 2018 in California? Tell me the tract code.", "type": "Bigquery"}
{"instance_id": "bq095", "instruction": "Generate a list of drugs from the table containing molecular details that have completed clinical trials for pancreatic endocrine carcinoma, disease ID EFO_0007416. Please include each drug's name, the target approved symbol, and links to the relevant clinical trials.", "type": "Bigquery"}
{"instance_id": "bq457", "instruction": "Get details of repositories that use specific feature toggle libraries. For each repository, include the full name with owner, hosting platform type, size in bytes, primary programming language, fork source name (if any), last update timestamp, the artifact and library names of the feature toggle used, and the library's programming languages. Include repositories that depend on the specified feature toggle libraries, defined by their artifact names, library names, platforms, and languages.", "type": "Bigquery"}
{"instance_id": "bq034", "instruction": "I want to know the IDs, names of weather stations within a 50 km straight-line distance from the center of Chicago (41.8319\u00b0N, 87.6847\u00b0W)", "type": "Bigquery"}
{"instance_id": "bq002", "instruction": "What's the maximum monthly, weekly, and daily product revenues (in millions) generated by the top-performing traffic source in the first half of 2017?", "type": "Bigquery"}
{"instance_id": "bq230", "instruction": "Find the total 2022 production figures in bushels for corn from the field crops category and mushrooms from the horticulture group for each U.S. state using the USDA NASS Agriculture Crops dataset? The data should include only records that show production as the statistic category, are measured at the state level, and have no missing values.", "type": "Bigquery"}
{"instance_id": "bq461", "instruction": "Could you provide a summary of the scoring plays for the Wildcats and the Fighting Irish in their 2014 season game?", "type": "Bigquery"}
{"instance_id": "bq208", "instruction": "Can you provide weather stations within a 20-mile radius of Chappaqua, New York (Latitude: 41.197, Longitude: -73.764), and tell me the number of valid temperature observations they have recorded from 2011 to 2020?", "type": "Bigquery"}
{"instance_id": "bq051", "instruction": "Get the average number of trips on rainy and non-rainy days in New York City during 2016, using data from the closest weather station located near the coordinates (longitude: -74.0060, latitude: 40.7128). Define a 'rainy day' as any day where the precipitation recorded is more than 0.5 millimeters.", "type": "Bigquery"}
{"instance_id": "bq290", "instruction": "Can you calculate the difference in maximum temperature, minimum temperature, and average temperature between US and UK weather stations for each day in October 2023, excluding records with missing temperature values?", "type": "Bigquery"}
{"instance_id": "bq432", "instruction": "Could you provide me with the cleansed data of food events in January 2015 as listed in the cleansing documentation?", "type": "Bigquery"}
{"instance_id": "bq094", "instruction": "Please display committees from 2016 that supported candidates and received small-dollar donations over $0 (under $200 each). For each qualifying committee, list the committee name, number of supported candidates, the candidates' names (in alphabetical order, separated by commas), and total small-dollar donations dollars.", "type": "Bigquery"}
{"instance_id": "bq060", "instruction": "Which top 3 countries had the highest net migration in 2017 among those with an area greater than 500 square kilometers? And what are their migration rates?", "type": "Bigquery"}
{"instance_id": "bq067", "instruction": "I want to build a ML model which can predict whether there will be more than one fatality in a crash invloving 2 or more people. Construct a labelled (0 or 1) dataset for me, and the predictors include the state, vehicle type, the number of drunk drivers, day of the week, hour of the day and another two engineered features, whether the accident happened in the work zone and the average absolute difference between travel speed and speed limit. Please use numeric value for each predictor and categorize the speed difference into levels 0 to 4 based on 20MPH increments (lower bound inclusive while upper exclusive).", "type": "Bigquery"}
{"instance_id": "bq403", "instruction": "Which three years in 2012-2017 have the smallest absolute difference between median revenue and median functional expenses for organizations filing IRS 990 forms? Please output three years and respective differences.", "type": "Bigquery"}
{"instance_id": "sf_bq390", "instruction": "Please provide the study instance UIDs for studies that include both T2-weighted axial magnetic resonance imaging and anatomical structure segmentations of the peripheral zone, in prostate repeatability collection.", "type": "Snowflake"}
{"instance_id": "sf_bq156", "instruction": "Compute the t-score (rounded to 2 decimals) to compare the difference in mean expression levels of gene DRG2 between two groups (TP53 mutated vs. non-mutated) in the Lower Grade Glioma study. Note that, categorical groups with number of samples smaller than 10 or zero variance should be ignored. Refer to `t_score.md` about how to compute t-score.", "type": "Snowflake"}
{"instance_id": "sf_bq158", "instruction": "Which top five histological types of breast cancer (BRCA) in the PanCancer Atlas exhibit the highest percentage of CDH1 gene mutations?", "type": "Snowflake"}
{"instance_id": "sf_bq167", "instruction": "Please find the giver-and-recipient pair with the most Kaggle forum upvotes. Display their usernames and the respective number of upvotes they gave to each other.", "type": "Snowflake"}
{"instance_id": "sf_bq193", "instruction": "Help me retrieve the top 5 most frequently occurring non-empty, non-commented lines of text in `readme.md` files from GitHub repositories that primarily use Python for development.", "type": "Snowflake"}
{"instance_id": "sf_bq194", "instruction": "What is the second most frequently used module (imported library) across Python, R, and IPython script (.ipynb) files in the GitHub sample dataset?", "type": "Snowflake"}
{"instance_id": "sf_bq160", "instruction": "Please tell me the creation date, title, parent forum, reply count, distinct user count, upvotes, and total views for the earliest five forum topics belonging to the parent forum named \"general\". Any value that is None should be regarded as 0.", "type": "Snowflake"}
{"instance_id": "sf_bq331", "instruction": "Who are the top three users whose forum message scores are closest to the average score, based on the absolute difference between their scores and the average score across all forum topics?", "type": "Snowflake"}
{"instance_id": "sf_bq104", "instruction": "Identify which DMA had the highest search scores for the terms that were top rising one year ago", "type": "Snowflake"}
{"instance_id": "sf_bq135", "instruction": "Which date before 2022 had the highest total transaction amount in the Zilliqa blockchain data?", "type": "Snowflake"}
{"instance_id": "sf_bq307", "instruction": "Find the top 10 most common first gold badges on Stack Overflow, showing how many users earned each and the average days from account creation to earning the badge.", "type": "Snowflake"}
{"instance_id": "sf_bq195", "instruction": "What are the top 10 Ethereum addresses by balance, considering both value transactions and gas fees, before September 1, 2021? Only keep successful transactions with no call type or where the call type is 'call'.", "type": "Snowflake"}
{"instance_id": "sf_bq159", "instruction": "Calculate the chi-square value to assess the association between histological types and the presence of CDH1 gene mutations in BRCA patients using data from the PanCancer Atlas. Focus on patients with known histological types and consider only reliable mutation entries. Exclude any histological types or mutation statuses with marginal totals less than or equal to 10. Match clinical and mutation data using ParticipantBarcode", "type": "Snowflake"}
{"instance_id": "sf_bq192", "instruction": "Which repository that has a license of either \"artistic-2.0\", \"isc\", \"mit\", or \"apache-2.0\", contains Python files in the master branch, and has the highest combined count of forks, issues, and watch events in April 2022?", "type": "Snowflake"}
{"instance_id": "sf_bq166", "instruction": "Analyze the largest copy number of chromosomal aberrations including amplifications, gains, homozygous deletions, heterozygous deletions, and normal copy states across cytogenetic bands in TCGA-KIRC kidney cancer samples. Use segment allelic data to identify the maximum copy number aberrations within each chromosomal segment, and report their frequencies, sorted by chromosome and cytoband.", "type": "Snowflake"}
{"instance_id": "sf_bq150", "instruction": "Assess whether different genetic variants affect the log10-transformed TP53 expression levels in TCGA-BRCA samples using sequencing and mutation data. Provide the total number of samples, the number of mutation types, the mean square between groups, the mean square within groups, and the F-statistic.", "type": "Snowflake"}
{"instance_id": "sf_bq157", "instruction": "Please help me compute the T score to show the statistical difference in the expression of the DRG2 gene between LGG patients with and without TP53 mutation. You could refer to the markdown file for the formula.", "type": "Snowflake"}
{"instance_id": "sf_bq217", "instruction": "How many pull requests in total were created in repositories that include JavaScript as one of their languages, considering data from January 18, 2023?", "type": "Snowflake"}
{"instance_id": "sf_bq210", "instruction": "How many US B2 patents granted between 2008 and 2018 contain claims that do not include the word 'claim'?", "type": "Snowflake"}
{"instance_id": "sf_bq014", "instruction": "Can you help me figure out the revenue for the product category that has the highest number of customers making a purchase in their first order?", "type": "Snowflake"}
{"instance_id": "sf_bq226", "instruction": "Can you find me the complete url of the most frequently used sender's address on the Cronos blockchain since January 1, 2023, where transactions were made to non-null addresses and in blocks larger than 4096 bytes?", "type": "Snowflake"}
{"instance_id": "sf009", "instruction": "A real estate company is looking for a comparison of the building types in Amsterdam and Rotterdam. They need to know the total surface area and the number of buildings for each type of building in both cities. Can you provide the building class and subclass, along with the total surface area and the number of buildings for both Amsterdam and Rotterdam?", "type": "Snowflake"}
{"instance_id": "sf_bq219", "instruction": "Which two liquor categories, each contributing an average of at least 1% to monthly sales volume over 24 months, have the lowest Pearson correlation coefficient in their sales percentages?", "type": "Snowflake"}
{"instance_id": "sf_bq221", "instruction": "Identify the CPC technology areas with the highest exponential moving average of patent filings each year (smoothing factor 0.2), and provide the full title and the best year for each CPC group at level 5.", "type": "Snowflake"}
{"instance_id": "sf_bq423", "instruction": "Please provide the page URL of the image-type ad published by advertisers from region CY, on the topic of health, that has the highest upper bound of times shown. This ad should have demographic information, geo location, contextual signals, customer lists, and topics of interest. The ad should be shown in region Croatia (region code HR) and must have been displayed between January 1, 2023, and January 1, 2024. Additionally, the advertiser must be verified.", "type": "Snowflake"}
{"instance_id": "sf_bq272", "instruction": "Please provide me with the names of the top three most profitable products for each month between January 2019 and August 2022, excluding any products that were either canceled or returned.", "type": "Snowflake"}
{"instance_id": "sf_bq412", "instruction": "Please provide the page URLs, first shown time, last shown time, removal reason, violation category, and lower and upper bound shown times for the most recent five closed ads in the Croatia region which had shown higher than 10,000 and lower than 25,000, and used at least one audience criterion such as demographics, geographic location, contextual signals, customer lists, or interest topics. The region code of Croatia is HR.", "type": "Snowflake"}
{"instance_id": "sf_bq415", "instruction": "List the samples in the genome data that rank in the top 10 for the number of homozygous reference genotypes, considering only the primary reference allele, ordered in descending sequence.", "type": "Snowflake"}
{"instance_id": "sf_bq071", "instruction": "What are the zip codes of the areas in the United States along with the number of times they have been affected by the named hurricanes, ordered by the number of occurences?", "type": "Snowflake"}
{"instance_id": "sf_bq012", "instruction": "What is the average balance of the top 10 addresses with the most balance on the Ethereum blockchain, considering both incoming and outgoing transactions with valid addresses, but only those recorded as used on receipt, as well as transaction fees? Only keep successful transactions with no call type or where the call type is 'call'. The average balance, expressed in quadrillions (10^15), is rounded to two decimal places.", "type": "Snowflake"}
{"instance_id": "sf008", "instruction": "Determine the percentage change in gross income inflow and the seasonally-adjusted purchase-only home price index for the Phoenix-Mesa-Scottsdale, AZ Metro Area from January 1, 2023, to December 31, 2023. Gross income inflow refers to the total adjusted gross income from all financial entities within the specified metro area", "type": "Snowflake"}
{"instance_id": "sf037", "instruction": "Calculate the shortest driving distance in miles between each 'The Home Depot' store, identified by its POI ID, and its nearest 'Lowe's Home Improvement' store", "type": "Snowflake"}
{"instance_id": "sf_bq211", "instruction": "Among patents granted between 2010 and 2023 in CN, how many of them belong to families that have a total of over one distinct applications?", "type": "Snowflake"}
{"instance_id": "sf001", "instruction": "Assuming today is April 1, 2024, I would like to know the daily snowfall amounts greater than 6 inches for each U.S. postal code during the week ending after the first two full weeks of the previous year. Show the postal code, date, and snowfall amount.", "type": "Snowflake"}
{"instance_id": "sf_bq216", "instruction": "Identify the top five patents filed in the same year as `US-9741766-B2` that are most similar to it based on technological similarities. Please provide the publication numbers.", "type": "Snowflake"}
{"instance_id": "sf006", "instruction": "Calculate the percentage change in the number of active financial branch entities for each state, comparing the counts on March 1, 2020, and December 31, 2021. Active entities are those operational on the specified dates.", "type": "Snowflake"}
{"instance_id": "sf_bq289", "instruction": "Can you find the shortest distance between any two amenities (either a library, place of worship, or community center) located within Philadelphia?", "type": "Snowflake"}
{"instance_id": "sf_bq070", "instruction": "Could you construct a structured clean dataset from `dicom_all` for me? It should retrieve digital slide microscopy (SM) images from the TCGA-LUAD and TCGA-LUSC datasets and meet the requirements in `dicom_dataset_selection.md`. The target labels are tissue type and cancer subtype.", "type": "Snowflake"}
{"instance_id": "sf_bq084", "instruction": "Please count the monthly transaction numbers and transactions per second for each month in 2023, and arrange them in descending order of monthly transaction count.", "type": "Snowflake"}
{"instance_id": "sf_bq083", "instruction": "What is the daily change in the total market value (formatted as a string in USD currency format) of the USDC token (with a target address of \"0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48\") in 2023 , considering both Mint (the input starts with 0x42966c68) and Burn (the input starts with 0x40c10f19) transactions?", "type": "Snowflake"}
{"instance_id": "sf_bq273", "instruction": "Can you list the top 5 months from August 2022 to November 2023 where the profit from Facebook-sourced completed orders showed the largest month-over-month increase? Calculate profit as sales minus costs.", "type": "Snowflake"}
{"instance_id": "sf_bq422", "instruction": "What are the average series sizes in MiB for the top 3 patients with the highest slice interval difference tolerance and the top 3 patients with the highest maximum exposure difference, considering only CT images from the 'nlst' collection?", "type": "Snowflake"}
{"instance_id": "sf_bq256", "instruction": "Determine the Ether balance of the Ethereum address that initiated the highest number of successful transactions before September 1, 2021. Exclude specific contract call types and account for all relevant incoming and outgoing transactions. Present the balance in Ether by converting from its native unit.", "type": "Snowflake"}
{"instance_id": "sf_bq251", "instruction": "Could you find the GitHub URL of the Python package that has the highest number of downloads on PyPi and was updated most recently? Please ensure that only the main repository URL is provided, excluding specific subsections like issues, blobs, pull requests, or tree views.", "type": "Snowflake"}
{"instance_id": "sf_bq063", "instruction": "What is the github URL of the latest released package from the NPM system that have the highest number of dependencies? Exlcude those package whose names contain character '@' or the URL label is not 'SOURCE_REPO'.", "type": "Snowflake"}
{"instance_id": "sf041", "instruction": "Produce a report for ERCOT on October 1, 2022, that combines hourly data on day-ahead and real-time prices from node ID 10000697078, load forecasts (datatypeid 19060) and actual loads, plus wind (forecast datatypeid 9285, actual datatypeid 16) and solar (forecast datatypeid 662, actual datatypeid 650) generation forecasts and actuals from object ID 10000712973. This report should include time zone alignments, peak classifications, and net load calculations, providing insights into daily operational dynamics and efficiency.", "type": "Snowflake"}
{"instance_id": "sf_bq258", "instruction": "Generate a monthly report for each product category detailing the month-over-month percentage growth in revenue and orders, along with the monthly total cost, profit, and profit-to-cost ratio for orders that were completed and delivered before 2022", "type": "Snowflake"}
{"instance_id": "sf_bq052", "instruction": "I wonder which patents within CPC subsection 'C05' or group 'A01G' in the USA have at least one forward or backward citations within one month of their application dates. Give me the ids, titles, application date, forward/backward citation counts and summary texts.", "type": "Snowflake"}
{"instance_id": "sf_bq260", "instruction": "Find the total number of youngest and oldest users separately for each gender in the e-commerce platform created from January 1, 2019, to April 30, 2022.", "type": "Snowflake"}
{"instance_id": "sf_bq294", "instruction": "Can you provide the details of the top 5 longest bike share trips that started during the second half of 2017, including the trip ID, duration in seconds, start date, start station name, route (start station to end station), bike number, subscriber type, member's birth year, age, age classification, gender, and the region name of the start station? Please exclude trips where the start station name, member's birth year, or member's gender is not specified.", "type": "Snowflake"}
{"instance_id": "sf_bq099", "instruction": "For patent class A01B3, I want to analyze the information of the top 3 assignees based on the total number of applications. Please provide the following five pieces of information: the name of this assignee, total number of applications, the year with the most applications, the number of applications in that year, and the country code with the most applications during that year.", "type": "Snowflake"}
{"instance_id": "sf_bq233", "instruction": "Can you find the imported Python modules and R libraries from the GitHub sample files and list them along with their occurrence counts? Please sort the results by language and then by the number of occurrences in descending order.", "type": "Snowflake"}
{"instance_id": "sf_bq037", "instruction": "About the refined human genetic variations collected in phase 3 on 2015-02-20, I want to know the minimum and maximum start positions as well as the proportions of these two respectively for reference bases 'AT' and 'TA'.", "type": "Snowflake"}
{"instance_id": "sf012", "instruction": "What were the total amounts of building and contents damage reported under the National Flood Insurance Program in the City of New York for each year from 2010 to 2019?", "type": "Snowflake"}
{"instance_id": "sf_bq295", "instruction": "Among the repositories from the GitHub Archive which include a Python file with less than 15,000 bytes in size and a keyword 'def' in the content, find the top 3 that have the highest number of watch events in 2017?", "type": "Snowflake"}
{"instance_id": "sf_bq261", "instruction": "What is the total cost and profit of products whose profit rank first per month sorted chronologically? Only consider data before 2024.", "type": "Snowflake"}
{"instance_id": "sf_bq266", "instruction": "Can you provide me with the names of the products that had the lowest profit margin each month throughout the year 2020, excluding any months where this data isn't available? Please list them in chronological order based on the month.", "type": "Snowflake"}
{"instance_id": "sf_bq292", "instruction": "Please analyse the monthly aggregated statistics and percentages of potential CoinJoin transactions within the Bitcoin network since July 1, 2023 as below. How much percent of monthly Bitcoin txs that were CoinJoins and what about the Bitcoin utxos? Also please show me the percent of monthly Bitcoin volume that took place in CoinJoined transactions.", "type": "Snowflake"}
{"instance_id": "sf_bq259", "instruction": "Can you provide the percentage of users who made a purchase in the first, second, third and four months after their initial purchase, organized by the month of their first purchase, using data from until the end of 2022?", "type": "Snowflake"}
{"instance_id": "sf_bq062", "instruction": "What is the most frequently used license by packages in each system?", "type": "Snowflake"}
{"instance_id": "sf_bq250", "instruction": "What is the total population living on the geography grid which is the farthest from any hospital in Singapore, based on the most recent population data before 2023? Note that geographic grids and distances are calculated based on geospatial data and GIS related functions.", "type": "Snowflake"}
{"instance_id": "sf040", "instruction": "Find the top 10 northernmost addresses in Florida's largest zip code area. What are their address numbers, street names, and types?", "type": "Snowflake"}
{"instance_id": "sf_bq091", "instruction": "In which year did the assignee with the most applications in the patent category 'A61' file the most?", "type": "Snowflake"}
{"instance_id": "sf_bq065", "instruction": "Provide the most recent 10 results of symbols and their corresponding rates, adjusted for the multiplier, from oracle requests with the script ID 3.", "type": "Snowflake"}
{"instance_id": "sf013", "instruction": "Determine the total length of roads for each class and subclass in Amsterdam and Rotterdam, based on specific QUADKEY segments '12020210' and '12020211'? Show the class, subclass, and total road lengths for both cities", "type": "Snowflake"}
{"instance_id": "sf_bq455", "instruction": "Find the top 5 CT scan series ID, including their series number, patient ID, and series size (in MiB), where the series are not classified as 'LOCALIZER' or have the specific JPEG compressed transfer syntaxes '1.2.840.10008.1.2.4.70' or '1.2.840.10008.1.2.4.51'. The series must have consistent slice intervals, exposure levels, image orientation, pixel spacing, image positions, and pixel dimensions. Additionally, the z-axis of the image orientation must align with the expected plane (dot product between 0.99 and 1.01).", "type": "Snowflake"}
{"instance_id": "sf014", "instruction": "What is the New York State ZIP code with the highest number of commuters traveling over one hour, according to 2021 ACS data? Include the zip code, the total commuters, state benchmark for this duration, and state population.", "type": "Snowflake"}
{"instance_id": "sf_bq036", "instruction": "What was the average number of GitHub commits made per month in 2016 for repositories containing Python code?", "type": "Snowflake"}
{"instance_id": "sf_bq007", "instruction": "Identify the top 10 U.S. states with the highest vulnerable population, calculated based on a weighted sum of employment sectors using 2017 ACS 5-Year data, and determine their average median income change from 2015 to 2018 using zip code data. ", "type": "Snowflake"}
{"instance_id": "sf_bq175", "instruction": "Identify cytoband names on chromosome 1 in the TCGA-KIRC segment allelic dataset where the frequency of amplifications, gains, and heterozygous deletions each rank within the top 11. Calculate these rankings based on the maximum copy number observed across various genomic studies of kidney cancer, reflecting the severity of genetic alterations.", "type": "Snowflake"}
{"instance_id": "sf_bq347", "instruction": "Which modality has the highest count of SOP instances, including MR series with SeriesInstanceUID = \"1.3.6.1.4.1.14519.5.2.1.3671.4754.105976129314091491952445656147\" and all associated segmentation data, along with the total count of instances?", "type": "Snowflake"}
{"instance_id": "sf_bq340", "instruction": "Which six Ethereum addresses, excluding '0x0000000000000000000000000000000000000000', have the largest absolute differences between their previous and current balances from the tokens at addresses '0x0d8775f648430679a709e98d2b0cb6250d2887ef0' and '0x1e15c05cbad367f044cbfbafda3d9a1510db5513'?", "type": "Snowflake"}
{"instance_id": "sf_bq349", "instruction": "Which OpenStreetMap ID from the planet features corresponds to the administrative boundary, represented as multipolygons, whose total number of 'amenity'-tagged Points of Interest (POIs) is closest to the median count among all such boundaries?", "type": "Snowflake"}
{"instance_id": "sf_bq371", "instruction": "What is the difference between the maximum and minimum average invoice values across the quarters in the year 2013?", "type": "Snowflake"}
{"instance_id": "sf_bq188", "instruction": "What is the average time in minutes that users spend per visit on the product category with the highest total quantity purchased?", "type": "Snowflake"}
{"instance_id": "sf_bq128", "instruction": "Tell me the patent title and abstract, as well as the publication date, the backward citation and forward citation count within 5 years for those published in January 2014. The detailed requirements are provided in `forward_backward_citation.md`.", "type": "Snowflake"}
{"instance_id": "sf_bq325", "instruction": "Can you tell me which genes have the strongest links to traits or conditions in each study? I need the names of the top 10 genes that stand out because they have the lowest p-values in their studies", "type": "Snowflake"}
{"instance_id": "sf_bq117", "instruction": "What is the total number of severe storm events that occurred in the most affected month over the past 15 years according to NOAA records, considering only the top 100 storm events with the highest property damage?", "type": "Snowflake"}
{"instance_id": "sf_bq121", "instruction": "How do the average reputation and number of badges vary among Stack Overflow users based on the number of complete years they have been members, considering only those who joined on or before October 1, 2021?", "type": "Snowflake"}
{"instance_id": "sf_bq370", "instruction": "How many customers have an equal number of orders and invoices, and where the total value of their orders matches the total value of their invoices?", "type": "Snowflake"}
{"instance_id": "sf_bq189", "instruction": "What is the average monthly revenue growth rate for the product category with the highest average monthly order growth rate based on completed orders?", "type": "Snowflake"}
{"instance_id": "sf_bq377", "instruction": "Extract and count the frequency of all package names listed in the require section of JSON-formatted content", "type": "Snowflake"}
{"instance_id": "sf_bq348", "instruction": "What are the top 3 usernames with the largest number of historical nodes of hospitals, clinics, or doctors' offices tagged as amenities within the geographic area bounded by the geogpoints `(31.1798246, 18.4519921)`, `(54.3798246, 18.4519921)`, `(54.3798246, 33.6519921)`, and `(31.1798246, 33.6519921)` that are no longer present in the latest dataset (`planet nodes`)?", "type": "Snowflake"}
{"instance_id": "sf_bq341", "instruction": "Which Ethereum address has the top 3 smallest positive balance from transactions involving the token at address \"0xa92a861fc11b99b24296af880011b47f9cafb5ab\"?", "type": "Snowflake"}
{"instance_id": "sf_bq187", "instruction": "What is the total circulating supply balances of the 'BNB' token for all addresses (excluding the zero address), based on the amount they have received (converted by dividing by 10^18) minus the amount they have sent?", "type": "Snowflake"}
{"instance_id": "sf_bq180", "instruction": "Please help me retrieve the top 5 most frequently used module names from Python and R scripts.", "type": "Snowflake"}
{"instance_id": "sf_bq346", "instruction": "Which five segmentation categories appear most frequently in publicly accessible DICOM SEG data, where the modality is \"SEG\" and the SOPClassUID is \"1.2.840.10008.5.1.4.1.1.66.4\"?", "type": "Snowflake"}
{"instance_id": "sf_bq118", "instruction": "How much higher is the average number of white people dying from discharges (excluding urethral discharge, firework discharge, and legal intervention involving firearm discharge) compared to vehicle-related incidents averaged across different ages?", "type": "Snowflake"}
{"instance_id": "sf_bq127", "instruction": "For each publication family whose earliest publication was first published in January 2015, please provide the earliest publication date, the distinct publication numbers, their country codes, the distinct CPC and IPC codes, distinct families (namely, the ids) that cite and are cited by this publication family. Please present all lists as comma-separated values, sorted by the first letter of the code for clarity.", "type": "Snowflake"}
{"instance_id": "sf_bq323", "instruction": "What is the combined overall average from Repetition Time, Echo Time, and Slice Thickness for MRI sequences labeled as t2w_prostateX (when the series description contains t2_tse_tra) and adc_prostateX (when the series description contains ADC) within the prostatex collection, ensuring the modality is MR?", "type": "Snowflake"}
{"instance_id": "sf_bq324", "instruction": "What is the total number of frames for whole slide microscopy images categorized under the 'TCGA-BRCA' collection that contain eosin staining during specimen preparation steps?", "type": "Snowflake"}
{"instance_id": "sf_bq152", "instruction": "Identify the biological pathway that shows the most significant change in gene expression between TCGA-UCEC samples with non-synonymous mutations in the PARP1 gene and those without such mutations. Analyze differences using t-statistics for each gene on log10(1 + expression data). Aggregate these t-statistics across pathways to determine the most affected pathway. Please provide the pathway name and its corresponding score", "type": "Snowflake"}
{"instance_id": "sf_bq358", "instruction": "Can you tell me which bike trip in New York City on July 15, 2015, started and ended in ZIP Code areas with the highest average temperature for that day, as recorded by the Central Park weather station '94728'? If there's more than one trip that meets these criteria, I'd like to know about the one that starts in the smallest ZIP Code and ends in the largest ZIP Code.", "type": "Snowflake"}
{"instance_id": "sf_bq155", "instruction": "Help me calculate the t-statistic based on the Pearson correlation coefficient between all possible pairs of gene `SNORA31` in the RNAseq data (Log10 transformation) and unique identifiers in the microRNA data available in TCGA. The cohort for this analysis consists of BRCA patients that are 80 years old or younger at the time of diagnosis and Stage I,II,IIA as pathological state. And only consider samples of size more than 25 and with absolute Pearson correlation at least 0.3, and less than 1.0.", "type": "Snowflake"}
{"instance_id": "sf_bq197", "instruction": "What is the top-selling product by sales volume and revenue for June 2024 and each month before, considering only completed orders?", "type": "Snowflake"}
{"instance_id": "sf_bq163", "instruction": "Identify the top 20 genes with the largest expression disparities between male and female 74-year-old epithelial cells in cluster 41 of MSK-SCLC patients, comparing average X_values by sex.", "type": "Snowflake"}
{"instance_id": "sf_bq164", "instruction": "Consolidate metadata from spatial transcriptomics and scRNAseq datasets\u2014including levels 1 through 4 and auxiliary files\u2014for the run ID 'HT264P1-S1H2Fc2U1Z1Bs1-H2Bs2-Test'. Include Filename, HTAN Parent Biospecimen ID, Component, File Format, Entity ID, and Run ID.", "type": "Snowflake"}
{"instance_id": "sf_bq190", "instruction": "What is the count of the youngest and oldest users respectively for male and female users from January 2019 to April 2022?", "type": "Snowflake"}
{"instance_id": "sf_bq101", "instruction": "Identify the top 10 most frequently imported packages and their counts in Java source files.", "type": "Snowflake"}
{"instance_id": "sf_bq333", "instruction": "Which are the top 3 browsers with the shortest average session duration, and what are their average session times? Only include browsers with more than 10 sessions.", "type": "Snowflake"}
{"instance_id": "sf_bq334", "instruction": "In my Bitcoin database, there are discrepancies in transaction records. Can you determine the annual differences in average output values calculated from separate input and output records versus a consolidated transactions table, focusing only on the years common to both calculation methods?", "type": "Snowflake"}
{"instance_id": "sf_bq191", "instruction": "Find the top 2 repositories from 2017, which have more than 30 unique users watching them, that also contain the text 'Copyright (c)'.", "type": "Snowflake"}
{"instance_id": "sf_bq359", "instruction": "List the repository names and commit counts for the top two GitHub repositories with JavaScript as the primary language and the highest number of commits.", "type": "Snowflake"}
{"instance_id": "sf_bq154", "instruction": "What is the Kruskal-Wallis score (H-score) among groups of LGG patients, where the IGF2 gene expression is calculated by first applying a log10 transformation to the normalized counts, then averaging them, and the groups are based on ICD-O-3 histology codes?", "type": "Snowflake"}
{"instance_id": "sf_bq361", "instruction": "For the user cohort with a first purchase date in January 2020, what proportion of users returned in the subsequent months of 2020?", "type": "Snowflake"}
{"instance_id": "sf_bq153", "instruction": "Calculate the average log10(normalized_count + 1) expression level of the IGF2 gene for each histology type among LGG patients. Include only patients with valid IGF2 expression data and histology types not enclosed in square brackets. Match gene expression and clinical data using ParticipantBarcode.", "type": "Snowflake"}
{"instance_id": "sf_bq107", "instruction": "What is the variant density of the cannabis reference with the longest reference length? Pay attention that a variant is present if there is at least one variant call with a genotype greater than 0.", "type": "Snowflake"}
{"instance_id": "sf_bq335", "instruction": "Which address, among those with the most recent transaction in October 2017, had the highest total transaction value, considering both inputs and outputs", "type": "Snowflake"}
{"instance_id": "sf_bq100", "instruction": "Find out the most frequently used package in all Go source files.", "type": "Snowflake"}
{"instance_id": "sf_bq136", "instruction": "Tell me all 2-hop transaction paths on the Zilliqa blockchain from the address zil1jrpjd8pjuv50cfkfr7eu6yrm3rn5u8rulqhqpz to the address zil19nmxkh020jnequql9kvqkf3pkwm0j0spqtd26e. Exclude intermediary addresses with over 50 outgoing transactions to avoid exchanges and active wallets. Display each path in the following format: <from_address> --(tx <first5_chars_of_transaction_id>..)--> <intermediate_address> --(tx <first5_chars_of_transaction_id>..)--> <end_address>", "type": "Snowflake"}
{"instance_id": "sf_bq131", "instruction": "What is the number of bus stops for the bus network with the most stops within the multipolygon boundary of Denmark (as defined by Wikidata ID 'Q35')?", "type": "Snowflake"}
{"instance_id": "sf_bq043", "instruction": "What are the RNA expression levels of the genes MDM2, TP53, CDKN1A, and CCNE1, along with associated clinical information, in bladder cancer patients with CDKN2A mutations in the 'TCGA-BLCA' project? Use clinical data from the Genomic Data Commons Release 39, data about somatic mutations derived from the hg19 human genome reference in Feb 2017.", "type": "Snowflake"}
{"instance_id": "sf_bq271", "instruction": "Could you generate a report that, for each month in 2021, provides the number of orders, number of unique purchasers, and profit (calculated as total product retail price minus total cost) grouped by country, product department, and product category?", "type": "Snowflake"}
{"instance_id": "sf_bq249", "instruction": "Please provide a report on the number files from the GitHub repository, categorized by the presence of specific line types. Categorize a file as 'trailing' if any line ends with a blank character, as 'Space' if any line starts with a space, and as 'Other' if it meets neither condition.", "type": "Snowflake"}
{"instance_id": "sf_bq276", "instruction": "Can you provide me with the details of all ports affected by tropical storms in region number 6585, including the port name, storm names, and average storm categories? Please consider only named storms in the North Atlantic basin with wind speeds of at least 35 knots and at least minimal tropical storm strength on the SSHS scale. Additionally, ensure that each port is located within a U.S. state boundary.", "type": "Snowflake"}
{"instance_id": "sf_bq044", "instruction": "For bladder cancer patients who have mutations in the CDKN2A (cyclin-dependent kinase inhibitor 2A) gene, using clinical data from the Genomic Data Commons Release 39, what types of mutations are they, what is their gender, vital status, and days to death - and for four downstream genes (MDM2 (MDM2 proto-oncogene), TP53 (tumor protein p53), CDKN1A (cyclin-dependent kinase inhibitor 1A), and CCNE1 (Cyclin E1)), what are the gene expression levels for each patient?", "type": "Snowflake"}
{"instance_id": "sf_bq420", "instruction": "Can you identify the top 5 patents that were initially rejected under section 101 with no allowed claims, based on the length of their granted claims? The patents should have been granted in the US between 2010 and 2023. Additionally, ensure to select the first office action date for each application.", "type": "Snowflake"}
{"instance_id": "sf_bq429", "instruction": "What are the top 5 states with the highest average median income difference from 2015 to 2018? also provide the average number of vulnerable employees across various industries for these states, using data from the ACS 5-Year Estimates for 2017.", "type": "Snowflake"}
{"instance_id": "sf_bq416", "instruction": "Please list the block numbers, source addresses, destination addresses (both in TronLink address format), and transfer amounts for the three largest USDT transactions on the TRON blockchain. The code numbers for USDT contract and transfer event are \"0xa614f803b6fd780986a42c78ec9c7f77e6ded13c\" and \"0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef\" respectively.", "type": "Snowflake"}
{"instance_id": "sf_bq072", "instruction": "Please tell me the total and Black deaths due to vehicle-related incidents and firearms separately, for each age from 12 to 18.", "type": "Snowflake"}
{"instance_id": "sf_bq247", "instruction": "For the top 6 valid family has the most publications? provide their family id and non-empty publication abstracts.", "type": "Snowflake"}
{"instance_id": "sf_bq411", "instruction": "Please identify the top two Google Trends search terms for each weekday over the first two weeks in Sept. 2024, and list them by date from most recent to oldest.", "type": "Snowflake"}
{"instance_id": "sf_bq213", "instruction": "What is the most common 4-digit IPC code among US B2 utility patents granted from June to August in 2022?", "type": "Snowflake"}
{"instance_id": "sf003", "instruction": "From 2015 to 2020, which zip code in Census Zip Code Tabulation Areas had the second-highest annual population growth rate, given a minimum estimate of 25,000 people over a 5-year period? Include the zip code, state abbreviation, and growth rate.", "type": "Snowflake"}
{"instance_id": "sf_bq026", "instruction": "For the assignee who has been the most active in the patent category 'A61', I'd like to know the five patent jurisdictions code where they filed the most patents during their busiest year, separated by commas.", "type": "Snowflake"}
{"instance_id": "sf_bq214", "instruction": "For United States utility patents under the B2 classification granted between 2010 and 2014, find the one with the most forward citations within a month of its filing date, and identify the most similar patent from the same filing year, regardless of its type.", "type": "Snowflake"}
{"instance_id": "sf_bq222", "instruction": "Find the CPC technology areas in Germany with the highest exponential moving average of patent filings each year (smoothing factor 0.1) for patents granted in December 2016. Show me the full title, CPC group and the best year for each CPC group at level 4.", "type": "Snowflake"}
{"instance_id": "sf_bq225", "instruction": "What's the top 10 widely used languages according to file counts?", "type": "Snowflake"}
{"instance_id": "sf_bq017", "instruction": "What are the five longest types of highways within the multipolygon boundary of Denmark (as defined by Wikidata ID 'Q35') by total length?", "type": "Snowflake"}
{"instance_id": "sf_bq028", "instruction": "Considering only the latest release versions of NPM package, which packages are the top 8 most popular based on the Github star number, as well as their versions?", "type": "Snowflake"}
{"instance_id": "sf035", "instruction": "How many unique users started sessions each day within each app group between June 1, 2023, and June 7, 2023? Also show the app group ID and the start day of the session.", "type": "Snowflake"}
{"instance_id": "sf_bq080", "instruction": "Can you provide a daily summary showing the cumulative number of Ethereum smart contracts created by users and other contracts, from August 30, 2018, to September 30, 2018, based on Ethereum transaction traces?", "type": "Snowflake"}
{"instance_id": "sf_bq246", "instruction": "Can you figure out the number of forward citations within 1 years from the application date for the patent that has the most backward citations within 1 years from application among all U.S. patents?", "type": "Snowflake"}
{"instance_id": "sf_bq410", "instruction": "Please tell me the top three states with the least poeple not in labor force, the amount increase in their total median income from 2015 to 2018, the number of people not in the labor force, and the proportion they represent according to the censustract data in 2015 and 2018.", "type": "Snowflake"}
{"instance_id": "sf_bq417", "instruction": "Please provide identification and descriptive data, including the storage details and size metrics of medical images for male patients over 60 years old, examined in the mediastinum, after September 1, 2014.", "type": "Snowflake"}
{"instance_id": "sf_bq073", "instruction": "List states in order of the total number of vulnerable workers, including the state name, the number of vulnerable workers in wholesale trade (38% of workers in that sector), and the number of vulnerable workers in manufacturing (41% of workers in that sector), based on 2015-2018 median income differences by ZIP code.", "type": "Snowflake"}
{"instance_id": "sf_bq248", "instruction": "What is the proportion of files whose paths include 'readme.md' that contain the phrase 'Copyright (c)', among all repositories that do not use any programming language with 'python' in its name", "type": "Snowflake"}
{"instance_id": "sf_bq421", "instruction": "Can you list all unique pairs of embedding medium and staining substance code meanings, along with the number of occurrences for each pair, based on distinct embedding medium and staining substance codes from the 'SM' modality in the DICOM dataset's un-nested specimen preparation sequences, ensuring that the codes are from the SCT coding scheme?", "type": "Snowflake"}
{"instance_id": "sf_bq283", "instruction": "Identify the top 15 active stations based on the number of trip starts. For each of these stations, provide the station ID, the total number of trips that start there, the percentage this represents out of all trips from active stations, and the average duration of trips starting from the station.", "type": "Snowflake"}
{"instance_id": "sf_bq426", "instruction": "What user type recorded the highest average temperature for trips starting and ending in New York City's zip code 10019 during 2018? Include average precipitation, wind speed, and temperature for that user type based on weather data from the New York Central Park station.", "type": "Snowflake"}
{"instance_id": "sf_bq016", "instruction": "Considering only the highest release versions of NPM packages, which one and its version has the most dependent packages?", "type": "Snowflake"}
{"instance_id": "sf_bq224", "instruction": "Which repository with an approved license in `licenses.md` had the highest combined total of forks, issues, and watches in April 2022?", "type": "Snowflake"}
{"instance_id": "sf_bq029", "instruction": "Get the number of patent publications and the average number of inventors per patent in CA every five years from 1960 to 2020, based on when the patents were filed. Focus only on patents with inventor details.", "type": "Snowflake"}
{"instance_id": "sf_bq223", "instruction": "Which assignees, excluding DENSO CORP itself, have cited patents assigned to DENSO CORP, and what are the titles of the primary CPC subclasses associated with these citations? Provide the name of each citing assignee, the full title of the CPC subclass, and the count of citations grouped by the assignee and the CPC subclass title. Please focus specifically on the main categories of the CPC codes,", "type": "Snowflake"}
{"instance_id": "sf_bq215", "instruction": "What is the publication number of US patent under the B2 classification granted during 2015 to 2018, with the highest originality score based on the diversity of 4-digits IPC codes from its backward citations?", "type": "Snowflake"}
{"instance_id": "sf_bq027", "instruction": "For patents granted between 2010 and 2018, provide the publication number of each patent and the number of backward citations it has received in the SEA category.", "type": "Snowflake"}
{"instance_id": "sf_bq020", "instruction": "What is the name of the reference sequence with the highest variant density in the given cannabis genome dataset?", "type": "Snowflake"}
{"instance_id": "sf_bq212", "instruction": "For United States utility patents under the B2 classification granted between June and September of 2022, identify the most frequent 4-digit IPC code for each patent. Then, list the publication numbers and IPC4 codes of patents where this code appears 10 or more times.", "type": "Snowflake"}
{"instance_id": "sf_bq444", "instruction": "Can you pull the blockchain timestamp, block number, and transaction hash for the first five mint and burn events from Ethereum logs for the address '0x8ad599c3a0ff1de082011efddc58f1908eb6e6d8'? Please include mint events identified by the topic '0x7a53080ba414158be7ec69b987b5fb7d07dee101fe85488f0853ae16239d0bde' and burn events by '0x0c396cd989a39f4459b5fa1aed6a9a8dcdbc45908acfd67e028cd568da98982c', and order them by block timestamp from the oldest to the newest.", "type": "Snowflake"}
{"instance_id": "sf002", "instruction": "As of December 31, 2022, list the top 10 active large banks, each with assets over $10 billion, that have the highest percentage of uninsured assets based on quarterly estimates. Provide the names of these banks and their respective percentages of uninsured assets.", "type": "Snowflake"}
{"instance_id": "sf_bq459", "instruction": "I would like to process articles by tokenizing and normalizing word vectors to compute cosine similarity scores and weighting these vectors by the 0.4th root of word frequency, in order to identify the top 10 most relevant articles related to the phrase \"Epigenetics and cerebral organoids: promising directions in autism spectrum disorders\". Please show me the id, date, title and cosine similarity scores of them.", "type": "Snowflake"}
{"instance_id": "sf_bq005", "instruction": "What are the average block intervals for Bitcoin blocks mined in 2023, broken down by date, and can you provide the first ten dates with their corresponding average intervals?", "type": "Snowflake"}
{"instance_id": "sf018", "instruction": "Examine user engagement with push notifications within a specified one-hour window on June 1, 2023.", "type": "Snowflake"}
{"instance_id": "sf011", "instruction": "Determine the population distribution within each block group relative to its census tract in New York State using 2021 ACS data. Include block group ID, census value, state county tract ID, total tract population, and the population ratio of each block group.", "type": "Snowflake"}
{"instance_id": "sf_bq033", "instruction": "How many U.S. publications related to IoT (where the abstract includes the phrase 'internet of things') were filed each month from 2008 to 2022, including months with no filings?", "type": "Snowflake"}
{"instance_id": "sf029", "instruction": "Generate a daily detailed sales report for each product under the 'Manufacturing' distributor view, covering the 30 days leading up to February 6, 2022. The report should include the total and average values for sales units, revenue, average selling price (ASP), glance views, conversion rate, shipped units, shipped revenue, net profit margin (PPM), and inventory details.", "type": "Snowflake"}
{"instance_id": "sf_bq450", "instruction": "Comprehensively analyze Ethereum blockchain addresses by calculating their transaction activities, balances, token exchanges, and other related metrics up to January 1, 2017, to provide detailed insights into their behaviors and interactions.", "type": "Snowflake"}
{"instance_id": "sf_bq252", "instruction": "Could you please find the name of the repository that contains the most copied non-binary Swift file in the dataset, ensuring each file is uniquely identified by its ID?", "type": "Snowflake"}
{"instance_id": "sf_bq255", "instruction": "How many commit messages are there in repositories that use the 'Shell' programming language and 'apache-2.0' license, where the length of the commit message is more than 5 characters but less than 10,000 characters, and the messages do not start with the word 'merge', 'update' or 'test'?", "type": "Snowflake"}
{"instance_id": "sf_bq093", "instruction": "Tell me the maximum and minimum net changes in balances for Ethereum Classic addresses on October 14, 2016, considering debits, credits, and gas fees, while excluding internal calls like 'delegatecall', 'callcode', and 'staticcall'.", "type": "Snowflake"}
{"instance_id": "sf_bq058", "instruction": "Retrieve all finalized deposits into Optimism at block 29815485 using the Optimism Standard Bridge, including transaction hash, an Etherscan link (the complete URL), L1 and L2 token addresses, sender and receiver addresses (with leading zeroes stripped), and the deposited amount (converted from hex to decimal). Ensure data is properly formatted and parsed according to Optimism's address and token standards. Note that, the keccak-256 hash of the Ethereum event signature for DepositFinalized is \"0x3303facd24627943a92e9dc87cfbb34b15c49b726eec3ad3487c16be9ab8efe8\".", "type": "Snowflake"}
{"instance_id": "sf_bq263", "instruction": "Produce a 2023 monthly report for the 'Sleep & Lounge' category detailing total sales, costs, completed order counts, profits, and profit margins, ensuring accurate cost alignment with sales data.", "type": "Snowflake"}
{"instance_id": "sf_bq056", "instruction": "How many different pairs of roads classified as motorway, trunk, primary, secondary, or residential in California overlap each other without sharing nodes and do not have a bridge tag, where these roads are tagged with 'highway'", "type": "Snowflake"}
{"instance_id": "sf_bq264", "instruction": "Identify the difference in the number of the oldest and youngest users registered between January 1, 2019, and April 30, 2022, from our e-commerce platform data.", "type": "Snowflake"}
{"instance_id": "sf_bq069", "instruction": "Could you help me generate a CT Image Series report which excludes the NLST study and those do not conform to some geometrical checks, also filter out those series that require additional decompression steps before passing to dcm2niix for conversion to NIFTI format. The detailed assumptions and requirements are provided in `nonNlstCohort.md`.", "type": "Snowflake"}
{"instance_id": "sf_bq207", "instruction": "Can you provide the initial publication numbers for our top 100 independent patent claims with the highest word count?", "type": "Snowflake"}
{"instance_id": "sf010", "instruction": "What are the cumulative percentages of mortgages near default in California for each recorded date in 2023, including those 90 to 180 days past due, in forbearance, or in the process of foreclosure, bankruptcy, or deed in lieu?", "type": "Snowflake"}
{"instance_id": "sf_bq456", "instruction": "Can you retrieve the PatientID, StudyInstanceUID, StudyDate, and FindingSite for each patient, along with the maximum values for Elongation, Flatness, Least Axis in 3D Length, Major Axis in 3D Length, Maximum 3D Diameter of a Mesh, Minor Axis in 3D Length, Sphericity, Surface Area of Mesh, Surface to Volume Ratio, Volume from Voxel Summation, and Volume of Mesh, in 2001?", "type": "Snowflake"}
{"instance_id": "sf_bq209", "instruction": "Can you find how many utility patents granted in 2010 have exactly one forward citation within the ten years following their application date?", "type": "Snowflake"}
{"instance_id": "sf_bq236", "instruction": "What are the top 5 zip codes of the areas in the United States that have experienced the most hail storm events in the past 10 years?", "type": "Snowflake"}
{"instance_id": "sf_bq460", "instruction": "Please process articles by creating normalized word vectors, and weighting these vectors by the 0.4th root of word frequency, and computes cosine similarity scores to identify the top 10 articles most similar to the article with ID \"8a78ef2d-d5f7-4d2d-9b47-5adb25cbd373\". I want the id, date, title and cosine similarity scores of them.", "type": "Snowflake"}
{"instance_id": "sf_bq458", "instruction": "Please help me calculate normalized document vectors for each article by tokenizing the body text into words, obtaining word vectors, and weighting these vectors by the 0.4th root of word frequency. Then, aggregate these vectors to form an article vector, and normalize them to unit length. Finally, retrieve the ID, date, title, and the computed article vector for each entry.", "type": "Snowflake"}
{"instance_id": "sf_bq265", "instruction": "Can you provide me with the emails of the top 10 users who have the highest average order value, considering only those users who registered in 2019 and made purchases within the same year?", "type": "Snowflake"}
{"instance_id": "sf_bq057", "instruction": "Which month (e.g., 3) in 2021 witnessed the highest percent of Bitcoin volume that took place in CoinJoin transactions? Also give me the percentage of CoinJoins transactions, the average input and output UTXOs ratio, and the proportion of CoinJoin transaction volume for that month (all 1 decimal).", "type": "Snowflake"}
{"instance_id": "sf_bq291", "instruction": "Can you provide a daily weather summary for July 2019 within a 5 km radius of latitude 26.75 and longitude 51.5? I need the maximum, minimum, and average temperatures; total precipitation; average cloud cover between 10 AM and 5 PM; total snowfall (when average temperature is below 32\u00b0F); and total rainfall (when average temperature is 32\u00b0F or above) for each forecast date. The data should correspond to forecasts created in July 2019 for the following day.", "type": "Snowflake"}
{"instance_id": "sf_bq068", "instruction": "What are the maximum and minimum balances across all addresses for different address types on Bitcoin Cash during March 2014?", "type": "Snowflake"}
{"instance_id": "sf_bq050", "instruction": "Help me look at the total number of bike trips, average trip duration (in minutes), average daily temperature, wind speed, and precipitation when trip starts (rounded to 1 decimal), as well as the month with the most trips (e.g., `4`), categorized by different starting and ending neighborhoods in New York City for the year 2014.", "type": "Snowflake"}
{"instance_id": "sf_bq262", "instruction": "Help me generate a monthly analysis report on e-commerce sales in the second half of 2019, which should contain the total sum of order count/revenue/profit as well as their growth rates for each product category monthly. Please sort the results by months (e.g., 2019-07) and product categories in ascending order.", "type": "Snowflake"}
{"instance_id": "sf_bq092", "instruction": "Tell me the highest and lowest net changes among all addresses and types on Bitcoin Cash as of April, 2023?", "type": "Snowflake"}
{"instance_id": "sf_bq254", "instruction": "Can you find the names of the multipolygons with valid ids that rank in the top two in terms of the number of points within their boundaries, among those multipolygons that do not have a Wikidata tag but are located within the same geographic area as the multipolygon associated with Wikidata item Q191?", "type": "Snowflake"}
{"instance_id": "sf044", "instruction": "What was the percentage change in post-market close prices for the Magnificent 7 tech companies from January 1 to June 30, 2024?", "type": "Snowflake"}
{"instance_id": "sf_bq253", "instruction": "Find the name of the OpenStreetMap relation that encompasses the most features within the same geographic area as the multipolygon tagged with Wikidata item 'Q1095'. The relation should have a specified name without a 'wikidata' tag, and at least one of its included features must have a 'wikidata' tag. Return the name of this relation", "type": "Snowflake"}
{"instance_id": "sf_bq321", "instruction": "How many unique StudyInstanceUIDs are there from the DWI, T2 Weighted Axial, Apparent Diffusion Coefficient series, and T2 Weighted Axial Segmentations in the 'qin_prostate_repeatability' collection?", "type": "Snowflake"}
{"instance_id": "sf_bq171", "instruction": "Whose Forum message upvotes are closest to the average in 2019? If there\u2019s a tie, tell me the one with the alphabetically first username.", "type": "Snowflake"}
{"instance_id": "sf_bq176", "instruction": "Identify the case barcodes from the TCGA-LAML study with the highest weighted average copy number in cytoband 15q11 on chromosome 15, using segment data and cytoband overlaps from TCGA's genomic and Mitelman databases.", "type": "Snowflake"}
{"instance_id": "sf_bq182", "instruction": "Which primary programming languages, determined by the highest number of bytes in each repository, have the sum of over 100 pull requests on January 18, 2023 in all its repositories?", "type": "Snowflake"}
{"instance_id": "sf_bq372", "instruction": "Which customer category has the maximum lost order value that is closest to the average maximum loss across all categories?", "type": "Snowflake"}
{"instance_id": "sf_bq147", "instruction": "Can you find which TCGA breast cancer cases include both normal and other types of tissue samples, focusing on protein-coding genes?", "type": "Snowflake"}
{"instance_id": "sf_bq375", "instruction": "Determine which file type among Python (.py), C (.c), Jupyter Notebook (.ipynb), Java (.java), and JavaScript (.js) in the GitHub codebase has the most files with a directory depth greater than 10, and provide the file count.", "type": "Snowflake"}
{"instance_id": "sf_bq320", "instruction": "What is the total count of StudyInstanceUIDs that have a segmented property type of '15825003' and belong to the 'Community' or 'nsclc_radiomics' collections?", "type": "Snowflake"}
{"instance_id": "sf_bq380", "instruction": "What are the usernames of the top three users with the most upvotes received in the Kaggle forum, along with the number of upvotes they received and the number of upvotes they gave to others?", "type": "Snowflake"}
{"instance_id": "sf_bq141", "instruction": "Using the TCGA-KIRP dataset, predict the clinical stage of patients based on the expression levels of genes 'MT-CO3', 'MT-CO1', and 'MT-CO2'. Select patients with non-null clinical stages (clinical_stage not null) where disease_code is 'KIRP'. Retrieve their gene expression data for the specified genes. Randomly split these patients into a training set (90%) and a test set (10%) based on their case_barcode. For each clinical stage in the training set, calculate the average expression of each gene. For each patient in the test set, compute the Euclidean distance between their gene expressions and the stage-specific averages from the training set. Assign each test patient to the clinical stage with the minimum distance (i.e., the closest stage average) as their predicted stage. Output the case_barcode along with the predicted clinical stage.", "type": "Snowflake"}
{"instance_id": "sf_bq373", "instruction": "What's the median of the average monthly spending across all customers for the year 2014?", "type": "Snowflake"}
{"instance_id": "sf_bq345", "instruction": "How large are the DICOM image files with SEG or RTSTRUCT modalities and the SOP Class UID \"1.2.840.10008.5.1.4.1.1.66.4\", when grouped by collection, study, and series IDs, if they have no references to other series, images, or sources? Can you also provide a viewer URL formatted as \"https://viewer.imaging.datacommons.cancer.gov/viewer/\" followed by the study ID, and list these sizes in kilobytes, sorted from largest to smallest?", "type": "Snowflake"}
{"instance_id": "sf_bq148", "instruction": "Could you list the top five protein-coding genes with the highest expression variability in 'Solid Tissue Normal' samples from multi-tissue TCGA-BRCA cases?", "type": "Snowflake"}
{"instance_id": "sf_bq342", "instruction": "What is the difference between the average hourly changes in transaction values for the Ethereum token 0x68e54af74b22acaccffa04ccaad13be16ed14eac, involving the addresses 0x8babf0ba311aab914c00e8fda7e8558a8b66de5d and 0xfbd6c6b112214d949dcdfb1217153bc0a742862f as either sender or receiver, between the years 2019 and 2020?", "type": "Snowflake"}
{"instance_id": "sf_bq170", "instruction": "For breast cancer cases (TCGA-BRCA) from REL 23 of the active GDC archive, identify and categorize copy number variations (CNVs) across different cytobands on all chromosomes. CNVs include amplifications, gains, homozygous deletions, heterozygous deletions, and normal diploid states. For each cytoband, tell me its name and start/end position, and calculate the frequency of each CNV type as a percentage of the total number of cases (rounded to 2 decimals).", "type": "Snowflake"}
{"instance_id": "sf_bq184", "instruction": "I want to compute and compare the cumulative count of Ethereum smart contracts created by users versus created by other contracts. Please list out the daily cumulative tallies between 2017 and 2021.", "type": "Snowflake"}
{"instance_id": "local024", "instruction": "Can you help me find the top 5 countries with the highest average runs per match for all players across all seasons, and also include their batting averages?", "type": "Local"}
{"instance_id": "local229", "instruction": "Find the IDs of players who scored the highest number of partnership runs for each match. The output should include the IDs of two players, each with their individual scores and the total partnership score. There can be multiple rows for a single match.", "type": "Local"}
{"instance_id": "local023", "instruction": "Please help me find the names of top 5 players with the highest average runs per match in season 5, along with their batting averages.", "type": "Local"}
{"instance_id": "local015", "instruction": "Help me respectively caulculate the percentage of motorcycle accident fatalities involving riders who were wearing helmets and those who weren't?", "type": "Local"}
{"instance_id": "local218", "instruction": "Can you calculate the median from the highest season goals of each team?", "type": "Local"}
{"instance_id": "local220", "instruction": "Who is the player with the most wins?", "type": "Local"}
{"instance_id": "local274", "instruction": "Which products were picked for order 421, and what is the average number of units picked for each product, using FIFO (First-In, First-Out) method?", "type": "Local"}
{"instance_id": "local041", "instruction": "What percentage of trees in the Bronx have a health status of Good?", "type": "Local"}
{"instance_id": "local273", "instruction": "What is the average pick percentage for each product (by name), considering the quantity picked from inventory locations that are ordered by the earliest purchase date and smallest quantity, while ensuring that the picked quantity matches the overlapping range between the order quantity and the available inventory?", "type": "Local"}
{"instance_id": "local077", "instruction": "Please review our interest data from September 2018 to August 2019. I need to know the max average composition value for each month, as well as the three-month rolling average. Ensure the output includes the date, the interest name, the max index composition for that month, the rolling average, and the top-ranking interests from the one month ago and two months ago with their names.", "type": "Local"}
{"instance_id": "local070", "instruction": "Please examine our records for Chinese cities in July 2021 and identify both the shortest and longest streaks of consecutive date entries. List the dates along with their corresponding city names, capitalizing the first letter of each city name, for these streaks.", "type": "Local"}
{"instance_id": "local221", "instruction": "Tell me top10 teams with the most wins across the league", "type": "Local"}
{"instance_id": "local219", "instruction": "Which single team has the fewest wins in each league?", "type": "Local"}
{"instance_id": "local022", "instruction": "Show me the names of strikers who scored no less than 100 runs in a match, but their team lost the game?", "type": "Local"}
{"instance_id": "local210", "instruction": "Can you identify the hubs that saw more than a 20% increase in finished orders from February to March?", "type": "Local"}
{"instance_id": "local025", "instruction": "Please calculate the average of the highest runs conceded in a single over for each match.", "type": "Local"}
{"instance_id": "local228", "instruction": "Identify the top three batsmen with the most runs and the top three bowlers with the most wickets in each season, displaying them in the same row for each season. In case of ties, prioritize players with lower player_ids. Exclude 'run out', 'hit wicket', and 'retired hurt' as out_types for bowlers.", "type": "Local"}
{"instance_id": "local071", "instruction": "Could you review our records in June 2022 and identify which countries have the longest streak of consecutive inserted city dates? Please list the 2-letter length country codes of these countries.", "type": "Local"}
{"instance_id": "local085", "instruction": "Can you tell me the ID of the top 3 employees who have the highest percentage of orders delivered late, considering only those with more than 50 total orders? Also provide their respective number of late orders and the percentage. ", "type": "Local"}
{"instance_id": "local049", "instruction": "Can you help me calculate the average number of new unicorn companies per year in the top industry from 2019 to 2021?", "type": "Local"}
{"instance_id": "local244", "instruction": "Calculate the duration of each track, classify them as short, medium, or long, output the minimum and maximum time for each kind (in minutes) and the total revenue for each category, group by the category.", "type": "Local"}
{"instance_id": "local286", "instruction": "Prepare a comprehensive performance report on our sellers, focusing on total sales, average item price, average review scores, and packing times. Ensure that the report includes only those sellers who have sold a quantity of more than 100 products and highlight the product category names in English with the highest sales volume.", "type": "Local"}
{"instance_id": "local272", "instruction": "Which product ID, aisle, and position should be selected to pick the highest quantity for order 423, ensuring the picked quantity does not exceed the available inventory in warehouse 1, and calculate the quantity to be picked while prioritizing locations with earlier dates and smaller quantities?", "type": "Local"}
{"instance_id": "local040", "instruction": "Which three boroughs have the highest number of trees, and what is the average mean income for each, considering only areas where both median and mean income estimates are greater than zero, and using the available ZIP code income data when tree ZIP codes are missing?", "type": "Local"}
{"instance_id": "local078", "instruction": "Identify the top 10 and bottom 10 interest categories based on their highest composition values across all months. For each category, display the time(MM-YYYY), interest name, and the composition value", "type": "Local"}
{"instance_id": "local275", "instruction": "Which products (by name) had a seasonality-adjusted sales ratio consistently above 2 for the entire year of 2017, based on monthly sales data from January 2016?", "type": "Local"}
{"instance_id": "local300", "instruction": "Could you calculate the highest daily balance each customer had within each month? Treat any negative daily balances as zero. Then, for each month, add up these maximum daily balances across all customers to get a monthly total.", "type": "Local"}
{"instance_id": "local132", "instruction": "Show entertainer and customer pairs where both the first and second style preferences of customers match the first and second strengths of entertainers (or vice versa), displaying only the entertainer's stage name and the customer's last name.", "type": "Local"}
{"instance_id": "local336", "instruction": "How many overtakes of each type occurred during the first five laps of the race?", "type": "Local"}
{"instance_id": "local309", "instruction": "For each year, which driver and which constructor scored the most points? I want the full name of each driver.", "type": "Local"}
{"instance_id": "local331", "instruction": "List the three most common third actions users take after visiting the `/detail` page twice in a row, including each action's occurrence count.", "type": "Local"}
{"instance_id": "local168", "instruction": "What is the average salary for remote Data Analyst jobs requiring the top three most in-demand skills?", "type": "Local"}
{"instance_id": "local157", "instruction": "For our upcoming meeting, please provide the daily percentage change in trading volume for all tickers from August 1 to August 10, 2021. This trend analysis is crucial for our strategic planning.", "type": "Local"}
{"instance_id": "local354", "instruction": "Which Formula 1 drivers, during the 1950s, had seasons in which they did not change their constructors at the beginning and end of the year and participated in at least two different race rounds within those seasons?", "type": "Local"}
{"instance_id": "local195", "instruction": "Please find out how widespread the appeal of our top five actors is. What percentage of our customers have rented films featuring these actors?", "type": "Local"}
{"instance_id": "local330", "instruction": "For each web page, how many unique user sessions either start or end there", "type": "Local"}
{"instance_id": "local133", "instruction": "In a scoring system where the first preference in musical styles receives 3 points, the second 2 points, and the third 1 point, calculate the total weighted score for each style ranked by at least one user. Determine the absolute differences between each style's weighted score and the average score across all styles.", "type": "Local"}
{"instance_id": "local301", "instruction": "I need an analysis of our sales performance around mid-June for the years 2018, 2019, and 2020. Specifically, calculate the percentage change in sales between the four weeks leading up to June 15 and the four weeks following June 15 for each year.", "type": "Local"}
{"instance_id": "local194", "instruction": "Please provide a list of the top three revenue-generating films for each actor, along with the average revenue per actor in those films, calculated by dividing the total film revenue equally among the actors for each film.", "type": "Local"}
{"instance_id": "local193", "instruction": "Could you find out the average percentage of the total lifetime sales (LTV) that occur in the first 7 and 30 days after a customer's initial purchase? Also, include the average total lifetime sales (LTV). Please exclude customers with zero lifetime sales. The percentage should be shown with %, and the 7- and 30-day periods should be based on the exact number of hours-minutes-seconds, not calendar days.", "type": "Local"}
{"instance_id": "local355", "instruction": "Calculate the average first and last rounds of races missed by drivers each year. Only include drivers who missed fewer than three races annually and switched teams between their first and last missed races", "type": "Local"}
{"instance_id": "local167", "instruction": "Which state has the highest number of female legislators whose term end dates fall on December 31st, and what is that count? Please provide state name abbreviation.", "type": "Local"}
{"instance_id": "local169", "instruction": "What is the annual retention rate for Colorado legislators who started their first term between 1917 and 1999, tracked up to 20 years later?", "type": "Local"}
{"instance_id": "local156", "instruction": "Can you analyze the yearly average cost of Bitcoin purchases by region, excluding the first year's data? Rank the regions based on these averages each year and calculate the annual percentage change in cost.", "type": "Local"}
{"instance_id": "local128", "instruction": "List the bowlers, match number, game number, handicap score, tournament date, and location for only those bowlers who won their game with a handicap score of 190 or less at Thunderbird Lanes, Totem Lanes, and Bolero Lanes.", "type": "Local"}
{"instance_id": "local065", "instruction": "Calculate the total income from Meat Lovers pizzas priced at $12 and Vegetarian pizzas at $10. Include any extra toppings charged at $1 each. Ensure that canceled orders are filtered out. How much money has Pizza Runner earned in total?", "type": "Local"}
{"instance_id": "local096", "instruction": "I'm interested in knowing the proportion of films that had exclusively female actors for each year. Show the proportion of female-actor-only films and the total number of all films for each of those years.", "type": "Local"}
{"instance_id": "local062", "instruction": "Can you segment Italian customers into ten profitability buckets for December 2021, using equal profit intervals, and calculate the following for each bucket in December 2021: the number of customers, maximum profit, and minimum profit?", "type": "Local"}
{"instance_id": "local054", "instruction": "Could you tell me the first names of customers who spent less than $1 on albums by the best-selling artist, along with the amounts they spent?", "type": "Local"}
{"instance_id": "local259", "instruction": "For each player, list their ID, name, most frequent role across all matches, batting hand, bowling skill, total runs scored, total matches played, total dismissals, batting average, highest score in a single match, number of matches where their score exceeded 30, 50, and 100, total balls faced in their career, strike rate, total wickets taken, economy rate, and their best performance in a single match (most wickets taken, in the format \"wickets taken-runs given\"). Ignore the extra runs data.", "type": "Local"}
{"instance_id": "local098", "instruction": "I'd like to know how many actors have managed to avoid long breaks in their careers. Could you check our records to see how many actors haven't been out of work for more than three years at any point?", "type": "Local"}
{"instance_id": "local038", "instruction": "Could you help me find the actor who appeared most in English G or PG-rated children's movies no longer than 2 hours, released between 2000 and 2010\uff1fGive me a full name.", "type": "Local"}
{"instance_id": "local007", "instruction": "Could you help me calculate the average single career span value in years for all baseball players? Please precise the result as a float number. If it's a full year, we count it as one year. If it's less than a full year but full months, we consider 12 months as one year. If it's less than a month, we consider 365 days as one year.", "type": "Local"}
{"instance_id": "local009", "instruction": "What is the distance of the longest route where Abakan is either the departure or destination city (in kilometers)?", "type": "Local"}
{"instance_id": "local031", "instruction": "What is the highest monthly delivered orders volume in the year with the lowest annual delivered orders volume among 2016, 2017, and 2018?", "type": "Local"}
{"instance_id": "local099", "instruction": "I need you to look into the actor collaborations and tell me how many actors have made more films with Yash Chopra than with any other director. This will help us understand his influence on the industry better.", "type": "Local"}
{"instance_id": "local055", "instruction": "What is the difference in average spending between customers who bought albums from the best-selling artist and those who bought from the least-selling artist? If there is a tie for either best-selling or lowest-selling, choose the artist whose name comes first alphabetically.", "type": "Local"}
{"instance_id": "local258", "instruction": "What are the total wickets taken by each bowler, their economy rate, their strike rate, and their best performance in a single match (most wickets taken, in the format \"wickets-runs\")? Ignore the extra runs data.", "type": "Local"}
{"instance_id": "local063", "instruction": "Which product has the smallest change in sales share for each product from the top 20% of products by total sales between Q4 in 2019 and 2020 in US without any promotion?", "type": "Local"}
{"instance_id": "local097", "instruction": "Could you analyze our data and identify which any consecutive ten-year period had the largest number of films? Only output the start year and the total count for that specific period.", "type": "Local"}
{"instance_id": "local064", "instruction": "What is the difference in average month-end balance between the month with the most and the month with the fewest customers having a positive balance in 2020?", "type": "Local"}
{"instance_id": "local269", "instruction": "What is the average total quantity across all final packaging combinations, considering all items contained within each combination?", "type": "Local"}
{"instance_id": "local202", "instruction": "For alien data, how many of the top 10 states by alien population have a higher percentage of friendly aliens than hostile aliens, with an average alien age exceeding 200?", "type": "Local"}
{"instance_id": "local030", "instruction": "Can you find the average payments and order counts for the five cities with the lowest total payments from delivered orders?", "type": "Local"}
{"instance_id": "local008", "instruction": "I would like to know the given names of baseball players who have achieved the highest value of games played, runs, hits, and home runs, with their corresponding score values.", "type": "Local"}
{"instance_id": "local037", "instruction": "What are the top three product categories with the highest number of payments in single payment type, and how many payments were made in each category?", "type": "Local"}
{"instance_id": "local039", "instruction": "Please help me find the film category with the highest total rental hours in cities where the city's name either starts with \"A\" or contains a hyphen. ", "type": "Local"}
{"instance_id": "local270", "instruction": "Which packaging containers include items in quantities greater than 500, considering all items contained within each container?", "type": "Local"}
{"instance_id": "local284", "instruction": "Can you generate a summary of our items' loss rates? Include the average loss rate, and also break down the count of items that are below, above, and within one standard deviation from this average.", "type": "Local"}
{"instance_id": "local283", "instruction": "Analyze our match data to identify the name, leagues, and countries of the champion team for each season. Include the total points accumulated by each team.", "type": "Local"}
{"instance_id": "local277", "instruction": "What is the average forecasted annual sales for products 4160 and 7790 for 2018? Use a weighted regression model based on sales data from January 2016, focusing on the first 36 months, with sales adjusted for seasonality during time steps 7 to 30.", "type": "Local"}
{"instance_id": "local073", "instruction": "Let's generate a report for each pizza order that lists the pizza name followed by \": \", then all the ingredients in alphabetical order. If any ingredient is ordered more than once, indicate it with '2x' directly in front of the ingredient without a space.", "type": "Local"}
{"instance_id": "local279", "instruction": "For each product, provide the product_id, month in 2019, and the smallest difference between its ending inventory and minimum required level, based on a monthly inventory adjustment model that includes restocking when levels fall below the minimum.", "type": "Local"}
{"instance_id": "local074", "instruction": "Please generate a summary of the closing balances at the end of each month for each customer transactions, show the monthly changes and monthly cumulative bank account balances. Ensure that even if a customer has no account activity in a given month, the balance for that month is still included in the output.", "type": "Local"}
{"instance_id": "local020", "instruction": "Which bowler has the lowest bowling average per wicket taken?", "type": "Local"}
{"instance_id": "local212", "instruction": "Can you find 5 delivery drivers with the highest average number of daily deliveries?", "type": "Local"}
{"instance_id": "local018", "instruction": "For the most common cause of traffic accidents in 2021, how much did its share (percentage in the annual roal incidents) decrease compared to 10 years earlier?", "type": "Local"}
{"instance_id": "local029", "instruction": "Please calculate the average payment value, city, and state for the top 3 customers with the most delivered orders.", "type": "Local"}
{"instance_id": "local081", "instruction": "How many customers were in each spending group in 1998, and what percentage of the total customer base does each group represent?", "type": "Local"}
{"instance_id": "local075", "instruction": "Can you provide a breakdown of how many times each product was viewed, how many times they were added to the shopping cart, and how many times they were left in the cart without being purchased? Also, give me the count of actual purchases for each product. Ensure that products with a page id in (1, 2, 12, 13) are filtered out.", "type": "Local"}
{"instance_id": "local072", "instruction": "Identify the country with data inserted on nine different days in January 2022. Then, find the longest consecutive period with data insertions for this country during January 2022, and calculate the proportion of entries that are from its capital city within this longest consecutive insertion period.", "type": "Local"}
{"instance_id": "local285", "instruction": "For veg whsle data, can you analyze our financial performance over the years 2020 to 2023? I need insights into the average wholesale price, maximum wholesale price, minimum wholesale price, wholesale price difference, total wholesale price, total selling price, average loss rate, total loss, and profit for each category within each year. Round all calculated values to two decimal places.", "type": "Local"}
{"instance_id": "local017", "instruction": "In which year were the two most common causes of traffic accidents different from those in other years?", "type": "Local"}
{"instance_id": "local028", "instruction": "Could you generate a report that shows the number of delivered orders for each month in the years 2016, 2017, and 2018? Each column represents a year, and each row represents a month", "type": "Local"}
{"instance_id": "local010", "instruction": "Distribute all the unique city pairs into the distance ranges 0, 1000, 2000, 3000, 4000, 5000, and 6000+, based on their average distance of all routes between them. Then how many pairs are there in the distance range with the fewest unique city paires?", "type": "Local"}
{"instance_id": "local026", "instruction": "Please help me find the top 3 bowlers who conceded the maximum runs in a single over, along with the corresponding matches.", "type": "Local"}
{"instance_id": "local019", "instruction": "For the NXT title that had the shortest match (excluding titles with \"title change\"), what were the names of the two wrestlers involved?", "type": "Local"}
{"instance_id": "local021", "instruction": "Could you show me the average total score of strikers who have scored more than 50 runs in at least one match?", "type": "Local"}
{"instance_id": "local198", "instruction": "Using the sales data, what is the median value of total sales made in countries where the number of customers is greater than 4?", "type": "Local"}
{"instance_id": "local196", "instruction": "For the ratings of the first movie rented by customers, please provide the average total spend and the average number of subsequent rentals for each rating category.", "type": "Local"}
{"instance_id": "local131", "instruction": "Could you list each musical style with the number of times it appears as a 1st, 2nd, or 3rd preference in a single row per style?", "type": "Local"}
{"instance_id": "local100", "instruction": "Can you investigate our database and find out how many actors have a 'Shahrukh number' of 2? This means they acted in a film with someone who acted with Shahrukh Khan, but not directly with him.", "type": "Local"}
{"instance_id": "local335", "instruction": "Which five constructors have had the most seasons in the 21st century where their drivers scored the fewest points in a Formula 1 season?", "type": "Local"}
{"instance_id": "local356", "instruction": "Can you tell me the full names of drivers who have been overtaken more times than they have performed overtakes?", "type": "Local"}
{"instance_id": "local163", "instruction": "Which university faculty members' salaries are closest to the average salary for their respective ranks? Please provide the ranks, first names, last names, and salaries.university", "type": "Local"}
{"instance_id": "local197", "instruction": "Can you determine which of our top 10 paying customers had the highest payment difference in any given month? I\u2019d like to know the highest payment difference for this customer, with the result rounded to two decimal places.", "type": "Local"}
{"instance_id": "local358", "instruction": "How many users are there in each age category (20s, 30s, 40s, 50s, and others)?", "type": "Local"}
{"instance_id": "local199", "instruction": "Can you identify the year and month with the highest rental orders created by the store's staff for each store? Please list the store ID, the year, the month, and the total rentals for those dates.", "type": "Local"}
{"instance_id": "local152", "instruction": "Can you provide the top 9 directors by movie count, including their ID, name, number of movies, average inter-movie duration (rounded to the nearest integer), average rating (rounded to 2 decimals), total votes, minimum and maximum ratings, and total movie duration? Sort the output first by movie count in descending order and then by total movie duration in descending order.", "type": "Local"}
{"instance_id": "local360", "instruction": "Identify the sessions with the fewest events lacking both '/detail' clicks and '/complete' conversions, considering only non-empty search types. If multiple sessions share the lowest count, include all of them. For each session, display the associated paths and search types.", "type": "Local"}
{"instance_id": "local302", "instruction": "Analyze the average sales performance impact 12 weeks before and after June 15, 2020, across various attributes like regions, platforms, age bands, demographics, and customer types. Identify and provide the attribute with the highest negative impact on sales, detailing the average percentage change in sales for that attribute.", "type": "Local"}
{"instance_id": "local130", "instruction": "Could you provide a list of last names for all students who completed English courses, including their quintile ranks based on their grades, and sorted from the highest to the lowest grade quintile?", "type": "Local"}
{"instance_id": "local311", "instruction": "Which constructors had the top 3 combined points from their best driver and team, and in which years did they achieve them?", "type": "Local"}
{"instance_id": "local329", "instruction": "How many unique sessions visited the /regist/input page and then the /regist/confirm page, in that order?", "type": "Local"}
{"instance_id": "local170", "instruction": "Which states have a consistently non-zero retention rate for legislators of each gender across every two-year interval (0, 2, 4, 6, 8, 10) during the first 10 years after they begin serving? Please provide state name abbreviation.", "type": "Local"}
{"instance_id": "local141", "instruction": "How did each salesperson's annual total sales compare to their annual sales quota? Provide the difference between their total sales and the quota for each year, organized by salesperson and year.", "type": "Local"}
{"instance_id": "local310", "instruction": "List the three years where the sum of the highest points achieved by any driver and any constructor was the lowest", "type": "Local"}
{"instance_id": "local114", "instruction": "Provide a detailed web sales report for each region, including the number of orders, total sales amount, and the name and sales amount of the top-selling sales representative in each region", "type": "Local"}
{"instance_id": "local344", "instruction": "How many times has each type of overtake occurred in Formula 1?", "type": "Local"}
{"instance_id": "local171", "instruction": "What is the number of male legislators from Louisiana who have served more than 30 years since their first term, grouped by their years of service, for periods less than 50 years?", "type": "Local"}
{"instance_id": "local003", "instruction": "According to the RFM definition document, how much is the average sales per order for each customer within distinct RFM segments, considering only 'delivered' orders? Please rank the customers into segments to analyze differences in average sales across these segments", "type": "Local"}
{"instance_id": "local209", "instruction": "What is the ratio of completed orders to total orders for the store with the highest number of orders?", "type": "Local"}
{"instance_id": "local004", "instruction": "Could you tell me the number of orders, average payment per order and customer lifespan in weeks of the 3 custumers with the highest average payment per order. Attention: I want the lifespan in float number if it's longer than one week, otherwise set it to be 1.0.", "type": "Local"}
{"instance_id": "local032", "instruction": "Could you help me find the sellers respectively with the highest number of distinct customers, highest profit, highest number of distinct orders, and most 5-star ratings, in delivered orders, along with their corresponding values? ", "type": "Local"}
{"instance_id": "local035", "instruction": "Please help me find two adjacent cities with the greatest distance between them.", "type": "Local"}
{"instance_id": "local253", "instruction": "For the salary dataset, create a detailed SQL report that compares the top 5 companies by average salary in Mumbai, Pune, New Delhi, and Hyderabad to the national average salary. The report should include the following columns: Location, Company Name, Average Salary in State, and Average Salary in Country.", "type": "Local"}
{"instance_id": "local061", "instruction": "What is the average monthly projected sales in USD for France in 2021? Please use data from 2019 and 2020 for projection. Ensure all values are converted to USD based on the 2021 exchange rates.", "type": "Local"}
{"instance_id": "local298", "instruction": "For each month, calculate the total balance from all users for the previous month (measured as of the 1st of each month), replacing any negative balances with zero. Ensure that data from the first month is used only as a baseline for calculating previous total balance, and exclude it from the final output. Sort the results in ascending order by month. ", "type": "Local"}
{"instance_id": "local066", "instruction": "Based on our customer pizza order information, summarize the total quantity of each ingredient used in the pizzas we delivered. Output the name and quantity for each ingredient.", "type": "Local"}
{"instance_id": "local059", "instruction": "For the calendar year 2021, what is the overall average quantity sold of the top three best-selling hardware products (by total quantity sold) in each division?", "type": "Local"}
{"instance_id": "local050", "instruction": "What is the median value from average monthly projected sales in USD for France in 2021? Please use data from 2019 and 2020 for projection.", "type": "Local"}
{"instance_id": "local262", "instruction": "Which models have more instances where traditional models perform worse than the Stack model than the total times these models were evaluated across all steps and versions?", "type": "Local"}
{"instance_id": "local068", "instruction": "Calculate the number of new cities inserted each April, May, and June, along with the year-over-year growth percentage for each month from 2021-2023. List the year, month, the total number of cities added, the running cumulative total, and the year-over-year growth percentage (including \"%\") for both the monthly total and the running total. Ensure that 2021 data is used only as a baseline for calculating growth rates, and exclude it from the final output.", "type": "Local"}
{"instance_id": "local034", "instruction": "Could you help me calculate the average of the total payment count for the most preferred payment method in each product category?", "type": "Local"}
{"instance_id": "local201", "instruction": "Identify the first 10 words (of length 4 to 5, starting with 'r') sorted alphabetically that have at least one anagram. Provide the count of anagrams for each word.", "type": "Local"}
{"instance_id": "local230", "instruction": "Determine the top three genres with the most movies rated above 8, and then identify the top four directors who have directed the most films rated above 8 within those genres. List these directors and their respective movie counts.", "type": "Local"}
{"instance_id": "local002", "instruction": "Can you calculate the 5-day symmetric moving average of predicted toy sales for December 5 to 8, 2018, using daily sales data from January 1, 2017, to August 29, 2018, with a simple linear regression model? Provide the total of the moving averages for those four days.", "type": "Local"}
{"instance_id": "local056", "instruction": "Which customer has the highest average monthly change in payment amounts? Provide the customer's full name.", "type": "Local"}
{"instance_id": "local264", "instruction": "Which model category (L1_model) appears the most frequently across all steps and versions when comparing traditional models to the Stack model, and what is the total count of its occurrences?", "type": "Local"}
{"instance_id": "local297", "instruction": "I\u2019d like to know the percentage of users whose closing balances showed a growth rate of more than 5% in the most recent month (measured as of the 1st of each month). If the previous month\u2019s balance is zero, calculate the growth rate by multiplying the current balance by 100.", "type": "Local"}
{"instance_id": "local263", "instruction": "Which L1_model has the highest occurrence for each status ('strong,' where the maximum test score for non-'Stack' models is less than the 'Stack' score, and 'soft,' where it equals the 'Stack' score), and how many times does it occur?", "type": "Local"}
{"instance_id": "local067", "instruction": "Can you provide the highest and lowest profits for Italian customers segmented into ten evenly divided tiers based on their December 2021 sales profits?", "type": "Local"}
{"instance_id": "local058", "instruction": "Can you provide a list of hardware product segments along with their unique product counts for 2020 in the output, ordered by the highest percentage increase in unique fact sales products from 2020 to 2021?", "type": "Local"}
{"instance_id": "local060", "instruction": "What is the change in each product\u2019s share of total sales in the top 20% of products (by sales), between Q4 of 2019 and 2020, in the US? Include only products with no promotions in Q4 2019 or Q4 2020, and provide results in decreasing order of the change in sales share.", "type": "Local"}
{"instance_id": "local299", "instruction": "Could you calculate each user\u2019s average balance over the past 30 days, computed daily? Then, for each month (based on the 1st of each month), find the highest of these daily averages for each user. Add up these maximum values across all users for each month as the final result. Please use the first month as a baseline for previous balances and exclude it from the output.", "type": "Local"}