This repository offers a wide range of datasets and queries from open data or our own practices (with necessary desensitization).
Datasets include a large number of typical domains, with diversified data characters (e.g., different column/tuple numbers).
Queries are real SQL statements that support various functionalities, such as feature extraction (), transactions (), and analytical queries (coming soon).
name | description | table number | column number | SQL | source |
---|---|---|---|---|---|
GEF2012-wind-forecasting | Hourly power generation at 7 wind farms | 10 | 61 | kaggle | |
electric-power-consumption | Per capita energy consumption in Morocco | 1 | 9 | kaggle | |
energydata_complete | 2 | 59 | |||
ashrae-energy-prediction | Energy usage from over 1,000 buildings over a three-year timeframe | 5 | 32 | kaggle |
name | description | table number | column number | SQL | source |
---|---|---|---|---|---|
recruit-restaurant-visitor-forecasting | The browsing statistics of two restaurant websites | 8 | 28 | kaggle | |
santander-customer-satisfaction | Hundreds of anonymized features that could reflect whether a customer is satisfied with their banking experience | 1 | 372 | kaggle | |
GiveMeSomeCredit | Credit features of 250,000 borrowers in banking scenario | 1 | 13 | kaggle | |
daily-financial-news | Daily financial news for over 6,000 stocks | 2 | 12 | tianchi | |
restaurant-revenue-prediction | Demographic, real estate, and commercial data for the investments of new restaurant sites | 2 | 85 | kaggle | |
homesite-quote-conversion | An anonymized database of information on customer and sales activity | 2 | 597 | kaggle | |
allstate-claims-severity | Insurance claims for worry-free customer experiences | 3 | 265 | kaggle | |
tiantian | The price-related features constructed using the fund market data downloaded from TianTian Fund website | 1 | 332 | tianchi | |
sberbank-russian-housing-market | Information about overall conditions in the country's economy and finance sector | 4 | 685 | kaggle | |
dow_jones_index | 1 | 16 | |||
robinhood-stock-data | The historical stock price of Robinhood (ticker symbol HOOD) | 1 | 6 | kaggle | |
porto-seguro-safe-driver-prediction | The features that affect an auto insurance policy holder files a claim | 1 | 60 | kaggle | |
amex-default-prediction | 4 | 384 | |||
house-rent-prediction-dataset | Information on almost 4700+ Houses/Apartments/Flats Available for Rent | 1 | 12 | kaggle |
name | description | table number | column number | SQL | source |
---|---|---|---|---|---|
big-data-derby-2022 | A wealth of data is now collected, including measures for heart rate, EKG, longitudinal movement, et al | 3 | 24 | kaggle | |
predict-west-nile-virus | Weather, location, testing, and spraying data | 5 | 51 | kaggle | |
covid19-global-forecasting-week-2 | Statistics of COVID19 cases in various locations across the world | 1 | 6 | kaggle | |
covid19-global-forecasting-week-5 | Statistics of COVID19 cases in various locations across the world | 1 | 9 | kaggle | |
covid19-global-forecasting-week-4 | Statistics of COVID19 cases in various locations across the world | 1 | 6 | kaggle | |
covid19-global-forecasting-week-1 | Statistics of COVID19 cases in various locations across the world | 1 | 8 | kaggle | |
covid19-global-forecasting-week-3 | Statistics of COVID19 cases in various locations across the world | 1 | 6 | kaggle |
name | description | table number | column number | SQL | source |
---|---|---|---|---|---|
facebook-v-predicting-check-ins | 3 | 13 | |||
telstra-recruiting-network | 7 | 18 | |||
twitter-threads | Thread functionality in Twitter | 5 | 35 | tianchi | |
spotify-app-reviews-2022 | Spotify reviews on Google Play Store | 1 | 6 | kaggle |
name | description | table number | column number | SQL | source |
---|---|---|---|---|---|
PRSA2017_Data_20130301-20170228 | 12 | 216 | |||
AirQualityUCI | The responses of a gas multisensor device deployed on the field in an Italian city | 1 | 1 | UCI_ML | |
historicalweatherdataforindiancities | Temperature data (Minimum, Average, Maximum) in degrees Centigrade and Precipitation data | 7 | 34 | kaggle |
name | description | table number | column number | SQL | source |
---|---|---|---|---|---|
store-sales-time-series-forecasting | Dates, store and product information | 5 | 22 | kaggle | |
coupon-purchase-prediction | A year of transactional data for 22,873 users on the site ponpare.jp | 9 | 80 | kaggle | |
grupo-bimbo-inventory-demand | 9 weeks of sales transactions in Mexico | 6 | 28 | kaggle | |
rossmann-store-sales | Historical sales data for 1,115 Rossmann stores | 2 | 19 | kaggle | |
favorita-grocery-sales-forecasting | Dates, store and item information, whether that item was being promoted, as well as the unit sales | 6 | 26 | kaggle | |
walmart-recruiting-store-sales-forecasting | 5 | 26 | |||
walmart-recruiting-sales-in-stormy-weather | Sales data for 111 products whose sales may be affected by the weather (such as milk, bread, umbrellas, etc.) | 4 | 28 | kaggle | |
ecommerce-customerssales-record | Order Statistics | 1 | 41 | kaggle | |
competitive-data-science-predict-future-sales | Daily historical sales data. | 5 | 16 | kaggle | |
m5-forecasting-accuracy | Item sales at stores in various locations for two 28-day time periods | 3 | 1965 | kaggle | |
goods | Public production introduction information | 41 | 807 | ||
material | Historical inventory statistics | 79 | 1265 | ||
orders | Historical order details | 35 | 809 | ||
shopmall | Comments and shelf status of goods | 35 | 809 | ||
transaction | Order details (query only) | 50 | 1069 |
name | description | table number | column number | SQL | source |
---|---|---|---|---|---|
pkdd-15-taxi-trip-time-prediction-ii | 4 | 24 | kaggle | ||
nyc-taxi-trip-duration | NYC Yellow Cab trip record data | 3 | 22 | kaggle | |
taxi-trajectory | A complete year (from 01/07/2013 to 30/06/2014) of the trajectories for all the 442 taxis running in the city of Porto | 1 | 9 | tianchi | |
pkdd-15-predict-taxi-service-trajectory-i | 4 | 25 | kaggle |
name | description | table number | column number | SQL | source |
---|---|---|---|---|---|
talkingdata-mobile-user-demographics | 8 | 34 | kaggle | ||
sf-crime | incidents derived from SFPD Crime Incident Reporting system | 3 | 57 | tianchi | |
detecting-insults-in-social-commentary | Detect social spam, account hacking, bot attacks, and more. | 1 | 5 | kaggle | |
expedia-hotel-recommendations | Customer behavior | 2 | 174 | kaggle | |
nfl-big-data-bowl-2022 | 7 | 113 | |||
airbnb-recruiting-new-user-bookings | Users along with their demographics, web session records, and some summary statistics | 6 | 51 | kaggle | |
unimelb | Information on the investigators who are applying for the grant | 1 | 251 | kaggle | |
Ipin2016Dataset | 8 | 314 | |||
dspp1 | 4 | 19 | |||
lish-moa | 4 | 1488 | |||
foursquare-location-matching | 2 | 38 | |||
bike-sharing-demand | The duration of travel, departure location, arrival location, and time elapsed | 1 | 12 | kaggle | |
web-traffic-time-series-forecasting | 6 | 1363 | |||
web-traffic-time-series-forecasting-1 | 2 | 553 | |||
korean-baseball-pitching-data-1982-2021 | Team pitching data from every season of KBO Baseball | 1 | 34 | kaggle | |
RSSI_dataset | RSSIs obtained on smartphones | 2 | 12 | UCI_ML | |
DontGetKicked | Car information | 2 | 67 | kaggle | |
cyclistic-bike-share-user-dataset-1-year | Cyclistic bikes | 1 | 18 | kaggle | |
data-science-job-salaries | 1 | 12 | |||
Hybrid_Indoor_Positioning | 1 | 67 | UCI_ML |