-
Notifications
You must be signed in to change notification settings - Fork 705
SQL for machine learning
Yi Wang edited this page Oct 19, 2018
·
3 revisions
Let us start with a very simple case.
Suppose that we want to regress the salary with respect the age and the gender, we'd train a model using the following SQL statement:
SELECT age, gender, salary
FROM engineer_info, engineer_payment
WHERE engineer_info.id = engieer_payment.id
TRAIN DNNRegressor
WITH hidden_units = [10, 30]
COLUMN clip(age, 18, 65), gender, cross(clip(age, 18, 65), gender)
LABEL salary
INTO my_first_model
;
This generates a table my_first_model
, which encode
- inputs: age, gender
- columns: clipped age, gender, and the cross of clipped age and gender
- the label: salary
We see that we need both SELECT to specify the fields to retrieve and COLUMN for fields-to-feature mapping.
Given this model, we can infer the salary for any other group of people. For example, the execution of the following statement
SELECT id, age, sex
FROM another_company_employee_info
INFER my_first_model
COLUMN age, vocab(sex, ["Female", "Male"])
LABEL expected_salary
INTO a_new_table
should generate a new table a_new_table
with fields:
- id, age, sex from SELECT, and
- expected_salary from LABEL
Again, we need COLUMN in addition to SELECT to map different field names, and even field values, to the features acceptable by the model.