Skip to content

Commit

Permalink
fix typo + rm dedicated section for df creation
Browse files Browse the repository at this point in the history
  • Loading branch information
etiennekintzler committed Apr 28, 2024
1 parent 1970274 commit b37edb2
Showing 1 changed file with 17 additions and 70 deletions.
87 changes: 17 additions & 70 deletions python/docs/source/tutorials/python_simplified_dftovw_tuto.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "b9a21a43-39ad-4213-9c7f-814bbafd8a54",
"metadata": {},
"outputs": [],
Expand All @@ -22,23 +22,23 @@
},
{
"cell_type": "markdown",
"id": "40e311d2-9ae1-4fde-8f13-36e9e0a76510",
"id": "fc831353-b5aa-4bb0-a928-c47b340397a5",
"metadata": {},
"source": [
"### Dataframe definition"
"### Building the example using `DftoVW.from_column_names`"
]
},
{
"cell_type": "markdown",
"id": "0323cff0-3474-4c4f-8fff-e92d6f14197a",
"id": "c60089f1-ce41-49ee-a3a9-74f0fb2cb34f",
"metadata": {},
"source": [
"Let's create the following pandas dataframe:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"id": "a31118c2-b315-4129-b28a-2ea37d2dae50",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -80,43 +80,20 @@
")"
]
},
{
"cell_type": "markdown",
"id": "fc831353-b5aa-4bb0-a928-c47b340397a5",
"metadata": {},
"source": [
"### Building the example using `DftoVW.from_column_names`"
]
},
{
"cell_type": "markdown",
"id": "473e5c72-ab6c-4d72-a466-7352ec604393",
"metadata": {},
"source": [
"The user build the examples using the class method `DftoVW.from_column_names`. The method is called using the dataframe object (`df`) and its various column names. The conversion to vowpal wabbit examples is then performed by calling the `convert_df` method."
"The user builds the examples using the class method `DftoVW.from_column_names`. The method is called using the dataframe object (`df`) and its various column names. The conversion to vowpal wabbit examples is then performed by calling the `convert_df` method:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"id": "2be83f6c-ecaa-45cb-bb3f-2f47827d6016",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['0 | age:27 marital-status=Separated education=HS-grad occupation=Handlers-cleaners hours-per-week:25',\n",
" '1 | age:34 marital-status=Married-civ-spouse education=Bachelors occupation=Prof-specialty hours-per-week:40',\n",
" '0 | age:44 marital-status=Never-married education=Assoc-voc occupation=Priv-house-serv hours-per-week:25',\n",
" '1 | age:38 marital-status=Married-civ-spouse education=Bachelors occupation=Prof-specialty hours-per-week:60',\n",
" '0 | age:34 marital-status=Married-civ-spouse education=HS-grad occupation=Other-service hours-per-week:36']"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"converter = DFtoVW.from_column_names(\n",
" df=df, y=\"income\", x=[\"age\", \"marital-status\", \"education\", \"occupation\", \"hours-per-week\"], \n",
Expand All @@ -137,7 +114,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"id": "c0269980-78b3-4123-84eb-27e0fba929b4",
"metadata": {},
"outputs": [],
Expand All @@ -162,11 +139,11 @@
"id": "30a526a6-7f8f-48e4-8dca-f9058a0d87fb",
"metadata": {},
"source": [
"The class method `DFtoVW.from_column_names` represents a quick and simple way to build the examples, but if the user needs more control over the way the examples are created, she or he can either the class `Feature` or the class `Namespace` for building features and any of the label class based on the nature of the task (see below). \n",
"The class method `DFtoVW.from_column_names` represents a quick and simple way to build the examples, but if the user needs more control over the way the examples are created, she or he can either use the class `Feature` or the class `Namespace` for building features, and any of the label class available (see below) based on the nature of the task. \n",
"\n",
"- When using `Namespace` class (see https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Namespaces for the meaning) the user specify the name of the namespace with the `name` field, and will pass one or a list of `Feature` object to the `features` field.\n",
"- When using `Namespace` class (see https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Namespaces for the meaning) the user specifies the name of the namespace with the `name` field, and will pass one or a list of `Feature` object to the `features` field.\n",
"\n",
"- The `Feature` class has a `value` field, which is the name of the column. One can also rename the feature using the `rename_feature` field or choose to enforce a type (`\"numerical\"` or `\"categorical\"`) using `as_type` field.\n",
"- The `Feature` class has a `value` field, which is the name of the column. The user can also rename the feature using the `rename_feature` field or choose to enforce a specific type (`\"numerical\"` or `\"categorical\"`) using `as_type` field.\n",
"\n",
"Regarding the labels, multiple classes are available:\n",
"- `SimpleLabel` for regression\n",
Expand All @@ -178,25 +155,10 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"id": "90a69d90-a0a6-42d4-8867-5d1b0e73f4ec",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['0 |ns_sociodemo age:27 marital-status=Separated education=HS-grad |ns_job occupation=Handlers-cleaners hours-per-week:25',\n",
" '1 |ns_sociodemo age:34 marital-status=Married-civ-spouse education=Bachelors |ns_job occupation=Prof-specialty hours-per-week:40',\n",
" '0 |ns_sociodemo age:44 marital-status=Never-married education=Assoc-voc |ns_job occupation=Priv-house-serv hours-per-week:25',\n",
" '1 |ns_sociodemo age:38 marital-status=Married-civ-spouse education=Bachelors |ns_job occupation=Prof-specialty hours-per-week:60',\n",
" '0 |ns_sociodemo age:34 marital-status=Married-civ-spouse education=HS-grad |ns_job occupation=Other-service hours-per-week:36']"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"from vowpalwabbit.dftovw import SimpleLabel, Namespace, Feature\n",
"\n",
Expand All @@ -219,7 +181,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"id": "f0ed661f-d9a0-4ebb-93b8-f5747347c7b4",
"metadata": {},
"outputs": [],
Expand All @@ -245,25 +207,10 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"id": "06aabeab-2365-4f86-bf60-7043b0e59190",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('ns_job', 'occupation', 0.0),\n",
" ('ns_job', 'hours-per-week', 0.0019117757910862565),\n",
" ('ns_sociodemo', 'age', 0.001858704723417759),\n",
" ('ns_sociodemo', 'marital-status', 0.0),\n",
" ('ns_sociodemo', 'education', 0.0)]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"[\n",
" (ns.name, feature.name, model_advanced.get_weight_from_name(feature.name, ns.name))\n",
Expand Down

0 comments on commit b37edb2

Please sign in to comment.