-
Notifications
You must be signed in to change notification settings - Fork 8
Tutorial
This tutorial is split in two parts.
- The first part explains how to use minifold primitives on list of dictionaries. This part is especially if you are not used with SQL.
- The second part illustrates how to build a minifold pipelines using connectors. Such pipelines to separate user needs (the
Query
) and the processing required to obtain the dictionaries. For instance, through a single pipeline, you can query several end points and aggregate their results corresponding to a given query.
This section starts with a simple example to present minifold primitives. You can run ipython3
and copy/paste the following lines of code to try by yourself.
Minifold primitives process a list of dictionaries supposed to share the same set of keys, and returns a list of dictionaries.
users = [
{
"firstname" : "John",
"lastname" : "Doe"
}, {
"firstname" : "John",
"lastname" : "Connor"
}, {
"firstname" : "Peter",
"lastname" : "Parker"
}
]
Now, let's see some minifold primitives.
Suppose you want to fetch last names. Run:
from pprint import pprint
from minifold.select import select
pprint(select(users, ["lastname"]))
Result:
[{'lastname': 'Doe'}, {'lastname': 'Connor'}, {'lastname': 'Parker'}]
Similarly, you could get firstnames as follows:
pprint(select(users, ["firstname"]))
Result:
[{'firstname': 'John'}, {'firstname': 'John'}, {'firstname': 'Peter'}]
Suppose you want to get only distinct lastnames:
from pprint import pprint
from minifold.select import select
from minifold.unique import unique
pprint(
unique(
["firstname"],
select(users, ["firstname"])
)
)
Result:
[{'firstname': 'John'}, {'firstname': 'Peter'}]
Suppose you only want to keep users having the firstname "John":
- Using a dedicated function:
from minifold.where import where
def my_filter(user :dict) -> bool:
return user["firstname"] == "John"
pprint(where(users, my_filter))
- Using a lambda function:
from minifold.where import where
pprint(where(users, lambda user: user["firstname"] == "John"))
Result:
[{'firstname': 'John', 'lastname': 'Doe'},
{'firstname': 'John', 'lastname': 'Connor'}]
Supposed you want to add a key is_spiderman
in each dictionary, and you want the corresponding value to be True
iff the record is related to Peter Parker.
from minifold.lambdas import lambdas
pprint(
lambdas(
{
"is_spiderman" : lambda user: user["firstname"] == "Peter" \
and user["lastname"] == "Parker"
},
users
)
)
Result:
[{'firstname': 'John', 'is_spiderman': False, 'lastname': 'Doe'},
{'firstname': 'John', 'is_spiderman': False, 'lastname': 'Connor'},
{'firstname': 'Peter', 'is_spiderman': True, 'lastname': 'Parker'}]
To discover other primitives, visit the Framework page.
We will start with the simplest connector: EntriesConnector
. This is just a wrapper around a collection of dictionaries. Let's start from the previous example:
from minifold.entries_connector import EntriesConnector
users = [
{
"firstname" : "John",
"lastname" : "Doe"
}, {
"firstname" : "John",
"lastname" : "Connor"
}, {
"firstname" : "Peter",
"lastname" : "Parker"
}
]
connector = EntriesConnector(users)
You can now query this connector using the query
method. As usual, it returns a list of dictionaries. By default, a Query
fetches everything.
from pprint import pprint
from minifold.query import Query
q = Query()
entries = connector.query(q)
pprint(entries)
Result:
[{'firstname': 'John', 'lastname': 'Doe'},
{'firstname': 'John', 'lastname': 'Connor'},
{'firstname': 'Peter', 'lastname': 'Parker'}]
Query
object can transport "instructions" to indicate which dictionaries you're interested in. You can basically get subset of keys (using attributes
parameter), dictionaries matching some constraints (using filters
etc.).
from pprint import pprint
from minifold.query import Query
q = Query(
attributes = ["lastname"],
filters = lambda user: user["firstname"] == "John"
)
entries = connector.query(q)
pprint(entries)
Result:
[{'lastname': 'Doe'}, {'lastname': 'Connor'}]
Suppose we now want to build a pipeline in charge of returning the set of distinct firstnames appearing in two collections. To this end:
- We need to wrap those two collections, using
EntriesConnector
. - We need to merge them, using
UnionConnector
- We can keep only firstnames, using
SelectConnector
, depending on if we want to keep lastnames or not. - We can remove duplicates, using
UniqueConnector
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from pprint import pprint
from minifold.entries_connector import EntriesConnector
from minifold.query import Query
from minifold.union import UnionConnector
from minifold.unique import UniqueConnector
boys = [
{
"firstname" : "John",
"lastname" : "Doe"
}, {
"firstname" : "John",
"lastname" : "Connor"
}, {
"firstname" : "Peter",
"lastname" : "Parker"
}
]
girls = [
{
"firstname" : "Sarah",
"lastname" : "Connor"
}, {
"firstname" : "Jane",
"lastname" : "Doe"
}
]
pipeline = UniqueConnector(
["firstname"],
UnionConnector([
EntriesConnector(boys),
EntriesConnector(girls)
])
)
Let's run a simple query:
q = Query()
entries = pipeline.query(q)
pprint(entries)
Results:
[{'firstname': 'John', 'lastname': 'Doe'},
{'firstname': 'Peter', 'lastname': 'Parker'},
{'firstname': 'Sarah', 'lastname': 'Connor'},
{'firstname': 'Jane', 'lastname': 'Doe'}]
Let's run a more evolved query:
q = Query(attributes = ["firstname"])
entries = pipeline.query(q)
pprint(entries)
Results:
[{'firstname': 'John'},
{'firstname': 'Peter'},
{'firstname': 'Sarah'},
{'firstname': 'Jane'}]
Now, suppose you to build another pipeline which add gender key on top of those two collections. This can be done using LambdasConnector
. Here, we assume that boys
and girls
are well-separated:
pipeline = UnionConnector([
LambdasConnector(
{"gender" : lambda boy: "male"},
EntriesConnector(boys)
),
LambdasConnector(
{"gender" : lambda girl: "female"},
EntriesConnector(girls)
)
])
q = Query()
entries = pipeline.query(q)
pprint(entries)
Results:
[{'firstname': 'John', 'gender': 'male', 'lastname': 'Doe'},
{'firstname': 'John', 'gender': 'male', 'lastname': 'Connor'},
{'firstname': 'Peter', 'gender': 'male', 'lastname': 'Parker'},
{'firstname': 'Sarah', 'gender': 'female', 'lastname': 'Connor'},
{'firstname': 'Jane', 'gender': 'female', 'lastname': 'Doe'}]
Of course, if your collections are a mix of men and women, you would require a more evolved lambda. from minifold.lambdas import LambdasConnector
def gender(user :dict) -> str:
return "male" if user["firstname"] in {"John", "Peter"} else \
"female" if user["firstname"] in {"Jane", "Sarah"} else \
"?"
pipeline = UnionConnector([
LambdasConnector(
{"gender" : gender},
EntriesConnector(boys)
),
LambdasConnector(
{"gender" : gender},
EntriesConnector(girls)
)
])
q = Query()
entries = pipeline.query(q)
pprint(entries)
Results:
[{'firstname': 'John', 'gender': 'male', 'lastname': 'Doe'},
{'firstname': 'John', 'gender': 'male', 'lastname': 'Connor'},
{'firstname': 'Peter', 'gender': 'male', 'lastname': 'Parker'},
{'firstname': 'Sarah', 'gender': 'female', 'lastname': 'Connor'},
{'firstname': 'Jane', 'gender': 'female', 'lastname': 'Doe'}]
The principe remains the same. Instead of using EntriesConnector
, you just rely on other connectors, depending on the nature of the data source.
- If the data source is remote, it is a good idea to use
CacheConnector
. Hence, you avoid to run to many query to API that could blacklist you and you improve the performance of your application. Browse this page to discover the full list of connectors. - If the data source requires credentials to get accessed, it is a good idea to configure a template using
Config
. Hence, your credentials are not hard-coded in your script. Browse this page to discover how to configure templates.
- I advise you to start with a simple connector, e.g.
EntriesConnector
to see a minimal example. - Then, as an exercise, copy this file and try to redevelop
JsonConnector
usingjson
package. - Once you're satisfied, compare your implementation and the minifold one. If everything is clear, feel free to see how more complex connectors have been implemented.
Good luck!