- Setup: how to set up for this repo
- Importers: how to create LifeLog entries from several data sources.
- Sample Dataset: a sampled set of anonymized data for testing
- Data Visualization: a ReactJS-based visualization frontend of the personal timeline
- Question Answering: a LLM-based QA engine over the personal timeline
- TimelineQA: a synthetic benchmark for evaluating personal timeline QA systems
-
Install Docker Desktop from this link.
-
Follow install steps and use the Desktop app to start the docker engine.
-
Install
git-lfs
and clone the repo. You may need a conda env to do that:
conda create -n personal-timeline python=3.10
conda activate personal-timeline
conda install -c conda-forge git-lfs
git lfs install
git clone https://github.com/facebookresearch/personal-timeline
cd personal-timeline
- Run init script (needs python)
sh src/init.sh
This will create a bunch of files/folders/symlinks needed for running the app.
This will also create a new directory under your home folder ~/personal-data
, the directory where your personal data will reside.
Ingestion configs are controlled via parameters in conf/ingest.conf
file. The configurations
are defaulted for optimized processing and don't need to be changed.
You can adjust values for these parameters to run importer with a different configuration.
- To set up a Google Map API (free), follow these instructions.
Copy the following lines to env/frontend.env.list
:
GOOGLE_MAP_API=<the API key goes here>
- To embed Spotify, you need to set up a Spotify API (free) following here. You need to log in with a Spotify account, create a project, and show the
secret
.
Copy the following lines to env/frontend.env.list
:
SPOTIFY_TOKEN=<the token goes here>
SPOTIFY_SECRET=<the secret goes here>
Set up an OpenAI API following these instructions.
Copy the following line to env/frontend.env.list
:
OPENAI_API_KEY=<the API key goes here>
We currently support 9 data sources. Here is a summary table:
Digital Services | Instructions | Destinations | Use cases |
---|---|---|---|
Apple Health | Link | personal-data/apple-health | Exercise patterns, calorie counts |
Amazon | Link | personal-data/amazon | Product recommendation, purchase history summarization |
Amazon Kindle | Link | personal-data/amazon-kindle | Book recommendation |
Spotify | Link | personal-data/spotify | Music / streaming recommendation |
Venmo | Link | personal-data/venmo | Monthly spend summarization |
Libby | Link | personal-data/libby | Book recommendation |
Google Photos | Link | personal-data/google_photos | Food recommendation, Object detections, and more |
Google Location | Link | personal-data/google-timeline/Location History/Semantic Location History | Location tracking / visualization |
Facebook posts | Link | personal-data/facebook | Question-Answering over FB posts / photos |
If you have a different data source not listed above, follow the instructions here to add this data source to the importer.
- You can download your Google photos and location (also Gmail, map and google calendar) data from Google Takeout.
- The download from Google Takeout would be in multiple zip files. Unzip all the files.
- For Google photos, move all the unzipped folders inside
~/personal-data/google_photos/
. There can be any number of sub-folders undergoogle_photos
. - For Google locations, move the unzipped files to
personal-data/google-timeline/Location History/Semantic Location History
.
- Go to Facebook Settings
- Click on Download your information and download FB data in JSON format
- Unzip the downloaded file and copy the directory
posts
sub-folder to~/personal-data/facebook
. Theposts
folder would sit directly under the Facebook folder.
- Go to the Apple Health app on your phone and ask to export your data. This will create a file called iwatch.xml and that's the input file to the importer.
- Move the downloaded file to this
~/personal-data/apple-health
- Request your data from Amazon here: https://www.amazon.com/gp/help/customer/display.html?nodeId=GXPU3YPMBZQRWZK2 They say it can take up to 30 days, but it took about 2 days. They'll email you when it's ready.
They separate Amazon purchases from Kindle purchases into two different directories.
The file you need for Amazon purchases is Retail.OrderHistory.1.csv The file you need for Kindle purchases is Digital Items.csv
- Move data for Amazon purchases to
~/personal-data/amazon
folder and of kindle downloads to~/personal-data/amazon-kindle
folder
-
Download your data from Venmo here -- https://help.venmo.com/hc/en-us/articles/360016096974-Transaction-History
-
Move the data into
~/personal-data/venmo
folder.
-
Download your data from Libby here -- https://libbyapp.com/timeline/activities. Click on
Actions
thenExport Timeline
-
Move the data into
~/personal-data/libby
folder.
-
Download your data from Spotify here -- https://support.spotify.com/us/article/data-rights-and-privacy-settings/ They say it can take up to 30 days, but it took about 2 days. They'll email you when it's ready.
-
Move the data into
~/personal-data/spotify
folder.
Now that we have all the data and setting in place, we can either run individual steps or the end-to-end system. This will import your photo data to SQLite (this is what will go into the episodic database), build summaries and make data available for visualization and search.
Running the Ingestion container will add two types of file to ~/personal-data/app_data
folder
- Import your data to an SQLite DB named
raw_data.db
- Export your personal data into csv files such as
books.csv
,exercise.csv
, etc.
To run the pipeline end-to-end (with frontend and QA backend), simply run
docker-compose up -d --build
You can also run ingestion, visualization, and the QA engine separately. To start data ingestion, use
docker-compose up -d backend --build
Once the docker command is run, you can see running containers for backend and frontend in the docker for Mac UI. Copy the container Id for ingest and see logs by running the following command:
docker logs -f <container_id>
To start the visualization frontend:
docker-compose up -d frontend --build
Running the Frontend will start a ReactJS UI at http://localhost:3000
. See here for more details.
We provide an anonymized digital data dataset for testing the UI and QA system, see here for more details.
The QA engine is based on PostText, a QA system for answering queries that require computing aggregates over personal data.
PostText Reference --- https://arxiv.org/abs/2306.01061:
@article{tan2023posttext,
title={Reimagining Retrieval Augmented Language Models for Answering Queries},
author={Wang-Chiew Tan and Yuliang Li and Pedro Rodriguez and Richard James and Xi Victoria Lin and Alon Halevy and Scott Yih},
journal={arXiv preprint:2306.01061},
year={2023},
}
To start the QA engine, run:
docker-compose up -d qa --build
The QA engine will be running on a flask server inside a docker container at http://localhost:8085
.
See here for more details.
There are 3 options for the QA engine.
- ChatGPT: uses OpenAI's gpt-3.5-turbo API without the personal timeline as context. It answers world knowledge question such as
what is the GDP of US in 2021
but not personal questions. - Retrieval-based: answers question by retrieving the top-k most relevant episodes from the personal timeline as the LLM's context. It can answer questions over the personal timeline such as
show me some plants in my neighborhood
. - View-based: translates the input question to a (customized) SQL query over tabular views (e.g., books, exercise, etc.) of the personal timeline. This QA engine is good at answering aggregate queries (
how many books did I purchase?
) and min/max queries (when was the last time I travel to Japan
).
Example questions you may try:
Show me some photos of plants in my neighborhood
Which cities did I visit when I traveled to Japan?
How many books did I purchase in April?
TimelineQA is a synthetic benchmark for accelerating progress on querying personal timelines. TimelineQA generates lifelogs of imaginary people. The episodes in the lifelog range from major life episodes such as high school graduation to those that occur on a daily basis such as going for a run. We have evaluated SOTA models for atomic and multi-hop QA on the benchmark.
Please check out the TimelineQA github repo and the TimelineQA paper --- https://arxiv.org/abs/2306.01061:
@article{tan2023timelineqa,
title={TimelineQA: A Benchmark for Question Answering over Timelines},
author={Tan, Wang-Chiew and Dwivedi-Yu, Jane and Li, Yuliang and Mathias, Lambert and Saeidi, Marzieh and Yan, Jing Nathan and Halevy, Alon Y},
journal={arXiv preprint arXiv:2306.01069},
year={2023}
}
The codebase is licensed under the Apache 2.0 license.
See contributing and the code of conduct.
We'd like to thank the following contributors for their contributions to this project:
- Tripti Singh
- Design and implementation of the sqlite DB backend
- Designing a pluggable data import and enrichment layer and building the pipeline orchestrator.
- Importers for all six data sources
- Generic csv and json data sources importer with instructions
- Dockerization
- Contributing in Documentation
- Wang-Chiew Tan
- Implementation of the PostText query engine
- Pierre Moulon for providing open-sourcing guidelines and suggestions