Basic Usage

Usage Guide - tracking.exposed dashboard

How do i retrieve my authentication token?

Install fbTREX web-extension;
Have a valid Facebook account, and use it with the browser where the fbTREX extension is installed;
Click on "Your data" section, this will open an URL in your browser;
The token is part of the URL, for a simple copy-paste, click on the tab "Control your data".

Configuration

You have to create the files manually. The files are stored in the folder config and must be called from the folder with the option -c config/filename. The default format is ArgParse format: --arg value for each line,for example --name JohnDoe. Config files should contain at least a value for --token. For each user, a separate file should be created. It can contain the values of name, token and path (default savepath).
If you create the config files yourself, you can refer to this page for the full description of valid syntax: https://pypi.org/project/ConfigArgParse/ Example:

--name JohnDoe
--token abfbc56478cfa6857af7857899c86f8f99f960a96
--path /home/ubuntu/dashboard/outputs/johnny/

Scripts

download_facebook.py

The file download_facebook.py use the fbTREX APIs to download a specified subset of the data available for a token, and saves a csv or json in the outputs folder. The default amount of posts retrieved is 400. This dataset will be used with the ipython (so you don't have to download your data each the time). There is a 10 minutes cache on the API so if you don't see the data you are looking for just wait or delete the .apicache folder. This function creates both a facebook_personal and facebook_labels csvs, which can be used within the respective notebooks. You can optionally generate a wordcloud with the texts for that timeframe.

Example

python3 src/download.py --token 00a0a0a000a0a000a0a0a0a0a0a0a0a or python3 src/download.py -c config/JohnDoe -a 1000

Arguments

  -h, --help            show this help message and exit
  -n NAME, --name NAME  name for your facebook profile
  -t TOKEN, --token TOKEN
                        token of your fbtrex user
  -c CONFIG, --config CONFIG
                        config file path
  -p PATH, --path PATH  path to save to (default "outputs")
  -s START, --start START
                        start date for harmonizer. default is a week ago
  -e END, --end END     end date for harmonizer. default is today.
  --no-csv              do not create a csv, creates a json instead
  -a AMOUNT, --amount AMOUNT
                        amount of entries to fetch from api
  --skip SKIP           amount of entries to skip
  --no-labels           do not create a csv with labels
  --wordcloud           generate a wordcloud

status.py

To log the status of data collection for a token. This is meant to be used in the context of synthetic data extraction with Facebook profiles that make use of the autoscroll script. It should be used along with crontab to produce .log files in the output folder. It logs parser errors, downtime(s) and other issues, such as low post collection.

Example

python3 src/status.py --token yourtokenhere or python3 src/status.py -c config/JohnDoe

Arguments

  -h, --help            show this help message and exit
  -n NAME, --name NAME  name for your facebook profile
  -t TOKEN, --token TOKEN
                        token of your fbtrex user
  -c CONFIG, --config CONFIG
                        config file path
  -p PATH, --path PATH  path to save to (default "outputs")

info.py

To get useful information about your own user and your usage of Facebook. It retrieves the time-frame of the data you have pulled from the API, the total time you spent on Facebook during that time-frame, your top sources of information, the percentage of sponsored posts, the estimate of the time you spent watching ads, and the most seen posts. Optionally, it produces a wordcloud of based on the text of the posts you have seen. If you want to download the data as text file, just add > info.txt after the command you are launching (eg. python3 src/info.py --token tokenhere > outputs/info.txt).

Example

python3 src/info.py --token yourtokenhere or python3 src/info.py --c JohnDoe --top 20 --wordcloud -a 3600

Arguments

  -h, --help            show this help message and exit
  -n NAME, --name NAME  name for your facebook profile
  -t TOKEN, --token TOKEN
                        token of your fbtrex user
  -c CONFIG, --config CONFIG
                        config file path
  -p PATH, --path PATH  path to save to (default "outputs")
  -a AMOUNT, --amount AMOUNT
                        amount of entries to fetch from api
  --skip SKIP           amount of entries to skip
  --top TOP             number of top sources to retrieve
  --wordcloud           creates a wordcloud and opens it

combine.py

In the context of the EU19 initiative, we wanted to analyze how the posts of two main sources of information (for example, of two politically opposite mainstream media pages in a country) are being served and get propagated across different users by the algorithm. This script outputs a "combined" csv file that can be used in this specific analysis. Please remember that this script needs:

a folder with different download.py outputs (--users) as well
another folder (--sources) with the outputs of fbCRAWL (you can google it or search for it in github) for the specific sources.

The csv files are being aggregated automatically, but you should make sure that the timeframe of each user and each source is consistent, and then trim the data with the --start and --end arguments. Also, it is mandatory for the script to work properly to include the exact name of the sources as displayed on Facebook (not the @username) with -s1 and -s2 arguments. See the example for more details.

Example

python3 src/combine.py --sources dataset/sources/ --users dataset/users/ --start 2019-05-06 --end 2019-05-14 s1 ABC.es -s2 eldiario.es

Arguments

  -h, --help            show this help message and exit
  -p PATH, --path PATH  path to save to (default "outputs")
  --start START         start date for harmonizer. default is a week ago
  -e END, --end END     end date for harmonizer. default is tomorrow.
  --sources SOURCES     directory containing ONLY csv files from FBcrawl, used
                        to merge with user data
  --users USERS         directory containing ONLY csv files from fbtrex, used
                        to merge with user data
  -s1 SOURCE1, --source1 SOURCE1
                        string of the exact displayname for the first source
  -s2 SOURCE2, --source2 SOURCE2
                        string of the exact displayname for the second source

venn.py

In the context of the EU19 initiative, we want to be able to compare the differences between different user bubbles or groups of users bubbles. This script generates venn diagrams in svg format. Remember to make sure the the amount of data you pull from the API using download_facebook.py with -a\--amount is enough to cover the same timeframe that is being covered by the fbCRAWL output files in --sources. It outputs a venn diagram image as an svg file in the folder outputs.

Example

python3 src/venn.py --sources dataset/sources/ --start 2019-05-15 --end 2019-05-20 -s1 ABC.es -s2 eldiario.es --user1 path_to_csv1 --user2 path_to_csv2

Arguments

  -h, --help            show this help message and exit
  -p PATH, --path PATH  path to save to (default "outputs")
  --start START         start date for harmonizer. default is since beginning
  -e END, --end END     end date for harmonizer. default is today.
  --sources SOURCES     directory containing ONLY csv files from FBcrawl, used
                        to merge with user data
  --user1 USER1         summary.csv file for user 1 that you want to compare
  --user2 USER2         summary.csv file for user 2 that you want to compare
  -s1 SOURCE1, --source1 SOURCE1
                        string of the exact displayName for the first source
  -s2 SOURCE2, --source2 SOURCE2
                        string of the exact displayName for the second source

download_youtube.py

This script downloads the data from ytTREX APIs: personal, last, video and related. --last returns the last 60 contributions received by the community. It is cached for 2 minutes, the first request after cache expires would define the next 60. --personal download personal data with your ID, must specify a token for your ytTREX user which can be retrieved by clicking on your extension icon. --video takes the id from a YouTube url, e.g.(https://www.youtube.com/watch?v=)4-VLmhQSzr8, and returns data containing all the different related videos observed when the same videoId have been watched by ytTREX contributors. Maximum 110 evidences are returned. --related RELATED takes the id from a YouTube url, and retrieves data about all the times a videoId was in the related list of other videos, so you can see where videoId was suggested.

Example

python3 src/youtube.py --personal 7498595af879785b0c50875c087f5a50f50a
python3 src/youtube.py --last
python3 src/youtube.py --video dQw4w9WgXcQ --related dQw4w9WgXcQ --last

Arguments

  -h, --help            show this help message and exit
  -p PATH, --path PATH  path to save to (default "outputs")
  --no-csv              do not create a csv, creates a json instead
  --personal PERSONAL   download personal data with your ID, must specify a
                        token for your ytTREX user which can be retrieved by
                        clicking on your extension icon.
  --last                the last 60 contributions received by the community.
                        It is cached for 2 minutes, the first request after
                        cache expires would define the next 60.
  --video VIDEO         it takes the id from a YouTube url, e.g.
                        (https://www.youtube.com/watch?v=)4-VLmhQSzr8, and
                        returns data containing all the different related
                        videos observed when the same videoId have been
                        watched by ytTREX contributors. Maximum 110 evidences
                        are returned.
  --related RELATED     it takes the id from a YouTube url, and retrieves data
                        about all the times a videoId was in the related list
                        of other videos, so you can see where videoId was
                        suggested.

Ipython Notebooks

After you installed notebook (included in requirements.txt) you should be able to navigate to the dashboard folder and start the notebook (if you installed the dependencies in virtualenv remember to activate it first!!!):

cd path/to/dashboard/
jupyter notebook

This command will launch a browser web page, from which you can open the notebooks.

facebook_summary.ipynb

You can start here. This script gives a panoramic of what can be done with the fbTREX data. It gives you a rapid introduction to the functions developed in the package. To run the script properly, enter your csv file in the empty variable of the first cell. Then press "run all". If a graph or output is not shown properly, you can re-run the single cell. Add your cells and make your own experiments!

facebook_labels.ipynb

Works with the output of facebook_labels.py, takes up to two users to make comparisons.

facebook_topics.py

Topical content modeling, the original dataset is not included but you will have an example dataset which gives an hint of how it works. Takes time to produce the outputs, so be patient when you run it.

youtube_video.py

Works with the output of download_youtube.py, compares the related videos of different users based on a video.

facebook_combined.ipynb

This script visualizes the output of combine.py, so you will need to run that script first and copy the path of the output file. To run this script, enter the path to the combined.csv file in the empty variable of the first cell. An interactive visualization will pop up. You can add cells and make your own experiments with altairs declarative visualizations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic Usage

Usage Guide - tracking.exposed dashboard

How do i retrieve my authentication token?

Configuration

Scripts

download_facebook.py

Example

Arguments

status.py

Example

Arguments

info.py

Example

Arguments

combine.py

Example

Arguments

venn.py

Example

Arguments

download_youtube.py

Example

Arguments

Ipython Notebooks

facebook_summary.ipynb

facebook_labels.ipynb

facebook_topics.py

youtube_video.py

facebook_combined.ipynb

Clone this wiki locally