-
Notifications
You must be signed in to change notification settings - Fork 5
Basic Usage
- Install fbTREX web-extension;
- Have a valid Facebook account, and use it with the browser where the fbTREX extension is installed;
- Click on "Your data" section, this will open an URL in your browser;
- The token is part of the URL, for a simple copy-paste, click on the tab "Control your data".
You have to create the files manually. The files are stored in the folder config and must be called from the folder with the option -c config/filename
. The default format is ArgParse format: --arg value
for each line,for example --name JohnDoe
. Config files should contain at least a value for --token
. For each user, a separate file should be created. It can contain the values of name
, token
and path
(default savepath).
If you create the config files yourself, you can refer to this page for the full description of valid syntax: https://pypi.org/project/ConfigArgParse/
Example:
--name JohnDoe
--token abfbc56478cfa6857af7857899c86f8f99f960a96
--path /home/ubuntu/dashboard/outputs/johnny/
The file download_facebook.py use the fbTREX APIs to download a specified subset of the data available for a token, and saves a csv or json in the outputs folder. The default amount of posts retrieved is 400. This dataset will be used with the ipython (so you don't have to download your data each the time). There is a 10 minutes cache on the API so if you don't see the data you are looking for just wait or delete the .apicache
folder. This function creates both a facebook_personal and facebook_labels csvs, which can be used within the respective notebooks. You can optionally generate a wordcloud with the texts for that timeframe.
python3 src/download.py --token 00a0a0a000a0a000a0a0a0a0a0a0a0a
or
python3 src/download.py -c config/JohnDoe -a 1000
-h, --help show this help message and exit
-n NAME, --name NAME name for your facebook profile
-t TOKEN, --token TOKEN
token of your fbtrex user
-c CONFIG, --config CONFIG
config file path
-p PATH, --path PATH path to save to (default "outputs")
-s START, --start START
start date for harmonizer. default is a week ago
-e END, --end END end date for harmonizer. default is today.
--no-csv do not create a csv, creates a json instead
-a AMOUNT, --amount AMOUNT
amount of entries to fetch from api
--skip SKIP amount of entries to skip
--no-labels do not create a csv with labels
--wordcloud generate a wordcloud
To log the status of data collection for a token. This is meant to be used in the context of synthetic data extraction with Facebook profiles that make use of the autoscroll script. It should be used along with crontab to produce .log files in the output folder. It logs parser errors, downtime(s) and other issues, such as low post collection.
python3 src/status.py --token yourtokenhere
or
python3 src/status.py -c config/JohnDoe
-h, --help show this help message and exit
-n NAME, --name NAME name for your facebook profile
-t TOKEN, --token TOKEN
token of your fbtrex user
-c CONFIG, --config CONFIG
config file path
-p PATH, --path PATH path to save to (default "outputs")
To get useful information about your own user and your usage of Facebook. It retrieves the time-frame of the data you have pulled from the API, the total time you spent on Facebook during that time-frame, your top sources of information, the percentage of sponsored posts, the estimate of the time you spent watching ads, and the most seen posts. Optionally, it produces a wordcloud of based on the text of the posts you have seen. If you want to download the data as text file, just add > info.txt
after the command you are launching (eg. python3 src/info.py --token tokenhere > outputs/info.txt
).
python3 src/info.py --token yourtokenhere
or
python3 src/info.py --c JohnDoe --top 20 --wordcloud -a 3600
-h, --help show this help message and exit
-n NAME, --name NAME name for your facebook profile
-t TOKEN, --token TOKEN
token of your fbtrex user
-c CONFIG, --config CONFIG
config file path
-p PATH, --path PATH path to save to (default "outputs")
-a AMOUNT, --amount AMOUNT
amount of entries to fetch from api
--skip SKIP amount of entries to skip
--top TOP number of top sources to retrieve
--wordcloud creates a wordcloud and opens it
In the context of the EU19 initiative, we wanted to analyze how the posts of two main sources of information (for example, of two politically opposite mainstream media pages in a country) are being served and get propagated across different users by the algorithm. This script outputs a "combined" csv file that can be used in this specific analysis. Please remember that this script needs:
- a folder with different
download.py
outputs (--users
) as well - another folder (
--sources
) with the outputs of fbCRAWL (you can google it or search for it in github) for the specific sources.
The csv files are being aggregated automatically, but you should make sure that the timeframe of each user and each source is consistent, and then trim the data with the --start
and --end
arguments. Also, it is mandatory for the script to work properly to include the exact name of the sources as displayed on Facebook (not the @username) with -s1
and -s2
arguments. See the example for more details.
python3 src/combine.py --sources dataset/sources/ --users dataset/users/ --start 2019-05-06 --end 2019-05-14 s1 ABC.es -s2 eldiario.es
-h, --help show this help message and exit
-p PATH, --path PATH path to save to (default "outputs")
--start START start date for harmonizer. default is a week ago
-e END, --end END end date for harmonizer. default is tomorrow.
--sources SOURCES directory containing ONLY csv files from FBcrawl, used
to merge with user data
--users USERS directory containing ONLY csv files from fbtrex, used
to merge with user data
-s1 SOURCE1, --source1 SOURCE1
string of the exact displayname for the first source
-s2 SOURCE2, --source2 SOURCE2
string of the exact displayname for the second source
In the context of the EU19 initiative, we want to be able to compare the differences between different user bubbles or groups of users bubbles. This script generates venn diagrams in svg format. Remember to make sure the the amount of data you pull from the API using download_facebook.py with -a\--amount
is enough to cover the same timeframe that is being covered by the fbCRAWL output files in --sources
. It outputs a venn diagram image as an svg file in the folder outputs.
python3 src/venn.py --sources dataset/sources/ --start 2019-05-15 --end 2019-05-20 -s1 ABC.es -s2 eldiario.es --user1 path_to_csv1 --user2 path_to_csv2
-h, --help show this help message and exit
-p PATH, --path PATH path to save to (default "outputs")
--start START start date for harmonizer. default is since beginning
-e END, --end END end date for harmonizer. default is today.
--sources SOURCES directory containing ONLY csv files from FBcrawl, used
to merge with user data
--user1 USER1 summary.csv file for user 1 that you want to compare
--user2 USER2 summary.csv file for user 2 that you want to compare
-s1 SOURCE1, --source1 SOURCE1
string of the exact displayName for the first source
-s2 SOURCE2, --source2 SOURCE2
string of the exact displayName for the second source
This script downloads the data from ytTREX APIs: personal, last, video and related. --last
returns the last 60 contributions received by the community. It is cached for 2 minutes, the first request after cache expires would define the next 60. --personal
download personal data with your ID, must specify a token for your ytTREX user which can be retrieved by clicking on your extension icon. --video
takes the id from a YouTube url, e.g.(https://www.youtube.com/watch?v=)4-VLmhQSzr8, and returns data containing all the different related videos observed when the same videoId have been watched by ytTREX contributors. Maximum 110 evidences are returned. --related RELATED
takes the id from a YouTube url, and retrieves data about all the times a videoId was in the related list of other videos, so you can see where videoId was suggested.
python3 src/youtube.py --personal 7498595af879785b0c50875c087f5a50f50a
python3 src/youtube.py --last
python3 src/youtube.py --video dQw4w9WgXcQ --related dQw4w9WgXcQ --last
-h, --help show this help message and exit
-p PATH, --path PATH path to save to (default "outputs")
--no-csv do not create a csv, creates a json instead
--personal PERSONAL download personal data with your ID, must specify a
token for your ytTREX user which can be retrieved by
clicking on your extension icon.
--last the last 60 contributions received by the community.
It is cached for 2 minutes, the first request after
cache expires would define the next 60.
--video VIDEO it takes the id from a YouTube url, e.g.
(https://www.youtube.com/watch?v=)4-VLmhQSzr8, and
returns data containing all the different related
videos observed when the same videoId have been
watched by ytTREX contributors. Maximum 110 evidences
are returned.
--related RELATED it takes the id from a YouTube url, and retrieves data
about all the times a videoId was in the related list
of other videos, so you can see where videoId was
suggested.
After you installed notebook (included in requirements.txt) you should be able to navigate to the dashboard folder and start the notebook (if you installed the dependencies in virtualenv remember to activate it first!!!):
cd path/to/dashboard/
jupyter notebook
This command will launch a browser web page, from which you can open the notebooks.
You can start here. This script gives a panoramic of what can be done with the fbTREX data. It gives you a rapid introduction to the functions developed in the package. To run the script properly, enter your csv file in the empty variable of the first cell. Then press "run all". If a graph or output is not shown properly, you can re-run the single cell. Add your cells and make your own experiments!
Works with the output of facebook_labels.py, takes up to two users to make comparisons.
Topical content modeling, the original dataset is not included but you will have an example dataset which gives an hint of how it works. Takes time to produce the outputs, so be patient when you run it.
Works with the output of download_youtube.py, compares the related videos of different users based on a video.
This script visualizes the output of combine.py
, so you will need to run that script first and copy the path of the output file. To run this script, enter the path to the combined.csv file in the empty variable of the first cell. An interactive visualization will pop up. You can add cells and make your own experiments with altairs declarative visualizations.