This webpage contains details about the data accompanying CHI 2021 paper Auditing E-Commerce Platforms for Algorithmically Curated VaccineMisinformation. The data was collected during two sets of audit experiments---Unpersonalized audit and Personalized audit. Through these audit experiments, we investigate the role of search and recommendation algorithms employed by Amazon in surfacing and amplifying vaccine misinformation. In the Unpersonalized audit, we determine the amount of health misinformation users are exposed to when searching for vaccine-related queries. In particular, we examine search-results of 48 search queries belonging to 10 popular vaccine-related topics without logging in to Amazon to eliminate the influence of personalization. Our Unpersonalized audit ran for 15 consecutive days, sorting the search results across 5 different Amazon filters each day: featured, price low to high, price high to low, average customer review and newest arrivals. The first audit resulted in 36,000 search results and 16,815 product page recommendations which we later annotated for their stance on health misinformation—promoting, neutral or debunking.
In our second set of audit---Personalized audit, we determine the impact of personalization due to user history on the amount of health misinformation returned in search results, recommendations and auto-complete suggestions. User history is built progressively over 7 days by performing several real-world actions such as search, search + click, search + click + add to cart, search + click + mark top-rated all positive review as helpful, follow contributor and search on third party website. The second audit resulted in search results and recommendations. The audit data is spread across four files. The description of each file along with their downloadable link is listed below
1. Queries file: filename- queries.csv (download). The file consists of a complete list of 48 search queries used in the audit study. It contains the following fields:-
query:
name of the search querytopic:
name of the vaccine-related search topic
A snippet:
query topic
vaccination book vaccination
2. Unpersonalized Audit
- Unpersonalised search results: filename- unpersonalized_seach_results.csv (download). The file contains a collection of 36,000 search results and their meta data collected over 15 days during our Unpersonalized audit run. It consists of the following fields:-
query:
name of the query searchedtopic:
name of the vaccine-related search topicdate_exp_run:
date on which the search was performedfilter:
name of the Amazon filter used to sort search resultssearch_result_rank:
rank of the search result in the Search Engine Results Page (SERP)URL:
URL of the Amazon producturl_code:
URL code of the Amazon product. This code is extracted from the product URLtitle:
title of the Amazon productcategory:
category of the Amazon productis_prime:
this field indicates whether the product had an Amazon prime batch or notprice:
price of the Amazon productis_sponsored:
this field indicates whether the product is sponsored on Amazonreviews:
number of reviews received by the Amazon productrating:
star rating of the Amazon productdate_of_publishing:
date of publishing of the Amazon productbestseller:
indicates whether the Amazon product is a best-seller or notannotation:
annotation value assigned to the Amazon product. For details on the annotation scheme, please refer the paper
A snippet:
query topic date_exp_run filter search_result_rank URL url_code title category is_prime price is_sponsored reviews rating date_of_publishing bestseller annotation
andrew wakefield andrew wakefield 5/2/2020 featured 2 http://www.amazon.com/Vaccine-Court-Americas-Compensation-Program/dp/1629144525/ref=sr_1_2?dchild=1&keywords=andrew+wakefield&qid=1588435229&sr=8-2 1629144525 The Vaccine Court: The Dark Truth of America's Vaccine Injury Compensation Program Books Y $24.49 N 28 ratings 5.0 out of 5 stars 11-Nov-14 N 1
- Unpersonalised recommendations: filename- unpersonalized_recommendations.csv (download). The file contains the product page recommendations of first three Amazon products present in the search results. The product page recommendations could be of five types namely, Frequently bought together, What other items customers buy after viewingthis item, Customers who viewed this item also viewed, Sponsored products related to this item and Customerswho bought this item also bought. We extracted the first product present in each recommendation type for analysis. The file contains the following fields:-
query:
name of the query searchedtopic:
name of the vaccine-related search topicdate_exp_run:
date on which the search was performedfilter:
name of the Amazon filter used to sort search resultssearch_result_rank:
rank of the search result in the Search Engine Results Page (SERP)URL:
URL of the Amazon producturl_code:
URL code of the Amazon product. This code is extracted from the product URLtitle:
title of the Amazon productannotation:
annotation value assigned to the Amazon product present in the search results.type_of_recommendation:
this field indicates the type of product page recommendationdestination_url:
URL of the Amazon product present in the recommendationdestination_url_code:
URL code of the Amazon product present in the recommendationrecommendation_annotation:
annotation value assigned to the recommended Amazon product
A snippet:
query topic date_exp_run_x filter search_result_rank URL url_code title source_annotation type_of_recommendation destination_url destination_url_code recommendation_annotation
andrew wakefield andrew wakefield 5/2/2020 featured 1 http://www.amazon.com/Callous-Disregard-Autism-Vaccines-Tragedy-ebook/dp/B004N62HRQ/ref=sr_1_1?dchild=1&keywords=andrew+wakefield&qid=1588435229&sr=8-1 B004N62HRQ Callous Disregard: Autism and Vaccines: The Truth Behind a Tragedy 1 customer_view_after_viewing http://www.amazon.com/Vaccine-Illusion-Tetyana-Obukhanych-ebook/dp/B007AW2CLG/ref=pd_sbs_351_1/138-6517699-9726254?_encoding=UTF8&pd_rd_i=B007AW2CLG&pd_rd_r=aa42a33a-515e-4a68-9e04-4e59632333be&pd_rd_w=BE30s&pd_rd_wg=MlD9i&pf_rd_p=d13bb895-21d3-4e96-94a7-553aaae51224&pf_rd_r=QXT0TP71K2BNMZSXVGB5&psc=1&refRID=QXT0TP71K2BNMZSXVGB5 B007AW2CLG 1
2. Personalized Audit
- Account details: filename- account_details.csv (download). The file contains the details of accounts set up in the Personalized audit. Each account builds up its history by performing real-world actions on products that were either all annotated as promoting misinformation, neutral or debunking. The file contains the following fields:-
code:
code assigned to the accountaction:
real-world action performed by the sock-puppet accountaccount_history_built_by_performising_action_on_product_type:
type of product on which the socket puppet account performs actions. The field can have one of the three values namely, promoting misinformation, neutral or debunking.search_filter1:
name of the first Amazon filter used to sort results by the accountsearch_filter2:
name of the second Amazon filter used to sort results by the account
A snippet:
code action account_history_built_by_performising_action_on_product_type search_filter1 search_filter2
p3 search+click+add_to_cart neutral featured average customer review
- Personalised search results: filename- personalization_search_results.csv (download). The file contains a collection of 2,68,800 search results collected over 7 days during our Personalized audit run. It consists of the following fields:-
topic:
name of the vaccine-related search topicquery_with_underscore:
name of the query searched separated by underscorequery:
name of the query searchedfolder:
code of the sock puppet account. The attributes of the suck-pupper accounts is present in account_details.csvfilter:
name of the Amazon filter used to sort search resultsdate:
date on which the data collection occurredrank:
rank of the search result in the Search Engine Results Page (SERP)url_code:
URL code of the Amazon product. This code is extracted from the product URLURL:
URL of the Amazon producttitle:
title of the Amazon product
A snippet:
topic query_with_underscore query folder filter date rank url_code url title
vaccination vaccination vaccination p22 search_results_priceLtoH 8/12/2020 3 B00NS42D28 http://www.amazon.com/Vaccine-Injuries-Documented-Reactions-Vaccines-ebook/dp/B00NS42D28/ref=sr_1_3?dchild=1&keywords=vaccination&qid=1597219049&sr=8-3 Vaccine Injuries: Documented Adverse Reactions to Vaccines
- Pre-purchase recommendations: filename- pre_purchase_recommendations_.csv (download). The file contains pre-purchase recommendations collected during Personalized audit run. Pre-purchase recommendations consist of product suggestions that are presented to users after they add product(s) to cart. Pre-page recommendations could be of the types Frequently bought together, Customers also bought these highly rated items, Related to items you'veviewed, etc. The file consists of the following fields:-
folder:
code of the sock puppet account. The attributes of the suck-pupper accounts is present in account_details.csvdate:
date on which the data collection occurredtype_of_recommendation:
this field indicates the type of pre-purchase recommendationrank:
rank of the amazon product in the recommendation list of type present in the field type_of_recommendationurl_code:
URL code of the Amazon product. This code is extracted from the product URLURL:
URL of the Amazon productannotation:
annotation value assigned to the Amazon product
A snippet:
folder date type_of_recommendation rank url_code url annotation
p3 8/12/2020 Customers who shopped 1 1441321659 http://www.amazon.com/gp/upsell-widgets/click-logger.html?widgetName=desktop-huc-carousels_huc-semantic-session-sims-scf&column=1&row=1&clickType=Title&url=%2Fdp%2F1441321659%3Fpsc%3D1%26pf_rd_p%3D995e9308-9761-4a71-9419-82fd033b88fd%26pf_rd_r%3DPSKTKDF89JM86WTTRXFX%26pd_rd_wg%3DNn0ES%26pd_rd_i%3D1441321659%26pd_rd_w%3D3ilS9%26pd_rd_r%3D39af52b4-1699-4cd5-924b-a0a89f52c26d%26ref_%3Dpd_luc_rh_crh_rh_sbs_sem_01_01_t_img_lh/ 0
- Homepage recommendations: filename- homepage_recommendations_.csv (download). The file contains homepage recommendations collected during Personalized audit run. These recommendations are present on the homepage of a user’s Amazonaccount. The homepage recommendations could be of three types Related to items you've viewed, Inspiredby your shopping trends and Recommended items other customers often buy again. The file consists of the following fields:-
folder:
code of the sock puppet account. The attributes of the suck-pupper accounts is present in account_details.csvdate:
date on which the data collection occurredtype_of_recommendation:
this field indicates the type of homepage recommendationrank:
rank of the amazon product in the recommendation list of type present in the field type_of_recommendationurl_code:
URL code of the Amazon product. This code is extracted from the product URLURL:
URL of the Amazon productannotation:
annotation value assigned to the Amazon product
A snippet:
folder date type_of_recommendation rank url_code url annotation
p14 8/12/2020 Related to items you've viewed 1 188121740X http://www.amazon.com/Millers-Review-Critical-Vaccine-Studies/dp/188121740X/ 1
- Product page recommendations: filename- product_page_recommendations_.csv (download). The file contains homepage recommendations collected during Personalized audit run. These are the recommendations present on the product page. They could be of five types namely, Frequently bought together, What other items customers buy after viewingthis item, Customers who viewed this item also viewed, Sponsored products related to this item and Customerswho bought this item also bought. The file consists of the following fields:-
folder:
code of the sock puppet account. The attributes of the suck-pupper accounts is present in account_details.csvdate:
date on which the data collection occurredtype_of_recommendation:
this field indicates the type of product page recommendationrank:
rank of the amazon product in the recommendation list of type present in the field type_of_recommendationurl_code:
URL code of the Amazon product. This code is extracted from the product URLURL:
URL of the Amazon productannotation:
annotation value assigned to the Amazon product
A snippet:
folder date type_of_recommendation rank url_code url annotation
p29 8/12/2020 1 803657668 frequently bought together http://www.amazon.com/New-Leadership-Challenge-Creating-Nursing/dp/0803657668/ref=pd_bxgy_img_2/131-1077905-1219437?_encoding=UTF8&pd_rd_i=0803657668&pd_rd_r=28c31f7a-ded6-4eea-8751-0ad00c46aabc&pd_rd_w=ELi0X&pd_rd_wg=UzbeC&pf_rd_p=ce6c479b-ef53-49a6-845b-bbbf35c28dd3&pf_rd_r=N9NDDMNCKY5Y8TB43V89&psc=1&refRID=N9NDDMNCKY5Y8TB43V89 0
3. Annotations: filename- all_unique_products.csv (download). The file consists of a dataset of 4,997 unique Amazon products collected and annotated for health misinformation during our first and second audit data collection. It contains the following fields:-
url_code:
URL code of the Amazon producturl:
URL of the Amazon productannotation:
annotation value assigned to the Amazon product
A snippet:
url_code url annotation
B004ULLOIC http://www.amazon.com/dp/B004ULLOIC 1