Skip to content

Commit

Permalink
docs(wip): add robots.txt information
Browse files Browse the repository at this point in the history
  • Loading branch information
Jonathan-Zollinger committed Feb 29, 2024
1 parent 8be8a19 commit ecd1030
Show file tree
Hide file tree
Showing 9 changed files with 179,054 additions and 0 deletions.
131 changes: 131 additions & 0 deletions src/test/resources/sitemap/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Sitemap
Some sites are really transparent with their data, documents called `sitemaps` can give some information we want without scraping for it.

## Robots.txt
the root domain will sometimes include `robot.txt`, a file instructing search [engines what crawlers are permitted to access](https://developers.google.com/search/docs/crawling-indexing/robots/intro).

<details><summary>Example</summary>

```sh
Sitemap: https://www.target.com/sitemap_keywords-index.xml.gz
Sitemap: https://www.target.com/sitemap_stores-index.xml.gz
Sitemap: https://www.target.com/sitemap_taxonomy-categories-index.xml.gz
Sitemap: https://www.target.com/sitemap_pdp-index.xml.gz
Sitemap: https://www.target.com/sitemap_taxonomy-brand-index.xml.gz
Sitemap: https://www.target.com/sitemap_facet-categories-index.xml.gz

User-agent: *
Disallow: /*/Ntk
Disallow: /*/Ntt
Disallow: /*/Ntx
Disallow: /*%7Cd_
Disallow: /*/schoollist/
Disallow: /*BTWN
Disallow: /[path]/
Disallow: /7078046/
Disallow: /7079046/
Disallow: /AddToList
Disallow: /AddToRegistry
Disallow: /admin
Disallow: /advancedGiftRegistrySearchView
Disallow: /AjaxSearchNavigationView
Disallow: /Allons_voter
Disallow: /bp/c/
Disallow: /bp/guest_mfg_brand
Disallow: /bp/p/
Disallow: /CallToActionModalView
Disallow: /cgi-bin
Disallow: /cgi-local
Disallow: /Checkout
Disallow: /CheckoutEditItemsDisplayView
Disallow: /CheckoutOrderBillingView
Disallow: /CheckoutOrderShippingView
Disallow: /CheckoutSignInView
Disallow: /co-
Disallow: /common
Disallow: /coupons.
Disallow: /custom-reviews/
Disallow: /data
Disallow: /database/philboard.mdb
Disallow: /dir_on_server/
Disallow: /EmailCartView
Disallow: /EnlargedImageView
Disallow: /ESPDisplayOptionsViewCmd
Disallow: /ESPModal
Disallow: /ExitCheckoutCmd
Disallow: /FeaturedShowMoreOverlay
Disallow: /FetchProdRefreshContent
Disallow: /fiats
Disallow: /FiatsCmd
Disallow: /file
Disallow: /FreeGiftDisplayView
Disallow: /gam-
Disallow: /GenericRegistryPortalView
Disallow: /gc?k
Disallow: /GiftRegistrySearchViewCmd
Disallow: /gp/
Disallow: /GuestAsAnonymous
Disallow: /guestEmailNotificationView
Disallow: /HelpContent
Disallow: /igp
Disallow: /index.jhtml
Disallow: /keyword=
Disallow: /legal-contact-us/
Disallow: /list.id=1
Disallow: /LogonForm
Disallow: /m/
Disallow: /ManageOrder
Disallow: /ManageReturns
Disallow: /MediaDisplayView
Disallow: /mm/
Disallow: /moreinfo.cfm
Disallow: /news
Disallow: /np/
Disallow: /OpenZoomLayer
Disallow: /OrderItemDisplay
Disallow: /OtherDisplayView
Disallow: /p/premium-registry
Disallow: /PhotoUpload
Disallow: /pl/
Disallow: /ProductComparisonCmd
Disallow: /ProductDetailsTabView
Disallow: /PromotionDetailsDisplayView
Disallow: /PromotionDisplayView
Disallow: /qi/
Disallow: /QuickInfoView
Disallow: /ready_sit_read/index.jhtml
Disallow: /RegistryPortalCmd
Disallow: /ReportAbuse
Disallow: /reviewVote
Disallow: /script
Disallow: /SearchNavigationView
Disallow: /shop/
Disallow: /SingleShipmentOrderSummaryView
Disallow: /SOImapPriceDisplayView
Disallow: /SpecificationDefinitionView
Disallow: /splitOrderItem
Disallow: /store-locator/search-results-print
Disallow: /supertarget/index.jhtml
Disallow: /target_baby/
Disallow: /target_group
Disallow: /targetdirect_group/
Disallow: /TargetListPortalView
Disallow: /TargetStoreLocatorCmd
Disallow: /tdir/p/kids-back-to-school/
Disallow: /tsa/
Disallow: /VariationSelectionView
Disallow: /webapp
Disallow: /winnt/
Disallow: /WriteComments
Disallow: /WriteReviews
Disallow: /XCSA/
Disallow: /yr
Disallow: /s?
Disallow: /cart
Disallow: /account/
Disallow: /tracking
Disallow: /config
```

</details>

Loading

0 comments on commit ecd1030

Please sign in to comment.