-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opengraph.Fetch returns nothing for a few domains #33
Comments
Thank you, @Pancham97
|
Hey @otiai10, sorry, forgot to add them. Here are a couple that I didn't seem to get working:
I am not sure. Maybe they don't want unnecessary website scraping or something. Plus, a few websites serve content via JavaScript, and that could be an issue too? 🤷 |
This works: package main
import (
"compress/gzip"
"encoding/json"
"log"
"net/http"
"os"
"github.com/otiai10/opengraph"
)
func main() {
target := "https://www.fastcompany.com/90945102/ai-chatbots-health-medicine-chatgpt-webmd-self-diagnosis-misinformation"
// 1) Necessary headers
headers := map[string]string{
"Accept": "text/html",
"Accept-Encoding": "gzip",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36",
}
req, _ := http.NewRequest("GET", target, nil)
for k, v := range headers {
req.Header.Set(k, v)
}
// 2) Necessary cookies (set by geo.capthca-delivery.com)
req.AddCookie(&http.Cookie{
Name: "datadome", // See your browser's cookie with this name
Value: "2ZnfSBOvZs1C2ZURicdpZAkZ-86xXY_RyRG-D6E8CjiNpgopXq7byBj5KmkCtLmcjRGjeGpzkBmP0JvFmKwUxazBMrGTkpY8-K9mJdGxD8WYobZ5QmI76Uqdhgf6Wvdi",
})
res, err := http.DefaultClient.Do(req)
if err != nil {
log.Println(1001, err)
return
}
defer res.Body.Close()
if res.StatusCode != 200 {
log.Println(1005, "Status code is not 200")
log.Println("Status:", res.StatusCode)
log.Println("Content-Type:", res.Header.Get("Content-Type"))
log.Println("Content-Encoding:", res.Header.Get("Content-Encoding"))
return
}
reader, err := gzip.NewReader(res.Body)
if err != nil {
log.Println(1002, err)
}
defer reader.Close()
// Use "Parse" for the io.Reader
ogp := opengraph.New(target)
if err := ogp.Parse(reader); err != nil {
log.Println(1004, err)
return
}
// Then let's check it out!
enc := json.NewEncoder(os.Stdout)
enc.SetIndent("", " ")
enc.Encode(ogp)
} |
There might be various reasons that this package (not holistic)
Then, in your case with |
Hey @otiai10, thanks, but for some reason, I can't seem to get it working. I am a bit new to Go so might be missing something obvious, but I am getting the error 1005. I have replaced the value of the 2023/08/29 23:24:11 1005 Status code is not 200
2023/08/29 23:24:11 Status: 403
2023/08/29 23:24:11 Content-Type: text/html;charset=utf-8
2023/08/29 23:24:11 Content-Encoding: |
|
I have been using this package to fetch opengraph info about websites and articles, but for a few websites, e.g. FastCompany, the
Fetch()
method returns nothing. After some research, I found that few websites block bots from scraping their content. However, when I try Raycast preview, or even macOS preview, it successfully fetches the metadata with the image and title. How can I achieve that? Here's how my code looks:The text was updated successfully, but these errors were encountered: