Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add thumbnail support #76

Merged
merged 7 commits into from
Jun 17, 2024
Merged

Add thumbnail support #76

merged 7 commits into from
Jun 17, 2024

Conversation

isZumpo
Copy link
Contributor

@isZumpo isZumpo commented Mar 23, 2024

"A picture is worth a thousand words"

I though it could be nice to have some images of the detected birds since I am really bad at bird names... I saw that the BirdWeather GraphQL API offers endpoints to receive thumbnail for a given bird. In this pull request, I have started on a very crude implementation for fetching these thumbnails and adding them in some of the html views.

Daily summary view:
image

Recent detections view:
image

Of course the images could be placed better inside the html view, I just did not spend too much time on it.

In addition to only fetching the urls, a in memory caching mechanism is utilized. The cache saves the result for each bird after each request, such that the next request for that bird will be received from the cache instead. In addition, during startup, the cache will be initialized with every detected bird in the database.

I am a bit unsure of where to place these functions. Maybe it would be better to put these inside the birdweather internal/birdweather folder instead? Please give me some advice, and I will do my best to update the pull request accordingly.

@tphakala
Copy link
Owner

I like idea of having images for each species, that has been on my mind to implement, but I don't want it to be dependent on any online service. Using external resources is often against Terms of Service, and with BirdWeather there is also availability issues.

So I am all in to have thumbnails for each species but they must be locally served, and for that we need to find a royalty free source for photos or have them done using generative AI like Midjourney or Dall-E.

@isZumpo
Copy link
Contributor Author

isZumpo commented Mar 23, 2024

Yes those are some really good points! I am also quite curious about them. Especially the Terms of Service aspect... I suppose using the birdweather API to post things is different from utilizing their images. I can see that birdnet-pi is grabbing their images from flickr so that might be a more viable solution?

However, does everything really need to be served locally? That mostly increases the size of the binary, as well as the bandwidth requirements. Sure you get better availability since there is less external dependencies, and have an easier time in air-gapped environments, but is it always necessary? Perhaps some sort of hybrid approach is also a viable option?

@rytilahti
Copy link

Regarding ToS issues, perhaps wikidata could be a potential source?

@MueJosh
Copy link

MueJosh commented Apr 3, 2024

This!

However, does everything really need to be served locally? That mostly increases the size of the binary, as well as the bandwidth requirements. Sure you get better availability since there is less external dependencies, and have an easier time in air-gapped environments, but is it always necessary? Perhaps some sort of hybrid approach is also a viable option?

To be honest, I really enjoy the "simple" view right now.
This birdnet-Program is really lightweight and draws much less energy then for example the birdnet-pi. Which makes it suitable for battery powered applications (I know there are better solutions, like the AudioMoth or other diy ARU)

Me personally would prefer pictures from an online source, so it's possible to turn this feature off and to save storage. I know how most birds in my area look like- (I get it why pictures are great tho!) -

If needed I can provide some Bird-Pictures (photography brought me to birdnet ;) )
As long as it's not commercially used. (And this feature can be turned off before it downloads XX-hundreds of pictures)

Obviously I don't have ALL birds on my camera
(If you're interested let me know, even when you don't want to implement it right now)

@matthew73210
Copy link

Emailed tim about terms of service etc.. here's what he has to say "We're using images from Wiki Commons - which we credit as provided in their API."

@isZumpo
Copy link
Contributor Author

isZumpo commented Apr 29, 2024

Emailed tim about terms of service etc.. here's what he has to say "We're using images from Wiki Commons - which we credit as provided in their API."

That is nice! In theory we could then create our own set of pictures from the same source and either bundle in the git repository itself or host somewhere. Though, I would prefer to use birdweather directly since their api providers some really cool features. Not sure how we should move forward on this topic...

@matthew73210
Copy link

Emailed tim about terms of service etc.. here's what he has to say "We're using images from Wiki Commons - which we credit as provided in their API."

That is nice! In theory we could then create our own set of pictures from the same source and either bundle in the git repository itself or host somewhere. Though, I would prefer to use birdweather directly since their api providers some really cool features. Not sure how we should move forward on this topic...

I created an issue as a feature tracker for stuff from birdweather. Probs discuss stuff there and see what other people think too.

@isZumpo
Copy link
Contributor Author

isZumpo commented May 6, 2024

Emailed tim about terms of service etc.. here's what he has to say "We're using images from Wiki Commons - which we credit as provided in their API."

That is nice! In theory we could then create our own set of pictures from the same source and either bundle in the git repository itself or host somewhere. Though, I would prefer to use birdweather directly since their api providers some really cool features. Not sure how we should move forward on this topic...

I created an issue as a feature tracker for stuff from birdweather. Probs discuss stuff there and see what other people think too.

Awesome :) Do you mind linking to the issue?

@matthew73210
Copy link

Emailed tim about terms of service etc.. here's what he has to say "We're using images from Wiki Commons - which we credit as provided in their API."

That is nice! In theory we could then create our own set of pictures from the same source and either bundle in the git repository itself or host somewhere. Though, I would prefer to use birdweather directly since their api providers some really cool features. Not sure how we should move forward on this topic...

I created an issue as a feature tracker for stuff from birdweather. Probs discuss stuff there and see what other people think too.

Awesome :) Do you mind linking to the issue?

#130

isZumpo added 2 commits May 16, 2024 22:01
Makes use of the birdweather graphql api to fetch
a given bird's thumbnail.
Pre-fetching the thumbnails during server boot makes
it very responsive when the user accesses the webpage
for the first time. Since all the birds' thumbnails
have already been fetched.
Copy link
Contributor

coderabbitai bot commented May 16, 2024

Walkthrough

The recent updates encompass significant enhancements to the system's image handling and caching functionalities. Key additions include methods to fetch and cache bird thumbnails, improved database retrieval functions, and new image API integrations. These updates enhance the user interface by displaying bird images and attribution details in tables, leveraging asynchronous cache initialization for efficient image handling.

Changes

Files/Groups Summary
internal/datastore/interfaces.go, internal/datastore/... Added new methods for retrieving detected species and expanding bird data retrieval.
internal/httpcontroller/routes.go, internal/httpcontroller/utils.go Introduced image caching, Wikimedia querying, and thumbnail display functions, along with route and utility updates for these features.
views/fragments/birdsTableHTML.html, views/fragments/recentDetections.html Added thumbnail images and attribution details to bird species tables.
go.mod Updated dependencies to support new features.
internal/httpcontroller/imageprovider/... Implemented image fetching, caching mechanisms, and Wikipedia interactions for bird images.
internal/httpcontroller/init.go Introduced bird image cache initialization in the server setup.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Server
    participant Datastore
    participant WikimediaAPI

    User->>Server: Request bird species data
    Server->>Datastore: Query for detected species
    Datastore-->>Server: Return detected species list
    Server->>WikimediaAPI: Request bird thumbnail for species
    WikimediaAPI-->>Server: Return bird thumbnail and attribution
    Server->>User: Display bird species with thumbnails
Loading

Poem

Whiskers twitch in coder's delight,
Bird thumbnails now take their flight,
Cached and loaded, swift as breeze,
With attribution notes, sure to please.
Datastore hums, steady and true,
Images fetched, a splendid view,
In rabbit's realm, code dreams anew.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 8fe2171 and fdd07d1.
Files ignored due to path filters (2)
  • go.mod is excluded by !**/*.mod
  • go.sum is excluded by !**/*.sum
Files selected for processing (5)
  • internal/datastore/interfaces.go (3 hunks)
  • internal/httpcontroller/routes.go (4 hunks)
  • internal/httpcontroller/utils.go (3 hunks)
  • views/fragments/birdsTableHTML.html (1 hunks)
  • views/fragments/recentDetections.html (2 hunks)
Files skipped from review due to trivial changes (2)
  • views/fragments/birdsTableHTML.html
  • views/fragments/recentDetections.html
Additional Context Used
GitHub Check Runs (1)
lint failure (2)

internal/httpcontroller/routes.go: [failure] 47-47:
Error return value is not checked (errcheck)

Additional comments not posted (4)
internal/httpcontroller/routes.go (1)

64-64: LGTM! The thumbnail function is correctly added to the function map.

internal/httpcontroller/utils.go (2)

250-268: LGTM! The queryGraphQL function is well-implemented and handles errors appropriately.


271-299: LGTM! The thumbnail function handles caching and concurrency appropriately.

internal/datastore/interfaces.go (1)

315-325: LGTM! The GetAllDetectedSpecies function retrieves all detected species correctly.

Comment on lines 45 to 47
go func(note string) {
defer wg.Done()
thumbnail(note)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the error returned by the thumbnail function.

-  thumbnail(note)
+  if _, err := thumbnail(note); err != nil {
+    log.Printf("Error fetching thumbnail for %s: %v", note, err)
+  }

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
go func(note string) {
defer wg.Done()
thumbnail(note)
go func(note string) {
defer wg.Done()
if _, err := thumbnail(note); err != nil {
log.Printf("Error fetching thumbnail for %s: %v", note, err)
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between fdd07d1 and 295913f.
Files selected for processing (1)
  • internal/httpcontroller/routes.go (4 hunks)
Files skipped from review as they are similar to previous changes (1)
  • internal/httpcontroller/routes.go

@PovilasID
Copy link

PovilasID commented May 17, 2024

Regarding of internet dependency I think it would be OK to have as an option.
whoBird has a toggle that pulls down images from Macaulay Library
https://www.birds.cornell.edu/home/ebird-api-terms-of-use/
Looks to permit use but way I am reading it could not be packaged inside of the container, could have same setup as birdweather.

@tphakala
Copy link
Owner

I am sorry but I am not going to accept this PR, I am not okay with consuming any resources from BirdWeather.com without prior agreement with Tim & Co., and even then I don't see any point of accessing resources from them if original source is publicly available at somewhere else.

@isZumpo
Copy link
Contributor Author

isZumpo commented May 17, 2024

I am sorry but I am not going to accept this PR, I am not okay with consuming any resources from BirdWeather.com without prior agreement with Tim & Co., and even then I don't see any point of accessing resources from them if original source is publicly available at somewhere else.

I am completely on board with this thought too.

Was rebasing the thumbnail branch yesterday and planning on soon replacing the birdweather api calls for wikimedia ones. Though it might make more sense to close this PR and make a new for that?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 295913f and dbf7717.
Files ignored due to path filters (1)
  • go.mod is excluded by !**/*.mod
Files selected for processing (1)
  • internal/httpcontroller/utils.go (2 hunks)
Additional comments not posted (3)
internal/httpcontroller/utils.go (3)

13-13: Ensure proper usage of the sync package.

The import of the sync package is necessary for the new thumbnail caching mechanism. Good to see it added here.


16-16: Verify the new dependency on cgt.name/pkg/go-mwclient.

Ensure that this package is properly integrated and maintained, as it is crucial for the new functionality.

Verification successful

The new dependency on cgt.name/pkg/go-mwclient is properly listed in the go.mod file and is actively used in the codebase.

  • The dependency is listed in go.mod as an indirect dependency.
  • It is imported and utilized in internal/httpcontroller/utils.go.
Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Verify the dependency on `cgt.name/pkg/go-mwclient` is properly managed.

# Test: Check if the dependency is listed in the project's dependency management file.
cat go.mod | grep 'cgt.name/pkg/go-mwclient'

Length of output: 89



Script:

#!/bin/bash
# Description: Verify the usage of `cgt.name/pkg/go-mwclient` in the codebase.

# Test: Search for the usage of `cgt.name/pkg/go-mwclient` in the codebase.
rg 'cgt.name/pkg/go-mwclient'

Length of output: 142



Script:

#!/bin/bash
# Description: Verify the usage of `mwclient` in `internal/httpcontroller/utils.go`.

# Test: Search for the usage of `mwclient` in `internal/httpcontroller/utils.go`.
rg 'mwclient' internal/httpcontroller/utils.go

Length of output: 147


244-245: Use of sync.Map for caching.

The use of sync.Map for thumbnailMap and thumbnailMutexMap is appropriate given the concurrent access patterns expected with the thumbnail caching feature.

Comment on lines 248 to 283
func queryWikiMedia(scientificName string) (string, error) {
w, err := mwclient.New("https://wikipedia.org/w/api.php", "Birdnet-Go")
if err != nil {
return "", err
}

// Specify parameters to send.
parameters := map[string]string{
"action": "query",
"prop": "pageimages",
"piprop": "thumbnail",
"pilicense": "free",
"titles": scientificName,
"pithumbsize": "400",
"redirects": "",
}

// Make the request.
resp, err := w.Get(parameters)
if err != nil {
return "", err
}

// Print the *jason.Object
pages, err := resp.GetObjectArray("query", "pages")
if err != nil {
return "", err
}

thumbnail, err := pages[0].GetString("thumbnail", "source")
if err != nil {
return "", err
}

return string(thumbnail), nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor queryWikiMedia function for error handling and readability.

func queryWikiMedia(scientificName string) (string, error) {
	w, err := mwclient.New("https://wikipedia.org/w/api.php", "Birdnet-Go")
	if err != nil {
		return "", fmt.Errorf("failed to create mwclient: %w", err)
	}

	parameters := map[string]string{
		"action":      "query",
		"prop":        "pageimages",
		"piprop":      "thumbnail",
		"pilicense":   "free",
		"titles":      scientificName,
		"pithumbsize": "400",
		"redirects":   "",
	}

	resp, err := w.Get(parameters)
	if err != nil {
		return "", fmt.Errorf("failed to get response: %w", err)
	}

	pages, err := resp.GetObjectArray("query", "pages")
	if err != nil {
		return "", fmt.Errorf("failed to get pages object array: %w", err)
	}

	thumbnail, err := pages[0].GetString("thumbnail", "source")
	if err != nil {
		return "", fmt.Errorf("failed to get thumbnail source: %w", err)
	}

	return string(thumbnail), nil
}

Refactor the function to improve error messages and overall readability.


Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
func queryWikiMedia(scientificName string) (string, error) {
w, err := mwclient.New("https://wikipedia.org/w/api.php", "Birdnet-Go")
if err != nil {
return "", err
}
// Specify parameters to send.
parameters := map[string]string{
"action": "query",
"prop": "pageimages",
"piprop": "thumbnail",
"pilicense": "free",
"titles": scientificName,
"pithumbsize": "400",
"redirects": "",
}
// Make the request.
resp, err := w.Get(parameters)
if err != nil {
return "", err
}
// Print the *jason.Object
pages, err := resp.GetObjectArray("query", "pages")
if err != nil {
return "", err
}
thumbnail, err := pages[0].GetString("thumbnail", "source")
if err != nil {
return "", err
}
return string(thumbnail), nil
}
func queryWikiMedia(scientificName string) (string, error) {
w, err := mwclient.New("https://wikipedia.org/w/api.php", "Birdnet-Go")
if err != nil {
return "", fmt.Errorf("failed to create mwclient: %w", err)
}
parameters := map[string]string{
"action": "query",
"prop": "pageimages",
"piprop": "thumbnail",
"pilicense": "free",
"titles": scientificName,
"pithumbsize": "400",
"redirects": "",
}
resp, err := w.Get(parameters)
if err != nil {
return "", fmt.Errorf("failed to get response: %w", err)
}
pages, err := resp.GetObjectArray("query", "pages")
if err != nil {
return "", fmt.Errorf("failed to get pages object array: %w", err)
}
thumbnail, err := pages[0].GetString("thumbnail", "source")
if err != nil {
return "", fmt.Errorf("failed to get thumbnail source: %w", err)
}
return string(thumbnail), nil
}

Comment on lines 286 to 314
func thumbnail(scientificName string) (string, error) {
// Check if thumbnail is already cached
if thumbnail, ok := thumbnailMap.Load(scientificName); ok {
log.Printf("Bird: %s, Thumbnail (cached): %s\n", scientificName, thumbnail)
return thumbnail.(string), nil
}

// Use a per-item mutex to ensure only one GraphQL query is performed per item
mu, _ := thumbnailMutexMap.LoadOrStore(scientificName, &sync.Mutex{})
mutex := mu.(*sync.Mutex)

mutex.Lock()
defer mutex.Unlock()

// Check again if thumbnail is cached after acquiring the lock
if thumbnail, ok := thumbnailMap.Load(scientificName); ok {
log.Printf("Bird: %s, Thumbnail (cached): %s\n", scientificName, thumbnail)
return thumbnail.(string), nil
}

thumbn, err := queryWikiMedia(scientificName)
if err != nil {
log.Printf("error querying wikimedia endpoint: %v", err)
}

thumbnailMap.Store(scientificName, thumbn)
log.Printf("Bird: %s, Thumbnail (fetched): %s\n", scientificName, thumbn)
return thumbn, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize thumbnail caching logic.

func thumbnail(scientificName string) (string, error) {
	if thumbnail, ok := thumbnailMap.Load(scientificName); ok {
		log.Printf("Bird: %s, Thumbnail (cached): %s\n", scientificName, thumbnail)
		return thumbnail.(string), nil
	}

	mu, _ := thumbnailMutexMap.LoadOrStore(scientificName, &sync.Mutex{})
	mutex := mu.(*sync.Mutex)

	mutex.Lock()
	defer mutex.Unlock()

	if thumbnail, ok := thumbnailMap.Load(scientificName); ok {
		log.Printf("Bird: %s, Thumbnail (cached): %s\n", scientificName, thumbnail)
		return thumbnail.(string), nil
	}

	thumbn, err := queryWikiMedia(scientificName)
	if err != nil {
		log.Printf("error querying wikimedia endpoint: %v", err)
		return "", err
	}

	thumbnailMap.Store(scientificName, thumbn)
	log.Printf("Bird: %s, Thumbnail (fetched): %s\n", scientificName, thumbn)
	return thumbn, nil
}

Consider optimizing the thumbnail function to reduce redundancy and improve efficiency. The double-check locking pattern is correctly implemented, but ensure that the error from queryWikiMedia is properly handled.


Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
func thumbnail(scientificName string) (string, error) {
// Check if thumbnail is already cached
if thumbnail, ok := thumbnailMap.Load(scientificName); ok {
log.Printf("Bird: %s, Thumbnail (cached): %s\n", scientificName, thumbnail)
return thumbnail.(string), nil
}
// Use a per-item mutex to ensure only one GraphQL query is performed per item
mu, _ := thumbnailMutexMap.LoadOrStore(scientificName, &sync.Mutex{})
mutex := mu.(*sync.Mutex)
mutex.Lock()
defer mutex.Unlock()
// Check again if thumbnail is cached after acquiring the lock
if thumbnail, ok := thumbnailMap.Load(scientificName); ok {
log.Printf("Bird: %s, Thumbnail (cached): %s\n", scientificName, thumbnail)
return thumbnail.(string), nil
}
thumbn, err := queryWikiMedia(scientificName)
if err != nil {
log.Printf("error querying wikimedia endpoint: %v", err)
}
thumbnailMap.Store(scientificName, thumbn)
log.Printf("Bird: %s, Thumbnail (fetched): %s\n", scientificName, thumbn)
return thumbn, nil
}
func thumbnail(scientificName string) (string, error) {
if thumbnail, ok := thumbnailMap.Load(scientificName); ok {
log.Printf("Bird: %s, Thumbnail (cached): %s\n", scientificName, thumbnail)
return thumbnail.(string), nil
}
mu, _ := thumbnailMutexMap.LoadOrStore(scientificName, &sync.Mutex{})
mutex := mu.(*sync.Mutex)
mutex.Lock()
defer mutex.Unlock()
if thumbnail, ok := thumbnailMap.Load(scientificName); ok {
log.Printf("Bird: %s, Thumbnail (cached): %s\n", scientificName, thumbnail)
return thumbnail.(string), nil
}
thumbn, err := queryWikiMedia(scientificName)
if err != nil {
log.Printf("error querying wikimedia endpoint: %v", err)
return "", err
}
thumbnailMap.Store(scientificName, thumbn)
log.Printf("Bird: %s, Thumbnail (fetched): %s\n", scientificName, thumbn)
return thumbn, nil
}

@isZumpo
Copy link
Contributor Author

isZumpo commented May 17, 2024

Ok finally got around to creating some sort of PoC of the wikimedia api. Thanks to @PovilasID for making me aware of the query feature in the wikipedia api in the discussion thread yesterday. Was quite lost on how to get the correct image for a given bird in some automatic way otherwise :)

Image proof that it still works as expected:
image

However, I am still not quite sure how we are actually supposed to give the correct credit for the images. I can see that the birdweather graphql api returns these fields along with the image:

imageCredit - Credited author of image
imageLicense - Name of image license
imageLicenseUrl - URL to image license page

I have never used birdweather directly myself, so I don't know how these fields are displayed inside the application itself? Do you have any idea @matthew73210

@PovilasID
Copy link

@isZumpo np. I do not think we can really get good way to get author info I think for now you can probably just split the string of the image url on the dash after 300px and replace everything before it with https://en.m.wikipedia.org/wiki/File: aka https://en.m.wikipedia.org/wiki/File:Blackcap_(Sylvia_atricapilla)_male.jpg and stick in hover over or meta data or in full page with Source Wikipedia

macaulaylibrary.org has en embed view https://macaulaylibrary.org/asset/211672961/embed to get pics from them you need to know image ID... whobird just made 3 60k arrays... to be as lookup table lol. So Sylvia atricapilla_Eurasian Blackcap
in https://github.com/woheller69/whoBIRD/blob/master/app/src/main/assets/labels_en.txt corresponds to blackc1 in https://github.com/woheller69/whoBIRD/blob/master/app/src/main/assets/taxo_code.txt and 211672961 in https://github.com/woheller69/whoBIRD/blob/master/app/src/main/assets/assets.txt this permits them build a quick image URL: https://cdn.download.ams.birds.cornell.edu/api/v1/asset/211672961/900 This is a very fragile way of doing things but there is probe a way to update it and this way is very fragile but I have to admit I do prefer their pictures.

@matthew73210
Copy link

Ok finally got around to creating some sort of PoC of the wikimedia api. Thanks to @PovilasID for making me aware of the query feature in the wikipedia api in the discussion thread yesterday. Was quite lost on how to get the correct image for a given bird in some automatic way otherwise :)

Image proof that it still works as expected: image

However, I am still not quite sure how we are actually supposed to give the correct credit for the images. I can see that the birdweather graphql api returns these fields along with the image:

imageCredit - Credited author of image
imageLicense - Name of image license
imageLicenseUrl - URL to image license page

I have never used birdweather directly myself, so I don't know how these fields are displayed inside the application itself? Do you have any idea @matthew73210

Re credit, probs the best way is to have it mentioned on the github page and under options (if that's where you put the toggle). If memory serves they can be part of the metadata of the image itself. Not sure if this helps.

@tomlut
Copy link

tomlut commented Jun 11, 2024

I have never used birdweather directly myself, so I don't know how these fields are displayed inside the application itself?

On the birdweather website clicking on the 🛈 icon next to the bird thumbnail takes you to a page with information about the species, a larger image and image credit link at the bottom of the page. e.g. https://app.birdweather.com/species/golden-whistler

In the birdnet iOS app clicking on the thumbnail takes you to the Wikipedia article.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between dbf7717 and 4679418.

Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
Files selected for processing (2)
  • go.mod (2 hunks)
  • views/fragments/recentDetections.html (2 hunks)
Additional context used
golangci-lint
Additional comments not posted (2)
go.mod (2)

28-28: The addition of github.com/antonholmquist/jason as an indirect dependency is appropriate for JSON parsing in API interactions.


27-27: Ensure the license of cgt.name/pkg/go-mwclient is compatible with your project's license.

Verification successful

The cgt.name/pkg/go-mwclient package is licensed under the Creative Commons CC0 1.0 Universal license, which is a public domain dedication. This license is generally compatible with most other licenses, including open-source licenses.

  • Ensure that the CC0 1.0 Universal license is compatible with your project's license.
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Description: Check the license of the newly added dependency.

# Test: Fetch the license file from the repository.
curl -s https://raw.githubusercontent.com/cgt/go-mwclient/master/LICENSE

Length of output: 89



Script:

#!/bin/bash
# Description: Search for any license-related files in the repository.

# Test: List all files in the repository and filter for license-related files.
curl -s https://api.github.com/repos/cgt/go-mwclient/contents | jq -r '.[] | select(.type == "file") | .name' | grep -i 'license\|copying\|readme'

Length of output: 166



Script:

#!/bin/bash
# Description: Fetch the contents of the COPYING file to determine the license type.

# Test: Fetch the COPYING file from the repository.
curl -s https://raw.githubusercontent.com/cgt/go-mwclient/master/COPYING

Length of output: 7122

@@ -24,6 +24,8 @@
hx-trigger="click"
hx-push-url="true">
{{ .CommonName}}</a>
<!-- Bird thumbnail -->
<img loading="lazy" width="40" src="{{thumbnail .ScientificName}}" alt="Bird Image" class="h-auto rounded-md">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a title attribute to the <img> tags for better accessibility and to provide additional context on hover.

- <img loading="lazy" width="40" src="{{thumbnail .ScientificName}}" alt="Bird Image" class="h-auto rounded-md">
+ <img loading="lazy" width="40" src="{{thumbnail .ScientificName}}" alt="Bird Image" title="{{.CommonName}}" class="h-auto rounded-md">

Also applies to: 71-71

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<img loading="lazy" width="40" src="{{thumbnail .ScientificName}}" alt="Bird Image" class="h-auto rounded-md">
<img loading="lazy" width="40" src="{{thumbnail .ScientificName}}" alt="Bird Image" title="{{.CommonName}}" class="h-auto rounded-md">

@isZumpo isZumpo marked this pull request as draft June 11, 2024 10:26
@isZumpo
Copy link
Contributor Author

isZumpo commented Jun 11, 2024

I have never used birdweather directly myself, so I don't know how these fields are displayed inside the application itself?

On the birdweather website clicking on the 🛈 icon next to the bird thumbnail takes you to a page with information about the species, a larger image and image credit link at the bottom of the page. e.g. https://app.birdweather.com/species/golden-whistler

In the birdnet iOS app clicking on the thumbnail takes you to the Wikipedia article.

Ok that looks great! Did some work today and managed to parse out the owner of the image and license itself:
image

will have to do some more code improvement and management of edge cases before I update the pr :)

@tphakala
Copy link
Owner

looks good

@isZumpo
Copy link
Contributor Author

isZumpo commented Jun 17, 2024

Finally managed to get something kind of ready. Let me tell you, getting the author information out of Wikipedia is very tricky since everyone is using different formats for the author block. Some just have their names, others have URLs, and then there are even some crazy people with tables and lots of links... I believe what I have now covers most of the different scenarios quite well!

Recent detections view
image

Daily summary view
image

I am a bit unsure about the daily summary view since the credits do not really fit into the table itself. Any suggestions on how we could format it in a nicer way there?

Anyways for those of you who want to try it out, feel free to either build this PR yourself. Or grab the image itself from my fork:

docker run ghcr.io/iszumpo/birdnet-go:f68715d97dddf310e05c525d4eba281f52f06dec ...

@tphakala
Copy link
Owner

Is this ready for merge? We can tweak things further later

@isZumpo
Copy link
Contributor Author

isZumpo commented Jun 17, 2024

Is this ready for merge? We can tweak things further later

Yeah, you are right let's not try to get everything perfect on the first try. There are observed issues for certain birds where Wikipedia does not return any "machine-readable" data. I believe that should be fixable without too much trouble in a future PR. Furthermore, it does not cause much more trouble than a missing image.

Since there are mentions of birdweather etc in the previous commits of this PR. Perhaps it is best to merge it by squashing?

@isZumpo isZumpo marked this pull request as ready for review June 17, 2024 17:34
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Outside diff range and nitpick comments (3)
views/fragments/birdsTableHTML.html (1)

Line range hint 8-36: The integration of the thumbnail column in the HTML table is well done, with appropriate attributes for lazy loading and responsive design. However, there's a TODO comment about adding credits back in. It's important to address this and ensure that proper attribution is displayed for the images.

internal/httpcontroller/init.go (2)

8-8: Consider using a more specific import alias for sync if it's only used in limited contexts, to improve readability.


36-36: The addition of BirdImageCache to the Server struct is a significant architectural change. Ensure that its lifecycle is properly managed across server operations.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 4679418 and f68715d.

Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
Files selected for processing (9)
  • go.mod (2 hunks)
  • internal/httpcontroller/imageprovider/imageprovider.go (1 hunks)
  • internal/httpcontroller/imageprovider/imageprovider_test.go (1 hunks)
  • internal/httpcontroller/imageprovider/wikipedia.go (1 hunks)
  • internal/httpcontroller/init.go (4 hunks)
  • internal/httpcontroller/routes.go (2 hunks)
  • internal/httpcontroller/utils.go (2 hunks)
  • views/fragments/birdsTableHTML.html (2 hunks)
  • views/fragments/recentDetections.html (3 hunks)
Files skipped from review as they are similar to previous changes (2)
  • go.mod
  • views/fragments/recentDetections.html
Additional comments not posted (7)
internal/httpcontroller/imageprovider/imageprovider_test.go (1)

16-54: The test implementation for TestBirdImageCache is well-structured and effectively checks the caching behavior. Good job ensuring that the cache avoids unnecessary fetches and handles duplicate requests correctly.

internal/httpcontroller/routes.go (1)

37-47: The updates to funcMap in the initRoutes method are well implemented, correctly integrating the thumbnail and thumbnailAttribution functions for use in templates. This is a crucial step for enabling dynamic content rendering based on image data.

internal/httpcontroller/init.go (2)

16-16: Importing the imageprovider package is crucial for the new functionality. Ensure that it is well integrated and tested.


109-109: Initializing the bird image cache in a separate goroutine is a good practice for non-blocking operations. Ensure that all potential race conditions are handled.

internal/httpcontroller/utils.go (2)

242-250: The thumbnail method is straightforward and correctly retrieves the image URL from the cache. Ensure that empty strings are handled appropriately in the calling context.


252-273: The thumbnailAttribution method is well-implemented with checks for missing author or license information. Consider adding more detailed logging for debugging purposes.
[REFACTOR_SUGGESTion]

- log.Printf("Error getting thumbnail info for %s: %v", scientificName, err)
+ log.Printf("Error getting thumbnail attribution for %s: %v", scientificName, err)
internal/httpcontroller/imageprovider/wikipedia.go (1)

25-33: The constructor for wikiMediaProvider is correctly setting up the client. Ensure that the user-agent string "Birdnet-Go" is appropriate and accepted by Wikipedia's API.

Comment on lines +44 to +47
cache, err := imageprovider.CreateDefaultCache()
if err != nil {
log.Fatal(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error handling during cache initialization is critical. Consider logging the error for better diagnostics before terminating the application.

- log.Fatal(err)
+ log.Fatalf("Failed to create default cache: %v", err)
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cache, err := imageprovider.CreateDefaultCache()
if err != nil {
log.Fatal(err)
}
cache, err := imageprovider.CreateDefaultCache()
if err != nil {
log.Fatalf("Failed to create default cache: %v", err)
}

Comment on lines +156 to +177
func (s *Server) initBirdImageCache() {
speciesList, err := s.ds.GetAllDetectedSpecies()
if err != nil {
s.Echo.Logger.Fatal(err)
}

var wg sync.WaitGroup

for _, species := range speciesList {
wg.Add(1)

go func(speciesName string) {
defer wg.Done()
_, err := s.BirdImageCache.Get(speciesName)
if err != nil {
s.Echo.Logger.Error(err)
}
}(species.ScientificName)
}

wg.Wait()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method initBirdImageCache efficiently initializes the cache with concurrency. However, consider handling errors more gracefully instead of just logging them, to avoid silent failures in cache initialization.

- s.Echo.Logger.Error(err)
+ if err != nil {
+   s.Echo.Logger.Errorf("Failed to initialize cache for species %s: %v", speciesName, err)
+   return
+ }
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (s *Server) initBirdImageCache() {
speciesList, err := s.ds.GetAllDetectedSpecies()
if err != nil {
s.Echo.Logger.Fatal(err)
}
var wg sync.WaitGroup
for _, species := range speciesList {
wg.Add(1)
go func(speciesName string) {
defer wg.Done()
_, err := s.BirdImageCache.Get(speciesName)
if err != nil {
s.Echo.Logger.Error(err)
}
}(species.ScientificName)
}
wg.Wait()
}
func (s *Server) initBirdImageCache() {
speciesList, err := s.ds.GetAllDetectedSpecies()
if err != nil {
s.Echo.Logger.Fatal(err)
}
var wg sync.WaitGroup
for _, species := range speciesList {
wg.Add(1)
go func(speciesName string) {
defer wg.Done()
_, err := s.BirdImageCache.Get(speciesName)
if err != nil {
s.Echo.Logger.Errorf("Failed to initialize cache for species %s: %v", speciesName, err)
return
}
}(species.ScientificName)
}
wg.Wait()
}

Comment on lines +54 to +76
// fetch retrieves the bird image for a given scientific name.
func (l *wikiMediaProvider) fetch(scientificName string) (BirdImage, error) {
// Query for the thumbnail image URL and source file name
thumbnailUrl, thumbnailSourceFile, err := l.queryThumbnail(scientificName)
if err != nil {
return BirdImage{}, fmt.Errorf("failed to query thumbnail of bird: %s : %w", scientificName, err)
}

// Query for the image author information
authorInfo, err := l.queryAuthorInfo(thumbnailSourceFile)
if err != nil {
return BirdImage{}, fmt.Errorf("failed to query thumbnail credit of bird: %s : %w", scientificName, err)
}

// Return the bird image struct with the image URL and author information
return BirdImage{
Url: thumbnailUrl,
AuthorName: authorInfo.name,
AuthorUrl: authorInfo.url,
LicenseName: authorInfo.licenseName,
LicenseUrl: authorInfo.licenseUrl,
}, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fetch method is a key part of this implementation, integrating thumbnail and author queries. Consider adding retries or fallback mechanisms in case of API failures.

- return BirdImage{}, fmt.Errorf("failed to query thumbnail of bird: %s : %w", scientificName, err)
+ // Implement retry logic here

Committable suggestion was skipped due to low confidence.

Comment on lines +162 to +221
// extractArtistInfo tries to extract the author information as best as possible
// from the given input which may consist of nested html tags.
func extractArtistInfo(htmlStr string) (href, text string, err error) {
// Parse the HTML string into an HTML document
doc, err := html.Parse(strings.NewReader(htmlStr))
if err != nil {
return "", "", err
}

// Find all the links in the document
links := findLinks(doc)

// If no links are found, extract the inner text and return it
if len(links) == 0 {
return "", html2text.HTML2Text(htmlStr), nil
}

// If there is only one link, extract the href and inner text and return them
if len(links) == 1 {
link := links[0]
href = extractHref(link)
text = extractText(link)
return href, text, nil
}

// Look for a Wikipedia user link and extract the href and inner text
wikipediaUserLinks := findWikipediaUserLinks(links)

if len(wikipediaUserLinks) == 0 {
return "", "", fmt.Errorf("failed to extract link from HTML: %s", htmlStr)
}

if len(wikipediaUserLinks) == 1 {
// Return the href and inner text of the Wikipedia user link
wikipediaLink := wikipediaUserLinks[0]
href = extractHref(wikipediaLink)
text = extractText(wikipediaLink)
return href, text, nil
}

// Check if all the links have the same href value
firstHref := extractHref(wikipediaUserLinks[0])
allSameHref := true
for _, link := range wikipediaUserLinks[1:] {
if extractHref(link) != firstHref {
allSameHref = false
break
}
}

if allSameHref {
// Return the href and inner text of the first Wikipedia user link
wikipediaLink := wikipediaUserLinks[0]
href = extractHref(wikipediaLink)
text = extractText(wikipediaLink)
return href, text, nil
}

return "", "", fmt.Errorf("multiple Wikipedia user links found in HTML: %s", htmlStr)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extractArtistInfo function is complex and handles multiple scenarios. Refactor this to simplify the logic and improve maintainability.

- if len(links) == 0 {
+ if len(links) <= 1 {
+   // Simplify handling for one or no links
}

Committable suggestion was skipped due to low confidence.

Comment on lines +20 to +83
type BirdImageCache struct {
dataMap sync.Map
dataMutexMap sync.Map
birdImageProvider ImageProvider
nonBirdImageProvider ImageProvider
}

type emptyImageProvider struct {
}

func (l *emptyImageProvider) fetch(scientificName string) (BirdImage, error) {
return BirdImage{}, nil
}

func initCache(e ImageProvider) *BirdImageCache {
return &BirdImageCache{
birdImageProvider: e,
nonBirdImageProvider: &emptyImageProvider{}, // TODO: Use a real image provider for non-birds
}
}

func CreateDefaultCache() (*BirdImageCache, error) {
provider, err := NewWikiMediaProvider()
if err != nil {
return nil, err
}
return initCache(provider), nil
}

func (c *BirdImageCache) Get(scientificName string) (info BirdImage, err error) {
// Check if the bird image is already in the cache
birdImage, ok := c.dataMap.Load(scientificName)
if ok {
return birdImage.(BirdImage), nil
}

// Use a per-item mutex to ensure only one query is performed per item
mu, _ := c.dataMutexMap.LoadOrStore(scientificName, &sync.Mutex{})
mutex := mu.(*sync.Mutex)

mutex.Lock()
defer mutex.Unlock()

// Check again if bird image is cached after acquiring the lock
birdImage, ok = c.dataMap.Load(scientificName)
if ok {
return birdImage.(BirdImage), nil
}

// Fetch the bird image from the image provider
fetchedBirdImage, err := c.fetch(scientificName)
if err != nil {
// TODO for now store a empty result in the cache to avoid future queries that would fail.
// In the future, look at the error and decide if it was caused by networking and is recoverable.
// And if it was, do not store the empty result in the cache.
c.dataMap.Store(scientificName, BirdImage{})
return BirdImage{}, err
}

// Store the fetched image information in the cache
c.dataMap.Store(scientificName, fetchedBirdImage)

return fetchedBirdImage, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of BirdImageCache uses modern concurrency patterns effectively with sync.Map and per-item mutexes. However, consider improving the error handling in the fetch method to differentiate between recoverable and non-recoverable errors before deciding to cache an empty result.

Comment on lines +85 to +109
func (c *BirdImageCache) fetch(scientificName string) (info BirdImage, err error) {
var imageProviderToUse ImageProvider

// Determine the image provider based on the scientific name
if slices.Contains([]string{
"Dog",
"Engine",
"Environmental",
"Fireworks",
"Gun",
"Human non-vocal",
"Human vocal",
"Human whistle",
"Noise",
"Power tools",
"Siren",
}, scientificName) {
imageProviderToUse = c.nonBirdImageProvider
} else {
imageProviderToUse = c.birdImageProvider
}

// Fetch the image from the image provider
return imageProviderToUse.fetch(scientificName)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fetch method's logic for determining the image provider based on the scientific name is a smart design choice. Consider adding more comments to explain why certain categories use the non-bird image provider, enhancing the maintainability and understandability of the code.

@tphakala tphakala merged commit 7ce09d5 into tphakala:main Jun 17, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants