Skip to content

Quick fix for community all stars script#5798

Merged
blackgirlbytes merged 4 commits intomainfrom
fix-community-stars-script
Nov 18, 2025
Merged

Quick fix for community all stars script#5798
blackgirlbytes merged 4 commits intomainfrom
fix-community-stars-script

Conversation

@blackgirlbytes
Copy link
Contributor

This pull request significantly improves the community_stars.py script by automating contributor data fetching, enhancing Block employee detection, and simplifying contributor categorization. The script now fetches contributor data directly from the GitHub API with retry logic, checks public organization memberships and company fields to identify Block employees, and removes the "unknown" contributor category for clearer reporting.

Automation and Data Fetching

  • The script now automatically fetches contributor data from the GitHub API if the local cache is missing or invalid, with retry logic to handle temporary API issues. [1] [2]

Block Employee Detection

  • Added the is_block_employee function, which checks a contributor's public organization memberships and company field to determine Block employment, using a cache to avoid redundant API calls and reduce rate limiting. [1] [2]

Contributor Categorization and Reporting

  • Contributors are now categorized strictly as either "Block" or "External," removing the "unknown" category and related reporting logic for simplicity. [1] [2] [3]

Documentation and Requirements

  • Updated script documentation to reflect the new automation, requirements, and team list file location.

Copilot AI review requested due to automatic review settings November 18, 2025 19:13
@blackgirlbytes blackgirlbytes requested a review from a team as a code owner November 18, 2025 19:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR modernizes the community_stars.py script by automating contributor data fetching from GitHub's API and enhancing Block employee detection through organization membership checks.

Key Changes:

  • Adds automatic fetching of contributor data from GitHub API with retry logic and local caching
  • Implements is_block_employee() function to detect Block employees via public org memberships and company field
  • Removes "unknown" contributor category, now strictly categorizing as "Block" or "External"

Comment on lines 42 to 72
def is_block_employee(username):
"""Check if a user is a Block employee by checking their public org memberships."""
try:
# Check public org memberships
url = f"https://api.github.com/users/{username}/orgs"
with urllib.request.urlopen(url) as response:
orgs = json.loads(response.read().decode('utf-8'))

# Check if any org matches Block orgs (case-insensitive)
user_orgs = {org['login'].lower() for org in orgs}
if user_orgs & BLOCK_ORGS:
return True

# Also check the user's company field
url = f"https://api.github.com/users/{username}"
with urllib.request.urlopen(url) as response:
user_data = json.loads(response.read().decode('utf-8'))

company = user_data.get('company', '').lower()
if company:
# Check for Block-related keywords in company field
block_keywords = ['block', 'square', 'cash app', 'cashapp', 'tidal']
if any(keyword in company for keyword in block_keywords):
return True

return False

except Exception as e:
# If we can't check (rate limit, network error, etc.), return False
# This means we'll default to treating them as external
return False
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_block_employee function makes two sequential API calls for every user check (orgs then user profile), which could be slow and hit rate limits when processing many contributors. Consider fetching both in parallel or combining the user profile check (which includes org data if public) into a single call.

Copilot uses AI. Check for mistakes.
Comment on lines +290 to +295
if username not in checked_orgs:
checked_orgs[username] = is_block_employee(username)
# Add a small delay to avoid rate limiting
time.sleep(0.1)

if checked_orgs[username]:
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache check uses username but the categorization check at line 286 uses username_lower. If a username differs only in case, it could bypass the cache and make redundant API calls. Use username_lower consistently for cache keys.

Suggested change
if username not in checked_orgs:
checked_orgs[username] = is_block_employee(username)
# Add a small delay to avoid rate limiting
time.sleep(0.1)
if checked_orgs[username]:
if username_lower not in checked_orgs:
checked_orgs[username_lower] = is_block_employee(username)
# Add a small delay to avoid rate limiting
time.sleep(0.1)
if checked_orgs[username_lower]:

Copilot uses AI. Check for mistakes.
if username not in checked_orgs:
checked_orgs[username] = is_block_employee(username)
# Add a small delay to avoid rate limiting
time.sleep(0.1)
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 0.1 second delay applies to every uncached contributor check, even if they're already in the block_non_goose list (line 286). Move this delay inside the is_block_employee function or only apply it when actually making API calls to avoid unnecessary delays.

Copilot uses AI. Check for mistakes.
"""Check if a user is a Block employee by checking their public org memberships."""
try:
# Check public org memberships
url = f"https://api.github.com/users/{username}/orgs"
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The username is directly interpolated into the URL without validation or encoding. If a username contains special characters, this could break the URL or cause unexpected behavior. Use urllib.parse.quote() to safely encode the username.

Copilot uses AI. Check for mistakes.
return True

# Also check the user's company field
url = f"https://api.github.com/users/{username}"
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as line 46 - the username should be URL-encoded using urllib.parse.quote() to handle special characters safely.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Contributor

github-actions bot commented Nov 18, 2025

PR Preview Action v1.6.0
Preview removed because the pull request was closed.
2025-11-18 22:10 UTC

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment on lines 43 to 68
"""Check if a user is a Block employee by checking their public org memberships."""
try:
# Check public org memberships
url = f"https://api.github.com/users/{username}/orgs"
with urllib.request.urlopen(url) as response:
orgs = json.loads(response.read().decode('utf-8'))

# Check if any org matches Block orgs (case-insensitive)
user_orgs = {org['login'].lower() for org in orgs}
if user_orgs & BLOCK_ORGS:
return True

# Also check the user's company field
url = f"https://api.github.com/users/{username}"
with urllib.request.urlopen(url) as response:
user_data = json.loads(response.read().decode('utf-8'))

company = user_data.get('company', '').lower()
if company:
# Check for Block-related keywords in company field
block_keywords = ['block', 'square', 'cash app', 'cashapp', 'tidal']
if any(keyword in company for keyword in block_keywords):
return True

return False

Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function makes two separate API calls (orgs then user profile) even when org check succeeds. Fetch the user profile first which includes both company field and can be used to get orgs if needed, reducing API calls by ~50% when org membership is detected.

Suggested change
"""Check if a user is a Block employee by checking their public org memberships."""
try:
# Check public org memberships
url = f"https://api.github.com/users/{username}/orgs"
with urllib.request.urlopen(url) as response:
orgs = json.loads(response.read().decode('utf-8'))
# Check if any org matches Block orgs (case-insensitive)
user_orgs = {org['login'].lower() for org in orgs}
if user_orgs & BLOCK_ORGS:
return True
# Also check the user's company field
url = f"https://api.github.com/users/{username}"
with urllib.request.urlopen(url) as response:
user_data = json.loads(response.read().decode('utf-8'))
company = user_data.get('company', '').lower()
if company:
# Check for Block-related keywords in company field
block_keywords = ['block', 'square', 'cash app', 'cashapp', 'tidal']
if any(keyword in company for keyword in block_keywords):
return True
return False
"""Check if a user is a Block employee by checking their company field and public org memberships."""
try:
# First, check the user's company field from their profile
url = f"https://api.github.com/users/{username}"
with urllib.request.urlopen(url) as response:
user_data = json.loads(response.read().decode('utf-8'))
company = user_data.get('company', '')
if company:
company_lower = company.lower()
block_keywords = ['block', 'square', 'cash app', 'cashapp', 'tidal']
if any(keyword in company_lower for keyword in block_keywords):
return True
# If company field is not sufficient, check public org memberships
url = f"https://api.github.com/users/{username}/orgs"
with urllib.request.urlopen(url) as response:
orgs = json.loads(response.read().decode('utf-8'))
user_orgs = {org['login'].lower() for org in orgs}
if user_orgs & BLOCK_ORGS:
return True
return False

Copilot uses AI. Check for mistakes.
…es, optimize API calls

- Add automatic GitHub data fetching with retry logic and validation
- Default unknown contributors to external (eligible for Community All-Stars)
- Add automatic Block employee detection via public org membership and company field
- Optimize is_block_employee() to check company field first, reducing API calls by ~30-40%
- Add caching for org checks to avoid redundant API calls
- Improve error handling with clear messages for API failures
- Update documentation to reflect automatic fetching capabilities

This eliminates the need for manual curl commands and prevents recipe failures
from empty/invalid GitHub data files.
The 'External' section in community_stars_teams.txt was never actually used
by the script - contributors default to 'external' unless they're in the
block_non_goose list or have public Block org membership.

Changes:
- Remove unused 'external' set from script
- Remove External section from team list file (39 usernames removed)
- Add documentation explaining automatic external detection
- Simplify load_team_lists() function

This reduces maintenance burden - no need to track external contributors
manually since they're auto-detected by default.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

Comment on lines +63 to +64
url = f"https://api.github.com/users/{username}/orgs"
with urllib.request.urlopen(url) as response:
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing timeout parameter on urlopen call. This could hang indefinitely if the GitHub API is unresponsive. Add a timeout parameter: urllib.request.urlopen(url, timeout=30)

Copilot uses AI. Check for mistakes.
"""
try:
# First check the user's profile (single API call)
url = f"https://api.github.com/users/{username}"
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Username is inserted directly into URL without validation or encoding. If a username contains special characters, this could break the URL or potentially be exploited. Use urllib.parse.quote() to encode the username: url = f"https://api.github.com/users/{urllib.parse.quote(username)}"

Copilot uses AI. Check for mistakes.
return True

# Only check orgs if company field didn't match (second API call only when needed)
url = f"https://api.github.com/users/{username}/orgs"
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Username is inserted directly into URL without validation or encoding. Use urllib.parse.quote() to encode the username: url = f"https://api.github.com/users/{urllib.parse.quote(username)}/orgs"

Copilot uses AI. Check for mistakes.
Comment on lines +42 to +77
def is_block_employee(username):
"""Check if a user is a Block employee by checking their profile and org memberships.

Makes a single API call to get user profile (includes company field),
then only calls orgs endpoint if company field doesn't match.
"""
try:
# First check the user's profile (single API call)
url = f"https://api.github.com/users/{username}"
with urllib.request.urlopen(url) as response:
user_data = json.loads(response.read().decode('utf-8'))

# Check company field first (no additional API call needed)
company = user_data.get('company', '').lower() if user_data.get('company') else ''
if company:
# Check for Block-related keywords in company field
block_keywords = ['block', 'square', 'cash app', 'cashapp', 'tidal']
if any(keyword in company for keyword in block_keywords):
return True

# Only check orgs if company field didn't match (second API call only when needed)
url = f"https://api.github.com/users/{username}/orgs"
with urllib.request.urlopen(url) as response:
orgs = json.loads(response.read().decode('utf-8'))

# Check if any org matches Block orgs (case-insensitive)
user_orgs = {org['login'].lower() for org in orgs}
if user_orgs & BLOCK_ORGS:
return True

return False

except Exception as e:
# If we can't check (rate limit, network error, etc.), return False
# This means we'll default to treating them as external
return False
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_block_employee function makes unauthenticated GitHub API calls without checking rate limits. For unauthenticated requests, GitHub's rate limit is only 60 requests per hour. With the 0.1s delay on line 291, processing 60+ contributors will exceed this limit and cause failures. Consider adding a GitHub token for authentication (5000 req/hr) or implementing exponential backoff when rate limit errors occur.

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +51
url = f"https://api.github.com/users/{username}"
with urllib.request.urlopen(url) as response:
Copy link

Copilot AI Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing timeout parameter on urlopen call. This could hang indefinitely if the GitHub API is unresponsive. Add a timeout parameter like line 219: urllib.request.urlopen(url, timeout=30)

Copilot uses AI. Check for mistakes.
@blackgirlbytes blackgirlbytes merged commit 2393a1f into main Nov 18, 2025
18 checks passed
michaelneale added a commit that referenced this pull request Nov 19, 2025
* main:
  feat/fix Re-enabled WAL with commit transaction management (Linux Verification Requested) (#5793)
  chore: remove autopilot experimental feature (#5781)
  Read paths from an interactive & login shell (#5774)
  docs: acp clients (#5800)
  Provider error proxy for simulating various types of errors (#5091)
  chore: Add links to maintainer profiles (#5788)
  Quick fix for community all stars script (#5798)
  Document Mistral AI provider (#5799)
  docs: Add Community Stars recipe script and txt file (#5776)
wpfleger96 added a commit that referenced this pull request Nov 20, 2025
* main: (33 commits)
  fix: support Gemini 3's thought signatures (#5806)
  chore: Add Adrian Cole to Maintainers (#5815)
  [MCP-UI] Proxy and Better Message Handling (#5487)
  Release 1.15.0
  Document New Window menu in macOS dock (#5811)
  Catch cron errors (#5707)
  feat/fix Re-enabled WAL with commit transaction management (Linux Verification Requested) (#5793)
  chore: remove autopilot experimental feature (#5781)
  Read paths from an interactive & login shell (#5774)
  docs: acp clients (#5800)
  Provider error proxy for simulating various types of errors (#5091)
  chore: Add links to maintainer profiles (#5788)
  Quick fix for community all stars script (#5798)
  Document Mistral AI provider (#5799)
  docs: Add Community Stars recipe script and txt file (#5776)
  chore: incorporate LF feedback (#5787)
  docs: quick launcher (#5779)
  Bump auto scroll threshold (#5738)
  fix: add one-time cleanup for linux hermit locking issues (#5742)
  Don't show update tray icon if GOOSE_VERSION is set (#5750)
  ...
BlairAllan pushed a commit to BlairAllan/goose that referenced this pull request Nov 29, 2025
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Blair Allan <Blairallan@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants