Quick fix for community all stars script#5798
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR modernizes the community_stars.py script by automating contributor data fetching from GitHub's API and enhancing Block employee detection through organization membership checks.
Key Changes:
- Adds automatic fetching of contributor data from GitHub API with retry logic and local caching
- Implements
is_block_employee()function to detect Block employees via public org memberships and company field - Removes "unknown" contributor category, now strictly categorizing as "Block" or "External"
| def is_block_employee(username): | ||
| """Check if a user is a Block employee by checking their public org memberships.""" | ||
| try: | ||
| # Check public org memberships | ||
| url = f"https://api.github.com/users/{username}/orgs" | ||
| with urllib.request.urlopen(url) as response: | ||
| orgs = json.loads(response.read().decode('utf-8')) | ||
|
|
||
| # Check if any org matches Block orgs (case-insensitive) | ||
| user_orgs = {org['login'].lower() for org in orgs} | ||
| if user_orgs & BLOCK_ORGS: | ||
| return True | ||
|
|
||
| # Also check the user's company field | ||
| url = f"https://api.github.com/users/{username}" | ||
| with urllib.request.urlopen(url) as response: | ||
| user_data = json.loads(response.read().decode('utf-8')) | ||
|
|
||
| company = user_data.get('company', '').lower() | ||
| if company: | ||
| # Check for Block-related keywords in company field | ||
| block_keywords = ['block', 'square', 'cash app', 'cashapp', 'tidal'] | ||
| if any(keyword in company for keyword in block_keywords): | ||
| return True | ||
|
|
||
| return False | ||
|
|
||
| except Exception as e: | ||
| # If we can't check (rate limit, network error, etc.), return False | ||
| # This means we'll default to treating them as external | ||
| return False |
There was a problem hiding this comment.
The is_block_employee function makes two sequential API calls for every user check (orgs then user profile), which could be slow and hit rate limits when processing many contributors. Consider fetching both in parallel or combining the user profile check (which includes org data if public) into a single call.
| if username not in checked_orgs: | ||
| checked_orgs[username] = is_block_employee(username) | ||
| # Add a small delay to avoid rate limiting | ||
| time.sleep(0.1) | ||
|
|
||
| if checked_orgs[username]: |
There was a problem hiding this comment.
The cache check uses username but the categorization check at line 286 uses username_lower. If a username differs only in case, it could bypass the cache and make redundant API calls. Use username_lower consistently for cache keys.
| if username not in checked_orgs: | |
| checked_orgs[username] = is_block_employee(username) | |
| # Add a small delay to avoid rate limiting | |
| time.sleep(0.1) | |
| if checked_orgs[username]: | |
| if username_lower not in checked_orgs: | |
| checked_orgs[username_lower] = is_block_employee(username) | |
| # Add a small delay to avoid rate limiting | |
| time.sleep(0.1) | |
| if checked_orgs[username_lower]: |
| if username not in checked_orgs: | ||
| checked_orgs[username] = is_block_employee(username) | ||
| # Add a small delay to avoid rate limiting | ||
| time.sleep(0.1) |
There was a problem hiding this comment.
The 0.1 second delay applies to every uncached contributor check, even if they're already in the block_non_goose list (line 286). Move this delay inside the is_block_employee function or only apply it when actually making API calls to avoid unnecessary delays.
| """Check if a user is a Block employee by checking their public org memberships.""" | ||
| try: | ||
| # Check public org memberships | ||
| url = f"https://api.github.com/users/{username}/orgs" |
There was a problem hiding this comment.
The username is directly interpolated into the URL without validation or encoding. If a username contains special characters, this could break the URL or cause unexpected behavior. Use urllib.parse.quote() to safely encode the username.
| return True | ||
|
|
||
| # Also check the user's company field | ||
| url = f"https://api.github.com/users/{username}" |
There was a problem hiding this comment.
Same issue as line 46 - the username should be URL-encoded using urllib.parse.quote() to handle special characters safely.
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| """Check if a user is a Block employee by checking their public org memberships.""" | ||
| try: | ||
| # Check public org memberships | ||
| url = f"https://api.github.com/users/{username}/orgs" | ||
| with urllib.request.urlopen(url) as response: | ||
| orgs = json.loads(response.read().decode('utf-8')) | ||
|
|
||
| # Check if any org matches Block orgs (case-insensitive) | ||
| user_orgs = {org['login'].lower() for org in orgs} | ||
| if user_orgs & BLOCK_ORGS: | ||
| return True | ||
|
|
||
| # Also check the user's company field | ||
| url = f"https://api.github.com/users/{username}" | ||
| with urllib.request.urlopen(url) as response: | ||
| user_data = json.loads(response.read().decode('utf-8')) | ||
|
|
||
| company = user_data.get('company', '').lower() | ||
| if company: | ||
| # Check for Block-related keywords in company field | ||
| block_keywords = ['block', 'square', 'cash app', 'cashapp', 'tidal'] | ||
| if any(keyword in company for keyword in block_keywords): | ||
| return True | ||
|
|
||
| return False | ||
|
|
There was a problem hiding this comment.
The function makes two separate API calls (orgs then user profile) even when org check succeeds. Fetch the user profile first which includes both company field and can be used to get orgs if needed, reducing API calls by ~50% when org membership is detected.
| """Check if a user is a Block employee by checking their public org memberships.""" | |
| try: | |
| # Check public org memberships | |
| url = f"https://api.github.com/users/{username}/orgs" | |
| with urllib.request.urlopen(url) as response: | |
| orgs = json.loads(response.read().decode('utf-8')) | |
| # Check if any org matches Block orgs (case-insensitive) | |
| user_orgs = {org['login'].lower() for org in orgs} | |
| if user_orgs & BLOCK_ORGS: | |
| return True | |
| # Also check the user's company field | |
| url = f"https://api.github.com/users/{username}" | |
| with urllib.request.urlopen(url) as response: | |
| user_data = json.loads(response.read().decode('utf-8')) | |
| company = user_data.get('company', '').lower() | |
| if company: | |
| # Check for Block-related keywords in company field | |
| block_keywords = ['block', 'square', 'cash app', 'cashapp', 'tidal'] | |
| if any(keyword in company for keyword in block_keywords): | |
| return True | |
| return False | |
| """Check if a user is a Block employee by checking their company field and public org memberships.""" | |
| try: | |
| # First, check the user's company field from their profile | |
| url = f"https://api.github.com/users/{username}" | |
| with urllib.request.urlopen(url) as response: | |
| user_data = json.loads(response.read().decode('utf-8')) | |
| company = user_data.get('company', '') | |
| if company: | |
| company_lower = company.lower() | |
| block_keywords = ['block', 'square', 'cash app', 'cashapp', 'tidal'] | |
| if any(keyword in company_lower for keyword in block_keywords): | |
| return True | |
| # If company field is not sufficient, check public org memberships | |
| url = f"https://api.github.com/users/{username}/orgs" | |
| with urllib.request.urlopen(url) as response: | |
| orgs = json.loads(response.read().decode('utf-8')) | |
| user_orgs = {org['login'].lower() for org in orgs} | |
| if user_orgs & BLOCK_ORGS: | |
| return True | |
| return False | |
…es, optimize API calls - Add automatic GitHub data fetching with retry logic and validation - Default unknown contributors to external (eligible for Community All-Stars) - Add automatic Block employee detection via public org membership and company field - Optimize is_block_employee() to check company field first, reducing API calls by ~30-40% - Add caching for org checks to avoid redundant API calls - Improve error handling with clear messages for API failures - Update documentation to reflect automatic fetching capabilities This eliminates the need for manual curl commands and prevents recipe failures from empty/invalid GitHub data files.
The 'External' section in community_stars_teams.txt was never actually used by the script - contributors default to 'external' unless they're in the block_non_goose list or have public Block org membership. Changes: - Remove unused 'external' set from script - Remove External section from team list file (39 usernames removed) - Add documentation explaining automatic external detection - Simplify load_team_lists() function This reduces maintenance burden - no need to track external contributors manually since they're auto-detected by default.
| url = f"https://api.github.com/users/{username}/orgs" | ||
| with urllib.request.urlopen(url) as response: |
There was a problem hiding this comment.
Missing timeout parameter on urlopen call. This could hang indefinitely if the GitHub API is unresponsive. Add a timeout parameter: urllib.request.urlopen(url, timeout=30)
| """ | ||
| try: | ||
| # First check the user's profile (single API call) | ||
| url = f"https://api.github.com/users/{username}" |
There was a problem hiding this comment.
Username is inserted directly into URL without validation or encoding. If a username contains special characters, this could break the URL or potentially be exploited. Use urllib.parse.quote() to encode the username: url = f"https://api.github.com/users/{urllib.parse.quote(username)}"
| return True | ||
|
|
||
| # Only check orgs if company field didn't match (second API call only when needed) | ||
| url = f"https://api.github.com/users/{username}/orgs" |
There was a problem hiding this comment.
Username is inserted directly into URL without validation or encoding. Use urllib.parse.quote() to encode the username: url = f"https://api.github.com/users/{urllib.parse.quote(username)}/orgs"
| def is_block_employee(username): | ||
| """Check if a user is a Block employee by checking their profile and org memberships. | ||
|
|
||
| Makes a single API call to get user profile (includes company field), | ||
| then only calls orgs endpoint if company field doesn't match. | ||
| """ | ||
| try: | ||
| # First check the user's profile (single API call) | ||
| url = f"https://api.github.com/users/{username}" | ||
| with urllib.request.urlopen(url) as response: | ||
| user_data = json.loads(response.read().decode('utf-8')) | ||
|
|
||
| # Check company field first (no additional API call needed) | ||
| company = user_data.get('company', '').lower() if user_data.get('company') else '' | ||
| if company: | ||
| # Check for Block-related keywords in company field | ||
| block_keywords = ['block', 'square', 'cash app', 'cashapp', 'tidal'] | ||
| if any(keyword in company for keyword in block_keywords): | ||
| return True | ||
|
|
||
| # Only check orgs if company field didn't match (second API call only when needed) | ||
| url = f"https://api.github.com/users/{username}/orgs" | ||
| with urllib.request.urlopen(url) as response: | ||
| orgs = json.loads(response.read().decode('utf-8')) | ||
|
|
||
| # Check if any org matches Block orgs (case-insensitive) | ||
| user_orgs = {org['login'].lower() for org in orgs} | ||
| if user_orgs & BLOCK_ORGS: | ||
| return True | ||
|
|
||
| return False | ||
|
|
||
| except Exception as e: | ||
| # If we can't check (rate limit, network error, etc.), return False | ||
| # This means we'll default to treating them as external | ||
| return False |
There was a problem hiding this comment.
The is_block_employee function makes unauthenticated GitHub API calls without checking rate limits. For unauthenticated requests, GitHub's rate limit is only 60 requests per hour. With the 0.1s delay on line 291, processing 60+ contributors will exceed this limit and cause failures. Consider adding a GitHub token for authentication (5000 req/hr) or implementing exponential backoff when rate limit errors occur.
| url = f"https://api.github.com/users/{username}" | ||
| with urllib.request.urlopen(url) as response: |
There was a problem hiding this comment.
Missing timeout parameter on urlopen call. This could hang indefinitely if the GitHub API is unresponsive. Add a timeout parameter like line 219: urllib.request.urlopen(url, timeout=30)
* main: feat/fix Re-enabled WAL with commit transaction management (Linux Verification Requested) (#5793) chore: remove autopilot experimental feature (#5781) Read paths from an interactive & login shell (#5774) docs: acp clients (#5800) Provider error proxy for simulating various types of errors (#5091) chore: Add links to maintainer profiles (#5788) Quick fix for community all stars script (#5798) Document Mistral AI provider (#5799) docs: Add Community Stars recipe script and txt file (#5776)
* main: (33 commits) fix: support Gemini 3's thought signatures (#5806) chore: Add Adrian Cole to Maintainers (#5815) [MCP-UI] Proxy and Better Message Handling (#5487) Release 1.15.0 Document New Window menu in macOS dock (#5811) Catch cron errors (#5707) feat/fix Re-enabled WAL with commit transaction management (Linux Verification Requested) (#5793) chore: remove autopilot experimental feature (#5781) Read paths from an interactive & login shell (#5774) docs: acp clients (#5800) Provider error proxy for simulating various types of errors (#5091) chore: Add links to maintainer profiles (#5788) Quick fix for community all stars script (#5798) Document Mistral AI provider (#5799) docs: Add Community Stars recipe script and txt file (#5776) chore: incorporate LF feedback (#5787) docs: quick launcher (#5779) Bump auto scroll threshold (#5738) fix: add one-time cleanup for linux hermit locking issues (#5742) Don't show update tray icon if GOOSE_VERSION is set (#5750) ...
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Blair Allan <Blairallan@icloud.com>
This pull request significantly improves the
community_stars.pyscript by automating contributor data fetching, enhancing Block employee detection, and simplifying contributor categorization. The script now fetches contributor data directly from the GitHub API with retry logic, checks public organization memberships and company fields to identify Block employees, and removes the "unknown" contributor category for clearer reporting.Automation and Data Fetching
Block Employee Detection
is_block_employeefunction, which checks a contributor's public organization memberships and company field to determine Block employment, using a cache to avoid redundant API calls and reduce rate limiting. [1] [2]Contributor Categorization and Reporting
Documentation and Requirements