fix: add retry logic for OpenAI embedding API failures#272
fix: add retry logic for OpenAI embedding API failures#272steipete merged 6 commits intoopenclaw:mainfrom
Conversation
Fixes openclaw#149 When importing or uploading skills, the OpenAI embedding API call could fail with transient errors (rate limits, timeouts, network issues), causing the entire import to fail with a generic "Server Error". This adds retry logic with exponential backoff (1s, 2s, 4s delays): - Retries on 429 (rate limit) and 5xx server errors - Retries on network/fetch errors - Logs warnings for debugging - Max 3 retries before failing with clear error message Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@superlowburn is attempting to deploy a commit to the Amantus Machina Team on Vercel. A member of the Team first needs to authorize it. |
convex/lib/embeddings.ts
Outdated
| for (let attempt = 0; attempt <= maxRetries; attempt++) { | ||
| try { | ||
| const response = await fetch('https://api.openai.com/v1/embeddings', { | ||
| method: 'POST', | ||
| headers: { |
There was a problem hiding this comment.
Off-by-one retry loop
for (let attempt = 0; attempt <= maxRetries; attempt++) with maxRetries = 3 performs 4 total fetch attempts (0..3), but the warning log prints attempt ${attempt + 1}/${maxRetries}, which reports only 1/3..3/3 and never reflects the final attempt. This also makes the actual backoff sequence longer than intended under persistent failures.
Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/lib/embeddings.ts
Line: 18:22
Comment:
**Off-by-one retry loop**
`for (let attempt = 0; attempt <= maxRetries; attempt++)` with `maxRetries = 3` performs **4 total** fetch attempts (0..3), but the warning log prints `attempt ${attempt + 1}/${maxRetries}`, which reports only `1/3..3/3` and never reflects the final attempt. This also makes the actual backoff sequence longer than intended under persistent failures.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Valid catch. The loop runs 4 times (0..3) but the log only reflects 3. Will fix to attempt < maxRetries and update the log to ${attempt + 1}/${maxRetries} consistently.
convex/lib/embeddings.ts
Outdated
| } catch (error) { | ||
| if (attempt < maxRetries && error instanceof Error && error.message.includes('fetch')) { | ||
| const delay = baseDelay * Math.pow(2, attempt) | ||
| console.warn(`Network error, retrying in ${delay}ms (attempt ${attempt + 1}/${maxRetries})`) |
There was a problem hiding this comment.
Network retries may not trigger
The retry path for thrown errors is gated on error instanceof Error && error.message.includes('fetch'). Many real network/timeouts (e.g., socket errors) won't include 'fetch' in the message (or may not be an Error), so transient network failures can still fail immediately without retrying, undermining the stated goal of handling network/fetch errors.
Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/lib/embeddings.ts
Line: 55:58
Comment:
**Network retries may not trigger**
The retry path for thrown errors is gated on `error instanceof Error && error.message.includes('fetch')`. Many real network/timeouts (e.g., socket errors) won't include `'fetch'` in the message (or may not be an `Error`), so transient network failures can still fail immediately without retrying, undermining the stated goal of handling network/fetch errors.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Good catch — error.message.includes('fetch') is too narrow. Socket hangups, DNS failures, and timeouts won't include that string. Will broaden to catch any Error on retry attempts instead of filtering by message content.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ginal error Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: return proper HTTP status codes for delete/undelete errors The delete and undelete handlers for skills and souls were catching all errors and returning 401 Unauthorized, even for errors like: - 'Skill not found' (should be 404) - 'Forbidden' (should be 403) - Other validation errors (should be 400) This change updates the error handling to return appropriate status codes: - 401 Unauthorized: authentication failures - 403 Forbidden: authorization failures (not owner/admin/moderator) - 404 Not Found: skill/soul/user not found - 400 Bad Request: other errors with descriptive message Fixes #34 * fix(cli): use proper Error objects in abort timeouts When AbortController.abort() receives a string instead of an Error, the string itself is thrown. pRetry then wraps it in a confusing message: 'Non-error was thrown: Timeout' Changed all 3 occurrences in http.ts: - apiRequest (line 57) - apiRequestForm (line 106) - downloadZip (line 141) Now timeouts will surface as proper Error objects with clear messages. * test: add e2e test for delete error handling Verifies that deleting a non-existent skill returns a proper 'not found' error instead of a generic 'Unauthorized' message. * fix: use Error for timeout abort in e2e helper * feat: add skill file viewer * fix: prevent file viewer state updates after unmount * feat: add ban reasons to moderation * chore: release 0.6.0 * docs: reset changelog for next release * feat: add LLM security evaluation at publish time Add OpenClaw LLM-based security evaluator that runs alongside VirusTotal when skills are published. Reads SKILL.md prose, metadata, install specs, and file manifest, then assesses coherence across 5 dimensions to catch social engineering vectors that VT/regex miss (e.g. instruction-only skills with no code files). - convex/lib/securityPrompt.ts: system prompt, message assembly, response parsing, injection pattern detection - convex/llmEval.ts: evaluateWithLlm action, evaluateBySlug convenience action, backfillLlmEval for existing skills - convex/schema.ts: llmAnalysis field on skillVersions - convex/skills.ts: updateVersionLlmAnalysisInternal mutation, getActiveSkillBatchForLlmBackfillInternal query, defense-in-depth multi-scanner flag merging in approveSkillByHashInternal - convex/lib/skillPublish.ts: schedule LLM eval alongside VT scan - SkillDetailPage.tsx: OpenClaw row, LlmAnalysisDetail expandable component with 5 dimension rows, guidance panel, findings section - styles.css: analysis detail styles from mockup * fix: collapse OpenClaw analysis by default, fix row spacing, switch to gpt-5-mini * fix: add retry with backoff for OpenAI rate limits, fix JSON mode requirement * fix: increase max_output_tokens for reasoning model, fix backfill error retry * feat: recognize metadata.openclaw as valid frontmatter namespace * fix: eval assembler falls back to metadata.openclaw for requirements * feat: evaluator reads all file contents, not just SKILL.md Reads all files from storage and includes their full source in the eval prompt so the LLM can detect malicious code hidden behind clean READMEs. Injection detection now scans all content. Per-file cap 10K chars, total cap 50K chars. * feat: add skill metadata docs, suspicious appeal banner for owners - Document full frontmatter metadata reference in docs/skill-format.md - Add metadata section + quick example to README - Show appeal message on suspicious skills (owner-only) linking to GitHub issues - Accept metadata.openclaw alias in README docs - Re-evaluate all skills with full file content reading (backfill in progress) * fix: trailing comma tolerance in JSON metadata, tone down persistence flags - Strip trailing commas in frontmatter JSON before parsing (silent failure fix) - Stop flagging disable-model-invocation default as a concern (it's the normal default) - Stop flagging skills configuring themselves as privilege escalation - Add MITRE ATLAS AML.T0051 context for when autonomous invocation actually matters - Show actual defaults in assembled eval message instead of "not set" * chore: fix lint issues (#213) * perf: lazy-load diff viewer (Monaco) (#212) * chore: fix review comments * fix: VT scan sync race condition + LLM-first moderation model VT no longer overwrites LLM moderation verdicts. LLM is the primary moderation authority; VT only escalates (hides + flags) for malicious/ suspicious content via new escalateByVtInternal mutation. Stale VT polls write vtAnalysis marker instead of overwriting moderationReason. Query pools expanded to include LLM-evaluated skills awaiting VT results. Ban message now references malicious skills and security@openclaw.ai. * fix: handle GitHub API rate limits in account age check (#246) * fix: handle GitHub API rate limits in account age check The GitHub account lookup uses unauthenticated requests (60 req/hr per IP). Since this runs server-side in Convex, all users share the same IP and quickly exhaust the rate limit, causing "GitHub account lookup failed" errors during skill publish. - Detect 403/429 responses and surface a clear rate-limit message - Support optional GITHUB_TOKEN env var for authenticated requests (5,000 req/hr) Fixes #155 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: stabilize GitHub account gate tests and docs --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> * docs: thank @superlowburn for PR #246 * fix: prioritize relevant skills in search * fix: add lexical fallback for skill search recall * test: add search fallback coverage * test: fix search test handler typing * fix(http): remove allowH2 from undici Agent — causes fetch failed on Node.js 22+ (#245) * Remove allowH2 option from global dispatcher fix/remove-allowH2-undici-node22-compat * fix(http): remove allowH2 from e2e dispatcher --------- Co-authored-by: Peter Steinberger <steipete@gmail.com> * docs: add 0.6.1 unreleased changelog from post-0.6.0 commits * fix: allow soft-deleted users to re-authenticate Fixes Issue #32 where users who soft-deleted their accounts were unable to sign back in because the re-auth logic was only triggering when an existingUserId was passed by the auth provider, which doesn't happen during a standard fresh login flow. * test: update auth tests for direct deletedAt check * fix: restore existingUserId check for type safety * fix: update tests to include required existingUserId parameter * fix: resolve final lint error in auth tests * fix: ensure reactivation only matches soft-deleted user (prevents bypass) * fix: allow re-auth when existingUserId is null * fix: use valid crons.interval and set to 1 minute * test: add missing coverage for fresh-login reactivation and identity mismatch guard * fix: scope reauth fix; keep banned users blocked (#177) (thanks @tanujbhaud) * fix: include comment deltas in action-based stat processing & add stats reconciliation (#194) Bug 1: applyAggregatedStatsAndUpdateCursor was missing 'comments' in both the guard condition and the applySkillStatDeltas call. This caused comment count deltas to be silently dropped during cron-based event processing, while stars/downloads/installs were processed correctly. Bug 2: No reconciliation mechanism existed. If events were missed due to cursor issues or processing errors, skill stats (stars, comments) would remain stale with no way to recover. Added reconcileSkillStarCounts maintenance mutation that counts actual records in the stars and comments tables and patches any out-of-sync skill stats. Fixes #193 Co-authored-by: Limitless2023 <limitless@users.noreply.github.com> * fix: prevent horizontal overflow from long code blocks in skill pages (#183) * Fix: Prevent horizontal overflow from long code blocks in skill pages - Add max-width: 100% to .file-list-body and .file-row - Prevents page-wide overflow when skills contain long code examples - Markdown pre blocks already have overflow-x: auto, but parent containers were expanding infinitely - Fixes issue where skills with 400+ char lines (e.g. browser automation commands) cause horizontal scrolling Affected: Skills with long inline code in markdown (browser act commands, etc.) * fix: add max-width to .file-list container to prevent overflow - Also ensures .file-list-body constraint is inherited properly - Prevents long code blocks from expanding file list container * fix: add max-width to all markdown containers and pre tags - Add max-width: 100% to .markdown, .tab-body, .markdown pre - Ensures code blocks are constrained and show horizontal scrollbar - Prevents content from expanding parent containers beyond viewport * fix: add overflow-x to parent containers for horizontal scroll Adds overflow-x: auto to .skill-detail-stack, .tab-card, and .tab-body to ensure long code blocks are scrollable within the content area instead of causing page-wide horizontal overflow. Fixes horizontal overflow issue on skill pages with long code examples (e.g., browser automation commands with 400+ character lines). Tested on zepto skill page - page now stays within viewport (1200px) and code blocks are accessible via horizontal scrollbar in tab area. * docs: note code-block overflow fix in changelog (#183) (thanks @bewithgaurav) --------- Co-authored-by: Peter Steinberger <steipete@gmail.com> * chore(release): 0.6.1 * fix: prevent infinite loading loop on skills page (#90) * fix: prevent infinite loading loop on skills pageAdd isLoadingMore guard to IntersectionObserver useEffect to preventcontinuous WebSocket queries when user is idle at bottom of page.The observer now won't set up while a request is in progress, breakingthe infinite loop cycle.Fixes: Related to #89 * fix: prevent repeated skills auto-load requests (#90) (thanks @xcqtnr) * fix: resolve PR merge conflicts and keep observer regression test (#90) (thanks @xcqtnr) --------- Co-authored-by: Peter Steinberger <steipete@gmail.com> * fix(cli): secure config file permissions (#164) * fix(cli): secure config file permissions and reduce duplication Security: - Config files now created with 0600 permissions (owner read/write only) - Config directories created with 0700 permissions - Protects API tokens from other users on shared systems Maintainability: - Extract resolveConfigPath() helper to reduce code duplication - Same legacy fallback logic (clawhub -> clawdhub) now in one place * fix(cli): tolerate unsupported chmod errors for config --------- Co-authored-by: Peter Steinberger <steipete@gmail.com> * fix: make /search host-aware in SSR (#257) * fix: make /search mode-aware Notes:\n- Medium: /search now depends on getSiteMode() during beforeLoad. On server-side routing, if VITE_SITE_MODE isn’t set and VITE_SOULHUB_SITE_URL is set (as in .env.local), getSiteMode() will resolve to souls and redirect /search to / even on the ClawdHub deployment. This is a regression risk vs the old always-/skills redirect. Confirm deployment envs guarantee correct mode. src/routes/search.tsx:9-31 * fix: make /search host-aware in SSR * chore: fix lint and route tree for /search route --------- Co-authored-by: Sash Zats <sash@zats.io> * fix(vt): explicit return types and missing undici dependency (#255) * fix(vt): explicit return types and missing undici dependency Refactor action handlers in convex/vt.ts to use explicit return types, resolving circular type inference (TS7022). Also add undici to devDependencies for E2E tests. * fix: add root undici devDependency for e2e (#255) (thanks @tanujbhaud) --------- Co-authored-by: Peter Steinberger <steipete@gmail.com> * Fix initial skill sorting (#92) * fix: initial skill sorting * chore: update unit test * fix: use correct indexes for skill sorting * chore: cleanup * fix: land skill sorting update (#92) (thanks @bpk9) --------- Co-authored-by: Peter Steinberger <steipete@gmail.com> * fix: harden download rate limiting and dedupe (#43) (thanks @regenrek) - add download-specific rate limit tier\n- add per-IP/day dedupe + daily pruning\n- keep moderation gating + deterministic zips\n- add optional forwarded-IP trust via TRUST_FORWARDED_IPS * fix: harden skill listing and rate limiting under load * fix: replace skill report prompt with modal * fix: add skill publish anti-spam caps and quarantine * docs: add git local-branch cleanup fallback * fix: enforce quality gate and trust-tier spam checks * fix: prevent autobanned users from self-reactivating * test: expand reauth ban regression coverage * feat: add empty-skill cleanup backfill with ban nominations * fix: make empty-skill cleanup resumable * feat: add non-suspicious skills filter toggle * style: polish selected states in skills toolbar * feat: default skills sort to downloads * fix: enforce downloads as canonical default skills sort * fix: force canonical downloads sort in skills browse mode * fix: bypass suspicious flags for privileged owners and polish comment delete UI * fix: add privileged-owner suspicious flag reconciler * fix: force auth redirects and registry to canonical clawhub host * feat: auto-generate missing skill summaries * fix: make skill summary backfill resumable * feat: add self-scheduling skill summary backfill job * perf: short-circuit empty skill summary generation * style: polish upload page layout and actions * feat: show popular non-suspicious skills on homepage * fix: normalize legacy skill stats to prevent homepage crash * fix: render homepage popular cards from nested skill entries * style: refine global UI theme, borders, and spacing * fix: resolve search timeout and improve skills page UI alignment (#53) * fix: resolve search timeout and improve skills page UI alignment - Added a 10s timeout to OpenAI embedding requests to prevent hanging searches. - Fixed a TypeScript error in search.ts regarding entry hydration. - Restructured skills page layout and CSS to ensure consistent alignment between the search toolbar and skill cards. * fix: resolve search timeout and improve skills page UI alignment - Added a 10s timeout to OpenAI embedding requests to prevent hanging searches. - Fixed a TypeScript error in search.ts regarding entry hydration. - Restructured skills page layout and CSS to ensure consistent alignment between the search toolbar and skill cards. * style: format skills index layout block --------- Co-authored-by: Peter Steinberger <steipete@gmail.com> * docs: thank @GhadiSaab for #53 * style: shift UI palette to cool blue tones * style: remove remaining warm accent literals * style: darken hero primary CTA in dark mode * fix: show stars in popular skill cards * fix: simplify skills CTA label * fix: dedupe download metrics hourly by user-or-ip identity (#278) * style: restore brown palette and dark-mode CTA tone * fix(comments): stop updating skills.updatedAt on comment add/remove (#55) * fix(comments): stop updating skills.updatedAt on comment add/remove Comments are not content changes, so they shouldn't invalidate skill list queries that depend on updatedAt. This reduces query invalidation when users add or remove comments. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test(comments): add updatedAt invalidation regression coverage --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> * refactor(comments): extract handlers and harden mutation tests * feat: make account deletion irreversible and migrate lint to oxlint * chore: add oxfmt config * fix(cli): throw Error for all timeout aborts (#283) * fix(cli): throw Error on timeout aborts Users have seen an elevated number of:\n clawdhub search image\n ✖ Non-error was thrown: "Timeout". You should only throw errors.\n\nInvestigation shows we were aborting with a string instead of an Error. Switching to controller.abort(new Error('Timeout')) makes retries/formatting treat it as a real error and clears the message.\n\nExample after change:\n clawdhub search image\n table-image v1.0.0 Table Image (0.332)\n nano-banana-pro v1.0.1 Nano Banana Pro (0.319)\n vap-media v1.0.1 AI media generation API - Flux2pro, Veo3.1, Suno Ai (0.281)\n clawdbot-meshyai-skill v0.1.0 Meshy AI (0.276)\n venice-ai-media v1.0.0 Venice AI Media (0.274)\n daily-recap v1.0.2 Daily Recap (0.260)\n openai-image-gen v1.0.1 Openai Image Gen (0.260)\n bible-votd v1.0.1 Bible Verse of the Day (0.248)\n orf v1.0.1 ORF (0.224)\n smalltalk v1.0.1 Smalltalk (0.161) * fix(http): wrap fetch calls in try-finally to prevent timer leaks Addresses Vercel review comment: clearTimeout was not called on error paths when fetch throws an exception. * fix(cli): unify timeout abort handling --------- Co-authored-by: Sash Zats <sash@zats.io> * refactor(cli): centralize HTTP status errors and timeout tests (#286) * fix: keep new skill versions pending until VT verdict * style: remove residual blue accents and warm base palette * fix: add retry logic for OpenAI embedding API failures (#272) * fix: add retry logic for OpenAI embedding API failures Fixes #149 When importing or uploading skills, the OpenAI embedding API call could fail with transient errors (rate limits, timeouts, network issues), causing the entire import to fail with a generic "Server Error". This adds retry logic with exponential backoff (1s, 2s, 4s delays): - Retries on 429 (rate limit) and 5xx server errors - Retries on network/fetch errors - Logs warnings for debugging - Max 3 retries before failing with clear error message Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct retry count and broaden network error catch Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address retry loop off-by-one, broaden error catch, preserve original error Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden embeddings retry semantics * style: format embeddings retry changes --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> * fix: sync handle on user ensure * fix: sync handle on user ensure (#293) (thanks @christianhpoe) * feat: improve moderation/admin UX + language-aware quality gate - API: owner-visible responses for hidden/soft-deleted skills\n- Admin: add unban user mutations + docs\n- Quality: Intl.Segmenter tokenization + CJK signal to reduce false rejects\n- Jobs: skill-stat-events interval 15m -> 5m\n- Tests: add coverage for owner-visible states + non-Latin docs\n- Changelog: add Unreleased entry * refactor: simplify user ensure updates * fix(cors): complete CORS + tokenized CLI reads (#296) * fix(cors): add Access-Control-Allow-Origin headers to API and downloads * fix: add CORS to error/raw paths & add CLI install auth * fix: add OPTIONS handler for CORS preflight * fix(cors): complete CORS + tokenized CLI reads * test(cli): fix config mock typing --------- Co-authored-by: Grenghis-Khan <63885013+Grenghis-Khan@users.noreply.github.com> * refactor: centralize CORS + CLI auth token (#297) * refactor(convex): centralize CORS headers * refactor(cli): centralize auth token lookup * fix(skills): keep global sorting across pagination (#98) * fix: initial skill sorting * chore: update unit test * fix: use correct indexes for skill sorting * chore: cleanup * fix(skills): preserve server order for paginated sorting * chore(lint): apply biome formatting fixes * chore(convex): bump tsconfig lib to ES2022 * fix(skills): add deterministic tie-breaker for search sorting * fix(skills): stable sorting across pagination (#98) (thanks @CodeBBakGoSu) --------- Co-authored-by: Brian Kasper <bkasperr@gmail.com> Co-authored-by: knox-glorang <knox@glorang.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> * chore: drop convex-helpers (#302) * perf: batch tag resolution to reduce action→query round-trips - Add getVersionsByIds batch query to skills.ts and souls.ts - Replace per-item tag resolution with batch resolution in httpApiV1.ts - Reduces N action→query round-trips to 1 for list endpoints Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: add null guard and short-circuit for empty tags - Short-circuit when no version IDs to resolve - Add null coalescing for runQuery response - Fixes potential crash when tags are empty or query returns null Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: batch resolve tags in v1 API (#112) (thanks @mkrokosz) * fix: handle duplicate Convex Auth user records in publish ownership check (#180) * fix: handle duplicate user records in publish ownership check * fix: heal publish ownership via GitHub auth identity --------- Co-authored-by: Emmet Brown <emmet@Emmets-Mac-mini.local> Co-authored-by: Peter Steinberger <steipete@gmail.com> * fix: gate publish by immutable GitHub account ID * refactor: simplify GitHub age gate cache * fix(api): centralize v1 soft-delete error mapping * chore(cli): align http client with main * test(api): cover v1 soft-delete error mapping * test(api): reposition soft-delete mapping test * fix: default to CF-only client IP parsing * docs: changelog credit + v1 delete status codes * fix(cli): clarify logout only affects local config (#166) * fix(cli): clarify logout only affects local config Users may assume 'clawhub logout' revokes their token everywhere. In reality, the token remains valid on the server until explicitly revoked in the web UI. This could be a security concern on shared machines. Update the message to set correct expectations. * fix(cli): clarify logout revocation scope (#166) (thanks @aronchick) * chore: sync changelog for merge (#166) (thanks @aronchick) --------- Co-authored-by: Peter Steinberger <steipete@gmail.com> * feat: anti-squatting protection, backup restore, and ban flow improvements (#298) * feat: anti-squatting protection, backup restore, and ban flow improvements - Add `reservedSlugs` table with 90-day cooldown to prevent slug squatting after skill deletion. Hard-delete finalize phase reserves slugs for the original owner; `insertVersion` blocks non-owners during cooldown. - Change ban flow from hard-delete to soft-delete: `banUserWithActor` now sets `moderationReason: 'user.banned'` and syncs embedding visibility. `unbanUserWithActor` restores all ban-hidden skills and releases slug reservations automatically. - Align `autobanMalwareAuthorInternal` with the same soft-delete + embedding visibility pattern so unban recovery works uniformly. - Add admin `reclaimSlug` / `reclaimSlugInternal` mutations for reclaiming squatted slugs, with audit logging. - Add GitHub backup restore system (`githubRestore.ts`, `githubRestoreMutations.ts`, `githubRestoreHelpers.ts`) that reads from the `clawdbot/skills` backup repo and re-creates skill records. Squatter eviction runs synchronously in the same transaction as restore to avoid async race conditions. - Add `POST /api/v1/users/restore` and `POST /api/v1/users/reclaim` admin HTTP endpoints for bulk operations. - Add `trustedPublisher` flag on users; trusted publishers bypass the `pending.scan` auto-hide for new skill publishes. - Add `setTrustedPublisher` / `setTrustedPublisherInternal` admin mutations. Addresses: slug squatting prevention, skill backup/restore, ban recovery, and trusted publisher workflow improvements. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: harden restore/reclaim + ban flow (#298) (thanks @autogame-17) --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> * refactor: post-#298 cleanup (#313) * refactor: consolidate slug + embedding helpers * refactor: batch ban/unban skill updates * refactor: report batched ban/unban scheduling * fix: unblock package typecheck * refactor: split httpApiV1 + consolidate moderation batches (#315) * refactor: dedupe v1 file response + unify embedding patches (#316) * Devin/1771112524 skill metadata update (#312) * fix: sync GitHub profile on login to handle username renames (#303) When a user renames their GitHub account, the stored username becomes stale and causes 'GitHub account lookup failed' errors during skill publishing. This fix: - Adds syncGitHubProfile function that fetches current profile using the immutable GitHub numeric ID - Adds syncGitHubProfileInternal mutation to update user's name, handle, displayName, and image when they change - Schedules the sync as a background action on every login via afterUserCreatedOrUpdated callback The sync is best-effort (silently fails if GitHub API unavailable) since it's not on the critical path. It only updates fields if the username has actually changed. Fixes #303 Co-Authored-By: Ian Alloway <adapter_burners.1y@icloud.com> * fix: allow updating skill summary/description on subsequent publishes (#301) Previously, the skill summary was only extracted from metadata.description in the SKILL.md frontmatter. This change also checks for a direct 'description' field in the frontmatter, ensuring that users can update their skill description by modifying either location. The fix prioritizes the new description from the current publish over the existing skill summary, allowing updates to be reflected correctly. Fixes #301 Co-Authored-By: Ian Alloway <adapter_burners.1y@icloud.com> * fix: throttle GitHub profile sync * feat: show skill owner avatars * fix: avoid nested owner links * refactor: centralize profile sync + owner lookup * docs: changelog for #312 (thanks @ianalloway) --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> * style: polish markdown code blocks * feat: show skill owner avatars on home + lists * feat: sync GitHub profile name * feat: improve skill card meta layout * fix: make ghost buttons look like buttons * fix: match skill hero cta widths * fix: prefer $HOME over os.homedir() for path resolution (#299) * fix: prefer $HOME over os.homedir() for path resolution os.homedir() reads from /etc/passwd which can return a stale path after a Linux user rename (usermod -l). Prefer the $HOME environment variable which reflects the current session. Closes #82 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: normalize resolveHome output --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> * UI: allow copying security scan summary text (#322) * fix(ui): prevent analysis toggle when selecting summary (#324) * feat: add uninstall command for skills (#241) * feat: add uninstall command for skills Implements `clawhub uninstall <slug>` to properly remove installed skills. Changes: - Added cmdUninstall function in skills.ts - Validates skill is installed before removal - Removes skill directory and lockfile entry - Supports --yes flag to skip confirmation prompt - Added comprehensive test coverage Closes #221 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: require --yes in non-interactive mode and update lockfile before rm Address review feedback: - Fail with "Pass --yes (no input)" when running non-interactively without --yes flag, matching delete/star/unstar/moderation commands - Update lockfile before removing directory to avoid inconsistent state if rm succeeds but writeLockfile fails Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden skill uninstall flow (#241) (thanks @superlowburn) * docs: document uninstall CLI command (#241) (thanks @superlowburn) * test: fix cmdUninstall mock typing (#241) (thanks @superlowburn) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Peter Steinberger <steipete@gmail.com> * feat: add skill file viewer * fix: prevent file viewer state updates after unmount * fix: lazy-load skill file viewer (#44) (thanks @regenrek) --------- Co-authored-by: Sergiy Dybskiy <s@serg.tech> Co-authored-by: Peter Steinberger <steipete@gmail.com> Co-authored-by: theonejvo <theonejvo@users.noreply.github.com> Co-authored-by: Vignesh <vigneshnatarajan92@gmail.com> Co-authored-by: Steve <superlowburn@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: DColl <david.coll.78@gmail.com> Co-authored-by: Tanuj Bhaud <tanujbhaud@gmail.com> Co-authored-by: Limitless <127183162+Limitless2023@users.noreply.github.com> Co-authored-by: Limitless2023 <limitless@users.noreply.github.com> Co-authored-by: Gaurav Sharma <sharmag@microsoft.com> Co-authored-by: xcqtnr <xcqtnr0.0@gmail.com> Co-authored-by: David Aronchick <aronchick@gmail.com> Co-authored-by: Sash Zats <sash@zats.io> Co-authored-by: Tanuj Bhaud <128238320+tanujbhaud@users.noreply.github.com> Co-authored-by: Brian Kasper <brian@bkasper.com> Co-authored-by: ghadi saab <ghadisaab21@gmail.com> Co-authored-by: sethconvex <seth@convex.dev> Co-authored-by: ChristianHPoe <chpoensgen@me.com> Co-authored-by: Grenghis-Khan <63885013+Grenghis-Khan@users.noreply.github.com> Co-authored-by: CodeBBakGoSu <127713112+CodeBBakGoSu@users.noreply.github.com> Co-authored-by: Brian Kasper <bkasperr@gmail.com> Co-authored-by: knox-glorang <knox@glorang.com> Co-authored-by: Matthew Krokosz <mattkrokosz@gmail.com> Co-authored-by: emmet-bot <emmet@universaleverything.io> Co-authored-by: Emmet Brown <emmet@Emmets-Mac-mini.local> Co-authored-by: autogame-17 <166480271+autogame-17@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Ian Alloway <adapter_burners.1y@icloud.com> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: CleanApp <165804662+borisolver@users.noreply.github.com>
Fixes #149
Problem
Users encountered "Server Error" when importing or uploading skills. The root cause was transient failures in the OpenAI embedding API call (rate limits, timeouts, network issues) causing the entire import to fail with an unhelpful error message.
Solution
Added retry logic with exponential backoff to
generateEmbedding():Testing
Notes
This is a defensive fix for a transient issue. The comment from @swairshah on Feb 6 ("This seems fixed now") suggests the underlying API issue may have resolved, but this adds resilience to prevent future occurrences.
Greptile Overview
Greptile Summary
This PR adds retry logic with exponential backoff to
convex/lib/embeddings.ts’sgenerateEmbedding()to make skill import/upload more resilient to transient OpenAI embeddings API failures.The change wraps the embeddings
fetch()in a retry loop, retrying on 429/5xx responses and certain thrown errors, logging warnings and ultimately surfacing a clearerEmbedding failed: ...error that upstream publish flows already map into user-friendly messages.Confidence Score: 3/5
Last reviewed commit: 87cb005
Context used:
dashboard- AGENTS.md (source)