Skip to content

Few Typo fixes #1539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion simclusters-ann/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ SimClusters from the Linear Algebra Perspective discussed the difference between
However, calculating the cosine similarity between two Tweets is pretty expensive in Tweet candidate generation. In TWISTLY, we scan at most 15,000 (6 source tweets * 25 clusters * 100 tweets per clusters) tweet candidates for every Home Timeline request. The traditional algorithm needs to make API calls to fetch 15,000 tweet SimCluster embeddings. Consider that we need to process over 6,000 RPS, it’s hard to support by the existing infrastructure.


## SimClusters Approximate Cosine Similariy Core Algorithm
## SimClusters Approximate Cosine Similarity Core Algorithm

1. Provide a source SimCluster Embedding *SV*, *SV = [(SC1, Score), (SC2, Score), (SC3, Score) …]*

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -306,7 +306,7 @@ struct ThriftFacetRankingOptions {
// penalty for keyword stuffing
60: optional i32 multipleHashtagsOrTrendsPenalty

// Langauge related boosts, similar to those in relevance ranking options. By default they are
// Language related boosts, similar to those in relevance ranking options. By default they are
// all 1.0 (no-boost).
// When the user language is english, facet language is not
11: optional double langEnglishUIBoost = 1.0
Expand Down
10 changes: 5 additions & 5 deletions src/thrift/com/twitter/search/earlybird/thrift/earlybird.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -728,7 +728,7 @@ struct ThriftSearchResultMetadata {
29: optional double parusScore

// Extra feature data, all new feature fields you want to return from Earlybird should go into
// this one, the outer one is always reaching its limit of the nubmer of fields JVM can
// this one, the outer one is always reaching its limit of the number of fields JVM can
// comfortably support!!
86: optional ThriftSearchResultExtraMetadata extraMetadata

Expand Down Expand Up @@ -831,7 +831,7 @@ struct ThriftSearchResult {
12: optional list<hits.ThriftHits> cardTitleHitHighlights
13: optional list<hits.ThriftHits> cardDescriptionHitHighlights

// Expansion types, if expandResult == False, the expasions set should be ignored.
// Expansion types, if expandResult == False, the expansions set should be ignored.
8: optional bool expandResult = 0
9: optional set<expansions.ThriftTweetExpansionType> expansions

Expand Down Expand Up @@ -971,7 +971,7 @@ struct ThriftTermStatisticsResults {
// The binIds will correspond to the times of the hits matching the driving search query for this
// term statistics request.
// If there were no hits matching the search query, numBins binIds will be returned, but the
// values of the binIds will not meaninfully correspond to anything related to the query, and
// values of the binIds will not meaningfully correspond to anything related to the query, and
// should not be used. Such cases can be identified by ThriftSearchResults.numHitsProcessed being
// set to 0 in the response, and the response not being early terminated.
3: optional list<i32> binIds
Expand Down Expand Up @@ -1097,8 +1097,8 @@ struct ThriftSearchResults {
// Superroots' schema merge/choose logic when returning results to clients:
// . pick the schema based on the order of: realtime > protected > archive
// . because of the above ordering, it is possible that archive earlybird schema with a new flush
// verion (with new bit features) might be lost to older realtime earlybird schema; this is
// considered to to be rare and accetable because one realtime earlybird deploy would fix it
// version (with new bit features) might be lost to older realtime earlybird schema; this is
// considered to to be rare and acceptable because one realtime earlybird deploy would fix it
21: optional features.ThriftSearchFeatureSchema featureSchema

// How long it took to score the results in earlybird (in nanoseconds). The number of results
Expand Down
4 changes: 2 additions & 2 deletions src/thrift/com/twitter/simclusters_v2/abuse.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ struct AdhocSingleSideClusterScores {
* we implement will use search abuse reports and impressions. We can build stores for new values
* in the future.
*
* The consumer creates the interactions which the author recieves. For instance, the consumer
* creates an abuse report for an author. The consumer scores are related to the interation creation
* The consumer creates the interactions which the author receives. For instance, the consumer
* creates an abuse report for an author. The consumer scores are related to the interaction creation
* behavior of the consumer. The author scores are related to the whether the author receives these
* interactions.
*
Expand Down
2 changes: 1 addition & 1 deletion src/thrift/com/twitter/simclusters_v2/embedding.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ struct TweetTopKTweetsWithScore {
/**
* The generic SimClustersEmbedding for online long-term storage and real-time calculation.
* Use SimClustersEmbeddingId as the only identifier.
* Warning: Doesn't include modelversion and embedding type in the value struct.
* Warning: Doesn't include model version and embedding type in the value struct.
**/
struct SimClustersEmbedding {
1: required list<SimClusterWithScore> embedding
Expand Down
2 changes: 1 addition & 1 deletion src/thrift/com/twitter/simclusters_v2/evaluation.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ struct CandidateTweets {
}(hasPersonalData = 'true')

/**
* An encapuslated collection of reference tweets
* An encapsulated collection of reference tweets
**/
struct ReferenceTweets {
1: required i64 targetUserId(personalDataType = 'UserId')
Expand Down
8 changes: 4 additions & 4 deletions src/thrift/com/twitter/simclusters_v2/identifier.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,12 @@ enum EmbeddingType {
Pop10000RankDecay11Tweet = 31,
OonPop1000RankDecayTweet = 32,

// [Experimental] Offline generated produciton-like LogFavScore-based Tweet Embedding
// [Experimental] Offline generated production-like LogFavScore-based Tweet Embedding
OfflineGeneratedLogFavBasedTweet = 40,

// Reserve 51-59 for Ads Embedding
LogFavBasedAdsTweet = 51, // Experimenal embedding for ads tweet candidate
LogFavClickBasedAdsTweet = 52, // Experimenal embedding for ads tweet candidate
LogFavBasedAdsTweet = 51, // Experimental embedding for ads tweet candidate
LogFavClickBasedAdsTweet = 52, // Experimental embedding for ads tweet candidate

// Reserve 60-69 for Evergreen content
LogFavBasedEvergreenTweet = 60,
Expand Down Expand Up @@ -104,7 +104,7 @@ enum EmbeddingType {
//Reserved 401 - 500 for Space embedding
FavBasedApeSpace = 401 // DEPRECATED
LogFavBasedListenerSpace = 402 // DEPRECATED
LogFavBasedAPESpeakerSpace = 403 // DEPRCATED
LogFavBasedAPESpeakerSpace = 403 // DEPRECATED
LogFavBasedUserInterestedInListenerSpace = 404 // DEPRECATED

// Experimental, internal-only IDs
Expand Down