Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firebase hangs indefinitely when getting token in bad network #8490

Closed
amrdraz opened this issue Sep 9, 2024 · 12 comments
Closed

Firebase hangs indefinitely when getting token in bad network #8490

amrdraz opened this issue Sep 9, 2024 · 12 comments

Comments

@amrdraz
Copy link

amrdraz commented Sep 9, 2024

Operating System

Across all OS in this order Android, IOS, Window, OSX, LInux etc..

Environment (if applicable)

Web, PWA and Webview inside react native app

Firebase SDK Version

10.7.2

Firebase SDK Product(s)

Auth

Project Tooling

I have a NextJS app served and built on vercel
I also have a react native app built on Expo.dev, but the login flow happen inside a webview right now

Detailed Problem Description

I had many reports of users not being able to login (I use custom token login) or waiting forever when re visiting the site (when app is refreshing token) most users aren't willing to wait for the full 30 seconds on desktop and the larger 60 seconds on mobile firebase takes to send a network-request error (as seen in code).

So I added an SLA of 10 seconds as I observed that in general when login is working fine it takes less then 2 seconds ( plan on adding a general timer to better check this variability)

The result looks something like this

Screenshot 2024-09-09 at 7 14 00 PM

Which when comparing successful requests on the same first endpoint called, we get a failure rate of 12.5%, it's important to note that in general the internet is working ok, the page did load after all, the JS got downloaded, and other API calls (like the logs ,tracking events and api calls to my server to get the admin token) are working

Screenshot 2024-09-09 at 7 17 25 PM

The issue is with firebase taking too long sometimes to get refresh the token (specifically more then 10 seconds)

Since therumble app has a smaller user base then our main app (thndr app) where we didn't hear many reports of this issue

I figured to also test how often this issue happens (it's only been a day so the code push is still rolling out) but it seems to happen to 0.6% of our sessions that got the code push

Screenshot 2024-09-09 at 7 25 23 PM

With this result I'm tempted to try and move auth on the rumble app to the react native side instead of the web view.
But it doesn't make much sense how bad the web experience would be compared to the app.

I thought maybe it's Egypt we have bad internet, but since thndr.app has a more global user base the issue seems to also occure outside though at a much lesser proportion (which can be attributed to our user base being primarily egyptian)

Screenshot 2024-09-09 at 7 30 09 PM

I'm kind of stumped on how to proceed with therumble.app, I didn't plan on building my own auth 😃
I can try to move to react native, but the issue still happens on desktop (though to a lesser degree of around 7%)

Screenshot 2024-09-09 at 7 32 45 PM

This issue mainly affects refreshing token

The get custom token has fail ratio of ~0.35% (163/(163+4370))
Screenshot 2024-09-09 at 7 38 45 PM

Which is why our users while uncomfortable that they need to login again at least can on average login again... though 0.3% of them get stuck

I can't reproduce the issue in development but as demonstrated it happens daily in production
the data seems to span devices and ip addresses, and even countries

The disproportionate difference in Egypt is likely due to our user base more then it being an Egypt thing and I will say that in general I don't personally experience the issue a lot and many users seem to never experience it, while others seem to get it all the time (likely due to network conditions) when I ran on datadog with the thndr app I also noted the issue spans carriers

Screenshot 2024-09-09 at 7 53 09 PM

Any idea how to proceed

This issue has been ongoing for months but I recently decided to take the time and try to resolve it as we plan on pushing therumble app more

thndr app has been ignoring the issue it seems generally users just logout and login again when stuck

I can't just say it's bad network, because other apis work fine it's google

so I hope you have any ideas

Steps and code to reproduce issue

My get token code is


export async function getIdTokenInSLA(forceRefresh?: boolean) {
  return resolveOrFailIn(getIdToken(forceRefresh), {
    time: 15_000, // recently decided to increase the SLA to see if it will make a difference though based on thndr it won't
    code: "FIREBASE_GET_TOKEN_TIMEOUT",
  });
}
export async function getIdToken(forceRefresh?: boolean) {
  if (process.env.NEXT_PUBLIC_USE_FAKE_FIREBASE_TOKEN) {
    return process.env.NEXT_PUBLIC_USE_FAKE_FIREBASE_TOKEN;
  }
  const user = getAuth(getInternalApp()).currentUser;
  if (user) {
    try {
      return user.getIdToken(forceRefresh);
    } catch (error) {
      throw new AlgebraicEffect("FIREBASE_GET_TOKEN_FAILED", { error });
    }
  } else {
    return waitForIdToken(forceRefresh);
  }
}

export function waitForIdToken(forceRefresh?: boolean): Promise<string> {
  return new Promise((resolve, reject) => {
    const unsubscribe = onAuthStateChanged(
      getAuth(getInternalApp()),
      (user) => {
        unsubscribe();

        if (user) {
          user
            .getIdToken(forceRefresh)
            .then((idToken) => {
              resolve(idToken);
            })
            .catch((error) => {
              let code = "FIREBASE_GET_TOKEN_FAILED";
              if (error.code === "auth/network-request-failed") {
                code = "FIREBASE_NETWORK_FAILED";
              }
              reject(new AlgebraicEffect(code, { error }));
            });
        } else {
          // user is signed out
          reject(new AlgebraicEffect("FIREBASE_GET_TOKEN_FAILED_NO_USER"));
        }
      },
      (error) => {
        reject(
          new AlgebraicEffect("FIREBASE_AUTH_STATE_CHANGE_ERROR", { error }),
        );
      },
    );
  });
}

My get custom token code is


export async function getNewUserToken(verifiedToken: string) {
  try {
    const userCred = await resolveOrFailIn(
      signInWithCustomToken(getAuth(getInternalApp()), verifiedToken),
      { time: 15_000, code: "FAILED_FIREBASE_CUSTOM_LOGIN" },
    );
    clientLogger.debug("signed user in with custom token", {
      fn: "signOutAndGetNewUserToken",
    });
    const token = await resolveOrFailIn(userCred.user.getIdToken(), {
      time: 15_000,
      code: "FAILED_FIREBASE_GET_TOKEN_AFTER_LOGIN",
    });
    clientLogger.debug("got token", { fn: "signOutAndGetNewUserToken" });
    return {
      user: userCred.user,
      token,
    };
  } catch (error: any) {
    let message = error.message;
    if (error.code === "auth/network-request-failed") {
      error = new AlgebraicEffect("FIREBASE_NETWORK_FAILED", { error });
    }
    clientLogger.error("getNewUserToken Failed", {
      code: error.code,
      message,
      extras: [{ error }],
    });
    clientLogger.flush();
    throw error;
  }
}

I have an axios interceptor on my API calls like so


export async function rumbleRequestAuthInterceptor(
  config: InternalAxiosRequestConfig,
) {
  try {
    let params = new URL(location.href).searchParams;
    let firebase_token = params.get("firebase_token");
    if (!firebase_token) {
      firebase_token = await getIdTokenInSLA();  // <=============== [Here i get token or let you refresh]
    }
    config.headers.Authorization = `Bearer ${firebase_token}`;
    if (useAuthStore.getState().firebase_token !== firebase_token) {
      useAuthStore.getState().decodeToken({ firebase_token });
    }
  } catch (error: any) {
    let log: "error" | "warn" = "error"
    if (error.code === "FIREBASE_GET_TOKEN_FAILED_NO_USER") {
      log = "warn"
    }

    clientLogger[log]("Failed to get auth token", {
      config: {
        baseURL: config.baseURL,
        env: config.env,
        url: config.url,
        responseType: config.responseType,
        method: config.method,
      },
      code: error.code,
      message: error.message,
      extras: [{ error }, { ...config }],
    });
    redirectToLogin({ redirect_reason: error.code ?? error.message });
    throw error;
  }
  return config;
}
export async function rumbleResponseErrorAuthInterceptor(
  error: AxiosError<{ error: string }>,
) {
  if (error.response?.status === 401) {
    redirectToLogin({ redirect_reason: error.response.data.error });
  } else if (error.response?.status === 403) {
    redirectToAccountPage({ redirect_reason: error.response.data.error });
  }
  clientLogger.flush();
  return Promise.reject(error);
}

function redirectToLogin({ redirect_reason }: { redirect_reason: string }) {
  clientLogger.info("redirected to login page", {
    redirect_reason,
  });
  useAuthStore.getState().logout();
  const searchParams = new URL(window.location.href).searchParams;
  window.location.href = `/auth/login?${new URLSearchParams(
    removeNullUndefined({
      redirect: window.location.pathname,
      redirect_reason,
      source: searchParams.get("source"),
    }),
  )}`;
}

When I ran my test on the larger thndr app since we didn't check the SLA before I just had the app log when it took longer then 10, 15, and 20 seconds to get the token.


export async function getFirebaseToken() {
  const clearTimers = startSLATimers(GET_FIRST_TOKEN_SLAs);
  const user = getFirebaseUser();
  if (user) {
    const token = await user.getIdToken();
    clearTimers();
    return token;
  }
  return new Promise<string | null>((res, rej) => {
    try {
      const unsubscribe = FirebaseAuth().onAuthStateChanged(usr => {
        unsubscribe();
        try {
          const token = usr?.getIdToken() || null;
          clearTimers();
          res(token);
        } catch (error: any) {
          clearTimers();
          rej(error);
          logger.recordError(error);
        }
      });
    } catch (error) {
      clearTimers();
      rej(error);
      logger.recordError(error);
    }
  });
}

const GET_FIRST_TOKEN_SLAs = [
  10_000, // 10 seconds
  15_000, // 15 seconds,
  20_000, // 20 seconds
];

function startSLATimers(times: number[]) {
  const timers = times.map(sla_time => {
    return setTimeout(() => {
      logger.warn('FIREBASE_GET_TOKEN_BROKE_SLA', { sla_time });
    }, sla_time);
  });

  return () => {
    timers.forEach(timer => clearTimeout(timer));
  };
}

the rumble app uses axiom for logs
thndr app uses datadog for logs

@amrdraz amrdraz added new A new issue that hasn't be categoirzed as question, bug or feature request question labels Sep 9, 2024
@google-oss-bot
Copy link
Contributor

I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.

@jbalidiong jbalidiong added api: auth needs-attention and removed needs-triage new A new issue that hasn't be categoirzed as question, bug or feature request labels Sep 9, 2024
@AmilKey
Copy link

AmilKey commented Sep 13, 2024

I get this error in my App "FirebaseError: Missing or insufficient permissions." after two hours of using the app. It seems that the token is not updated automatically. The error is fixed after refreshing the page or forcibly requesting a new token.
firebase 10.13.1

look like related to #7590

@dlarocque
Copy link
Contributor

I get this error in my App "FirebaseError: Missing or insufficient permissions." after two hours of using the app. It seems that the token is not updated automatically. The error is fixed after refreshing the page or forcibly requesting a new token. firebase 10.13.1

look like related to #7590

Hi @AmilKey, is your issue related to the original posted issue? If not, could you please open a new one with more information?

@DellaBitta
Copy link
Contributor

DellaBitta commented Sep 20, 2024

Hi @mrdraz,

so I hope you have any ideas

My first suggestion is to restructure the way the code tracks if a user is signed in or not. Because the Auth SDK goes through an asynchronous initialization flow, it could be the case that the value for the current user is null shortly after Auth is initialized but then is set to a valid user object shortly afterwards.

Therefore I highly recommend adding an onAuthStateChanged listener and using it to track User status, as documented here.

Additionally you can await auth.authStateReady() just after initializing Auth.

You should then write code within the onAuthStateChanged callback that stores the state of the User, and reuse that state around your code instead of calling getAuth().currentUser everywhere.

That might help with some of the flow in sign in.

Beyond that I recommend narrowing down which Auth operation is failing within your code. If you can create an SLA call around specific auth token retrieval operations then that would be a helpful way to narrow down the issue that you're encountering.

@google-oss-bot
Copy link
Contributor

Hey @amrdraz. We need more information to resolve this issue but there hasn't been an update in 5 weekdays. I'm marking the issue as stale and if there are no new updates in the next 5 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

@amrdraz
Copy link
Author

amrdraz commented Sep 27, 2024

Hey there
What more information do you need me to provide?

@amrdraz
Copy link
Author

amrdraz commented Sep 27, 2024

Hi @mrdraz,

so I hope you have any ideas

My first suggestion is to restructure the way the code tracks if a user is signed in or not. Because the Auth SDK goes through an asynchronous initialization flow, it could be the case that the value for the current user is null shortly after Auth is initialized but then is set to a valid user object shortly afterwards.

Therefore I highly recommend adding an onAuthStateChanged listener and using it to track User status, as documented here.

If you notice the code does wait and I setup a promise wrapping the subscription (as in the documentation) so I initially try to get the current user if they're present and if they're not then I listen on auth state change until it fires

If the user doesn't exist I throw my effect that the user doesn't exist (that's not my error)

The issue is that the user does exist but that I listen indefinitely

The issue seems more pronounced in the current user getIdToken and if not then onAuthChange get user then getIdToken

I see how that SLA (which happens 12.5% ) has a wider surface area

But it still happens on 0.3% of specifically signInWithCustomToken where I get a time out

Additionally you can await auth.authStateReady() just after initializing Auth.

You should then write code within the onAuthStateChanged callback that stores the state of the User, and reuse that state around your code instead of calling getAuth().currentUser everywhere.

That might help with some of the flow in sign in.

I can rewrite the code with readyState method to check state then continue to use onAuthStateChanged since I didn't log on it but I'm skeptical this is where the issue is (after all most of the time users signup fine) but who knows maybe there is something there and I wasn't aware if this method so I'll give it a try

Beyond that I recommend narrowing down which Auth operation is failing within your code. If you can create an SLA call around specific auth token retrieval operations then that would be a helpful way to narrow down the issue that you're encountering.

The auth operation is getIdToken
Or getIdToken after I subscribe to onAuthState change

I can't narrow that a bit more for clarity

I know for a fact as I mentioned that signInWithCustomToken which is as narrow as I can get doesn't meet the SLA 0.3% of the time

@DellaBitta
Copy link
Contributor

Hi @amrdraz,

Unfortunately I'm not sure how to move forward on our end with this issue. We would need a reproducible test case for us to debug locally. Is there any way you can isolate the problem in a simple test application, reproduce it, and send it to us?

@amrdraz
Copy link
Author

amrdraz commented Oct 10, 2024

I'll see if I can do something about that
my next attempt was going to observe how auth.authStateReady() affects production

Then try using firebase login on the mobile app 🤷

it is a very strange situation as I mentioned since it happens under some bad network conditions
but at the same time I had users report it happening more frequently then others (maybe they live in bad network areas)

all in all it hasn't been develitating, we just keep getting requests by some users that we keep them logged in and not log them out everytme 😅

@DellaBitta
Copy link
Contributor

Ok, thanks. Keep us posted!

@google-oss-bot
Copy link
Contributor

Hey @amrdraz. We need more information to resolve this issue but there hasn't been an update in 5 weekdays. I'm marking the issue as stale and if there are no new updates in the next 5 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

@google-oss-bot
Copy link
Contributor

Since there haven't been any recent updates here, I am going to close this issue.

@amrdraz if you're still experiencing this problem and want to continue the discussion just leave a comment here and we are happy to re-open this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants