Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oic.utils.SessionDB is not conducive to a shared DB with multiple SessionDB instances #146

Closed
gobengo opened this issue Nov 23, 2015 · 12 comments
Assignees
Milestone

Comments

@gobengo
Copy link
Contributor

gobengo commented Nov 23, 2015

It is common in my world to deploy multiple processes of an application server (like an oidc server) all sharing a database cluster. In this way, if one process or host crashes, the other instances can continue to serve requests.

I use oic.utils.sdb with a db object that reads/writes to a Redis instance on the network. I then run 2+ servers, each with a SessionDB instance that can read/write from that Redis.

However, I noticed that SessionDB().gets_sids_for_sub() does not return the same values on all hosts. It appears this is because this and a few other SessionDB methods make use of an in-memory .uid2sid cache that is only updated by writes to that one SessionDB instance. Writes to the underlying .db don't update that cache.

Here is a test suite that fails, illustrating that:

import unittest

from oic.oic.message import AuthorizationRequest
from oic.utils.sdb import AuthnEvent

from oic.utils.sdb import SessionDB


def create_auth_request():
    areq = AuthorizationRequest(response_type="code", client_id="client1",
                                redirect_uri="http://example.com/authz",
                                scope=["openid"], state="state000")
    return areq


def setup_session(session_db, uid):
    aevent = AuthnEvent(uid, 'salt')
    areq = create_auth_request()
    sid = session_db.create_authz_session(aevent=aevent,
                                          areq=areq)
    session_db.do_sub(sid, 'client_salt')
    return sid


class TestSessionDB(unittest.TestCase):

    def setUp(self):
        self.db = dict()
        self.sdb = SessionDB('https://1.com', db=self.db)

    def test_exists(self):
        self.assertTrue(bool(self.sdb))

    def test_get_sids_from_uid(self):
        # This passes
        uid = 'uid1'
        sid = setup_session(self.sdb, uid)
        # Now try to look it up
        sids_by_sub = self.sdb.get_sids_from_uid(uid)
        self.assertEqual(len(sids_by_sub), 1)
        self.assertIn(sid, sids_by_sub)

    def test_distributed_get_sids_from_uid(self):
        # This fails
        # Like the former test, but with multiple sdb instances
        # Sharing one db
        uid = 'uiduid'
        db = self.db
        # First
        sdb1 = SessionDB('https://1.com', db=db)
        sid1 = setup_session(sdb1, uid)
        # Second
        sdb2 = SessionDB('https://2.com', db=db)
        sid2 = setup_session(sdb2, uid)
        # Find both
        # From sdb1
        sdb1sids = sdb1.get_sids_from_uid(uid)
        sdb2sids = sdb2.get_sids_from_uid(uid)
        self.assertEqual(set(sdb1sids), set(sdb2sids))


if __name__ == "__main__":
    unittest.main()

I believe this can be fixed without too much effort, via

I'll be implementing a Subclass of SessionDB that tries to implement it, and will share my progress. But I wanted to share my intentions early in case others have solved this problem or thought of a better way.

@gobengo
Copy link
Contributor Author

gobengo commented Nov 24, 2015

An initial implementation is something like

"""
A better version of oic.utils.sdb:SessionDB.

The real one is not designed to afford for sessions shared between multiple
processes/hosts, because it uses an in-memory cache of .uid2sid to find sessions
for a given user.
"""

from oic.utils.sdb import SessionDB as OICSessionDB

from collections import Mapping
import itertools
import logging

LOG = logging.getLogger(__name__)


def uid_from_session(session):
    return session['authn_event'].uid


class SessionStorageUid2Sid(Mapping):
    """
    Implement object required by oic.oic.provider:Provider.uid2sid,
    except it's not just a dict. It's a Mapping that reads from a Mapping
    of Session IDs (sids) to Sessions (as created by
    oic.oic.provider:Provider.create_authz_session)
    https://github.com/rohe/pyoidc/issues/146
    """

    def __init__(self, sessions_by_id, uid_from_session=uid_from_session):
        self._sessions_by_id = sessions_by_id
        self._uid_from_session = uid_from_session

    def __getitem__(self, uid):
        session_items_for_uid = itertools.ifilter(
            lambda i: uid == self._uid_from_session(i[1]),
            self._sessions_by_id.items())
        # Would be neat if oic.utils.sdb explicitly cast to list() bcuz lazy
        return list(sid for (sid, session) in session_items_for_uid)

    def __iter__(self):
        sessions_sorted_by_uuid = sorted(self._sessions_by_id.values(),
                                         key=self._uid_from_session)
        for uid, user_sessions in itertools.groupby(sessions_sorted_by_uuid,
                                                    self._uid_from_session):
            yield uid

    def __len__(self):
        return len(iter(self))


def create_uid2sid_from_db(sessions_by_id):
    """
    :param db: `db` like that passed to SessionDB constructor
    :returns: Object that reads from db to provide a usable uid2sid map,
      which currently means:
      try:
          self.uid2sid[uid].append(sid)
      except KeyError:
          self.uid2sid[uid] = [sid]
      self.uid2sid.items()
      self.uid2sid.values()
      self.uid2sid[sid] = sid
    """
    LOG.debug('create_uid2sid_from_db %s', sessions_by_id)
    return SessionStorageUid2Sid(sessions_by_id)


class SessionDB(OICSessionDB):

    def __init__(self, *args, **kwargs):
        """
        :param uid2sid: Mapping of UID to session IDs
        """
        uid2sid = kwargs.pop('uid2sid', None)
        super(SessionDB, self).__init__(*args, **kwargs)
        # https://github.com/rohe/pyoidc/pull/147
        kwarg_db = kwargs.get('db')
        if kwarg_db is not None:
            self._db = kwarg_db
        self.uid2sid = uid2sid or create_uid2sid_from_db(self._db)

    def __delitem__(self, sid):
        # Exactly like super(), except don't re-assing .uid2sid
        del self._db[sid]

    def do_sub(self, sid, client_salt, sector_id="", subject_type="public"):
        """
        Construct a sub (subject identifier) and add it to the sdb[sid]

        This is exactly like super(), except it doesn't try to write
        to .uid2sid.
        (see https://github.com/Livefyre/lfoidc/pull/62/files#r43441487)

        :param sid: Session identifier
        :param sector_id: Possible sector identifier
        :param subject_type: 'public'/'pairwise'
        :param client_salt: client specific salt - used in pairwise
        :return:
        """
        uid = self._db[sid]["authn_event"].uid
        user_salt = self._db[sid]["authn_event"].salt

        if subject_type == "public":
            sub = hashlib.sha256("{}{}".format(uid, user_salt).encode("utf-8")).hexdigest()
        else:
            sub = pairwise_id(uid, sector_id, "{}{}".format(client_salt, user_salt))

        logger.debug("uid2sid: %s" % self.uid2sid)
        self.update(sid, 'sub', sub)

        return sub

I don't mind subclassing that much, except for having to override all of do_sub just to prevent the self.uid2sid[uid].append(sid) mutation. += [sid] would give the uid2sid object a chance to interpret as __setitem__ (I have proposed this as #148)

@rohe
Copy link
Contributor

rohe commented Feb 11, 2016

24 nov 2015 kl. 02:22 skrev Benjamin Goering notifications@github.com:

An initial implementation is something like

"""
A better version of oic.utils.sdb:SessionDB.

The real one is not designed to afford for sessions shared between multiple
processes/hosts, because it uses an in-memory cache of .uid2sid to find sessions
for a given user.
”""

The code looks ok to me.
Could you provide a couple of tests that covers the new code ?

— Roland

”Everybody should be quiet near a little stream and listen.”
From ’Open House for Butterflies’ by Ruth Krauss

@bjmc
Copy link
Contributor

bjmc commented May 10, 2016

👍 Has this been merged yet? Or is subclassing still needed currently?

@bjmc
Copy link
Contributor

bjmc commented Jun 9, 2016

@gobengo Did you want to add tests for this and see if we can get it merged?

@bjmc
Copy link
Contributor

bjmc commented Feb 1, 2017

Has there been any progress on this issue?

@gobengo
Copy link
Contributor Author

gobengo commented Feb 1, 2017

No progress.

I no longer work for the same organization using pyoidc, so don't have access to the tests I wrote back then. :/

Maybe best to close for now until someone else can spend time with it.

@rohe
Copy link
Contributor

rohe commented Feb 4, 2017

OK, I'll close it.

@rohe rohe closed this as completed Feb 4, 2017
@bjmc
Copy link
Contributor

bjmc commented Feb 4, 2017

Can I ask that we keep this open? If the only thing lacking are tests, I'll write some myself. Being able to run multiple instances is critical for production use in a high availability system.

@rohe
Copy link
Contributor

rohe commented Feb 4, 2017

Sure

@tpazderka
Copy link
Collaborator

OK, I had a look, and the easiest solution seems to be storing a reverse mappings ("uid" and "sub") in the same db object.

Other solution would be to create an ABC class that would force users to implement such reverse lookups based on their database implementation.

@schlenk I sort of like the second solution better, what about you?

@schlenk
Copy link
Collaborator

schlenk commented May 4, 2019

I think you are right, that an ABC would be nicer. Especially as the mapping between uid<-> sub is kind of temporary, while the client registration is more permanent. So it would not be total strange to have different storage strategies there.

@tpazderka
Copy link
Collaborator

Closed via #660

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants