Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove usage of global attributes #75

Closed
wants to merge 2 commits into from

Conversation

nineinchnick
Copy link
Collaborator

@nineinchnick nineinchnick commented Feb 26, 2020

Using global attributes when building models makes it really hard to share assets between different models. Multiple models have to be stored in separate sources, cannot be processed together and unused assets are still drawn in diagrams.

This MR removes all usages of global/static attributes and requires to explicitly pass elements to a model. Note that only Dataflows have to be passed. See the new tests added for difference in usage.

Also note that previously, the sequence diagram relied on the order in which assets were defined (not dataflows). Now dataflows are used to order assets. It's still not consistent with being able to manually order dataflows but this can be improved in future MRs.

@nineinchnick nineinchnick requested a review from izar as a code owner February 26, 2020 13:32
@ghost
Copy link

ghost commented Feb 26, 2020

Congratulations 🎉. DeepCode analyzed your code in 2.376 seconds and we found no issues. Enjoy a moment of no bugs ☀️.

👉 View analysis in DeepCode’s Dashboard | Configure the bot

@nineinchnick nineinchnick force-pushed the avoid-globals branch 2 times, most recently from df4e9a4 to fff68fd Compare March 4, 2020 11:03
@nineinchnick nineinchnick force-pushed the avoid-globals branch 2 times, most recently from 1780df7 to b598528 Compare March 4, 2020 23:28
@nineinchnick nineinchnick force-pushed the avoid-globals branch 3 times, most recently from eb437fe to a4f6be3 Compare March 16, 2020 12:12
Copy link
Collaborator

@izar izar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I like this one. The idea behind pytm is to have the code describe the system, transparently, without making it a whole "big thing": write the code that describes the system you're working right here, and keep it with the code that is described.
From a software engineering point of view, the separation and re-use you propose is great, but I think here it will necessarily complicate 80% of cases pytm is used, to simplify 20%.

@nineinchnick
Copy link
Collaborator Author

I knew this one is controversial and is going to spawn some discussion :-)

If you take a look at the modified README and tm.py, the only things that changes is you have to specify which dataflows form the model when creating one, and because of that you have to define the model at the end. This makes the process of building a model a bit more explicit. I don't think this makes it any less transparent.

We could retain the same order of first defining and describing the model if we could assign the model as a property to all assets and dataflows but this would add a lot more overhead. So I went with the current solution.

The 80/20 argument depends on much complicated it would be to build small models. I'm not sure that being explicit means complicating things. You already have to be explicit when defining Dataflows by pointing to a source and sink. This falls into this workflow nicely:

  • define assets
  • define dataflows by connecting assets
  • collect dataflows into a model

So maybe the fact that it reorders the current workflow seems like a complication?

@nineinchnick
Copy link
Collaborator Author

Note that before we can complete the discussion here #84 tries to address the issue of assets shared between multiple models.

@izar
Copy link
Collaborator

izar commented Mar 17, 2020 via email

@nineinchnick
Copy link
Collaborator Author

I understand, but it sounds like a trap. This kind of changes needed to make the library more extensible are breaking changes. Doing this early will help keep the userbase.

What if I didn't move creating the model to the end in the examples?

tm = TM("Some model")

user = Actor("User")
server = Server("Web server")

tm.elements = [
    Dataflow(user, server, "Request"),
]

tm.process()

@izar
Copy link
Collaborator

izar commented Mar 18, 2020 via email

@nineinchnick nineinchnick force-pushed the avoid-globals branch 6 times, most recently from 00b20a8 to 4ba2fae Compare March 31, 2020 18:44
@nineinchnick nineinchnick force-pushed the avoid-globals branch 2 times, most recently from ca014ee to a459d66 Compare June 16, 2020 20:03
@izar
Copy link
Collaborator

izar commented Jun 23, 2020

@shambho @colesmj what do you guys think? Is it worth to move this way and break the "simplicity" of the approach? @nineinchnick correctly points that the way it has been so far creates a problem in having multiple models handled together, but I offer that it is a marginal case compared to how people have been working so far. I mean, I see the value of the change from a development point of view but I am thinking of the users here. It breaks the "anyone can write a model in 5 minutes" i think by adding a layer of syntax that most people will probably not be using.

@nineinchnick
Copy link
Collaborator Author

As a last resort I can create a default context and keep the reset() method. Then this will allow to retain the old behavior and won't break BC. The only question is which approach should we recommend in the readme and tm.py as an example.

@colesmj
Copy link
Collaborator

colesmj commented Jun 24, 2020

@izar I can see your argument on this, and from @nineinchnick . This PR contains more than 1 change, however. To be honest, the suggested change to add the context on its face when used with simple TMs is not much of a change (2 lines?). The paradigm change that models are defined by their dataflows is probably the bigger change - taking the act of creating a threat model from asset centric to flow centric, and potentially an area where this change would be both a benefit and a pitfall. I think our core audience for pytm are experienced developers, not experience Python developers, but the added requirement on the user to initiate a model does not seem onerous. What may be strange is how things might be shared across models - elements are defined within the context (if I read the test and sample correctly) and then data flows are specified to build the model, again within the context; at this point nothing is shared, but I can see the benefit of having assets and elements not related to the data flows being not included in the analysis. This may catch some people off guard but it sort of makes sense - if you consider the model may be incomplete at the time of rendering (i.e. in the course of conversation the author identifies elements but does not identify how they are used yet, so there is nothing to connect them).
I would have liked to see a sample of 2 models sharing assets or flows to highlight the benefit of using a context manager before merging this PR.

@izar
Copy link
Collaborator

izar commented Aug 26, 2021

Where do we go with this one? @colesmj @nineinchnick

@nineinchnick
Copy link
Collaborator Author

nineinchnick commented Aug 30, 2021

I was supposed to provide more examples, especially with multiple models sharing assets. Otherwise, there are no changes requested in the PR.
I should rebase the branch and also can make sure this is backward compatible.

@raphaelahrens
Copy link
Contributor

raphaelahrens commented Aug 31, 2021

Hi,

I would like to see this feature being merged.

Why the feature would help me?

I work on a research project with ~10 teams, where each teams builds there own services.
But all teams share a common infrastructure, for example they all use common a data format, a registration service, a message broker and others.

To make the modelling easier and to have common names and attributes in the threat models I would like to create libraries like this company.py and data.py

# Company
from pytm.pytm import ExternalEntity
from pytm.pytm import Boundary
from pytm.pytm import Actor

boundary = Boundary('Unternehmen')
mes = ExternalEntity('MES', inBoundary=boundary)
ceo = Actor('CEO')
# Data
from pytm.pytm import Data
from pytm.pytm import Classification

efdm = Data("EFDM")
efdm.classification = Classification.RESTRICTED
efdm.format = "JSON"

efdm_signal = Data("Umsetzungssignal")
efdm_signal.classification = Classification.RESTRICTED

time = Data("Zeit")
time.classification = Classification.PUBLIC
machine_info = Data("Anlangendaten")
machine_info.classification = Classification.SECRET

These can then be used by the other teams to create there threat models like this.

from pytm.pytm import Server
from pytm.pytm import Dataflow as D

import company
import data

service = Server('Example', inBoundary=company.boundary)

D(
   service,
   company.mes,
   'Push machine info,
   data = data.machine_info
)

But with the current global state this will also add the CEO into the drawings and the reports, even if the CEO is not relevant for this threat model.
With the data module this only affects the reports, so machine_info is listed as well as time and the others.

I think it is also help in creating small per feature threat models, since the users of the software can create there own reusable libs of elements which they can reuse in there next feature threat models.

I hope this example was helpful.

@nineinchnick
Copy link
Collaborator Author

@raphaelahrens until this gets merged, as a workaround, you should:

  • set the ignoreUnused property in the model (TM object)
  • call TM.reset() before building each model

@izar
Copy link
Collaborator

izar commented Aug 31, 2021

Thanks @raphaelahrens that was a very useful example.

I definitely see the use case. I also continue to think it falls into the small side of a 80/20 split.

A naive question, but one that might (if I am not dreamwalking) be a useful compromise - what if we added functionality to add one TM object to another? This way company.py and data.py would be included, their TMs joined (smartly) with the new TM and all would be accessible, as in the example.

The simple syntax is maintained, and we have minimal change. Would that work?

@raphaelahrens
Copy link
Contributor

We seem to agree that composable threat models are useful and I also agree that the first model must be easy to create with minimal effort.
But regarding the 80/20 concern, I also believe that there must be a way to easily grow the threat modelling efforts.
As mentioned early it is helpful to model different features in separate models and then it helps if one can reuse parts of an earlier model.
Currently I create one model for one feature and then copy the model for the next and updates to one model stay in that model.
The benefits I see in having the ability to import models are

  1. Less typing for later models
  2. Older models might get more attention, because you have to read them if you want to reuse them
  3. Updates are not only applied to one model they propagate to all models.

As soon as your repository has more then one model, this feature will be helpful.

The question then is what is the desired/recommended workflow for pytm?

Currently it is simpler to have one big threat model, because reuse is much simpler.
But large models are hard to reason with.
Alternatively you can copy the threat models which makes updates harder and older models will age quicker.

For me it made sense to put elements which I would copy from other models into a separate python module for reuse.
As I have done in my example.
This felt natural to me since these are python objects, so I can write functions and classes which give me what I need and I can have constants done in my data module. ( by the way this is what made me use pytm).
And by using these functions, classes and constants like normal members of a module this felt straight forward, since I can write something like ... inBoundary=company.boundary).

I agree that not all users will be python programmers and something like with tm.build(): and the need to indent all lines will drive people mad.

I have started a branch raphaelahrens:object_context, where I experimented with the idea of putting the context inside the TM object instead of keeping it in the class object.
All other element classes then use TM.context() to get the reference to the current threat model object.
TM.context() return the object if the call was not done via an import.
With this function

def _check_if_imported():
    # Get the stack of the caller
    stack = inspect.stack()
    # Get the module of the second last stack which called
    module = inspect.getmodule(stack[-2][0])
    # return Ture only if it was from a real module
    # Not some import
    return module is not None

Which gets check if TM() and TM.context() are called

This has the benefit that when TM() is called inside an import TM._context() is not changed.
So you can combine these two threat models like this

#!/usr/bin/env python3

from pytm import (
    TM,
    Dataflow,
    ExternalEntity,
)


t = TM(
    "my test tm",
    ignoreUnused=True,
)

test = ExternalEntity('Test')

print("import")
import tm as example

Dataflow(
    test,
    example.user,
    'flow',
)


if __name__ == "__main__":
    t.process()

Currently the merging is done by setting ignoreUnused=True since it looks at all dataflows and rebuilds the element list.
The next step would be to write the merging process and to see if it even makes sense to even reuse the model via t.merge(example.tm) or so.
Also the function _check_if_imported() needs to be verified for safety, I guess there are still some corner cases where it return a false value.

@izar
Copy link
Collaborator

izar commented Sep 9, 2021 via email

@raphaelahrens
Copy link
Contributor

@izar yes the dataflow will be part of t.
I added the wired looking import in the middle to make sure the solution would still work with none pep8 code.
As mentioned there will be corner cases of pythons import behavior I have missed.

@raphaelahrens
Copy link
Contributor

I created pull request #175 with the the implantation and some small test cases. But it would be nice if someone could verify that I have not interfered with the normal behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants