Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take this repo public #284

Closed
5 tasks done
capnrefsmmat opened this issue Sep 26, 2020 · 9 comments
Closed
5 tasks done

Take this repo public #284

capnrefsmmat opened this issue Sep 26, 2020 · 9 comments
Assignees

Comments

@capnrefsmmat
Copy link
Contributor

capnrefsmmat commented Sep 26, 2020

Before our Fellows start, we must make this repo public. Checklist:

  • Verify there is no confidential information here (credentials, restricted data, partner names...)
  • Apply the MIT license via LICENSE file and note in README
  • Tidy up README and repo metadata to include appropriate links so members of the public who stumble on this will know what it is
  • Press the big red button
  • Update public docs to point here so people see how indicators work
@korlaxxalrok
Copy link
Contributor

Do we need/want to think about obfuscating our deployment mechanism? My instincts around this are to keep as much knowledge about private infrastructure as guarded as possible. I'm not sure it will really matter, but I wanted to mention it as something to possibly put on the list for review.

@krivard
Copy link
Contributor

krivard commented Sep 29, 2020

Brian and I went through the details offline.

The potentially-sensitive infrastructure information falls into roughly two categories:

  1. the CI/CD system that automatically tests, packages, and deploys indicators from deploy-* branches
  2. the documentation of the CI/CD system

(1) cannot be removed entirely without causing an enormous disruption to how we fix bugs in the deployed versions of all automated indicators. We could reduce the risk posed by a public CI/CD configuration by moving as many of the functional details as possible out of this repository, so that the public stuff just calls scripts whose exact implementations remain private.

(2) can be removed, but doing so will make onboarding backend engineers more difficult.

The URL of the Jenkins server must remain in the public repo for automatic build tests to work. The Jenkins server does not accept public network traffic, so that minimizes risk there.

The URL of the runtime host must remain in the public repo for automatic deployment to work. The runtime host is currently on the public internet, and this line of the README discloses its OS and cloud platform. That is probably a bit too much information to make public in combination.

In the interest of expediency and kindness to our new members I'd prefer to leave as much of the current system in place as possible without exposing us to ridiculous amounts of risk. Some changes I would consider are:

  1. Drop the OS from the README -- we should do this regardless
  2. Shift to a CMU-private runtime host -- currently discussing with Brian

@capnrefsmmat
Copy link
Contributor Author

What threat are we guarding against by not disclosing OS or host details? Many projects publicly share their CI configuration, and I'm not sure what risks that entails.

@krivard
Copy link
Contributor

krivard commented Sep 29, 2020

Verify there is no confidential information here (credentials, restricted data, partner names...)

  • No mention of partner names
  • No credentials (checked for patterns: token['"]*: key['"]*: secret
  • No restricted data (synthetic source files only for doctor-visits, fb-survey, and hospital-admissions unit tests)

@korlaxxalrok
Copy link
Contributor

@capnrefsmmat We expose a little more visibility into our deployment mechanism. It doesn't need to be a thing that holds us up, but maybe something to address when we have the chance.

@krivard
Copy link
Contributor

krivard commented Sep 29, 2020

@capnrefsmmat the usual "somebody gets mad at scientists and sets a bunch of hooligans on us" scenario. DDoS, script-based attacks, etc. Anyone motivated enough will be able to take down or corrupt any of our publicly-accessible resources, and the more details we make public about the configuration of those resources (such as OS information), the less motivated they'll have to be. We don't want to make it easy for someone to take out the Automation server, since that would be a pain to clean up.

On conferring with Brian, we'll stick with just removing the OS details. The runtime host is already disclosed through backversions of epidata, so switching to a private host won't keep anything hidden.

@krivard
Copy link
Contributor

krivard commented Sep 29, 2020

Once #288 is merged we'll need to

@krivard
Copy link
Contributor

krivard commented Sep 30, 2020

it turns out filter-branch does not edit the commits but rather replaces them with a new version, which should have been obvious in hindsight. There are two ways to propagate the resulting forced changes to all the clones:

  • reset --hard for each branch in remote
  • rebase for each branch in remote

The first is no good because it will trash all changes that haven't been pushed up to the server.

The second is no good because it will attempt to replay the original versions of each replaced commit, and each of those will generate a conflict that has to be resolved manually. There have been hundreds of such commits, and I'm not going to make everyone do that.

There might be a way to have folks generate patches, but that would require more coordination than we're prepared to do today.

I've removed the disclosure from the contribution guide that's been merged to main, and we'll just cope with it remaining in the history.

@capnrefsmmat we're ready to link to this repo from the public docs.

@capnrefsmmat
Copy link
Contributor Author

See cmu-delphi/delphi-epidata#229

@krivard krivard closed this as completed Oct 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants