Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify what "All activity is now stored only in the cloud..." actually means #665

Closed
jb510 opened this issue Nov 22, 2014 · 14 comments
Closed
Labels

Comments

@jb510
Copy link

jb510 commented Nov 22, 2014

I'm still running 1.4.9 because I have no idea what they heck "All activity is now stored only in the cloud over SSL, local MySQL storage dependence is over!" in the change log, website and elsewhere actually means.

What cloud? My cloud? I have lots of clouds... Your cloud? How does that work?

Googling extensively and stopping just short of diff'ing all the code changes between 1.4.9 and 2.0.0 didn't result and any better idea what the heck that means. From wp-stream.com: "Activity logs are stored securely in the cloud over SSL making your data immune to any intruder and making Stream the most robust WordPress audit trail solution on the web" doesn't help either...

Many of us have pretty strict data protection policies to consider and while local MySQL storage was blowing up databases at least it was our own database that we owned and controlled and new the privacy policies of.

@blobaugh
Copy link

blobaugh commented Dec 3, 2014

@jb510 I had this same conversation at WCSF 2014. It is stored on the stream servers.

While I understand and respect the ideas behind the cloud stats storage I too have clients with privacy issues that require a local storage of their stats. I am currently using 1.4.9 to avoid offsite data, however I am concerned about lack of updates to that version.

I heard mention by @topher1kenobe that an idea for decoupling where storage lives could be a possibility. Is there any traction on that currently? Any way we can help forward that along?

FYI- The clients that do not want to store data offsite are govt, big corp, private intranets, sites with lots of PII that I have run into so far

@frankiejarrett
Copy link
Contributor

Hey @jb510 and @blobaugh, thanks for the comments.

We are working on an enterprise plugin specifically for these use cases that compliments Stream 2.x to store and query records to a custom Elasticsearch cluster.

The aim is to bypass the Stream API and cloud storage while still keeping records out of the WP database. Record transfer can then take place within a closed network, and record storage can be self-hosted.

Self-hosted Elasticsearch will be our first approach for this, but I would also be interested to know of other JSON doc databases that you would find acceptable use cases for. For instance, would a MongoDB solution be of interest to you or your clients?

But I do want to reiterate that we don't have any plans to bring back storing Stream logs in the WP content database since that is precisely what we've moved away from given all the risk factors associated.

@blobaugh
Copy link

@fjarrett what are the risk factors? Can it be built in a modular was so that if we decide we do want to store it in mysql instead of running another server we can simply create a mysql module for it? Where will the user go to view stream stats once they are decoupled from the Stream cloud?

@frankiejarrett
Copy link
Contributor

@blobaugh I'll take a moment here just to explain the vision behind why we are wanting to shun MySQL 😄

Stream is different from most other plugins because its needs are beyond the scope what the WordPress database is designed to handle. Stream needs to store very large amounts of data, all of the time, and retain that data as long as it can. The more data it's storing, the more meaningful results someone can get out of it.

As a general rule, it's not wise to be storing large amounts of activity logs in a content database anyway. The primary job of the content database should just be for serving up content to visitors as fast as possible. Bloating it with logs only creates more for it to deal with.

So our aim with Stream is to be steering the product in a better direction as much as possible. The point being that this isn't really a database-agnostic situation. MySQL has major performance limitations when it comes to running complex queries on large data sets, while NoSQL JSON document stores are designed and built for exactly this sort of thing (hence the ELK stack).

I'm sure there will be other tracking plugins that store logs in the WP database, but they will end up having performance limitations that Stream won't have.

To answer your question about seeing stats, that will all happen in the Stream plugin as normal.

@jb510
Copy link
Author

jb510 commented Dec 12, 2014

I don't think anyone has issue with the technical reasons behind this change. The issue is with storing data offsite in someone else's cloud that we have no knowledge or control of (security, privacy, backups, etc).

@frankiejarrett
Copy link
Contributor

Hey @jb510! Not sure if you read my earlier comment above, but yes, this is why we are making a direct to Elasticsearch adapter plugin for enterprise use which will bypass our cloud storage and keep data on-premise.

I do think technical details were relevant to the conversation above where @blobaugh was asking specifically about Stream using MySQL, which may have caused the scope of this issue to grow past where you originally intended 😸

@jb510
Copy link
Author

jb510 commented Dec 12, 2014

I read Ben's comment to be more "what if we want to keep ownership/control", but that might be me reading my issues into his comment. Regardless, I'm sure for some people the hope would be the option to keep things in MySQL, I get that's an awful idea, and for others it would mean another DB self-hosted local to WordPress (same server) or remote (cloud).

It all goes back to my original issue though that this change was made and all that was said is "data is now hosted in the cloud, yay!" without ANY details what so ever as to the details of that cloud based storage are. What are the security polices? Is that data stored encrypted or in clear text? In what legal jurisdiction is that data physically stored? Who is liable for data breach? etc... etc...

@frankiejarrett
Copy link
Contributor

@jb510 I really appreciate your feedback on this subject. After carefully reviewing the Privacy Policy and Terms of Service documents on our website, we agree with you that there was too much to be left to the imagination. So today we improved the verbiage in these a great deal to be more specific across the board, especially about some of the issues you've raised here in this thread.

While privacy statements are all well and good, we also realize the value in 3rd party auditing and verification of standard privacy practices. So today we've started working with TRUSTe in order to do just that. With their assistance, we are seeking their Cloud Data Privacy and US-EU/US-Swiss Safe Harbor certifications. We will be including those seals in our Privacy Policy once the processes are complete.

@jb510
Copy link
Author

jb510 commented Dec 13, 2014

Thanks Frankie. Look forward to that update. TRUSTe Cloud certification would be a great step and I suspect EU based websites will be happy with the UE-EU/US-Swiss Safe Harbor cert (although in my experience those EU folks are never really happy with anything when it comes to data privacy, so good luck!)

FWIW, I personally would also love to see the privacy policy and TOS for the stream website separated from those for application data or better delineated. It's hard to read as it now stands with the important bits down at #18. The privacy and security issues between the stream website (login, personal profile info) and the application data (change logs of every bit of data on a site) to me seem two wholly different beasts, but IANAL.

@frankiejarrett
Copy link
Contributor

Thank you, @jb510!

And nice idea about making the website and the product more distinct in the privacy policy, we will certainly do that.

@jb510
Copy link
Author

jb510 commented Dec 13, 2014

👍

@quasel
Copy link

quasel commented Feb 13, 2015

any idea when the option to self host the data storage will be available?

@frankiejarrett
Copy link
Contributor

@quasel We are about to go into private beta with a few enterprise customers. I imagine there will be several months of feedback from them before we release it publicly.

@quasel
Copy link

quasel commented Feb 16, 2015

@fjarrett thx for your quick replay looking forward to the release 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants