-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] RFC: Telemetry #1250
Comments
I'm hesitant about it, but as long as the project is very transparent about it (perhaps notifying users on initial install of yarn?) then I think most will be okay (myself included) |
The way I see it, the first install would print something like this:
Since we would only send the data once every week, you'd then have seven days to disable it before the first (anonymous) payload is sent. And if you stop using Yarn before the seven days, it won't send any data at all. |
Would the "once-a-week" thing be a general setup -- i.e. would EVERY install of yarn send data on say Saturday morning, or would it be individual from the first day of use? e.g. I use it on Monday the first, so by NEXT Monday the 8th, etc? |
That would be the day of first install + 7. I find it fairer, and it's also better for us since we would get gradual information over the course of the week rather than all at once. |
Ok. I can appreciate that; the server hit that you mentioned was going to be my next remark. |
Why? Isn't it better for the data to be owned by the Yarn project rather than a third party? |
My original line of thought was "if users know for sure that there's no chance we could ever get access to the raw data, they might trust us more". I'm not sure it would really change many things in practice though 🤔 |
I feel like people would trust it more if the server-side portion was open source, compared to using a third party closed-source system. |
I've examined "enumerators": {
"projectCount": [
"/tmp/webpack-virtual-modules"
]
}, I find that unacceptable and it makes me want to opt-out. Also it contradicts with
|
It doesn't send the directory. As you mentioned it would be really unacceptable. Instead, we simply keep the list of project path locally for bookkeeping purposes (to avoid counting the same path over and over again), but before emission we turn them into a number (note the field name: So in your example, we would send |
I am not certainly sure, but it could be possible that the information which has been proposed to be collected can be enough to identify single entities. This might be something which has to be tested and/or checked with someone who has certain expertise with this to see if this is the case. If these information would be sufficient to identify single entities, the GDPR propably will apply for users inside the EU. If this is the case, the telemetry would need to be opt-in, a privacy policy has to be published and kept up-to-date and a few other things which need to be managed. It depends if this will be less workload or not in summary. 🤔 (Personally I think opt-in telemetry is also a more friendly approach. It is more like: "Hey if you want to help us, you just need to enable telemetry for us to understand how you work with yarn!" instead of "If you do not want to help us, you can stop it anytime". But I also see that the opt-out approach has its appeal in that there might be more data which can be analysed)
There are a few more options which might be more trustworthy than Google Analytics and worth considering: |
We have a thing called documentation:
|
Describe the user story
As maintainer, it's sometimes difficult to know what we should prioritize. Are large monorepos the most common situation our users encounter? What packageExtensions are the most common? How many people opted-out to the nm linker? Etc.
Because of the lack of analytics, some projects also have trouble taking us seriously. A thread in the Node docker image recently suggested to remove Yarn from the Docker image, citing Yarn as a fringe tool. I don't have time to spend collecting the various polls from the surface of the earth.
Describe the solution you'd like
I propose we implement opt-out telemetry.
Homebrew is an OSX package manager with some level of analytics (they actually log more than what I have in mind for us: https://docs.brew.sh/Analytics).
Users would be anonymous. We wouldn't implement "client IDs".
Data would be stored on a third-party we don't own. In our case, something like Google Analytics would be perfect.On this point, I've investigated a bit Google Analytics and I'm not sure it's an option. The dashboards are very bare, and it doesn't seem to have good support for arrays, which would be necessary to support plugin and command names, unless we split it across dozens of calls. Perhaps Datadog would be a better fit after all.Events would be aggregated, and sent weekly. We wouldn't be able to track anything with a lower granularity. As a result, telemetry wouldn't have any effect on CI.
Information about telemetry would be displayed on first install, together with a link explaining it in more details. Documentation would include a new page describing it.
A new
yarn analytics off
would disable it from all projects on the machine (on
would re-enable it). Runningyarn analytics show
would print the information that would be sent.The payload would be sent only during installs (not during
run
or anything else), in parallel with the regular install workflow (so it shouldn't have any significant overhead). Connectivity failures would be ignored and not cause installs to fail.The information I propose we would track:
packageExtensions
field (name of extended + name of the extra dependency)Describe the drawbacks of your solution
Telemetry is seen with an understandable amount of caution. Not helping, the project was once associated with Facebook, and it will be important to remind users that we don't have any particular link with it anymore. Using a third-party provider (such as Google Analytics) will also be a good way to guarantee that we don't collect unlisted data (such as IP, etc).
Describe alternatives you've considered
We could do without telemetry. Unfortunately, I think the lack of consideration we get from some entities is caused at least in part by the lack of metrics we can show them (helping us will have impact on X thousands of developers). Not having those tools require us to put more work into convincing them, which is exhausting.
The text was updated successfully, but these errors were encountered: