-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faker-ruby has too many generators to maintain #2689
Comments
I think there are two types of generators we have:
My opinion on the generators: We should have as many yaml generators as possible. Infact, I try to review most of the new Yaml based generators PR that are added. I also find that it increases the value of faker-ruby. Regarding third party API, I have no strong opinions and I don't mind removing them since we don't have control over them anyway. Historically, these web API's have gone down which has affected faker-ruby and it's stability.
I very strongly believe we should not remove any yaml based generators. Infact, the Haskell package which I have authored (https://hackage.haskell.org/package/fakedata) depends on faker for it and removing any yaml would be a breaking change there. |
We can also potentially revisit the idea of splitting Faker up into multiple gems, which was explored a bit in #1539 |
I still think we could have a rubric of some sort to evaluate a generator before adding it. Simply because the generator is Yaml based does not seem like a strong enough point to accept any PR that is adding a generator. To prove my point, the more generators there are, the harder it is to know if one exists already, especially because sometimes the names are not so easy to search for. It's easy to end up with duplicated generators, for example.
I agree. I am going to open a new issue with a list of them for to have a sense of how many there are.
Fair. As a heads up, even if we want to remove yaml based generators, the Changing logs clarify the breaking changes before bumping the new version. That said, it seems a good starting point is to list the generators that rely on external services for now. I'm not suggesting we remove all generators. My point with this discussion is to bring awareness to the fact that simply adding any new generators without a rubric, and focusing only on generators, makes it harder to improve the existing ones. |
I believe @vbrazo has shared that with me. It looks like the discussion there didn't go anywhere, and it wasn't clear to me what difference it would make in maintaining faker. I'm thinking of starting with the generators that rely on 3rd parties, and see how it goes from there. What do you think? |
Baring in mind the speed issues reported in #2719. Is there an appetite for an approach where you might have a minimal (or zero) faker instantiation and then bring in generators as required by the end user? This would be similar to the a configuration pattern used in Rails. i.e. require "rails/all" vs. require "rails"
# Pick the frameworks you want:
require "active_model/railtie"
require "active_job/railtie"
require "active_record/railtie"
# require "active_storage/engine"
require "action_controller/railtie"
require "action_mailer/railtie"
# require "action_mailbox/engine"
# require "action_text/engine"
require "action_view/railtie"
# require "action_cable/engine"
require "rails/test_unit/railtie" ..in the Folks who just want to "require & forget" can use Something like... gem "faker", require: "faker/minimal" Then some sort of initialiser. Thoughts? |
hi @andycroll (I watch a few of you talks on youtube :D, thanks). That look like a smart suggestion to configurable this way. From my testing requiring all the files is just a small postion of the time(100ms) to load. Still ig could be speedup, and it creates more constans then whay you might want. Most of the time is spent parsing yaml(1400ms), so the biggest gain would be if we could reduce the amount of yaml being parsed. I don't know if it make any sence to try to eager load and parse all yaml files if the gem is eager loaded. The alternative is to parse yaml files needed on demand and cache in memory. It would make it alot simplier to only load on demand since then we don't need to load all locales. Just want to have it said: I think there is very little to gain by splitting it up into multiple gems. |
This is what I do in fakedata (The haskell version of faker, but uses faker's yaml file directly) - Do it on demand and cache it in memory so that the same yaml file is never read again. I wrote a more detailed report about it here: https://psibi.in/posts/2019-09-23-fakedata-perf.html |
Some YAML-based generators may be sort of generators which someone may find redundant. But some other people may actually find them useful. What I would suggest is to look at each generator and think about their use case. If there is actual practical use case, then the generator can be kept. What comes to generators which are relying on third party API's, I would strongly suggest we deprecate those. Reason is simple: Faker should be as much self-contained as possible. Faker developers (and those using Faker) should have full control of the library. Therefore, I don't view code relying on third party API's, or any other web services for that matter, a good thing. |
I have just begun using Faker, pitching in here on the off-chance that the perspective of someone new to the gem is useful in this discussion. When browsing through the generators available, I was surprised to find extremely niche data like Myst quotes and World Cup stadiums. In my experience, gems that provide some form of content are usually split into a core where the main functionality is found and other gems that provide said content. (Take My blind guess is that the vast majority of people using Faker only use a very small subset of its generators to create usernames, emails, addresses and other common test data. However, as @jremes-foss mentioned, noone can know for certain which generators that look redundant at first may actually be the very reason why someone is using Faker. Looking over the generators I would say there are a few main categories:
I understand how splitting up faker into multiple gems adds complexity and I agree that one gem per namespace as discussed in #1539 is probably overkill. However, let's say categories of less common generators were moved into gems like Additionally, only having a few of the most common generators in the core faker gem would almost certainly improve the performance of using faker (#2719) for most people. And people that need niche cultural content would at least know that they are going to include a lot of additional data to load each test when they include |
There is of course also the option of removing all generators from the core repo and having it depend on a base content gem like A base content gem like this could be useful as a template for anyone who wants to create additional generator gems. |
This issue hasn't seen any update for a long while, but I'd like to add two cents not as a maintainer but as a long-time user of the library in favour of breaking up the gem or at very least namespacing things a bit more frequently. Every time I use Faker, I wind up coming to the list of generators and it's a bit of a search to find what I need. Sometimes, the generators that produce the data that I need don't map directly to name (looking at you in particular, Moreover, I feel like the discoverability of important and highly useful generators is significantly harmed by the presence of so many others without some basic level of categorization. (If there is such a categorized list, I would be in your debt if you'd link it to me.) I understand that taxonomy is difficult, but even a poor first attempt would make it significantly easier to find the right data (and to understand what's available). Thank you all for your ongoing work on the project. :) |
For reference: in #2852 we verified there are no generators relying on external services. |
It looks like this conversation was continued in https://github.com/orgs/faker-ruby/discussions/2877. |
Hi, all. Sorry for the radio silence and thank you all for your input. At first, I was thinking on how we could make any changes in a compatible manner but after talking with @stympy at RailsConf this year, and he gave me the idea of not focusing on that that much. I haven't gotten to this yet, but the idea is to split out the generators between core and other plugins. One challenge is to decide on what generators are the core. Maybe a community survey would help with that? I'm going to focus on this when I am back from vacation. |
Survey to get more context on what generators the community is using, and other questions to help with this work is live here: https://app.polling.com/forms/3103cabf-0fad-4669-b753-a5a4a4598eaa/share (edited: had to move to another platform since typeform only allows 10 submissions in the free plan xD). Please share with your colleagues! Thank you. |
faker-ruby has a ton of generators. Although it can be an easy way for new contributors to help, it adds too much work for maintainers.
Here are some problems that I've encountered for the past few months:
a) generators relying on 3rd party APIs. Those services get shut down, or could even return unexpected values. It requires us to deprecate and remove generators often. We should avoid relying on those services as as much as we can.
b) generators being added without having a measurement of its need. It's hard to know which generators are the most popular and which ones aren't used at all. But some of them are not worth of being maintained because they are so niche.
c) there are so many generators, it's hard to keep track of them, and know about them. I've spent time reviewing PRs only to realize there is one already, and it wasn't straightforward to find it.
d) there are lots of generators, but the translators are not up to date. In my perspective, it makes sense for us to focus on translating the generators that we want to keep.
This is not a popular issue, I'd assume. But to keep faker in a stable manner, I'd propose some changes related to the points above:
The reason why I'm opening this issue is because working on new generators is where most of us spend our time. It's time I'd like to be using for solving critical bugs, and stabilizing the gem with the generators we want to keep, with the translations up to date.
What are your thoughts? How has your experience been? What do you agree, or disagree?
cc @vbrazo @stympy @koic @thdaraujo @psibi @Zeragamba @connorshea @Newman101
The text was updated successfully, but these errors were encountered: