Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beef from Gemini/MySQL on CentOS #828

Closed
ajs6f opened this issue May 10, 2018 · 56 comments
Closed

Beef from Gemini/MySQL on CentOS #828

ajs6f opened this issue May 10, 2018 · 56 comments

Comments

@ajs6f
Copy link

ajs6f commented May 10, 2018

TASK [Islandora-Devops.crayfish : Install Gemini db schema (mysql)] ************************************************************
Thursday 10 May 2018  14:34:42 -0400 (0:00:01.104)       0:21:13.837 **********
fatal: [default]: FAILED! => {"changed": false, "msg": "ERROR 1709 (HY000) at line 1: Index column size too large. The maximum column size is 767 bytes.\n"}

I'm going to check into whether this is a problem with CentOS7's stock version of MySQL, but I'm trying with PostgreSQL for the moment.

@whikloj
Copy link
Member

whikloj commented May 14, 2018

Yes, it is. I'll have to dig it up but we've run into this because the stock Centos7 is a MariaDB 5.5 whereas Ubuntu is 5.7. Therein lives the issue. That or the huge possible indexes from our Gemini table schema.

@seth-shaw-unlv
Copy link
Contributor

Linking ansible-role-crawyfish issue #2

@ajs6f
Copy link
Author

ajs6f commented May 21, 2018

Ok, so bumping MySQL on the fly should get past this, eh? Trying....

@ajs6f
Copy link
Author

ajs6f commented May 21, 2018

Anyone know if 5.6 solves this, or if I have to go ATW to 5.7?

@seth-shaw-unlv
Copy link
Contributor

Looks like it is still a problem with 5.6. HOWEVER, it looks like you can get around the limitation with some configuration flags.

@ajs6f
Copy link
Author

ajs6f commented May 21, 2018

Nice catch, @seth-shaw-unlv ! I am trying with MariaDB 10.0, and if I can't bang it out, I'll try with 10.1.

@ajs6f
Copy link
Author

ajs6f commented May 21, 2018

I think I just hit this because MariaDB 10.0 uses SysV init on CentOS7, trying their workaround.

@ajs6f
Copy link
Author

ajs6f commented May 21, 2018

Aaaaand, it turns out to actually be easier to just upgrade past 10.1.8, when MariaDB started using systemd.

@ajs6f
Copy link
Author

ajs6f commented May 21, 2018

Okay, using MariaDB 10.2 and with a fix for a problem with the package I was able to get past this. So far. I'm not sure what to contribute back to CLAW, though-- I guess we have to be honest and say you just can't do this on stock CentOS7? Or should we try to mung the Gemini schema when installing to CentOS (which sounds horrible and unmaintainable)? Or will some kind of new hash-based design (as mentioned in the abovelinked ticket) obviate the need to upgrade?

@seth-shaw-unlv
Copy link
Contributor

@ajs6f @DigitLib

It doesn't look like anyone is tackling a Gemini revision to resolve this issue anytime soon and @ajs6f moved forward with a Maria 10.2 install. @DigitLib, would you have any objections to the claw-playbook installing Maria 10.2 by default on CentOS?

I think I have a revised playbook working with Maria 10.2 that I'm putting through a full rebuild right now. I can issue a PR for more community review/discussion once I confirm it works and if you two think this is a good idea.

@ajs6f
Copy link
Author

ajs6f commented May 23, 2018

It sounds reasonable to me. TBC, I may have some trouble selling 10.2 locally (I know, but we're pretty slow-moving here, and often for good reason) but that's my problem, not CLAW's problem.

But I feel like we might be losing sight of an idea (not sure whose, maybe @DiegoPino ?) to move to a hashed form of the URIs (which could let us shrink the column length)? Not saying we should prefer that, just that I don't want to lose that thread because of my site's conservative deployment policies.

@ajs6f
Copy link
Author

ajs6f commented May 23, 2018

TB even more C, definitely let's have the PR ( I will test using CentOS Vagrant).

@DiegoPino
Copy link
Contributor

@seth-shaw-unlv @ajs6f since i did not get any positive feedback (or a postcard or a beer) to do so when I suggested the index alter and hash I though it was not a thing you all wanted. But i can for sure refactor that, i wrote a service that does that (in node.js, rocksdb long story) so can just copy that logic if you all care. Personally, i think refactor is needed even if this all passes, such long indexes are never fast when you reach a few Mill. urls.

@ajs6f
Copy link
Author

ajs6f commented May 23, 2018

It would be nice both to keep the field size shorter (for speed) and also to have a more well-distributed index (because of the hashing).

@DigitLib
Copy link

@seth-shaw-unlv I don't have MariaDB 10 I tried a @DiegoPino scheme https://gist.github.com/DiegoPino/825d33da1a376b72cf0b0c06ff66bed1 and worked OK. I can try to install MariaDB 10...

@seth-shaw-unlv
Copy link
Contributor

Arg! Maria 10.2 is still complaining that the field is too long because varchar is 3 bytes per char and 3*2048 is bigger than the new max of 3072. It looks like we need to update Gemini after all.

@ajs6f
Copy link
Author

ajs6f commented May 23, 2018

Hm, that's weird-- Maria 10.2 worked for me.

@seth-shaw-unlv
Copy link
Contributor

@ajs6f Can you send me your server config to see if I'm missing something?

@ajs6f
Copy link
Author

ajs6f commented May 23, 2018

Sure, but what exactly do you want-- the MariaDB config?

@seth-shaw-unlv
Copy link
Contributor

@ajs6f yes, please. I should have been more clear.

@ajs6f
Copy link
Author

ajs6f commented May 23, 2018

@seth-shaw-unlv No prob, here ya go.

@seth-shaw-unlv
Copy link
Contributor

That looks consistent with my test. Hmmm, perhaps there is something deeper I'm missing.

@ajs6f Can you log into your db and run show variables; and desc gemini.Gemini; for me?

@ajs6f
Copy link
Author

ajs6f commented May 23, 2018

Oh, WOW. Something is really off here:

MariaDB [(none)]> desc gemini.Gemini;
ERROR 1146 (42S02): Table 'gemini.Gemini' doesn't exist
MariaDB [(none)]> use gemini;
Database changed
MariaDB [gemini]> desc gemini.Gemini;
ERROR 1146 (42S02): Table 'gemini.Gemini' doesn't exist
MariaDB [gemini]> show tables;
Empty set (0.00 sec)

And yet the -playbook install finished successfully and CLAW appears to be running fine.

@ajs6f
Copy link
Author

ajs6f commented May 23, 2018

Yep, further checking reveals that Gemini is not being used, Fedora is not receiving updates (except API-X's setup) long story short, that -playbook finished without error but without correctly installing. :(

@ajs6f
Copy link
Author

ajs6f commented May 23, 2018

What's more, from the Drupal side, everything appears to be working well. There's no indication that the various CLAW components are not installed/connecting/correct.

@DiegoPino
Copy link
Contributor

DiegoPino commented May 24, 2018

@ajs6f @whikloj @seth-shaw-unlv @DigitLib I started refactoring Gemini last night. Will make a pull tonight at some point. I went for total abstraction via doctrine-ORM and should work out of the box for you all. Also, if you are Ok with this, I will move table creation to Gemini, so no more SQL script that runs independently first/DevOps tasks, Gemini will provide its own method for doing so probably will follow with an ansible related pull too (next week)

Lastly, for your own thoughts, since the changes are based on my own Nodejs based mapping/routing service, different thing but similar need, I could leave "space"/plan for the following: A Source Server/Destination Server field. Means, same Drupal could write to multiple Fedora's (round robin?, balance, random? via rules?), Same Fedora could be synced to another Drupal if needed (i speak only about the mapping, not the logic really, CLAW is a different beast than what I have). That means, Gemini in the future would/Could become a router, not only a mapper. Probably worth another issue in the future or out of scope.

Thoughts? hopes?

@ajs6f
Copy link
Author

ajs6f commented May 24, 2018

@DiegoPino Thoughts:

  1. I'm agnostic to ORM or other method, but putting the schema into Gemini itself sounds excellent, and I would be grateful for it. At this point I am pivoting to avoid Ansible entirely for SI (for us it has produced enormous overhead with no accompanying value), so keeping component initialization with each component sounds great.
  2. Routing vs. simple identifier mapping: At least worth another issue, at which time I will inflict upon everyone my opinion that such a feature would be very unRESTful. The stronger move here is to distribute the repository (a la @acoburn's Trellis), so that Gemini can fire requests at a single address. Let middleware do middleware and let backend do backend, and keep the contract in between as simple and narrow as can be.

@DigitLib
Copy link

@DiegoPino Goood idea to move SQL scripts to Gemini! Waiting for testing in claw-playbook...

@DiegoPino
Copy link
Contributor

@ajs6f @seth-shaw-unlv, so, i have this stuff working. I see folks are fighting with composer and stuff so really don't want to drive them crazy right now. When it is right to pull this? Seems like there is some close due date i don't want to interfere with. Anyway, code works.

@seth-shaw-unlv
Copy link
Contributor

@DiegoPino why don't you go ahead and issue the PR. Between us and perhaps another committer that isn't fighting with composer we can take care of it, I think. If nothing else it gives @ajs6f and @DigitLib something to target their centos builds against.

@DiegoPino
Copy link
Contributor

Ok, cool! Will deal with this once day job is over. Thanks!

@DigitLib
Copy link

@seth-shaw-unlv @DiegoPino I agree!! Go ahead...

@ajs6f
Copy link
Author

ajs6f commented May 29, 2018

@seth-shaw-unlv @DiegoPino Go for it!

@DiegoPino
Copy link
Contributor

DiegoPino commented May 29, 2018

@ajs6f @seth-shaw-unlv @DigitLib @Islandora-CLAW/committers I don't want to go into architecture weirdness but while making the formal pull (cleaning my mess) I just found out that 1) http://symfony.com/blog/the-end-of-silex 😞 and that makes me nervous, maybe I should refactor this totally into Symfony 4 and do a single Gemini repo with no dependencies to the rest of crayfish ecosystem at all. I mean, move one piece at the time and do it full symfony. I feel Gemini here is the key to the other services and maybe should be "upgraded independently". Would love to know your thoughts. It seems also more relevant to the future of this project that just patch it.

if you decided to stay on Silex, to be less rogue (i'm using an independent silex service provider for ORM right now), I could need to A)
update crayfish-commons https://github.com/Islandora-CLAW/Crayfish-Commons/blob/master/src/Provider/IslandoraServiceProvider.php
or
B) simply not use dbal at all in Geminy and make direct use of the ORM service provider without requiring crayfish commons there. Which leads to a question to you folks about why the dependency here and if the other tiny crayfishes need ORM? or want to stay with dbal?

In any case, I can make a few pulls with different options, nothing here is life changing. But please let me know your thoughts. Also, if i'm speaking non-sense too.

@seth-shaw-unlv
Copy link
Contributor

My usual inclination is to get something that resolves the issue in now and then have the conversation about the larger issue afterward; although, I hesitate to isolate Gemini as this is a larger architecture issue.

That stated, I don't really feel qualified to have an opinion on this one since I'm not familiar with Silex in the first place. Also, while considering this issue I noticed that @whikloj already has a PR on Crayfish commons to pull Gemini into Crayfish Commons which I think reinforces your statement that "Gemini ... is the key to the other services" so I'd be much more interested in his opinion on the topic, as well as @dannylamb since Travis has been appeased for the time being.

@whikloj
Copy link
Member

whikloj commented May 29, 2018

Yeah I needed to get at Gemini for the Recast Service, but that is probably out of date now. I can see that Gemini as the linkage between Drupal and Fedora could be needed from a lot of potential services. My concern with Symfony 4 is that we immediate place a requirement of PHP 7.1.3 or greater, I'm not sure if that is a deal breaker for some.

@seth-shaw-unlv
Copy link
Contributor

It seems all roads lead to killing PHP 5. I think this is the third time in less than 6 months where one of the possible solutions to an issue involved not using PHP 5.

@DiegoPino
Copy link
Contributor

DiegoPino commented May 29, 2018 via email

@dannylamb
Copy link
Contributor

I definitely am in favor of going with straight symfony for the long term. If given a chance to do it all over again, I think symfony probably would've been the better move from the start (oh hindsight).

I also am in favor of killing 5.6. It's just going to be build issue after build issue as time goes on.

But for now, can we just hash? This issue can be resolved with a lot less work. Switching to symfony seems more like something we'd do in a sprint.

@DiegoPino
Copy link
Contributor

@dannylamb ok, np, thanks for the feedback. Symfony 4 implementation was really straight forward and I was already doing the backport to symfony 3.2 (tiny changes for 5.6 folks), but I can just hash using the silex version, of course, I would have to drop ORM support and auto deployment of the schema to make this just a patch and not full-blown rewrite. Let me finish this stuff today in during the morning and I will share the Symfony rewritten piece, pull a quick fix for silex stuff, and other contributing related actions before the end of the day.

@dannylamb
Copy link
Contributor

I mean, if you already have it done in Symfony...

I'll take what you give me. Just don't want to send you down a rabbit hole if you don't have to.

@DiegoPino
Copy link
Contributor

Just heads up (needs testing, having second thoughts about dateCreated, fields). Silex version (for those 5.6 folks) Islandora/Crayfish#46

@seth-shaw-unlv
Copy link
Contributor

I just merged the PR but I neglected to test Postgres first. 👎
The current version's migrate command only works for MySQL.

So, either we need to do a short-term fix and update the claw-playbook to use a valid postgres schema (I'm testing this option now) OR generate a migration for postgres (can you do that, @DiegoPino?) and update claw-playbook to issue the migration:migrate command.

@DiegoPino
Copy link
Contributor

DiegoPino commented May 31, 2018

@seth-shaw-unlvI can issue a new pull for Postgres, can you manually test the SQL i wrote there to see if it works on Postgres? Since DBAL is pretty engine dependent, means yeah, you can swap engines in your config, but the moment you start writing SQL queries manually like CRAYFISH did you could end in an issue.

What about this??

https://symfony.com/doc/current/doctrine/multiple_entity_managers.html (see the multiple ones)

This will all be solved once we move to Symfony 4. For now, I can patch it up! (tomorrow sadly, kinda late with my OR2018 slides and 7.x release stuffd)

@DiegoPino
Copy link
Contributor

@seth-shaw-unlv or do you prefer a quicker fix and just do?

if  ('mysql' == $this->connection->getDatabasePlatform()->getName())  {
 $this->DoSQLthatone;
}
elseif 
('postgress' == $this->connection->getDatabasePlatform()->getName())  {
 $this->DoSQLother;
}

@seth-shaw-unlv
Copy link
Contributor

I think I was overreacting with urgency. claw-playbook is tagged against Crayfish 0.0.1, so the recent merge doesn't impact anyone CentOS folks who are building in their own work-arounds at the moment anyway.

So, rather than a quick fix I think we should focus on getting #722 resolved (making sure it works for both mysql and postgres).

@DiegoPino
Copy link
Contributor

See #722 (comment)

@seth-shaw-unlv
Copy link
Contributor

Once islandora-deprecated/ansible-role-crayfish/pull/7 gets merged we can cut a new version of the role (0.0.2) and update the claw-playbook. That should resolve this issue.

@seth-shaw-unlv
Copy link
Contributor

Oh, we also need a new version of Islandora-Devops/ansible-role-karaf, too.

@seth-shaw-unlv
Copy link
Contributor

That (Islandora-Devops/islandora-playbook/pull/66) should do it.

@ajs6f, anything else before we close this issue?

@ajs6f
Copy link
Author

ajs6f commented Jun 21, 2018

@seth-shaw-unlv I don't know-- I'm waiting on the full content-modelling epic to land (and for a go-ahead on that account from @dannylamb) to do any more testing. In that light, I guess I'm cool to close this?

@dannylamb
Copy link
Contributor

dannylamb commented Jun 21, 2018

@ajs6f You're probably good to close this ticket since the season-long story arc of content modeling has landed, however you'll only be truly sassified when Islandora-Devops/islandora-playbook#68 lands.

Should be as easy as ISLANDORA_DISTRO=centos/7 vagrant up then.

@ajs6f
Copy link
Author

ajs6f commented Jun 21, 2018

  1. "sassified" is now officially the word of the week. Thank you @dannylamb, I look forward to being truly sassified.
  2. Sadly, no-- remember my destructive rampage through IRC trying to install the content-modelling stuff over top of an existing install? That hasn't changed. It would still take a long, long time to blank that machine, and I'm still going to need to try doing it there (for the specific purpose of demoing CLAW locally). That, however, is a story for another time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants