Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display keywords on book page (as per issue #210) #212

Closed

Conversation

peterjdann
Copy link
Contributor

The immediate purpose of this proposed change is to allow users of the Librivox catalog to see what keywords have been applied to an audiobook, to see how many audiobooks in all have that keyword, and, where one or more other audiobooks have that same keyword, to list them by clicking the keyword concerned.

In the longer term, if/when we can improve the functioning of the Advanced Search capability, a better understanding of what keywords are commonly applied to our audiobooks (facilitated by this change) should enable users to conduct more sophisticates searches (for example, to find audiobooks that have both "love" and "French Revolution" as keywords, to take a purely hypothetical example).

Included in this change is code which I propose should be run as a cron job (to update keyword usage statistics) and a call to a quick-and-dirty update function that will allow the keyword usage stats for a newly catalogued project to show in whatever interval may exist before the next cron job will run,.

@peterjdann
Copy link
Contributor Author

I am an almost complete newbie at using GitHub fork-and-clone, and must admit I have not the following idea what, if anything, I should be doing about the "All checks have failed" message I'm currently seeing for this pull request. Please advise!

@redrun45
Copy link
Collaborator

redrun45 commented Apr 1, 2024

The problem

I believe the problem is: while your new code is in the pull request, the modifications you made to the database are not.

Looking at the log ('Details' link on our "Deploy localdev (PR)" check task), it seems that when GitHub makes a test copy of the web site using your new code, that test copy returns a '500' internal server error message. I pulled up a copy myself, and here's what I see when I access a given page:

An uncaught Exception was encountered
Type: mysqli_sql_exception
Message: Unknown column 'keywords.instances' in 'field list'
Filename: /librivox/www/librivox.org/catalog/system/database/drivers/mysqli/mysqli_driver.php
Line Number: 307

As for what to do about it, this is new to me too. I've never had to update the database schema in tandem with a code change.

What I'd suggest, is to first take a snapshot of your current state, then re-run Ansible to reset everything (including the database) to the shared version we're working off of. Then see what it takes to get things working from there.

Housekeeping

You'll notice that our 'librivox/master' version of the code has been updated since you started, so you can go to your "fork" page and you'll have an option to merge those changes into your fork. The line that should have this button for you, currently shows me:

This branch is 14 commits ahead of, 6 commits behind LibriVox/librivox-catalog:master.

You'll want to merge those 6 commits into your fork, and then go back to your dev machine to git pull the resulting, updated branch.

Next steps (Updated per Artom's post)

Now, with the database reset and your code branch up to date, you'll be seeing the error code that I'm seeing. From there, you can create a database migration file, with all the steps necessary to get the database working with your code.

notartom added a commit to notartom/librivox-catalog that referenced this pull request Apr 1, 2024
We've already done a few schema changes "out of band", but with
LibriVox#212 we now have a
situation where a schema change is needed in a PR for the code to
work, and since we have no way to supply that, our CI (continuout
integration, not Code Ingiter), basic such as it is, fails because the
schema that the code relies on is not deployed in the CI VM.

In addition, tracking schema changes inline with migrations is just
good standard practice.

This patch starts making use of Code Igniter's DB migrations framework
(https://codeigniter.com/userguide3/libraries/migration.html) to allow
PRs to include schema changes. There will be a librivox-ansible
follow-up patch to run the DB migrate script on every deployment.
@notartom
Copy link
Member

notartom commented Apr 1, 2024

#213 is the correct "industry standard" way to do these things. If that passes the checks, I'll merge it, then propose a patch to librivox-ansible to run the DB migrate command on every deployment, and from then on, PRs will be able to add their schema changes inline.

notartom added a commit that referenced this pull request Apr 1, 2024
We've already done a few schema changes "out of band", but with
#212 we now have a
situation where a schema change is needed in a PR for the code to
work, and since we have no way to supply that, our CI (continuout
integration, not Code Ingiter), basic such as it is, fails because the
schema that the code relies on is not deployed in the CI VM.

In addition, tracking schema changes inline with migrations is just
good standard practice.

This patch starts making use of Code Igniter's DB migrations framework
(https://codeigniter.com/userguide3/libraries/migration.html) to allow
PRs to include schema changes. There will be a librivox-ansible
follow-up patch to run the DB migrate script on every deployment.
@notartom
Copy link
Member

notartom commented Apr 1, 2024

OK, @peterjdann you should be good to create our first migration under application/migrations, see https://codeigniter.com/userguide3/libraries/migration.html#create-a-migration for an example. That should not only make our CI pass, but also create a record in-code in-tree of the schema changes required for your PR.

…nces in catalog and keyword ID for each keyword associated with project, sorted with most used keywords first
…hyperlink only when there are two or more books in the catalog with this keyword). After each keyword, display in brackets the number of audiobooks in the catalog that have this as one of their keywords.
…oid any impression that what will be entered or appear in this field will necessarily be a single word
…' to be treated as two separate keyterms. Changed to 'Keyterms' the field label that was previously 'Keywords' (as 'Keywords' tends to imply, wrongly, that only single-word search terms can be entered here).
…irectly from new column 'instances' that I've added to table 'keywords', rather than calculating this number dynamically at run time. This change should reduce (by how much??) server load each time user displays a catalog page. For now, I have run the SQL that updates keywords.instances directly against MariaDB. It's possible in future we'll figure out a way of running that SQL as part of the current cron job that updates some project metadata.
… update of the keyword usage statistics for the newly added item (preventing a situation arising where a newly catalogued item might appear with its keyword statistics missing until such time as the next keyword statistics updating cron job may run).
…to the wishes of those who have offered comments on this mooted code change proposal.
…t it was previously pending a more thorough investigation of the issues involved. Don't want to solve one bug by introducing another unintentionally.
@peterjdann peterjdann force-pushed the display_keyterms_on_book_page branch from b94f465 to 22e2a9f Compare April 2, 2024 03:20
@peterjdann
Copy link
Contributor Author

peterjdann commented Apr 2, 2024

Since we last corresponded, I have rebased the development branch in my local environment so that it now includes, I believe, all the latest changes present in the master at this site. This involved resolving a couple of conflicts flagged in the rebase operation. I have also had a crack at creating the kind of configuration settings and code which I believe are required automatically to migrate a database schema change in this environment, working off the documentation at https://codeigniter.com/userguide3/libraries/migration.html#create-a-migration and https://codeigniter.com/userguide3/database/forge.html#adding-a-column-to-a-table.

@notartom
Copy link
Member

notartom commented Apr 2, 2024

Playing with this PR locally, I kept getting this error:

The migration class "Migration_Add_keywords_instances_field" is missing an "up" method.

Looks like there's a bug with CodeIgniter itself: https://stackoverflow.com/questions/65405139/no-working-up-method-in-migrate-codeigniter-3-with-php8

In system/libraries/Migration.php, the bit where it checks for the presence of the up() method looks like this:

elseif ( ! is_callable(array($class, $method)))
{
    $this->_error_string = sprintf($this->lang->line('migration_missing_'.$method.'_method'), $class);
    return FALSE;
}

However, per https://www.php.net/manual/en/function.is-callable.php, is_callable()'s first argument is an object, not a string name of a class, so we should be doing something like:

$instance = new $class();
<snip>
elseif ( ! is_callable(array($instance, $method)))
{
    $this->_error_string = sprintf($this->lang->line('migration_missing_'.$method.'_method'), $class);
    return FALSE;
}

@peterjdann
Copy link
Contributor Author

peterjdann commented Apr 2, 2024 via email

@redrun45
Copy link
Collaborator

redrun45 commented Apr 4, 2024

The silence is not due to a lack of enthusiasm on my part - I'm afraid we've got outages in my neck of the woods. I'll have a bit of a backlog when my PC is back online. 😅

@peterjdann
Copy link
Contributor Author

peterjdann commented Apr 4, 2024 via email

@notartom
Copy link
Member

notartom commented Apr 4, 2024

This is a really big PR (naturally), so reviewing it as a whole is difficult. In cases like this, I normally fall back on reviewing one commit at a time (really, at my day job, one commit at a time is the only way we review code). The assumption is, every commit can stand on its own, adding small bits of the overall feature; and every commit builds on the previous one, until the last one in the PR ties it all together. Splitting a feature into such commits is an art/science in and of itself.

I don't think it's fair to arbitrarily hold volunteer LibriVox contributors to the same standards as my day job colleagues, but I do think that it's valuable to promote (insist on?) quality maintainable and readable code, with a clean git history.

I think I can guess how you got to the commit history in this PR: you started with a POC of your feature, then kept adding/changing things until you beat it into shape, and ended up with the final result you had initially designed. The problem that this introduces is that subsequent commits end up undoing much of what previous commits had done. That makes it hard to review, and muddies the git log history.

Do you think you'd be able to take the final result, and starting from this end state, split it into commits that build the feature from the ground up? The usual way of doing these things is to start with the database/model changes at the bottom, then the controllers, then the views - at least, that's what I would do, keeping in mind I'm not an expert in web stuff.

Ideally, if you could also add unit tests where appropriate (newly added functions that stand on their own, with clear input -> output correctness that can be verified by tests), I would be over the moon! :)

@peterjdann
Copy link
Contributor Author

peterjdann commented Apr 4, 2024 via email

@notartom
Copy link
Member

notartom commented Apr 5, 2024

That's pretty spot on - git has various commands that can help, so let me know if you need pointers.

@peterjdann
Copy link
Contributor Author

I am concerned that I am pursuing an ever-receding mirage here. No sooner than I have got past one set of difficulties than I find myself faced with still further challenges, not one of which I had the faintest inkling of at the outset.

I should say, by way of background, that I worked for many years as the support manager of a company that developed and sold an SAP change management solution to many major companies around the world, including, by the time I finished up, Shell, CocaCola, Apple, IBM and pharmaceutical GSK. We tested every release of our product against a defined set of agreed high-criticality and moderate-criticality capabilities, but we never performed a single 'unit test', and only ever tested the complete product, even though much of the code underlying an SAP system can readily be envisaged as a series of 'units' with defined inputs and outputs.

I am now being asked to create a set of tests for models, controllers and views in an environment where, as far as I can see, with only one possible exception, only the most trivial set of model tests currently exists. (If I'm wrong, I apologise, and do please correct me, but that appears to me to be the situation at the moment.) I am also very concerned that it may prove difficult to provide meaningful, non-trivial tests of the functions I have developed without having a known, frozen database to work off — and it's not clear to me at all that the anonymised database I am currently working off is, in any sense, 'frozen'. For all I know, the keywords data in different instances of this development database may be in a more or less constant state of flux, and may be quite different by the time these tests come to be run on a machine other than my own. In short, I wonder what the possible lifespan of any tests I do develop may be, and if they will even still be valid by the time I get all finally created (which, incidentally, is looking like the middle of 2025 at the moment, from where I sit).

I can see, from the test documentation I have managed to peruse so far, that there are all sorts of way of mocking up data sources for 'unit testing' purposes, but the approaches I have read about (eg, writing pretend DB data into an array), frankly strike me as idiotic in the context in which we're working here, where what really counts (no pun intended) is whether the results of SQL queries look realistic.

If you do insist on my developing a full set of unit tests developed to go with a new, git-sanitised version of this pull request, can you please point me to example code for model, controller and view unit tests developed under the framework(s) we're using here that work with realistic-looking data? I don't see any under either PHPUnit documentation or ci-phpunit-test documentation that strike me as fitting that bill.

I am willing to give this a go if I can find some reasonable guidance, but right now nothing that I'm looking at by way of examples strikes me as any more appropriate a way of testing the change I'm proposing than simply installing it, playing with it for twenty or so minutes, then doing your darnedest to try to break it.

@redrun45
Copy link
Collaborator

redrun45 commented Apr 6, 2024

@peterjdann,
Thanks for your efforts so far. Sounds like this has gotten to feel like a rabbit hole, but I think a big chunk of that is a mismatch of expectations.

We do want to look at your final code in a more modular fashion, rather than either tracking the history of decisions across the versions, or looking at all of the changes at once.

We do not expect, or even ask, that you mock up controllers and models and a fake database for testing everything. It would be pretty cool if we had a complete set of models for unit tests, but we only recently started using any! You're not on the hook to create the whole set of mock models from scratch. Far from it.

The actual request here is much more pragmatic:

Ideally, if you could also add unit tests where appropriate (newly added functions that stand on their own, with clear input -> output correctness that can be verified by tests), I would be over the moon! :)

If you happen to add some code that is purely "functional" in the sense that its output naturally depends entirely on its input, not on any database queries, then it would be nice if you added a few test-cases for it. Stuff like this "helper" program, which could be tested without too much fuss:
https://github.com/LibriVox/librivox-catalog/pull/189/files
https://github.com/LibriVox/librivox-catalog/pull/192/files

Absolutely no database or model mockups required, requested or expected!

@notartom
Copy link
Member

notartom commented Apr 6, 2024

Yep, very much what @redrun45 said. Looking at the code in the PR again, honestly nothing jumps out as being super easy to write a unit test for. Anything that does SQL stuff is out of the question. Some of the other get_<foo>-type methods could be candidates, but it looks like mocking/stubbing a bunch of model stuff would be necessary for those.

Really, the unit tests are a bonus. I'd like to see us move in that direction, but given how much of a mess the existing code base is, it's hard not to feel like a hypocrite when asking for unit tests for new stuff.

The more important part is a cleaner sequence of commits. I just don't want to have a commit that adds method X, only for the subsequent commit to remove it and replace it with Y and Z.

@peterjdann
Copy link
Contributor Author

Thank you both. I've clearly succumbed to a bad case of not reading a suggestion that's actually laid out with perfect clarity, and for that I must apologise. And I can understand completely that what you're asking for makes sense.
As it happens, right now I've got some other work on my hands of an urgent nature that is probably going to take me around 5 weeks to complete. Once I've got that done, I'll be in a good position to give this my undivided attention.

@redrun45
Copy link
Collaborator

This one continues as a new PR, #225. 🎉

@redrun45 redrun45 closed this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants