Display keywords on book page (as per issue #210) #212

peterjdann · 2024-04-01T06:28:57Z

The immediate purpose of this proposed change is to allow users of the Librivox catalog to see what keywords have been applied to an audiobook, to see how many audiobooks in all have that keyword, and, where one or more other audiobooks have that same keyword, to list them by clicking the keyword concerned.

In the longer term, if/when we can improve the functioning of the Advanced Search capability, a better understanding of what keywords are commonly applied to our audiobooks (facilitated by this change) should enable users to conduct more sophisticates searches (for example, to find audiobooks that have both "love" and "French Revolution" as keywords, to take a purely hypothetical example).

Included in this change is code which I propose should be run as a cron job (to update keyword usage statistics) and a call to a quick-and-dirty update function that will allow the keyword usage stats for a newly catalogued project to show in whatever interval may exist before the next cron job will run,.

peterjdann · 2024-04-01T06:53:23Z

I am an almost complete newbie at using GitHub fork-and-clone, and must admit I have not the following idea what, if anything, I should be doing about the "All checks have failed" message I'm currently seeing for this pull request. Please advise!

redrun45 · 2024-04-01T12:00:56Z

The problem

I believe the problem is: while your new code is in the pull request, the modifications you made to the database are not.

Looking at the log ('Details' link on our "Deploy localdev (PR)" check task), it seems that when GitHub makes a test copy of the web site using your new code, that test copy returns a '500' internal server error message. I pulled up a copy myself, and here's what I see when I access a given page:

An uncaught Exception was encountered
Type: mysqli_sql_exception
Message: Unknown column 'keywords.instances' in 'field list'
Filename: /librivox/www/librivox.org/catalog/system/database/drivers/mysqli/mysqli_driver.php
Line Number: 307

As for what to do about it, this is new to me too. I've never had to update the database schema in tandem with a code change.

What I'd suggest, is to first take a snapshot of your current state, then re-run Ansible to reset everything (including the database) to the shared version we're working off of. Then see what it takes to get things working from there.

Housekeeping

You'll notice that our 'librivox/master' version of the code has been updated since you started, so you can go to your "fork" page and you'll have an option to merge those changes into your fork. The line that should have this button for you, currently shows me:

This branch is 14 commits ahead of, 6 commits behind LibriVox/librivox-catalog:master.

You'll want to merge those 6 commits into your fork, and then go back to your dev machine to git pull the resulting, updated branch.

Next steps (Updated per Artom's post)

Now, with the database reset and your code branch up to date, you'll be seeing the error code that I'm seeing. From there, you can create a database migration file, with all the steps necessary to get the database working with your code.

We've already done a few schema changes "out of band", but with LibriVox#212 we now have a situation where a schema change is needed in a PR for the code to work, and since we have no way to supply that, our CI (continuout integration, not Code Ingiter), basic such as it is, fails because the schema that the code relies on is not deployed in the CI VM. In addition, tracking schema changes inline with migrations is just good standard practice. This patch starts making use of Code Igniter's DB migrations framework (https://codeigniter.com/userguide3/libraries/migration.html) to allow PRs to include schema changes. There will be a librivox-ansible follow-up patch to run the DB migrate script on every deployment.

notartom · 2024-04-01T14:59:37Z

#213 is the correct "industry standard" way to do these things. If that passes the checks, I'll merge it, then propose a patch to librivox-ansible to run the DB migrate command on every deployment, and from then on, PRs will be able to add their schema changes inline.

We've already done a few schema changes "out of band", but with #212 we now have a situation where a schema change is needed in a PR for the code to work, and since we have no way to supply that, our CI (continuout integration, not Code Ingiter), basic such as it is, fails because the schema that the code relies on is not deployed in the CI VM. In addition, tracking schema changes inline with migrations is just good standard practice. This patch starts making use of Code Igniter's DB migrations framework (https://codeigniter.com/userguide3/libraries/migration.html) to allow PRs to include schema changes. There will be a librivox-ansible follow-up patch to run the DB migrate script on every deployment.

notartom · 2024-04-01T15:17:36Z

OK, @peterjdann you should be good to create our first migration under application/migrations, see https://codeigniter.com/userguide3/libraries/migration.html#create-a-migration for an example. That should not only make our CI pass, but also create a record in-code in-tree of the schema changes required for your PR.

…rom the DB

…nces in catalog and keyword ID for each keyword associated with project, sorted with most used keywords first

…hyperlink only when there are two or more books in the catalog with this keyword). After each keyword, display in brackets the number of audiobooks in the catalog that have this as one of their keywords.

…urred in master in past few weeks.

…oid any impression that what will be entered or appear in this field will necessarily be a single word

…' to be treated as two separate keyterms. Changed to 'Keyterms' the field label that was previously 'Keywords' (as 'Keywords' tends to imply, wrongly, that only single-word search terms can be entered here).

…rtial I had previously failed to notice.

…irectly from new column 'instances' that I've added to table 'keywords', rather than calculating this number dynamically at run time. This change should reduce (by how much??) server load each time user displays a catalog page. For now, I have run the SQL that updates keywords.instances directly against MariaDB. It's possible in future we'll figure out a way of running that SQL as part of the current cron job that updates some project metadata.

…tps://librivox.org/cron/keywords_stats/update_keywords_stats

… update of the keyword usage statistics for the newly added item (preventing a situation arising where a newly catalogued item might appear with its keyword statistics missing until such time as the next keyword statistics updating cron job may run).

…to the wishes of those who have offered comments on this mooted code change proposal.

…t it was previously pending a more thorough investigation of the issues involved. Don't want to solve one bug by introducing another unintentionally.

…d in 'keywords' table

peterjdann · 2024-04-02T03:38:55Z

Since we last corresponded, I have rebased the development branch in my local environment so that it now includes, I believe, all the latest changes present in the master at this site. This involved resolving a couple of conflicts flagged in the rebase operation. I have also had a crack at creating the kind of configuration settings and code which I believe are required automatically to migrate a database schema change in this environment, working off the documentation at https://codeigniter.com/userguide3/libraries/migration.html#create-a-migration and https://codeigniter.com/userguide3/database/forge.html#adding-a-column-to-a-table.

notartom · 2024-04-02T04:36:13Z

Playing with this PR locally, I kept getting this error:

The migration class "Migration_Add_keywords_instances_field" is missing an "up" method.

Looks like there's a bug with CodeIgniter itself: https://stackoverflow.com/questions/65405139/no-working-up-method-in-migrate-codeigniter-3-with-php8

In system/libraries/Migration.php, the bit where it checks for the presence of the up() method looks like this:

elseif ( ! is_callable(array($class, $method)))
{
    $this->_error_string = sprintf($this->lang->line('migration_missing_'.$method.'_method'), $class);
    return FALSE;
}

However, per https://www.php.net/manual/en/function.is-callable.php, is_callable()'s first argument is an object, not a string name of a class, so we should be doing something like:

$instance = new $class();
<snip>
elseif ( ! is_callable(array($instance, $method)))
{
    $this->_error_string = sprintf($this->lang->line('migration_missing_'.$method.'_method'), $class);
    return FALSE;
}

peterjdann · 2024-04-02T04:51:01Z

I had just found that Stackoverflow post when you posted this and was looking at the code concerned. I am not confident I understand where or how to implement the line you describe as “$instance - new $class(); Would you be willing to put that through as a change in the master which I can then rebase from? Edit: All good. It's pretty obvious how to patch this. I've done it now as you suggested. Peter

…

On 2 Apr 2024, at 3:36 pm, Artom Lifshitz ***@***.***> wrote: Playing with this PR locally, I kept getting this error: The migration class "Migration_Add_keywords_instances_field" is missing an "up" method. Looks like there's a bug with CodeIgniter itself: https://stackoverflow.com/questions/65405139/no-working-up-method-in-migrate-codeigniter-3-with-php8 In system/libraries/Migration.php, the bit where it checks for the presence of the up() method looks like this: elseif ( ! is_callable(array($class, $method))) { $this->_error_string = sprintf($this->lang->line('migration_missing_'.$method.'_method'), $class); return FALSE; } However, per https://www.php.net/manual/en/function.is-callable.php, is_callable()'s first argument is an object, not a string name of a class, so we should be doing something like: $instance = new $class(); <snip> elseif ( ! is_callable(array($instance, $method))) { $this->_error_string = sprintf($this->lang->line('migration_missing_'.$method.'_method'), $class); return FALSE; } — Reply to this email directly, view it on GitHub <#212 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APUJ53HYU7AXXSNEMRZ72GDY3IYVHAVCNFSM6AAAAABFREOGHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZRGA3DSMJVHE>. You are receiving this because you were mentioned.

redrun45 · 2024-04-04T00:22:03Z

The silence is not due to a lack of enthusiasm on my part - I'm afraid we've got outages in my neck of the woods. I'll have a bit of a backlog when my PC is back online. 😅

peterjdann · 2024-04-04T00:28:31Z

Not a problem in the least. I’m not expecting any instantaneous response at all, and am only sorry you’re having to wade through such difficulties. All in good time.

…

On 4 Apr 2024, at 11:22 AM, redrun45 ***@***.***> wrote: The silence is not due to a lack of enthusiasm on my part - I'm afraid we've got outages in my neck of the woods. I'll have a bit of a backlog when my PC is back online. 😅 — Reply to this email directly, view it on GitHub <#212 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APUJ53B5YH6C6GN6ZTEBNCLY3SMMDAVCNFSM6AAAAABFREOGHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVHA2DKNZRGQ>. You are receiving this because you were mentioned.

notartom · 2024-04-04T01:03:28Z

This is a really big PR (naturally), so reviewing it as a whole is difficult. In cases like this, I normally fall back on reviewing one commit at a time (really, at my day job, one commit at a time is the only way we review code). The assumption is, every commit can stand on its own, adding small bits of the overall feature; and every commit builds on the previous one, until the last one in the PR ties it all together. Splitting a feature into such commits is an art/science in and of itself.

I don't think it's fair to arbitrarily hold volunteer LibriVox contributors to the same standards as my day job colleagues, but I do think that it's valuable to promote (insist on?) quality maintainable and readable code, with a clean git history.

I think I can guess how you got to the commit history in this PR: you started with a POC of your feature, then kept adding/changing things until you beat it into shape, and ended up with the final result you had initially designed. The problem that this introduces is that subsequent commits end up undoing much of what previous commits had done. That makes it hard to review, and muddies the git log history.

Do you think you'd be able to take the final result, and starting from this end state, split it into commits that build the feature from the ground up? The usual way of doing these things is to start with the database/model changes at the bottom, then the controllers, then the views - at least, that's what I would do, keeping in mind I'm not an expert in web stuff.

Ideally, if you could also add unit tests where appropriate (newly added functions that stand on their own, with clear input -> output correctness that can be verified by tests), I would be over the moon! :)

peterjdann · 2024-04-04T01:57:40Z

Artom, you’re quite right in your assumptions about my method of development here. I imagine the easiest (ha!) way for me to do what you are suggesting is: - Create a new branch off master - Review everything I changed as part of this PR, and group these changes into discreet, testable chunks, presumably following the order you have suggested - DB/model changes, controller changes, then view changes - Make myself acquainted with whatever is going on in application/tests and create tests capabie of verifying the validity of each of the small ‘chunks’ of changes I am about to make, wherever possible - Implement and commit these new changes and their accompanying tests in a stepwise fashion, in the order you suggest Is that correct? I’d much rather you be over the moon than under it, so I’m certainly willing to have a go at what you’re suggesting. I do want to make sure I understand what you’re suggesting, though.

…

On 4 Apr 2024, at 12:03 PM, Artom Lifshitz ***@***.***> wrote: This is a really big PR (naturally), so reviewing it as a whole is difficult. In cases like this, I normally fall back on reviewing one commit at a time (really, at my day job, one commit at a time is the only way we review code). The assumption is, every commit can stand on its own, adding small bits of the overall feature; and every commit builds on the previous one, until the last one in the PR ties it all together. Splitting a feature into such commits is an art/science in and of itself. I don't think it's fair to arbitrarily hold volunteer LibriVox contributors to the same standards as my day job colleagues, but I do think that it's valuable to promote (insist on?) quality maintainable and readable code, with a clean git history. I think I can guess how you got to the commit history in this PR: you started with a POC of your feature, then kept adding/changing things until you beat it into shape, and ended up with the final result you had initially designed. The problem that this introduces is that subsequent commits end up undoing much of what previous commits had done. That makes it hard to review, and muddies the git log history. Do you think you'd be able to take the final result, and starting from this end state, split it into commits that build the feature from the ground up? The usual way of doing these things is to start with the database/model changes at the bottom, then the controllers, then the views - at least, that's what I would do, keeping in mind I'm not an expert in web stuff. Ideally, if you could also add unit tests where appropriate (newly added functions that stand on their own, with clear input -> output correctness that can be verified by tests), I would be over the moon! :) — Reply to this email directly, view it on GitHub <#212 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APUJ53GRZJHVK4NNUIXRMRLY3SRHNAVCNFSM6AAAAABFREOGHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVHA4TINZVGQ>. You are receiving this because you were mentioned.

notartom · 2024-04-05T18:27:38Z

That's pretty spot on - git has various commands that can help, so let me know if you need pointers.

peterjdann · 2024-04-06T06:59:10Z

I am concerned that I am pursuing an ever-receding mirage here. No sooner than I have got past one set of difficulties than I find myself faced with still further challenges, not one of which I had the faintest inkling of at the outset.

I should say, by way of background, that I worked for many years as the support manager of a company that developed and sold an SAP change management solution to many major companies around the world, including, by the time I finished up, Shell, CocaCola, Apple, IBM and pharmaceutical GSK. We tested every release of our product against a defined set of agreed high-criticality and moderate-criticality capabilities, but we never performed a single 'unit test', and only ever tested the complete product, even though much of the code underlying an SAP system can readily be envisaged as a series of 'units' with defined inputs and outputs.

I am now being asked to create a set of tests for models, controllers and views in an environment where, as far as I can see, with only one possible exception, only the most trivial set of model tests currently exists. (If I'm wrong, I apologise, and do please correct me, but that appears to me to be the situation at the moment.) I am also very concerned that it may prove difficult to provide meaningful, non-trivial tests of the functions I have developed without having a known, frozen database to work off — and it's not clear to me at all that the anonymised database I am currently working off is, in any sense, 'frozen'. For all I know, the keywords data in different instances of this development database may be in a more or less constant state of flux, and may be quite different by the time these tests come to be run on a machine other than my own. In short, I wonder what the possible lifespan of any tests I do develop may be, and if they will even still be valid by the time I get all finally created (which, incidentally, is looking like the middle of 2025 at the moment, from where I sit).

I can see, from the test documentation I have managed to peruse so far, that there are all sorts of way of mocking up data sources for 'unit testing' purposes, but the approaches I have read about (eg, writing pretend DB data into an array), frankly strike me as idiotic in the context in which we're working here, where what really counts (no pun intended) is whether the results of SQL queries look realistic.

If you do insist on my developing a full set of unit tests developed to go with a new, git-sanitised version of this pull request, can you please point me to example code for model, controller and view unit tests developed under the framework(s) we're using here that work with realistic-looking data? I don't see any under either PHPUnit documentation or ci-phpunit-test documentation that strike me as fitting that bill.

I am willing to give this a go if I can find some reasonable guidance, but right now nothing that I'm looking at by way of examples strikes me as any more appropriate a way of testing the change I'm proposing than simply installing it, playing with it for twenty or so minutes, then doing your darnedest to try to break it.

redrun45 · 2024-04-06T08:34:12Z

@peterjdann,
Thanks for your efforts so far. Sounds like this has gotten to feel like a rabbit hole, but I think a big chunk of that is a mismatch of expectations.

We do want to look at your final code in a more modular fashion, rather than either tracking the history of decisions across the versions, or looking at all of the changes at once.

We do not expect, or even ask, that you mock up controllers and models and a fake database for testing everything. It would be pretty cool if we had a complete set of models for unit tests, but we only recently started using any! You're not on the hook to create the whole set of mock models from scratch. Far from it.

The actual request here is much more pragmatic:

Ideally, if you could also add unit tests where appropriate (newly added functions that stand on their own, with clear input -> output correctness that can be verified by tests), I would be over the moon! :)

If you happen to add some code that is purely "functional" in the sense that its output naturally depends entirely on its input, not on any database queries, then it would be nice if you added a few test-cases for it. Stuff like this "helper" program, which could be tested without too much fuss:
https://github.com/LibriVox/librivox-catalog/pull/189/files
https://github.com/LibriVox/librivox-catalog/pull/192/files

Absolutely no database or model mockups required, requested or expected!

notartom · 2024-04-06T14:14:36Z

Yep, very much what @redrun45 said. Looking at the code in the PR again, honestly nothing jumps out as being super easy to write a unit test for. Anything that does SQL stuff is out of the question. Some of the other get_<foo>-type methods could be candidates, but it looks like mocking/stubbing a bunch of model stuff would be necessary for those.

Really, the unit tests are a bonus. I'd like to see us move in that direction, but given how much of a mess the existing code base is, it's hard not to feel like a hypocrite when asking for unit tests for new stuff.

The more important part is a cleaner sequence of commits. I just don't want to have a commit that adds method X, only for the subsequent commit to remove it and replace it with Y and Z.

peterjdann · 2024-04-07T06:58:26Z

Thank you both. I've clearly succumbed to a bad case of not reading a suggestion that's actually laid out with perfect clarity, and for that I must apologise. And I can understand completely that what you're asking for makes sense.
As it happens, right now I've got some other work on my hands of an urgent nature that is probably going to take me around 5 weeks to complete. Once I've got that done, I'll be in a good position to give this my undivided attention.

redrun45 · 2024-05-16T16:13:51Z

This one continues as a new PR, #225. 🎉

notartom mentioned this pull request Apr 1, 2024

Start using CodeIgniter DB migrations #213

Merged

peterjdann added 15 commits April 2, 2024 11:36

Book page now displays keywords in lowercase, as they come straight f…

7abb2c1

…rom the DB

book page now displays (not nicely formatted) text, number of occurre…

6aaddb1

…nces in catalog and keyword ID for each keyword associated with project, sorted with most used keywords first

Fixed error occurring when project had no keywords

49dbe1d

Added hyperlinks to list of keywords appearing on book page (but add …

d7e45fc

…hyperlink only when there are two or more books in the catalog with this keyword). After each keyword, display in brackets the number of audiobooks in the catalog that have this as one of their keywords.

Rebased local development branch to incorporate changes that have occ…

147b8ef

…urred in master in past few weeks.

For GUI (only) changed references from 'keywords' to 'keyterms' to av…

bcf3fb9

…oid any impression that what will be entered or appear in this field will necessarily be a single word

Fix advanced search bug that caused a keyterm like 'political fiction…

cf2b958

…' to be treated as two separate keyterms. Changed to 'Keyterms' the field label that was previously 'Keywords' (as 'Keywords' tends to imply, wrongly, that only single-word search terms can be entered here).

Including in rebased development branch a fixed conflict in footer pa…

e16e726

…rtial I had previously failed to notice.

Resolved conflict in footer partial during rebase

a8d816d

Enable cron job update of keywords stats by performing GET against ht…

c0032f6

…tps://librivox.org/cron/keywords_stats/update_keywords_stats

Changed GUI references to 'keyterms' back to 'keywords' in deference …

852af6e

…to the wishes of those who have offered comments on this mooted code change proposal.

Restore treatment of keywords explosion at Librivox_search.php to wha…

9b7921e

…t it was previously pending a more thorough investigation of the issues involved. Don't want to solve one bug by introducing another unintentionally.

Added (coded) migration instructions for creation of 'instances' fiel…

22e2a9f

…d in 'keywords' table

peterjdann force-pushed the display_keyterms_on_book_page branch from b94f465 to 22e2a9f Compare April 2, 2024 03:20

Changed migration-related timestamp values from 12 digit to 14 digits

7b3dd55

Patch to Migration.php required for compatibility with PHP 8

4a0e2ce

redrun45 closed this May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Display keywords on book page (as per issue #210) #212

Display keywords on book page (as per issue #210) #212

peterjdann commented Apr 1, 2024

peterjdann commented Apr 1, 2024

redrun45 commented Apr 1, 2024 •

edited

Loading

notartom commented Apr 1, 2024 •

edited

Loading

notartom commented Apr 1, 2024

peterjdann commented Apr 2, 2024 •

edited

Loading

notartom commented Apr 2, 2024 •

edited

Loading

peterjdann commented Apr 2, 2024 via email •

edited

Loading

redrun45 commented Apr 4, 2024

peterjdann commented Apr 4, 2024 via email

notartom commented Apr 4, 2024

peterjdann commented Apr 4, 2024 via email

notartom commented Apr 5, 2024

peterjdann commented Apr 6, 2024

redrun45 commented Apr 6, 2024

notartom commented Apr 6, 2024

peterjdann commented Apr 7, 2024

redrun45 commented May 16, 2024

Display keywords on book page (as per issue #210) #212

Display keywords on book page (as per issue #210) #212

Conversation

peterjdann commented Apr 1, 2024

peterjdann commented Apr 1, 2024

redrun45 commented Apr 1, 2024 • edited Loading

The problem

Housekeeping

Next steps (Updated per Artom's post)

notartom commented Apr 1, 2024 • edited Loading

notartom commented Apr 1, 2024

peterjdann commented Apr 2, 2024 • edited Loading

notartom commented Apr 2, 2024 • edited Loading

peterjdann commented Apr 2, 2024 via email • edited Loading

redrun45 commented Apr 4, 2024

peterjdann commented Apr 4, 2024 via email

notartom commented Apr 4, 2024

peterjdann commented Apr 4, 2024 via email

notartom commented Apr 5, 2024

peterjdann commented Apr 6, 2024

redrun45 commented Apr 6, 2024

notartom commented Apr 6, 2024

peterjdann commented Apr 7, 2024

redrun45 commented May 16, 2024

redrun45 commented Apr 1, 2024 •

edited

Loading

notartom commented Apr 1, 2024 •

edited

Loading

peterjdann commented Apr 2, 2024 •

edited

Loading

notartom commented Apr 2, 2024 •

edited

Loading

peterjdann commented Apr 2, 2024 via email •

edited

Loading