Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle multiple middle names of authors #81

Closed
only1chunts opened this issue Nov 7, 2016 · 9 comments
Closed

handle multiple middle names of authors #81

only1chunts opened this issue Nov 7, 2016 · 9 comments

Comments

@only1chunts
Copy link
Member

only1chunts commented Nov 7, 2016

This should be considered in conjunction with #80 & #82

Many authors have more than 1 middle name. e.g. Thomas Gilbert = M Thomas P Gilbert. or Irene Wing Shan Chik, Steve Kwan Hok Tong, Kwok Wing Stephen Tsui etc.... we need a method to enable the display of names to be slightly more configurable. Perhaps the easiest method is to have a new column in the database for "display name", which auto populates with the usual first initial of each name, but can be over-written if required (in admin pages) so Steve Kwan Hok Tong - autopopulates to "Tong SK" but can be updated to "Tong SKH" if required. The webpage then takes the display name to generate the web-view.
The citations are listed in GigaScience papers as . e.g Hebsgaard MB, Gilbert MTP, Arneborg J

@only1chunts
Copy link
Member Author

This is the same issue as ticket #8 but this solution is better so I have closed #8.
Some of our authors have multiple middle initials,
e.g. "Teo A S M", is currently stored in the database as:
Surname = "Teo"
middle name = "SM"
first name = "Audrey"
but displays on the dataset as "Teo, A S" (http://gigadb.org/dataset/100165) instead of "Teo A S M".
Make it so that the website displays the 1st letter of each word of middle_name column. We then need to store the names in the database with spaces between multiple middle initials e.g.
Surname = "Teo"
middle name = "S M"
first name = "Audrey"

@rija
Copy link
Contributor

rija commented Jan 11, 2018

Hi,

As the goal of this change request is to allow the accommodation of: the diversity of names,
a standardised format (Gigascience Journal), and ability to tweak the displayed information,
I suggest the implementation of the following display algorhitm:

  1. extract surname
  2. extract first letter of every word in first_name and middle_name field
  3. If only capital letters are entered in one of the first_name or middle_name field, return each of the capital letter before moving on to the next word.
  4. If a period, space or comma is met, it is considered as a word separator and algorithm will apply to the words before and after
  5. merge all initials into one string
  6. return the resulting display name in the order: Surname, initials string, separated by a space

Below, I take a sample of author records from the database and apply the algorithm to each.

Given the author has surname surname
And the author has first name first_name
And the auhor has middle name middle_name
When gigadb web site shows a paper by the author
Then the author's name should be displayed as display_name

Examples:

surname first_name middle_name display_name
Teo Audrey SM Teo ASM
Gilbert M.Thomas P Gilbert MTP
Muñoz Ángel GG Muñoz ÁGG
Martinez-Cruzado Juan Carlos Martinez-Cruzado JC
Shen Yong-Yi Shen Y
Tong Steve KwanHok Tong SK
Tong Steve Kwan.Hok Tong SKH
Tong Steve Kwan,Hok Tong SKH
Tong Steve Kwan Hok Tong SKH
Tong Kwan Hok Steve Tong KHS

In the last example, the middle name has been entered as "KwanHok" in the database, but if the middle name is changed by the author
to "Kwan Hok" or "Kwan.Hok" or "Kwan,Hok"
then the resulting display name would be: Tong SKH

It will help to have a preview area in the author edit form, so that form user can see in the preview how the name would be displayed on the web site.

Thus, with this predictable algorithm and a preview area, authors (or admins) have the flexibility to experiment in the form's existing name fields to influence how the name is displayed.

Let me know if the above examples and suggestion meet your goal for this ticket.

@only1chunts
Copy link
Member Author

only1chunts commented Jan 11, 2018 via email

rija referenced this issue in rija/gigadb-website Jan 12, 2018
create the initial behat config file in tests.
rija referenced this issue in rija/gigadb-website Jan 12, 2018
Wrote test scenarios for issue #81 using examples from the prodcution database.
Feature Context file boilerplate created.
Behat framework installed.
rija referenced this issue in rija/gigadb-website Jan 13, 2018
…science story

implemented the step definitions for the scenario to implement the story for issue #81 from gigascience perspective.

Made used of Behat tables to and real author names from the database.

That means taht before each run of the suite, the sample of production database needs to be loaded. The database dump is called "author-names-80-81-82.pgdmp" and it differs only from the  2016 dump by having an Attribute record for "urlredirect".
rija referenced this issue in rija/gigadb-website Jan 13, 2018
It was tricky to get PHPUnit with Yii1.1 with Gigadb-website running in Vagrant.
Among the several problem:   recent version of Php unit and its extensions would cause issues (4.8 fails. but 4.1 is ok).
Also the phpunit selenium extension is required even we don't use it  because the Yii framework references it!
Also avoid using minimum-stability as 'dev' in composer.json. prefer 'stable' and override with @dev on individual package if necessary.
rija referenced this issue in rija/gigadb-website Jan 14, 2018
set up the unit tests for the necessary routines. Currently failing as expected as implementation incomplete.
rija referenced this issue in rija/gigadb-website Jan 14, 2018
the display format for authors on the dataset page now is the same as the references on Gigascience Journal and all tests (unit and acceptance) are passing.

However I came across two names which were entered in a way that may make me revise one of my initial assumption.
rija referenced this issue in rija/gigadb-website Jan 15, 2018
The display format for authors on the dataset page now is the same as the references on Gigascience Journal and all tests (unit and acceptance) are passing.
rija referenced this issue in rija/gigadb-website Jan 15, 2018
Installed php-pecl-xcebug in vagrant chef recipe as it is needed by phpunit to calculate test coverage.
Set up the creation of a test database in gigadb chef recipe, so that unit tests can run without disrupting manual and automated user testing by wiping out the development database.
Created a db_test.json config for the test database.
Moidified Yii test config to use the test database.
Run the test coverage for the gigadb-website codebase and generate a comprehensive html report.
rija referenced this issue in rija/gigadb-website Jan 16, 2018
I didn't realised the test.php config for Yii was also pulling main.php and therefore overriding the $dbConfig variable causing the unit tests to use the mai database. Corrected it by loadding the test database config into a distinct variable.
rija referenced this issue in rija/gigadb-website Jan 16, 2018
figured out how to organise Behat FeatureContext's step definitions in multiple file. The answer is to use subContexts.
Created a feature and a scenario for the preview functionality.
rija referenced this issue in rija/gigadb-website Jan 17, 2018
Refactored the Author's displayName function to be more flexible.
Added a field in the author view to show display name.
Updated the login form to be actionable by automated tests (submit button needs to be wihin the <form></form> element.

Moved snapshot after failure hook to GigadbWebsite so it can be automatically available to all subcontexts.

Got all acceptance tests scenario passing.
rija referenced this issue in rija/gigadb-website Jan 17, 2018
part of attempt at setting functional testing with phpunit
rija referenced this issue in rija/gigadb-website Jan 17, 2018
make sure test coverage reports are not commited
rija referenced this issue in rija/gigadb-website Jan 17, 2018
all test passing.

Just made sure the admin login comes from env variables.
Also add test_users* env variables with tests login/password in .gitignore.
rija referenced this issue in rija/gigadb-website Jan 19, 2018
@rija
Copy link
Contributor

rija commented Jan 30, 2018

Hi @only1chunts,

I've made a first implementation of this functionality on my branch:
https://github.com/rija/gigadb-website/tree/author-names-81

You can see the stories and acceptance criteria on the page below (click the [+] all button to expand and see them)

https://rija.github.io/gigadb-website/scenarios_report/author-names-80-81-82.html

It implements the algorithm I've described (see the examples in the page) but it scales down a little the idea of preview area in the form:
The form is so small and simple (only 4 fields matching the database colum of the author table), that it's no less quick and convenient if I add a display name preview field in the author view page the curator sees after updating an author form.
(Maybe adding an "edit" button on the author view for admin users would make that even more easier to make further correction)

Whereas a faithful preview area in the form would require a round-trip with the server for an ajax call, which is not resource efficient.

Other observations I made during my implementation shown in the example on the above acceptance criteria page:

(1) There are some abbrievated names who are meant to stay lowercase. See the example of "Hekkert BtL", the "t" should stay lowercase, as shown in the pubmed article linked from the dataset: https://www.ncbi.nlm.nih.gov/pubmed/21743474
Therefore the algorithm only abbreviate, it won't capitalise. Authors would have to capitalise appropriately for the case to be retained (which is what is happening anyway, looking at sample from the database).

(2) Accentuated characters (issue #82) are now represented correctly as seen in the example "Muñoz ÁGG" and "Schiøtt M"

(3) Some trailing special characters are filtered out automatically as seen in "Schiøtt, "

(4) Some names have a "Jr" attached, this should not be abbreviated hence the resulting display name "Loughran TPJr", the corresponding Gigascience manuscript preserve the "Jr" in the list of references: https://academic.oup.com/gigascience/article/2/1/1/2656134

Let me know what you think.

@only1chunts
Copy link
Member Author

I think that all sounds great, I think I need to see it in action to understand if it does what I need so I will ask @jessesiu to get it up and running on his machine for me to see.

rija referenced this issue in rija/gigadb-website Feb 2, 2018
Wrote test scenarios for issue #81 using examples from the prodcution database.
Feature Context file boilerplate created.
Behat framework installed.
rija referenced this issue in rija/gigadb-website Feb 2, 2018
…science story

implemented the step definitions for the scenario to implement the story for issue #81 from gigascience perspective.

Made used of Behat tables to and real author names from the database.

That means taht before each run of the suite, the sample of production database needs to be loaded. The database dump is called "author-names-80-81-82.pgdmp" and it differs only from the  2016 dump by having an Attribute record for "urlredirect".
rija referenced this issue in rija/gigadb-website Feb 2, 2018
It was tricky to get PHPUnit with Yii1.1 with Gigadb-website running in Vagrant.
Among the several problem:   recent version of Php unit and its extensions would cause issues (4.8 fails. but 4.1 is ok).
Also the phpunit selenium extension is required even we don't use it  because the Yii framework references it!
Also avoid using minimum-stability as 'dev' in composer.json. prefer 'stable' and override with @dev on individual package if necessary.
rija referenced this issue in rija/gigadb-website Feb 2, 2018
set up the unit tests for the necessary routines. Currently failing as expected as implementation incomplete.
rija referenced this issue in rija/gigadb-website Feb 2, 2018
The display format for authors on the dataset page now is the same as the references on Gigascience Journal and all tests (unit and acceptance) are passing.
rija referenced this issue in rija/gigadb-website Feb 2, 2018
Installed php-pecl-xcebug in vagrant chef recipe as it is needed by phpunit to calculate test coverage.
Set up the creation of a test database in gigadb chef recipe, so that unit tests can run without disrupting manual and automated user testing by wiping out the development database.
Created a db_test.json config for the test database.
Moidified Yii test config to use the test database.
Run the test coverage for the gigadb-website codebase and generate a comprehensive html report.
rija referenced this issue in rija/gigadb-website Feb 2, 2018
I didn't realised the test.php config for Yii was also pulling main.php and therefore overriding the $dbConfig variable causing the unit tests to use the mai database. Corrected it by loadding the test database config into a distinct variable.
rija referenced this issue in rija/gigadb-website Feb 2, 2018
figured out how to organise Behat FeatureContext's step definitions in multiple file. The answer is to use subContexts.
Created a feature and a scenario for the preview functionality.
rija referenced this issue in rija/gigadb-website Feb 2, 2018
Refactored the Author's displayName function to be more flexible.
Added a field in the author view to show display name.
Updated the login form to be actionable by automated tests (submit button needs to be wihin the <form></form> element.

Moved snapshot after failure hook to GigadbWebsite so it can be automatically available to all subcontexts.

Got all acceptance tests scenario passing.
rija referenced this issue in rija/gigadb-website Feb 2, 2018
part of attempt at setting functional testing with phpunit
rija referenced this issue in rija/gigadb-website Feb 2, 2018
all test passing.

Just made sure the admin login comes from env variables.
Also add test_users* env variables with tests login/password in .gitignore.
rija referenced this issue in rija/gigadb-website Feb 2, 2018
It's best practice to use "composer install" insstad of keeeping the composer vendorised subtree of dependencies.
rija referenced this issue in rija/gigadb-website Feb 2, 2018
downgraded PHPUnit to 4.1.1, so it work with the Yii Framework (even 4.3.* doesn't work with Yii 1.1).
Switched to minimum-stability: stable, so I can remove the @stable suffixes
add @dev suffix to mink-wunit-driver as that's the only branch avaiable for that package.
Removed google api client as it's already in Gigadb codebase.

Because of the PHPUNit downgraded, cannot use namespace when using PHPUNit Assert
(PHPUNit\Framework\Assert had to be replaced by PHPUnit_Framework_Assert)

Controlled which Behat hooks get run with which scenario using tags on features and on hooks
so that for example oauth revokation hook are only actioned in affilate login scenario

Addded a scenario to ensure the test environment is loaded up with all the test data in the author name display and name display preview tests.

Updated one sql test data script to drop the not null constraint on the affiliation column of the gigadb_user table.

Updated the test runner so it run all the acceptance tests and unit tests.
rija referenced this issue in rija/gigadb-website Feb 2, 2018
…cumentation

After all tests have run, the previous state of the database is restored.
This is done in the test runner now as it is a concern that overarch all tests.
Updating the TESTING docs with info on running unit tests and the above database setup.
rija referenced this issue in rija/gigadb-website Feb 6, 2018
Fixed syntax of pg_dump to ensure the initial state of database is saved and restored.
increased sleep time after terminating pg backend processes
Made the test runner logs visible in protected/runtime/
Moved the printCurrentUrl inside the try{} block as it blows up when a step fails and no web session has been started yet(with visit)
rija referenced this issue in rija/gigadb-website Feb 7, 2018
…e deterministic

By default CDbFixtureManager.php loads the fixtures using readdir() which returns list files in the order in which they are stored by the filesystem. It "seems" than that order vary from system to system resulting in the following error:

CDbException: CDbCommand failed to execute the SQL statement: SQLSTATE[23503]: Foreign key violation: 7 ERROR:  insert or update on table "dataset_author" violates foreign key constraint "dataset_author_author_id_fkey"
DETAIL:  Key (author_id)=(1) is not present in table "author".

because the join table fixture is loaded before the data table.

I've created an init.php file in the fixtures directory to customise the fixture loading behaviour, in this case, ensuring the fixtures are loaded  in the order that won't violate the foreign key constraint.
@only1chunts
Copy link
Member Author

Hi @rija , I've had a look at the version @jessesiu got running on his machine, it appears to be mostly fine but when I look at the admin page of the authors (e.g. http://127.0.0.1:9170/adminAuthor/update/id/3789 ) I have no way to update the display name?

@rija
Copy link
Contributor

rija commented Feb 8, 2018

Hi @only1chunts,

The list of examples in the acceptance test scenarios was to show how the display format of an author’s name can be tweaked in a variety of ways while adhering to Gigascience Journal’s reference standards.

As it is implemented, there is no explicit editable display name field for you to type in.
The variety of ways an author’s name could be displayed as can be obtained by using the existing fields (Surname, First name, and Middle name) on the new formatting rules. Additionally, the author’s view screen has a display field to show how it would appear on a dataset page.

What I understand in your last comment, is that, irrespective of those new rules and how an author name’s elements can be edited in existing fields, there is a need for an additional free-form, editable, display name field on that form.
Could you confirm that my understanding there is correct?

@only1chunts
Copy link
Member Author

only1chunts commented Feb 8, 2018 via email

@rija
Copy link
Contributor

rija commented Feb 8, 2018

Thank you @only1chunts,

In that case, two things:

  1. We will need a new varchar column (custom_name) in the author database table
  2. We need to add the following two acceptance criteria:

Scenario: If custom display name field is empty, save calculated value 
	Given I sign in as an admin
	And I am on "/adminAuthor/update/id/19"
	When I fill in "Author_surname" with "Poe"
	And I fill in "Author_first_name" with "Edgar"
	And I fill in "Author_middle_name" with "Allan"
	And I press "Save"
	Then I should see "Poe EA"


Scenario: when display name edited, save it instead of calculated value
	Given I sign in as an admin
	And I am on "/adminAuthor/update/id/19"
	When I fill in "Author_surname" with "Poe"
	And I fill in "Author_first_name" with "Edgar"
	And I fill in "Author_middle_name" with "Allan"
	And I fill in "Author_custom_name" with "PEA"
	And I press "Save"
	Then I should see "PEA"

rija referenced this issue in rija/gigadb-website Feb 8, 2018
getDisplayName on Author model used the third form for generateDisplayName (as intended) instead of making the concatenation itself
rija referenced this issue in rija/gigadb-website Feb 8, 2018
…a custom name

updated the database schema to add a new varchar column custom_name to the author table.
updated the author form and model to edit/save the new column.
updated the getDisplayName to display the custom_name field if it is not null (otherwise display the calculated display name)
rija referenced this issue in rija/gigadb-website Feb 9, 2018
pli888 pushed a commit that referenced this issue May 14, 2018
…ure for issue #81 that keeps crashing since adding the feature for #49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants