Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

z/VM 7.2 doesn't work with compressed DASDs #464

Closed
giacobbis opened this issue Jan 3, 2022 · 10 comments
Closed

z/VM 7.2 doesn't work with compressed DASDs #464

giacobbis opened this issue Jan 3, 2022 · 10 comments
Assignees
Labels
BUG The issue describes likely incorrect product functionality that likely needs corrected.

Comments

@giacobbis
Copy link

giacobbis commented Jan 3, 2022

Hi all,

This my environment: Windows 10, Hercules 4.4 pre-built for Windows: Hercules version 4.4.0.10631-SDL-gbf377f63 (4.4.0.10631).

With my great satisfaction I could run z/VM 7.2 under Hercules (*), but ONLY with non-compressed DASDs. If I use compressed DASDs for Page and Spool (I tried both CCKD and CCKD64 formats), I get some nasty errors:

......IPL z/VM 7.2 ......
01:41:33 Start ((Warm|Force|COLD|CLEAN) (DRain) (DIsable)  (NODIRect)          
01:41:33       (NOAUTOlog)) or (SHUTDOWN)                                       
01:41:38 COLD DRAIN                                                             
01:41:38 NOW 01:41:38 EST SATURDAY 2022-01-01                                   
01:41:38 Change TOD clock (Yes|No)                                              
01:41:41 NO                                                                     
01:41:41 The directory on volume M01RES at address 3012 has been brought online.
01:41:41 HCPERP517I  DASD  3012 AN OPERATION WAS TERMINATED BECAUSE AN         
01:41:41 HCPERP517I  UNRECOGNIZED ERROR OCCURRED                               
01:41:41 HCPERP6303I SENSE = INVALID                                           
01:41:41 HCPERP6304I IRB = 00C04017 3FB3D008 00000000 00800000                 
01:41:41 HCPERP6305I USERID = SYSTEM                                           
01:41:41 HCPERP2216I CHANNEL PATH ID = 30                                      
01:41:41 HCPPAW415E Six continuous paging errors have occurred on DASD 3012 vol
.....
01:41:41 HCPPAW415E Six continuous paging errors have occurred on DASD 3012 volu
me M01RES.                                                                      
01:41:41 HCPPAW415E Six continuous paging errors have occurred on DASD 3012 volu
me M01RES.                                                                      
01:41:41 HCPPAW415E Six continuous paging errors have occurred on DASD 3012 volu
me M01RES.                                                                     

(NOTE: These errors appeared only on VM 3270 console. In hercules log, no errors at all.)

Also, z/VM 7.1 with compressed DASDs gave the same kind of errors, but I could 'cure' them with the PAGING63 IPL option.

If I IPL a z/VM 6.3 nucleus with the same DASDs, everything works OK.

I'm aware that starting from z/VM 7.1 there were some modifications in Page and Spool access. It looks to me as though the Hercules compressed DASD logic in some way doesn't support this new kind of access.

(*) In order to IPL z/VM 7.2, I had to enable the FACILITY bit 55.

Regards.

Paul

@Fish-Git
Copy link
Member

Fish-Git commented Jan 3, 2022

Is your z/VM 7.2 ADCD? Or not?

@Fish-Git
Copy link
Member

Fish-Git commented Jan 3, 2022

Never mind!

CONFIRMED!

I was able to recreate your problem!

It works just fine with UN-compressed dasd (e.g. CKD64), but fails with compressed (e.g. CCKD64) dasd!

Very weird! (and very concerning too!!)

I will begin looking into this right away, as it clearly indicates a problem in CCKD and/or CCKD64 logic!

(Yikes!)   8-o

@Fish-Git Fish-Git self-assigned this Jan 3, 2022
@Fish-Git Fish-Git added BUG The issue describes likely incorrect product functionality that likely needs corrected. IN PROGRESS... I'm working on it! (Or someone else is!) Researching... The issue is being looked into or additional information is being gathered/located. labels Jan 3, 2022
@Fish-Git
Copy link
Member

Fish-Git commented Jan 3, 2022

I was able to recreate your problem!

Well... I wasn't able to recreate your exact problem (I did not experience your z/VM errors, i.e. the HCPERP... and HCPPAW415E messages; I did not experience those), but I WAS able to recreate the general "doesn't work with compressed dasd images but works just fine with uncompressed dasd images" problem. That I was able to recreate.

In my case, the failure that occurs when compressed dasds are using is a very tight hard CPU loop (checking some bit somewhere) immediately after the "brought online" startup message. When regular (uncompressed) CKD64 dasds are used, no loop occurs and the system comes up just fine.

Maybe it's because mine is an ADCD distro? Is yours an ADCD distro too? I'm guessing it's not, and that's probably why our respective failures are different. But that's just a guess.

In any case, I'm going to be digging into this problem right away! Top priority!

@giacobbis
Copy link
Author

giacobbis commented Jan 4, 2022

Hi Fish,

Excuse me for my late reply. :-)

I confirm: my z/VM7.2 is an ADCD distribution.

The different behaviour you're experiencing is easily explained: If I COLD Start VM, I get all the weird I/O errors on VM console. If I WARM Start VM, I go in a tight loop.

Usually ADCD systems have WARM start as a default in System Config file, so you get a tight loop without having any chance to modify.

I'm glad you're looking into the problem, since it is very important for me to be able to use compressed DASD.

Thank you again!

(and BTW: Happy New Year too!)

Paul

@Fish-Git
Copy link
Member

Fish-Git commented Jan 4, 2022

I confirm: my z/VM7.2 is an ADCD distribution.

The different behaviour you're experiencing is easily explained: If I COLD Start VM, I get all the weird I/O errors on VM console. If I WARM Start VM, I go in a tight loop.

Usually ADCD systems have WARM start as a default in System Config file, so you get a tight loop without having any chance to modify.

Okay, that makes sense. Thanks!

I'm glad you're looking into the problem, since it is very important for me to be able to use compressed DASD.

It's important for me too!   (And a lot of other Hercules users as well!)

And guess what?

I FOUND THE BUG!   :-D

It's late though, so I'll probably commit the fix sometime tomorrow.

@giacobbis
Copy link
Author

giacobbis commented Jan 4, 2022

Great!

I knew you're one of the smartest guys I ever met - and I met quite a few, having worked in IBM Italy as a System Engineer for about 20 years.

But anyhow your performance astonished me.

Take it easy.. I know there are 9 hours of difference in time zone between Seattle and Milan (Italy). Just a kind request, if possible.

About 10 years ago I was able to perform some mods to Hercules; I remember that I was the first to run z/VM 6.3 in Hercules 3.08 after modifying the code to support topology instructions. (and Roger Bowler was so kind to mention me as a contributor).

But now I'm not willing to deal with all the procedure to rebuild Hercules (Git clone and so on); so, please, please, if you can share with me the new (corrected) prebuilt version of Hercules for Windows, I'll be very grateful to you.:-)

Regards

Paul

@Fish-Git
Copy link
Member

Fish-Git commented Jan 4, 2022

But now I'm not willing to deal with all the procedure to rebuild Hercules (Git clone and so on); so, please, please, if you can share with me the new (corrected) prebuilt version of Hercules for Windows, I'll be very grateful to you.:-)

Normally I hate handing out private builds to people, but this might be an exception. I'll see what I can do.

But in all honesty, this bug is IMHO serious enough to warrant a 4.4.1 "fix" release. That is to say, even though 4.4.0 was only officially released just 2.5 weeks ago, due to the seriousness of this particular bug, we might need to make a new official release of Hercules: version 4.4.1.

But I need to discuss it with my fellow developers first.

(How urgent is this fix to you? How long can you comfortably wait for it?)

@Fish-Git
Copy link
Member

Fish-Git commented Jan 4, 2022

Okay, here's the bug and, further below, the patch that fixes it (which I have already committed):

In module ckddasd.c's 0xE7 ccw LOCATE RECORD logic, after seeking to the requested track (ckd_seek), it sets unitstat to CE+DE = normal status (line 4868) and then performs a search according to specified orientation, and then afterwards, checks the completion status (unitstat) to make sure it succeeded (line 4907).

For some unknown reason, between those two points in the logic flow, the unitstat changed from normal completion X'0C' = CE+DE, to X'00'(!), thereby causing the success/failure test on line 4907 to become true (i.e. "an error occurred"), thereby causing the critical logic to set data orientation (lines 4910 - 4924) to be skipped!  (Oops!)

hyperion/ckddasd.c

Lines 4861 to 4925 in bf377f6

/* Seek to the required track */
rc = ckd_seek(dev, cyl, head, &trkhdr, unitstat);
if (rc < 0)
break;
/* Set normal status */
*unitstat = CSW_CE | CSW_DE;
/* Perform search according to specified orientation */
switch ((dev->ckdloper & CKDOPER_ORIENTATION)) {
case CKDOPER_ORIENT_HOME:
/* For home orientation, compare the search CCHH
with the CCHH in the track header */
if (memcmp(&(trkhdr.cyl), cchhr, 4) != 0)
{
ckd_build_sense(dev, 0, SENSE1_NRF, 0, 0, 0);
*unitstat = CSW_CE | CSW_DE | CSW_UC;
}
break;
case CKDOPER_ORIENT_COUNT:
case CKDOPER_ORIENT_DATA:
/* For count or data orientation, search the track
for a count field matching the specified CCHHR */
while (1)
{
/* Read next count field and exit at end of track
with sense data indicating no record found */
rc = ckd_read_count(dev, code, &rechdr, unitstat);
if (rc < 0) break;
/* Turn off track overflow flag */
if (dev->ckdcyls < 32768)
rechdr.cyl[0] &= 0x7F;
/* Compare the count field with the search CCHHR */
if (memcmp(&rechdr, cchhr, 5) == 0)
break;
} /* end while */
} /* end switch(CKDOPER_ORIENTATION) */
/* Exit if search ended with error status */
if (*unitstat != (CSW_CE | CSW_DE))
break;
/* Reorient past data if data orientation is specified */
if ((dev->ckdloper & CKDOPER_ORIENTATION)
== CKDOPER_ORIENT_DATA)
{
/* Skip past key and data fields */
dev->bufoff += dev->ckdcurkl + dev->ckdcurdl;
/* Set the device orientation fields */
dev->ckdrem = 0;
dev->ckdorient = CKDORIENT_DATA;
}
/* Set locate record flag and return normal status */
dev->ckdlocat = 1;
break;
}

I managed to eventually track down the problem to a single "rogue" (erroneous/accidental/unintentional) statement in the cckd_read_track function that was added by the original author back on May 1, 2002:

Revision: 812a557
Author: Greg Smith
Date: 5/1/2002 8:40:58 PM
Message: 01 May 2002 3390-9 and large file support - Greg Smith

(Yep! This bug has been in Hercules for almost 20 years now!)

Anyway, the fix was to remove the rogue (erroneous/accidental/unintentional) statement that was resetting unitstat to zero in the cckd_read_track function:

*unitstat = 0;

and associated functions (the bug existed not only in our CCKD logic but also in our CFBA logic too, and was carried forward into our CCKD64/CFBA64 logic as well).

As soon as that is done, everything started working just fine, and z/VM 7.2 came right up, clean as a whistle!  :)

Here's the patch which very clearly shows the actual bug and how it was fixed (which, as I said, has already been committed):

From: "User Name" <user@example.com>
Date: Tue, 04 Jan 2022 09:54:07 -0800
Subject: [PATCH] Short description

Details of change, perhaps spanning...
...multiple...
...lines.

diff -r -a -x .git -x 'msvc.AMD64.*' -x 'msvc.dllmod.*' -x 'msvc.debug.*' -x '*.suo' -x '*.ncb' -x '*.user' -x '*.htm' -x WORK -x DICTS -x FILES -x 'allTests.*' -x '*.rej' -x '*.orig' -x AutoBuildCount.h -x '*.cmp*' -x '*.comp*' -Nu hyperion-1/cckddasd.c hyperion-0/cckddasd.c
--- hyperion-1/cckddasd.c	2021-03-06 11:03:35.570094100 -0800
+++ hyperion-0/cckddasd.c	2022-01-04 01:12:25.048317400 -0800
@@ -861,7 +861,6 @@
 
     /* read the new track */
     dev->bufupd = 0;
-    *unitstat = 0;
     cache = cckd_read_trk (dev, trk, 0, unitstat);
     if (cache < 0)
     {
@@ -1046,7 +1045,6 @@
 
     /* Read the new blkgrp */
     dev->bufupd = 0;
-    *unitstat = 0;
     cache = cckd_read_trk (dev, blkgrp, 0, unitstat);
     if (cache < 0)
     {
diff -r -a -x .git -x 'msvc.AMD64.*' -x 'msvc.dllmod.*' -x 'msvc.debug.*' -x '*.suo' -x '*.ncb' -x '*.user' -x '*.htm' -x WORK -x DICTS -x FILES -x 'allTests.*' -x '*.rej' -x '*.orig' -x AutoBuildCount.h -x '*.cmp*' -x '*.comp*' -Nu hyperion-1/cckddasd64.c hyperion-0/cckddasd64.c
--- hyperion-1/cckddasd64.c	2021-03-02 23:17:47.835962300 -0800
+++ hyperion-0/cckddasd64.c	2022-01-04 01:08:57.708353200 -0800
@@ -621,7 +621,6 @@
 
     /* read the new track */
     dev->bufupd = 0;
-    *unitstat = 0;
     cache = cckd64_read_trk (dev, trk, 0, unitstat);
     if (cache < 0)
     {
@@ -806,7 +805,6 @@
 
     /* Read the new blkgrp */
     dev->bufupd = 0;
-    *unitstat = 0;
     cache = cckd64_read_trk (dev, blkgrp, 0, unitstat);
     if (cache < 0)
     {

Here's the same thing as a downloadable patch file:

Fixed by commit 120c28b.

Closing this issue as resolved.

Please let me know if I've missed something and it still doesn't work for you, and I'll gladly reopen it.

Thanks.

@Fish-Git Fish-Git closed this as completed Jan 4, 2022
@Fish-Git Fish-Git removed IN PROGRESS... I'm working on it! (Or someone else is!) Researching... The issue is being looked into or additional information is being gathered/located. labels Jan 4, 2022
@giacobbis
Copy link
Author

giacobbis commented Jan 5, 2022

Hi Fish,

With great efforts I could build the Hyperion corrected version, and...

It works like a charm!

BTW, I didn't deal with Git, clone and all that crap; I simply downloaded the version 4.4 source and modified by hand the four 'guilty' instructions. Then a makefile and voilà: a perfectly working version.

Thanks for all.

Now I'm waiting for the future (but not yet scheduled) z/OS 2.5 ADCD....

Paul

@Fish-Git
Copy link
Member

Fish-Git commented Jan 5, 2022

With great efforts I could build the Hyperion corrected version, and...

It works like a charm!

Great! Glad to hear it. Well done!

FYI: after virtually zero debate, we have all agreed there will be a new 4.4.1 hot fix release. It should hopefully be ready "very soon" (within days I'm guessing).

Now I'm waiting for the future (but not yet scheduled) z/OS 2.5 ADCD....

Aye. Me too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG The issue describes likely incorrect product functionality that likely needs corrected.
Projects
None yet
Development

No branches or pull requests

2 participants