Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CD-i Compatibility #1

Open
ogarvey opened this issue Mar 3, 2024 · 23 comments
Open

CD-i Compatibility #1

ogarvey opened this issue Mar 3, 2024 · 23 comments
Assignees

Comments

@ogarvey
Copy link

ogarvey commented Mar 3, 2024

Hi there,

Just came across this repository while researching the BOLT engine in relation to CD-i asset extraction.

I've tried your tool on the BOLT file used in the CD-i version of Tetris, but it repeatedly fails on the line:

err_msg("lookbehind too far", bytevalue);

I've also attempted to port the functionality to C# to fit with some of my existing experiments: https://github.com/ogarvey/ExtractCLUT/blob/main/Helpers/BoltFileHelper.cs

My code in the ExtractBoltData method, seems to split the various blobs correctly, with the initial data files containing the offset data, and the secondary data files containing the encoded data, but any attempt to decode the secondary files has hit a brick wall.

So on the off chance you're willing to take a look, this is the binary of the BOLT file I've been looking at BTetris0.rtb_1_16_0.zip

If my progress so far is at least somewhat on track, I believe there's 19 folders, with a combined total of 138 files within.

Any assistance or guidance appreciated, and if further examples of CD-i BOLT files would help, please let me know and I can provide some.

Regards,

OGarvey

@heinermann heinermann self-assigned this Mar 4, 2024
@heinermann
Copy link
Owner

heinermann commented Mar 4, 2024

Hmm I have no idea. This seems to be an anomaly with the format and not something I've seen in any of the other games I've tested it on. For the record, I noted checking the following games:

  • Starcraft 64 (N64)
  • Namco Museum 64 (N64)
  • Bassmasters 2000 (N64)
  • Ms. Pac-Man - Maze Madness (N64)
  • Power Rangers - Lightspeed Rescue (N64)
  • Shrek Super Party (XBox)
  • Pac-Man Collection (GBA)
  • The Lost Vikings (GBA)
  • Rock n' Roll Racing (GBA)
  • Blackthorne (GBA)
  • Namco Museum (GBA)
  • Ms. Pac-Man - Maze Madness (Dreamcast, Track 19)
  • Namco Museum (Dreamcast, Track 5)
  • The Game of Life (Windows 95)
  • Mystic Midway: Rest in Pieces (DOS)
  • Voyeur (DOS)
  • 3-D TableSports (DOS)

Reference topic (when I was discovering its uses): http://www.staredit.net/topic/18209/

The CDi archive fundamentally doesn't look any different at first glance, but there's something fishy going on. The first byte in each run appears to not be valid for decompress and tries to do a negative lookup, which triggers that error. So it could either be

  1. I'm missing some kind of edge case or mechanism in the format that only this version uses.
  2. This version uses a slightly different format.
  3. There is legitimate negative lookups to pull data from the header (extremely unlikely).

I tried messing around with it but didn't really see anything obvious. I'll investigate other games for now and get back.

@heinermann
Copy link
Owner

heinermann commented Mar 4, 2024

I should have been more thorough when checking, it seems the list is just games that have BOLT archives, not that it actually works.

Confirmed games with the same problem as Tetris:

  • 3-D TableSports (DOS)
  • Voyeur (DOS)
  • Assuming all CD-i games based on testing

Confirmed working without issues:

  • The Game of Life (Windows)
  • Starcraft 64 (N64)

@ogarvey
Copy link
Author

ogarvey commented Mar 4, 2024

Thanks for taking the time to have a look.

It looks like the format was first used on the CD-i, so it's certainly possible there was a change in format between that and the later releases, although there does seems to be some 'cross platform' releases with the CD-i/DOS titles.

CD-iBOLTs.zip

Attached a bunch more BOLTs from the files I had on my HDD for reference, Labyrinth Of Crete must have at least some uncompressed data I think, because if I throw the whole file at one my image parsing tool, I can see the following RLE images (palette is incorrect...)

image

Similarly for Mystic Midway, I was able to extract some audio from the directory and files starting at offset 0x1C442 of the binary file.

Given the difficulty in decompiling CD-i applications correctly, I think my next step will probably be to compare a file from one of the games that exists on CD-i and DOS, and if the structure is similar enough hope that the DOS version of the game is easier to reverse engineer the format from.

@heinermann
Copy link
Owner

I think none of the DOS games work with this program either, but all of the N64 games and later stuff seems to work.

@ogarvey
Copy link
Author

ogarvey commented Mar 4, 2024

I've compared the files from the DOS and CD-i version of Mystic Midway - Phantom Express and there's enough similarities that I think it's worth me investigating the DOS version some more.

Thanks for your time, and if I find anything useful, I'll provide an update :)

@ogarvey ogarvey closed this as completed Mar 4, 2024
@heinermann
Copy link
Owner

Thanks. I went and started checking stuff in this spreadsheet: https://docs.google.com/spreadsheets/d/1uwPyACqCSqkVOeA5vrEsOaZVY9TY4EF0HKrv5iC4nwQ/edit?usp=sharing

@ogarvey
Copy link
Author

ogarvey commented Mar 4, 2024

I've made some limited progress at identifying some of the flags and their associated file types:
image

Also attached some image examples and associated binary files, one compressed and one uncompressed as well as the results of parsing the BOLT File they came from:

UNCOMPRESSED:
UNCOMPRESSED: Merlins Apprentice_1603436_1614862_Secondary

COMPRESSED:
COMPRESSED: Merlins Apprentice_1794156_1796776_Secondary

Image Examples.zip

MerlinsApprentice_.blt_parsedData.txt

@heinermann
Copy link
Owner

In The Game of Life it seems to have the same problem with MMSqInf0.BLT, but none of the other archives.

It seems for a file, the flags & 0xFF seems to denote a file type as you've noted, but it's not clear how much it differs from game to game. I started documenting each type in that spreadsheet just in case though.

@heinermann heinermann reopened this Mar 5, 2024
@heinermann
Copy link
Owner

heinermann commented Mar 5, 2024

I am reverse engineering the implementation in the Game of Life and it is much more complex than it appears. I'll make various improvements in the coming days.

The important thing is that it uses a different algorithm per file type, most of them just happen to share the same function but it's possible some file types could use a different one. It can also be different on a per-archive basis.

@heinermann
Copy link
Owner

heinermann commented Mar 6, 2024

I implemented one of the DOS versions which is working better for the DOS games but it's not entirely correct. I think the CD-i version is probably similar to the DOS version with some tweaks, but I don't know how to look at the code for it so it'll require some analysis and debugging.

File type 9 (sometimes type 8) is usually the one that is treated differently than others too.

@heinermann
Copy link
Owner

Thanks for taking the time to have a look.

It looks like the format was first used on the CD-i, so it's certainly possible there was a change in format between that and the later releases, although there does seems to be some 'cross platform' releases with the CD-i/DOS titles.

CD-iBOLTs.zip

Attached a bunch more BOLTs from the files I had on my HDD for reference, Labyrinth Of Crete must have at least some uncompressed data I think, because if I throw the whole file at one my image parsing tool, I can see the following RLE images (palette is incorrect...)

image

Similarly for Mystic Midway, I was able to extract some audio from the directory and files starting at offset 0x1C442 of the binary file.

Given the difficulty in decompiling CD-i applications correctly, I think my next step will probably be to compare a file from one of the games that exists on CD-i and DOS, and if the structure is similar enough hope that the DOS version of the game is easier to reverse engineer the format from.

I think the ones you provided here are not valid/contiguous BOLT archives and are separated by sector headers or something on the disc. I used IsoBuster to extract a BOLT archive from one of the CD-i's and it had similar results to the DOS outputs. So I think maybe the DOS algorithm might work if it is fixed.

@ogarvey
Copy link
Author

ogarvey commented Mar 6, 2024

Yeah, looks like in two of the files I provided I forgot to strip the sector header data.

Interestingly it seems as though those two (Labyrinth Of Crete and Merlin's Apprentice) weren't even developed by Cinemaware/Philips P.O.V Entertainment/Mass Media but still used the BOLT format. So that might account for some differences in those two potentially.

@ogarvey
Copy link
Author

ogarvey commented Mar 6, 2024

I implemented one of the DOS versions which is working better for the DOS games but it's not entirely correct. I think the CD-i version is probably similar to the DOS version with some tweaks, but I don't know how to look at the code for it so it'll require some analysis and debugging.

This is a decompiled version of the Tetris "Decompress" method: https://gist.github.com/ogarvey/67b255c4c355e18d84d639aea6015aca unfortunately I'm still a beginner at deciphering this in the context of how the CD-i operates, so some bits are unclear but given your existing knowledge of the other systems, you may understand it better than I.

@heinermann
Copy link
Owner

heinermann commented Mar 6, 2024

I implemented one of the DOS versions which is working better for the DOS games but it's not entirely correct. I think the CD-i version is probably similar to the DOS version with some tweaks, but I don't know how to look at the code for it so it'll require some analysis and debugging.

This is a decompiled version of the Tetris "Decompress" method: https://gist.github.com/ogarvey/67b255c4c355e18d84d639aea6015aca unfortunately I'm still a beginner at deciphering this in the context of how the CD-i operates, so some bits are unclear but given your existing knowledge of the other systems, you may understand it better than I.

EDIT: First second pass: https://gist.github.com/heinermann/146c89cc78f2236c131827c4699516fd

@heinermann
Copy link
Owner

heinermann commented Mar 8, 2024

I pushed my interpretation of that function, but it's not quite working.

I implemented one of the DOS versions which is working better for the DOS games but it's not entirely correct. I think the CD-i version is probably similar to the DOS version with some tweaks, but I don't know how to look at the code for it so it'll require some analysis and debugging.

This is a decompiled version of the Tetris "Decompress" method: https://gist.github.com/ogarvey/67b255c4c355e18d84d639aea6015aca unfortunately I'm still a beginner at deciphering this in the context of how the CD-i operates, so some bits are unclear but given your existing knowledge of the other systems, you may understand it better than I.

Are there different functions for the other file types? In Caesars World of Boxing the 6th file immediately uses opcode 0xC which is like a reversed lookbehind in the snippet (invalid in the context).

I also think my interpretation is off because I'm getting slightly larger data than the targets. Can you explain how you ripped the code so I can look into it better?

EDIT: I fixed it. Works for Tetris without errors (but not Caesars, haven't tried others yet).

@ogarvey
Copy link
Author

ogarvey commented Mar 8, 2024

I've been using Ghidra and the attached plugin/extension for decompiling the CD-i code.

If you wanted to give it a go, you have to extract the disc image using IsoBuster. Then, find the cdi module, usually prefixed cdi_ followed by the Game name.

That has to be stripped of the header and ecc bytes in each sector,

image

then it can be imported into Ghidra and analysed using the instructions in the zip's readme file.

Ghidra Setup v0_1.zip

The Green Book is also a useful reference: http://www.icdia.co.uk/docs/funcspec.html

@ogarvey
Copy link
Author

ogarvey commented Mar 9, 2024

Had a chance to convert the C++ to C# (https://github.com/ogarvey/ExtractCLUT/blob/f59a4f577e4fabf8ce0b6bdb1ca5381499f662ab/Helpers/BoltFileHelper.cs#L311) and got some promising results :D

The 91kb files, are CLUT encoded images and can be parsed just fine by my conversion tools now.

I'm going to check my attempt at converting the C++ again, but there does seem to be a potential issue somewhere, as the colours are incorrect in places once combined with the palette.

As you can see from this short YouTube video of the beach level: https://www.youtube.com/watch?v=pXZ5_j1lqhE

Beach Still

The obvious issues are the clouds, and also the Tetris "board".

The black areas are where the animated section is overlaid and aren't an issue with the extractor

@ogarvey
Copy link
Author

ogarvey commented Mar 9, 2024

Confirmed "working" in addition so far, missing palettes for these so I'm not sure yet if they suffer from the same colour issue as Tetris:

Zombie Dinos:
ZombieDinos

Defender Of The Crown:
Defender Of The Crown

Mystic Midway - Rest In Pieces:
image

Not Working:

Lords Of The Rising Sun:
LOTRS

@heinermann
Copy link
Owner

heinermann commented Mar 10, 2024

Fixed the DOS algorithm which works on both Voyeur (DOS) and Voyeur (CD-i).

Also works on

  • Mystic Midway: Rest in Pieces
  • Mystic Midway: Phantom Express
  • Caesars World of Boxing
  • NFL Hall of Fame Football

I also implemented a later format. The n64 algorithm persists through the Gamecube, PS2, and XBox era, though newer versions have a tweaked data structure.

Compatibility noted in the doc: https://docs.google.com/spreadsheets/d/1uwPyACqCSqkVOeA5vrEsOaZVY9TY4EF0HKrv5iC4nwQ/edit#gid=0

@ogarvey
Copy link
Author

ogarvey commented Mar 21, 2024

Been somewhat busy of late unfortunately, but I've been looking at the Ghidra disassembly again, and found the following string in the cdi application for Lords Of The Rising Sun: "BOLT Library Version 3.0 1-6-91 (Not ready for production)\r" which gave me quite the chuckle 😅

Haven't found the decompression methods yet but I'll compare it to Tetris once I do and report back

@heinermann
Copy link
Owner

The method should be similar, but there's also a data structure associated with it, 2 function array pointers followed by 4 null pointers. The array index corresponds to the file type value noted above.

(Varies slightly between versions but still appears in all of them)

typedef void* (__cdecl* t_openfn)(BOLTArchive*);
typedef BOLTEntry* (__cdecl* t_closefn)(BOLTArchive*);

struct BOLTAccess {
  t_openfn* open_fns;
  t_closefn* close_fns;
  t_unknownfn* field_8; // rest are always null
  t_unknownfn* field_C;
  t_unknownfn* field_10;
  t_unknownfn* field_14;
};

@ogarvey
Copy link
Author

ogarvey commented Mar 22, 2024

Interestingly, after finding and comparing the Decompress methods, despite claiming to use the same version of the library as Tetris, the LotRS method starts off the same but then seems to diverge from the Tetris one (around line 37):

https://gist.github.com/ogarvey/3da6912bc081de4c1f6054406ef98678

It may just be that Ghidra didn't do as good a job decompiling it I guess given the unrecovered jumptable it mentions

@heinermann
Copy link
Owner

Thanks I hate it. 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants