Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two V3000 export issues: implict H cause wrong index on s groups, and reversed order for COLLECTON / SGroup Blocks. #2109

Open
sapiosciences-dev opened this issue Jul 9, 2024 · 2 comments

Comments

@sapiosciences-dev
Copy link

Summary
Inside molfile_saver.cpp, function: void MolfileSaver::_writeCtab(Output& output, BaseMolecule& mol, bool query):

  1. Issue in V3000 implicit h toggle causing the the S group total count referenced in memory to be modified, causing a failure in parsing the S group when reading from RDKit as it thinks the file is corrupted.

  2. Issue in V3000 export of M V30 BEGIN COLLECTION block occuring before M V30 BEGIN SGROUP block, causing RDKit failing to parse the section.

Steps to Reproduce
Import the RxN . Export and import to RDKit. See Failures. The test file had been provided under attachment section.

Expected behavior/Actual Behavior

Issue 1

First on the issue of improper S group counting:

else if (add_implicit_h && Molecule::shouldWriteHCount(mol.asMolecule(), i) && hcount > 0)
        {
            int sg_idx;

            sg_idx = mol.sgroups.addSGroup(SGroup::SG_TYPE_DAT);
            DataSGroup& sgroup = (DataSGroup&)mol.sgroups.getSGroup(sg_idx);

            sgroup.atoms.push(i);

            QS_DEF(Array<char>, tmp_buf);
            ArrayOutput tmp_out(tmp_buf);
            tmp_out.printf("IMPL_H%d", hcount);
            tmp_buf.push(0);
            sgroup.data.readString(tmp_buf.ptr(), true);

            sgroup.name.readString("MRV_IMPLICIT_H", true);
            sgroup.detached = true;
        }

The section of the code above adds new s groups to mol.sgroups object while the execution is still in progress exporting the data, and more importantly, this was added AFTER the following line was already exported to the text file I/O:

    output.writeStringCR("M  V30 BEGIN CTAB");
    output.printfCR("M  V30 COUNTS %d %d %d 0 0", mol.vertexCount(), mol.edgeCount(), mol.countSGroups());
    output.writeStringCR("M  V30 BEGIN ATOM");

So, when implicit H is added, total S group counter written already in an earlier line had become incorrect.

Issue 2

Per this technical manual:
https://www.daylight.com/meetings/mug05/Kappler/ctfile.pdf
In page 85 to page 90, the order of BEGIN SGROPU and BEGIN COLLECTION is defined. RDKit follows this strict ordering while Indigo writes it in reverse. It causes toolkit compatibility issue when read by another tool.

The following block of code:

 MoleculeStereocenters& stereocenters = mol.stereocenters;

    if (stereocenters.begin() != stereocenters.end() || mol.hasHighlighting())
    {
        output.writeStringCR("M  V30 BEGIN COLLECTION");

        QS_DEF(Array<int>, processed);

        processed.clear_resize(mol.vertexEnd());
        processed.zerofill();

        for (i = mol.vertexBegin(); i != mol.vertexEnd(); i = mol.vertexNext(i))
        {
            if (processed[i])
                continue;

            ArrayOutput out(buf);
            int j, type = stereocenters.getType(i);

            if (type == MoleculeStereocenters::ATOM_ABS)
                out.writeString("MDLV30/STEABS ATOMS=(");
            else if (type == MoleculeStereocenters::ATOM_OR)
                out.printf("MDLV30/STEREL%d ATOMS=(", stereocenters.getGroup(i));
            else if (type == MoleculeStereocenters::ATOM_AND)
                out.printf("MDLV30/STERAC%d ATOMS=(", stereocenters.getGroup(i));
            else
                continue;

            QS_DEF(Array<int>, list);

            list.clear();
            list.push(i);

            for (j = mol.vertexNext(i); j < mol.vertexEnd(); j = mol.vertexNext(j))
                if (stereocenters.sameGroup(i, j))
                {
                    list.push(j);
                    processed[j] = 1;
                }

            out.printf("%d", list.size());
            for (j = 0; j < list.size(); j++)
                out.printf(" %d", _atom_mapping[list[j]]);
            out.writeChar(')');

            _writeMultiString(output, buf.ptr(), buf.size());
        }

        if (mol.hasHighlighting())
        {
            if (mol.countHighlightedBonds() > 0)
            {
                ArrayOutput out(buf);

                out.printf("MDLV30/HILITE BONDS=(%d", mol.countHighlightedBonds());

                for (i = mol.edgeBegin(); i != mol.edgeEnd(); i = mol.edgeNext(i))
                    if (mol.isBondHighlighted(i))
                        out.printf(" %d", _bond_mapping[i]);
                out.writeChar(')');

                _writeMultiString(output, buf.ptr(), buf.size());
            }
            if (mol.countHighlightedAtoms() > 0)
            {
                ArrayOutput out(buf);
                out.printf("MDLV30/HILITE ATOMS=(%d", mol.countHighlightedAtoms());
                for (i = mol.vertexBegin(); i != mol.vertexEnd(); i = mol.vertexNext(i))
                    if (mol.isAtomHighlighted(i))
                        out.printf(" %d", _atom_mapping[i]);
                out.writeChar(')');

                _writeMultiString(output, buf.ptr(), buf.size());
            }
        }
        if (mol.custom_collections.size() > 0)
        {
            for (i = mol.custom_collections.begin(); i != mol.custom_collections.end(); i = mol.custom_collections.next(i))
            {
                ArrayOutput out(buf);
                out.printf("%s", mol.custom_collections.at(i));
                _writeMultiString(output, buf.ptr(), buf.size());
            }
        }

        output.writeStringCR("M  V30 END COLLECTION");
    }

I think, should be moved to right AFTER

        output.writeStringCR("M  V30 END SGROUP");
    }

Attachments
stereo_reaction.zip

If applicable, add attachment files to reproduce the issue.

Additional context
Add any other context about the problem here.

@sapiosciences-dev
Copy link
Author

P.S.: I have confirmed with unit testing code that I made changes in locally tests/tests/molecule.cpp, that the code will export correctly when implicit H is force-disabled and order of export is reversed.

#include <unistd.h>
#include <fstream>
#include "molecule/molfile_saver.h"
#include "reaction/reaction.h"
#include "reaction/reaction_auto_loader.h"
#include "reaction/reaction_automapper.h"

TEST_F(IndigoCoreMoleculeTest, Reaction)
{
    Reaction reaction;
    {
        char dir[250];
        getcwd(dir, 250);
        cout << "Current directory: " << dir << "\n";
        ifstream reactionFile("../../data/reactions/other/stereo_reaction.rxn");
        string content;
        string line;
        while (getline(reactionFile, line))
        {
            content += line;
            content.push_back('\n');
        }
        reactionFile.close();
        //        cout << "The content is: \n" << content << "\n";
        cout << "Loading reaction...";
        loadReaction(content.c_str(), reaction);

        reaction.aromatize(AromaticityOptions(AromaticityOptions::GENERIC));
        ReactionAutomapper ram(reaction);
        ram.automap(0);

        for (int i = reaction.productBegin(); i < reaction.productEnd(); i = reaction.productNext(i))
        {
            cout << " *** Current Product Index is: " << i << " *** \n";
            Molecule& mol = reaction.getMolecule(i);
            string molOutStr;
            StringOutput molOut(molOutStr);
            MolfileSaver molSaver(molOut);
            molSaver.saveMolecule(mol);
            cout << molOutStr;
        }
    }
}
#include "reaction/reaction.h"
#include "reaction/reaction_auto_loader.h"

void IndigoCoreTest::loadReaction(const char* buf, Reaction& reaction)
{
    BufferScanner scanner(buf);
    ReactionAutoLoader loader(scanner);
    loader.loadReaction(reaction);
}

@sapiosciences-dev
Copy link
Author

sapiosciences-dev commented Jul 9, 2024

Actually Issue 1 seems to be fixed in 1.20 release. (We were on 1.12 branch, makes sense to upgrade now 👍 )
Issue #2 is still present but it was an easy fix on my side in C++ code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant