Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

number validation #277

Merged

Conversation

Eyobyb
Copy link
Collaborator

@Eyobyb Eyobyb commented Jan 18, 2024

No description provided.

change prompt when number checker fail
@20001LastOrder 20001LastOrder self-requested a review January 24, 2024 05:14
Copy link
Collaborator

@20001LastOrder 20001LastOrder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Please see the comments below.

src/sherpa_ai/agents/qa_agent.py Show resolved Hide resolved
src/sherpa_ai/memory/belief.py Outdated Show resolved Hide resolved
src/sherpa_ai/memory/belief.py Outdated Show resolved Hide resolved
src/sherpa_ai/output_parsers/number_validation.py Outdated Show resolved Hide resolved
@pytest.mark.parametrize(
"objective , input_data, expected_numbers",
[
# (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these commented inputs active? If not, they should be removed. I think they make the test case very hard to understand.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are active but running them all at the same time will cause them to fail @20001LastOrder . so if you have a suggestion a way to keep them since they hold different aspects of the test that would be nice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is quite strange because each test cases are meant to be isolated. I'll try to allocate some time to see if I can figure out why this happens

@20001LastOrder
Copy link
Collaborator

It seems the failed tests are related to #287 and the commits I pushed could probably fix this issue as well. However, there is still one test failing for

           1,
            "on june how much cash does Sabio Delivers had?",
            (
                """Second Quarter 2023 Financial Highlights for Sabio Delivers
                Sabio delivered revenues of US$8.0M in Q2-2023, up 11% from US$7.2M in Q2-2022.
                CTV/OTT sales as a category increased by 57% to US$5.0 million, compared to US$3.2 million in the prior year's quarter. CTV/OTT sales accounted for 62% of the Company's sales mix, compared with 44% in the prior year's quarter.
                Mobile display revenues of US$2.9million in Q2-2023, down 24%, from US$3.9 million in Q2-2022, as our legacy mobile display campaigns continued to shift their spend with Sabio from mobile display to higher-margin mobile OTT streaming, which is recognized under the Company's CTV/OTT revenue category.
                Gross Profit of US$4.8 million in Q2-2023, up from US$4.3 million in Q2-2022. Gross Margin improved on a year-over-year basis, from 59% in Q2-2022 to 60% in the completed quarter. The increase is attributable to several efficiency and direct sales improvements within the CTV/OTT channel as well as our App Science business.
                Adjusted EBITDA1 loss of US$1.7 million in Q2-2023 compared to a loss of US$1.4 million in Q2-2022. The loss was primarily driven by overhead added during and subsequent to the second quarter of 2022, which included the continued expansion of our sales and marketing apparatus in the prior year and costs associated with transitioning our workforce back to the office. On a sequential basis, second quarter operating expenses, normalized for commissions, were flat in comparison to the first quarter of 2023 as cost efficiencies implemented by management offset incremental headcount additions to our salesforce to position ourselves for the 2024 U.S. elections.
                As of June 30, 2023, the Company had cash of US$1.7 million, as compared to US$2.4 million on June 30, 2022.`
                As of June 2023, the Company had US$6 million outstanding under its credit facility with Avidbank.""",
                [
                    {
                        "Document": "Sabio Delivers 11% Q2-2023 Revenue Growth, Led by 57% Increase in Connected TV/OTT Sales",
                        "Source": "https://www.sabioholding.com/press-releases/sabio-delivers-11-q2-2023-revenue-growth-led-by-57-increase-in-connected-tv-ott-sales",
                    }
                ],
            ),
            ["2.4", "1.4", "30", "2022", "2023", "1.7"],
        ),

It seems to be related to the way numbers are extracted. I got the following from the QA agent's response:

In June, Sabio Delivers had cash of US$1.7 million. This information is mentioned in the provided action-result history: "As of June 30, 2023, the Company had cash of US$1.7 million, as compared to US$2.4 million on June 30, 2022." [source](https://sabio-delivers.com/financial-highlights-q2-2023)"

And these numbers were extracted

['1.7', '30', '2023', '1.7', '2.4', '30', '2022', '2', '2023']

The error was that 2 was not in the resources. We should either find a way for this error or exclude this case from the testing if we do not handle it right now.

There were other errors I had to fix. Most of them were because the number validation fixed the flow of other tests. I think in general for new features, we should parameterize whether it should be run and set the default to not run to avoid too much influence on other test cases. It is always good to test everything with pytest to check if the feature implementation broke other tests.

@20001LastOrder 20001LastOrder linked an issue Feb 5, 2024 that may be closed by this pull request
@20001LastOrder 20001LastOrder self-requested a review February 7, 2024 04:00
Copy link
Collaborator

@20001LastOrder 20001LastOrder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. Let's get this merged as it blocks the hydra configuration and contains some other fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants