Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Augur 0.76.1 Release #2869

Merged
merged 47 commits into from
Jul 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
3ccd69d
Proper Gunicorn bringup detection
Ulincsys Jul 1, 2024
2be3c4d
Write controlling process ID to disk for retrieval
Ulincsys Jul 1, 2024
c43ca39
Merge pull request #2842 from chaoss/backend-detect-gunicorn-up
sgoggins Jul 2, 2024
0f114fe
Remove nulls from pr review message ref
ABrain7710 Jul 3, 2024
42fba8e
Add metadata to sub process exceptions
ABrain7710 Jul 3, 2024
2a30cbe
Simplify metadata exception
ABrain7710 Jul 3, 2024
f9557c5
Add github graphql data access'
ABrain7710 Jul 3, 2024
ce08e9d
Log required ouput in exception when dependency task fails
ABrain7710 Jul 4, 2024
65fbcfe
Migrate pr reviews to github data access
ABrain7710 Jul 4, 2024
3667071
Migrate pr review comments to github data access
ABrain7710 Jul 4, 2024
02fcff0
Simplify pr reviews
ABrain7710 Jul 4, 2024
94e1405
Fix some log levels since we are in the file
ABrain7710 Jul 4, 2024
67f95a9
Improve paginate_resource logic
ABrain7710 Jul 4, 2024
30350fc
Remove unused method
ABrain7710 Jul 4, 2024
9819296
Change log statements in contributor resolution to match new policy
IsaacMilarky Jul 8, 2024
48ad40b
Merge pull request #2860 from chaoss/add_metadata_to_subprocess_errors
IsaacMilarky Jul 8, 2024
204ebd0
Fix pylint
IsaacMilarky Jul 9, 2024
2316efe
Merge pull request #2859 from chaoss/pr_review_comments_fix
Ulincsys Jul 9, 2024
4e3ba93
Merge pull request #2864 from chaoss/github_data_access_pr_migration
ABrain7710 Jul 9, 2024
321ef0c
Merge pull request #2863 from chaoss/fix-pylint
ABrain7710 Jul 9, 2024
5e2b747
Merge pull request #2862 from chaoss/isaac-logging-changes
ABrain7710 Jul 9, 2024
d4fec6a
Move repo_info to new graphql endpoint
ABrain7710 Jul 13, 2024
e460b2c
Migrate pr files to github_graphql_data_access
ABrain7710 Jul 13, 2024
21e8700
Indent client'
ABrain7710 Jul 13, 2024
09207b6
Remove strip
ABrain7710 Jul 13, 2024
ce322ba
Define keys as list
ABrain7710 Jul 13, 2024
05195a9
Return data properly
ABrain7710 Jul 13, 2024
c107786
Add self
ABrain7710 Jul 15, 2024
68aa4ce
Fix syntax error
ABrain7710 Jul 15, 2024
afd1a19
Fix syntax error
ABrain7710 Jul 15, 2024
c92e1ef
Fixes
ABrain7710 Jul 15, 2024
526f085
Raise from original exception
ABrain7710 Jul 15, 2024
03fa6c6
Make annoying info log a debug
ABrain7710 Jul 15, 2024
1cfcd38
Merge pull request #2861 from chaoss/graphql_refactor
sgoggins Jul 16, 2024
ea8d0a4
updating version
sgoggins Jul 17, 2024
fc664fc
Merge pull request #2870 from chaoss/dev
sgoggins Jul 17, 2024
ccbc822
Address release bugs
ABrain7710 Jul 22, 2024
25cc00c
Merge pull request #2872 from chaoss/release-fixes
sgoggins Jul 22, 2024
0205e13
Raise exceptions on graphql errors
ABrain7710 Jul 22, 2024
1c8839f
Remove comment
ABrain7710 Jul 23, 2024
a951d13
Merge pull request #2874 from chaoss/release-fixes
sgoggins Jul 23, 2024
208148c
Fix syntax error
ABrain7710 Jul 23, 2024
009687e
Catch not found for prs
ABrain7710 Jul 23, 2024
0b40bbb
Throw exception when data is none
ABrain7710 Jul 23, 2024
4529a91
try this
ABrain7710 Jul 23, 2024
54fd6fa
Add variables
ABrain7710 Jul 23, 2024
0210296
Handle case where api returns None
ABrain7710 Jul 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ augur_export_env.sh
!docker.config.json
config.yml
reports.yml
*.pid

node_modules/
.idea/
Expand Down
36 changes: 3 additions & 33 deletions .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,6 @@
# go here to check pylint codes if not explained
#https://vald-phoenix.github.io/pylint-errors/

#doc string checkers
#enable=C0112,C0114,C0115,C0116

# checks for black listed names being used
#enable=C0102

#refactoring checker
#enable=R

disable=E0611,E1101,W1203,R0801,W0614,W0611,C0411,C0103,C0301,C0303,C0304,C0305,W0311,E0401,C0116


# Analyse import fallback blocks. This can be used to support both Python 2 and
Expand Down Expand Up @@ -150,29 +140,9 @@ confidence=HIGH,
INFERENCE_FAILURE,
UNDEFINED

# Disable the message, report, category or checker with the given id(s). You
# can either give multiple identifiers separated by comma (,) or put this
# option multiple times (only on the command line, not in the configuration
# file where it should appear only once). You can also use "--disable=all" to
# disable everything first and then re-enable specific checks. For example, if
# you want to run only the similarities checker, you can use "--disable=all
# --enable=similarities". If you want to run only the classes checker, but have
# no Warning level messages displayed, use "--disable=all --enable=classes
# --disable=W".
disable=raw-checker-failed,
bad-inline-option,
locally-disabled,
file-ignored,
suppressed-message,
useless-suppression,
deprecated-pragma,
use-symbolic-message-instead

# Enable the message, report, category or checker with the given id(s). You can
# either give multiple identifier separated by comma (,) or put this option
# multiple time (only on the command line, not in the configuration file where
# it should appear only once). See also the "--disable" option for examples.
enable=c-extension-no-member
# Only enable specific messages
disable=all
enable=unused-import,redefined-outer-name,E1206,E1205,E0704,E0107,E4702,E1101,E0211,E0213,E0103,E1133,E1120,E3102,E0602,E1123,E0001,W0702,W1404,W0706,W0101,W0120,W0718,R1737,R1705,R1720,R1724,R1723,R0401,R1701,C1802,C0200,C0501,C0201,W1001,E1102,R0923


[LOGGING]
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Augur NEW Release v0.76.0
# Augur NEW Release v0.76.1

Augur is primarily a data engineering tool that makes it possible for data scientists to gather open source software community data. Less data carpentry for everyone else!
The primary way of looking at Augur data is through [8Knot](https://github.com/oss-aspen/8knot) ... A public instance of 8Knot is available at https://metrix.chaoss.io ... That is tied to a public instance of Augur at https://ai.chaoss.io
Expand All @@ -10,7 +10,7 @@ The primary way of looking at Augur data is through [8Knot](https://github.com/o
## NEW RELEASE ALERT!
### [If you want to jump right in, updated docker build/compose and bare metal installation instructions are available here](docs/new-install.md)

Augur is now releasing a dramatically improved new version to the main branch. It is also available here: https://github.com/chaoss/augur/releases/tag/v0.76.0
Augur is now releasing a dramatically improved new version to the main branch. It is also available here: https://github.com/chaoss/augur/releases/tag/v0.76.1

- The `main` branch is a stable version of our new architecture, which features:
- Dramatic improvement in the speed of large scale data collection (100,000+ repos). All data is obtained for 100k+ repos within 2 weeks.
Expand Down
9 changes: 8 additions & 1 deletion augur/api/routes/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,20 @@
import sqlalchemy as s
import pandas as pd
import json
from flask import Response, current_app
from flask import Response, current_app, jsonify

from augur.application.db.lib import get_value
from augur.application.logs import AugurLogger

logger = AugurLogger("augur").get_logger()

@app.route("/api")
def get_api_version():
return jsonify({
"status": "up",
"route": AUGUR_API_VERSION
})

@app.route('/{}/repo-groups'.format(AUGUR_API_VERSION))
def get_all_repo_groups(): #TODO: make this name automatic - wrapper?
repoGroupsSQL = s.sql.text("""
Expand Down
29 changes: 24 additions & 5 deletions augur/application/cli/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,10 @@
import logging
import psutil
import signal
from redis.exceptions import ConnectionError as RedisConnectionError
import uuid
import traceback
import requests
from redis.exceptions import ConnectionError as RedisConnectionError
from urllib.parse import urlparse

from augur.tasks.start_tasks import augur_collection_monitor, create_collection_status_records
Expand All @@ -38,14 +39,17 @@
@cli.command("start")
@click.option("--disable-collection", is_flag=True, default=False, help="Turns off data collection workers")
@click.option("--development", is_flag=True, default=False, help="Enable development mode, implies --disable-collection")
@click.option("--pidfile", default="main.pid", help="File to store the controlling process ID in")
@click.option('--port')
@test_connection
@test_db_connection
@with_database
@click.pass_context
def start(ctx, disable_collection, development, port):
def start(ctx, disable_collection, development, pidfile, port):
"""Start Augur's backend server."""

with open(pidfile, "w") as pidfile:
pidfile.write(str(os.getpid()))

try:
if os.environ.get('AUGUR_DOCKER_DEPLOY') != "1":
raise_open_file_limit(100000)
Expand Down Expand Up @@ -75,9 +79,25 @@
gunicorn_command = f"gunicorn -c {gunicorn_location} -b {host}:{port} augur.api.server:app --log-file gunicorn.log"
server = subprocess.Popen(gunicorn_command.split(" "))

time.sleep(3)
logger.info("awaiting Gunicorn start")
while not server.poll():
try:
api_response = requests.get(f"http://{host}:{port}/api")
except requests.exceptions.ConnectionError as e:
time.sleep(0.5)
continue

if not api_response.ok:
logger.critical("Gunicorn failed to start or was not reachable. Exiting")
exit(247)
break
else:
logger.critical("Gunicorn was shut down abnormally. Exiting")
exit(247)

logger.info('Gunicorn webserver started...')
logger.info(f'Augur is running at: {"http" if development else "https"}://{host}:{port}')
logger.info(f"The API is available at '{api_response.json()['route']}'")

processes = start_celery_worker_processes(float(worker_vmem_cap), disable_collection)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0621: Redefining name 'processes' from outer scope (line 386) (redefined-outer-name)


Expand All @@ -91,7 +111,6 @@
celery_beat_process = subprocess.Popen(celery_command.split(" "))

if not disable_collection:

with DatabaseSession(logger, engine=ctx.obj.engine) as session:

clean_collection_status(session)
Expand Down Expand Up @@ -201,7 +220,7 @@
"""
Sends SIGTERM to all Augur server & worker processes
"""
logger = logging.getLogger("augur.cli")

Check warning on line 223 in augur/application/cli/backend.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0621: Redefining name 'logger' from outer scope (line 31) (redefined-outer-name) Raw Output: augur/application/cli/backend.py:223:4: W0621: Redefining name 'logger' from outer scope (line 31) (redefined-outer-name)

augur_stop(signal.SIGTERM, logger, ctx.obj.engine)

Expand All @@ -214,11 +233,11 @@
"""
Sends SIGKILL to all Augur server & worker processes
"""
logger = logging.getLogger("augur.cli")

Check warning on line 236 in augur/application/cli/backend.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0621: Redefining name 'logger' from outer scope (line 31) (redefined-outer-name) Raw Output: augur/application/cli/backend.py:236:4: W0621: Redefining name 'logger' from outer scope (line 31) (redefined-outer-name)
augur_stop(signal.SIGKILL, logger, ctx.obj.engine)


def augur_stop(signal, logger, engine):

Check warning on line 240 in augur/application/cli/backend.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0621: Redefining name 'signal' from outer scope (line 12) (redefined-outer-name) Raw Output: augur/application/cli/backend.py:240:15: W0621: Redefining name 'signal' from outer scope (line 12) (redefined-outer-name)

Check warning on line 240 in augur/application/cli/backend.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0621: Redefining name 'logger' from outer scope (line 31) (redefined-outer-name) Raw Output: augur/application/cli/backend.py:240:23: W0621: Redefining name 'logger' from outer scope (line 31) (redefined-outer-name)
"""
Stops augur with the given signal,
and cleans up collection if it was running
Expand All @@ -234,7 +253,7 @@
cleanup_after_collection_halt(logger, engine)


def cleanup_after_collection_halt(logger, engine):

Check warning on line 256 in augur/application/cli/backend.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0621: Redefining name 'logger' from outer scope (line 31) (redefined-outer-name) Raw Output: augur/application/cli/backend.py:256:34: W0621: Redefining name 'logger' from outer scope (line 31) (redefined-outer-name)
clear_redis_caches()

connection_string = get_value("RabbitMQ", "connection_string")
Expand Down Expand Up @@ -383,7 +402,7 @@
pass
return augur_processes

def _broadcast_signal_to_processes(processes, broadcast_signal=signal.SIGTERM, given_logger=None):

Check warning on line 405 in augur/application/cli/backend.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0621: Redefining name 'processes' from outer scope (line 386) (redefined-outer-name) Raw Output: augur/application/cli/backend.py:405:35: W0621: Redefining name 'processes' from outer scope (line 386) (redefined-outer-name)
if given_logger is None:
_logger = logger
else:
Expand Down
4 changes: 2 additions & 2 deletions augur/application/db/lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

logger = logging.getLogger("db_lib")

def convert_type_of_value(config_dict, logger=None):

Check warning on line 20 in augur/application/db/lib.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0621: Redefining name 'logger' from outer scope (line 18) (redefined-outer-name) Raw Output: augur/application/db/lib.py:20:39: W0621: Redefining name 'logger' from outer scope (line 18) (redefined-outer-name)


data_type = config_dict["type"]
Expand Down Expand Up @@ -177,7 +177,7 @@

try:
working_commits = fetchall_data_from_sql_text(query)
except:

Check warning on line 180 in augur/application/db/lib.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0702: No exception type(s) specified (bare-except) Raw Output: augur/application/db/lib.py:180:4: W0702: No exception type(s) specified (bare-except)
working_commits = []

return working_commits
Expand All @@ -197,7 +197,7 @@
return session.query(CollectionStatus).filter(getattr(CollectionStatus,f"{collection_type}_status" ) == CollectionState.COLLECTING.value).count()


def facade_bulk_insert_commits(logger, records):

Check warning on line 200 in augur/application/db/lib.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0621: Redefining name 'logger' from outer scope (line 18) (redefined-outer-name) Raw Output: augur/application/db/lib.py:200:31: W0621: Redefining name 'logger' from outer scope (line 18) (redefined-outer-name)

with get_session() as session:

Expand Down Expand Up @@ -239,7 +239,7 @@
raise e


def bulk_insert_dicts(logger, data: Union[List[dict], dict], table, natural_keys: List[str], return_columns: Optional[List[str]] = None, string_fields: Optional[List[str]] = None, on_conflict_update:bool = True) -> Optional[List[dict]]:

Check warning on line 242 in augur/application/db/lib.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0621: Redefining name 'logger' from outer scope (line 18) (redefined-outer-name) Raw Output: augur/application/db/lib.py:242:22: W0621: Redefining name 'logger' from outer scope (line 18) (redefined-outer-name)

if isinstance(data, list) is False:

Expand All @@ -249,15 +249,15 @@
data = [data]

else:
logger.info("Data must be a list or a dict")
logger.error("Data must be a list or a dict")
return None

if len(data) == 0:
# self.logger.info("Gave no data to insert, returning...")
return None

if isinstance(data[0], dict) is False:
logger.info("Must be list of dicts")
logger.error("Must be list of dicts")
return None

# remove any duplicate data
Expand Down
72 changes: 35 additions & 37 deletions augur/tasks/git/dependency_tasks/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from augur.tasks.util.worker_util import parse_json_from_subprocess_call
from augur.tasks.git.util.facade_worker.facade_worker.utilitymethods import get_absolute_repo_path
from augur.tasks.github.util.github_random_key_auth import GithubRandomKeyAuth

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0611: Unused GithubRandomKeyAuth imported from augur.tasks.github.util.github_random_key_auth (unused-import)

from augur.tasks.util.metadata_exception import MetadataException


def generate_deps_data(logger, repo_git):
Expand Down Expand Up @@ -94,50 +95,47 @@ def generate_scorecard(logger, repo_git):

try:
required_output = parse_json_from_subprocess_call(logger,['./scorecard', command, '--format=json'],cwd=path_to_scorecard)
except Exception as e:
logger.error(f"Could not parse required output! Error: {e}")
raise e

# end

logger.info('adding to database...')
logger.debug(f"output: {required_output}")
logger.info('adding to database...')
logger.debug(f"output: {required_output}")

if not required_output['checks']:
logger.info('No scorecard checks found!')
return

#Store the overall score first
to_insert = []
overall_deps_scorecard = {
'repo_id': repo_id,
'name': 'OSSF_SCORECARD_AGGREGATE_SCORE',
'scorecard_check_details': required_output['repo'],
'score': required_output['score'],
'tool_source': 'scorecard_model',
'tool_version': '0.43.9',
'data_source': 'Git',
'data_collection_date': datetime.now().strftime('%Y-%m-%dT%H:%M:%SZ')
}
to_insert.append(overall_deps_scorecard)
# bulk_insert_dicts(overall_deps_scorecard, RepoDepsScorecard, ["repo_id","name"])

#Store misc data from scorecard in json field.
for check in required_output['checks']:
repo_deps_scorecard = {
if not required_output['checks']:
logger.info('No scorecard checks found!')
return

#Store the overall score first
to_insert = []
overall_deps_scorecard = {
'repo_id': repo_id,
'name': check['name'],
'scorecard_check_details': check,
'score': check['score'],
'name': 'OSSF_SCORECARD_AGGREGATE_SCORE',
'scorecard_check_details': required_output['repo'],
'score': required_output['score'],
'tool_source': 'scorecard_model',
'tool_version': '0.43.9',
'data_source': 'Git',
'data_collection_date': datetime.now().strftime('%Y-%m-%dT%H:%M:%SZ')
}
to_insert.append(repo_deps_scorecard)

bulk_insert_dicts(logger, to_insert, RepoDepsScorecard, ["repo_id","name"])

logger.info(f"Done generating scorecard for repo {repo_id} from path {path}")
to_insert.append(overall_deps_scorecard)
# bulk_insert_dicts(overall_deps_scorecard, RepoDepsScorecard, ["repo_id","name"])

#Store misc data from scorecard in json field.
for check in required_output['checks']:
repo_deps_scorecard = {
'repo_id': repo_id,
'name': check['name'],
'scorecard_check_details': check,
'score': check['score'],
'tool_source': 'scorecard_model',
'tool_version': '0.43.9',
'data_source': 'Git',
'data_collection_date': datetime.now().strftime('%Y-%m-%dT%H:%M:%SZ')
}
to_insert.append(repo_deps_scorecard)

bulk_insert_dicts(logger, to_insert, RepoDepsScorecard, ["repo_id","name"])

logger.info(f"Done generating scorecard for repo {repo_id} from path {path}")

except Exception as e:

raise MetadataException(e, f"required_output: {required_output}")
Loading
Loading