Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporary PR to integrate PDL's mass of bug fixes and enhancements #148

Open
wants to merge 132 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
132 commits
Select commit Hold shift + click to select a range
c7f5664
When considering a vm for a job, ask aws if it is still running.
Aug 11, 2017
4dcbbb4
When number of jobs is larger than the number of available vms, N,
Aug 11, 2017
6de4692
Add pid in the logs of the modules used by jobManager.
Aug 11, 2017
e2afe8a
When job manager restarts, it empties its vm "total" pool and "free"
Aug 11, 2017
97c22e3
Add ability to pull amis from aws and to exact a tag "Name" as the i…
Aug 11, 2017
e66551a
Remove DEFAULT_AMI since amis are automatically loaded from aws now.
Aug 11, 2017
94656b7
Add script to drive lab submissions into tango.
Aug 15, 2017
253cb8e
Add scripts to access ec2 and redis.
Aug 15, 2017
45d5fc1
remove trailing lines.
Aug 15, 2017
12139e1
Check if output file exists before comparing modification time.
Aug 16, 2017
d8d0f65
resetTango should only be called from jobManager. Remove the call fro…
Aug 18, 2017
9043e3a
Add and subtract some logging.
Aug 18, 2017
4a6bba9
Fix a condition for running all students' jobs.
Aug 18, 2017
b92ffba
Use tangoHostPort to distinguish multiple tango containers on the sam…
Aug 22, 2017
75ca36d
Add logging in destroyVM.
Aug 22, 2017
93e60ad
Modified pool allocation logic to 1) not to allocate all vms allowed
Aug 23, 2017
76157c3
When job manager restarts, it now destroy the vm instances that not
Aug 30, 2017
7674945
Improve tool script that exercises job manager and the code beneath.
Aug 30, 2017
cbe01c2
Add ability to submit jobs for a given list of students.
Aug 30, 2017
46ceb59
Fix incomplete test script
xyzisinus Aug 30, 2017
7fef985
Check if the vm still exists before terminating.
Aug 31, 2017
0febf75
Check output file for the missing "scores:" line and
Aug 31, 2017
5b2bda8
Fix typos that prevents job manager to start.
Sep 1, 2017
4ce9534
Improvements to run_jobs: ability to run failed submissions, to dry r…
Sep 5, 2017
0296460
Add a separate config file for run_jobs to separate config settings
Sep 5, 2017
b1aa4ba
Separate "class Config" from util.py.
Sep 5, 2017
9be308f
Move student submission range into config file.
Sep 5, 2017
a9e2983
Correct a typo.
Sep 5, 2017
e0b5253
Better check for output files with missing scores.
Sep 6, 2017
79b7ead
remove trailing spaces.
Sep 6, 2017
2176dfe
Move logging init in TangoServer to the beginning to capture all
Sep 7, 2017
a5bc482
1. Add ability to shrink preallocated pool. 2. preliminary code to at…
Sep 14, 2017
b3a076f
Add ability to test decrementing pool size
Sep 14, 2017
d896b36
Add vm pool low water mark.
Sep 15, 2017
7805577
test for existence of the newly config variables for backward compati…
Sep 15, 2017
aaa07eb
Fix a condition that checks low_water_mark config variable.
Oct 11, 2017
965b281
A useful suggestive setting for pool low water mark.
Oct 11, 2017
e2acaaa
Merge branch 'master' of https://github.com/xyzisinus/Tango
Oct 11, 2017
7432c4b
Add -j flag to list all jobs. Exam output file in next iteration of
Oct 12, 2017
d2c86e3
When low water mark is zero, the free_destroy vm op works now.
Oct 12, 2017
9736236
Add option to direct query to diferent redis servers with port number.
Oct 13, 2017
56c71c1
Sort the lists of vms and pool items.
Oct 13, 2017
484f435
Wait a bit after requesting an instance from aws. also allows a redis
Oct 13, 2017
95ad616
Add comments.
Oct 13, 2017
04e8176
Fix a problem when failed job is last finished, it's not recognized.
Nov 7, 2017
2544f1c
checkpoint. 1. fix a bug in finding the last submission. 2. scan subm…
Nov 13, 2017
caefb9a
report inconsistency between current results and existing.
Nov 16, 2017
906bfa5
Recover the old content of the config file after a mistaken commit.
Nov 16, 2017
3e0cd38
When a job is done, report inconsistency between existing result and …
Nov 16, 2017
b4e7123
Now about to print interesting attributes of each instance.
Nov 17, 2017
9f18890
Only look at running instances.
Nov 17, 2017
7580354
print instance launch time in local zone.
Nov 18, 2017
27fe73b
Save the code of tag changing. Its mission is finished now the
Nov 18, 2017
14251e9
Add ability to KEEP_VM_AFTER_FAILURE.
Nov 18, 2017
1b1ef28
disable a testing statement in the ec2 tool.
Nov 27, 2017
ebafb72
Move timeout report to the right place. add duration reporting.
Nov 29, 2017
39c02fc
temporarily commmit experimental file before moving its useful parts …
Dec 4, 2017
c1de4c0
Add config for autodriver logging.
Dec 12, 2017
a715788
checkpoint before adding timestamp insersion into user output.
Dec 12, 2017
fb0d78b
checkpoint. Add timekeeping thread. change printing macros.
Dec 18, 2017
364ae40
checkpoint. code complete but need to move file creation to parent p…
Dec 22, 2017
3c0dd2f
checkpoint autodriver, after file creation being moved to parent process
Jan 17, 2018
0cefe52
Cleanup in autodriver
Jan 23, 2018
6f429ef
audodriver should not exit for minor errors
Jan 24, 2018
95f8574
Ready to be made into an image.
Jan 26, 2018
dc22fd5
Add an output generator to help test autodriver.
Jan 26, 2018
4b1439a
Add pthread lib into build.
Jan 26, 2018
bd17429
remove old experiment file.
Jan 26, 2018
daaa912
Add autodriver timestamp interval in tango config.
Jan 30, 2018
9812622
Remove redundant wording in error messages.
Feb 1, 2018
7351917
chown output file to autograde. notify child thread job finish.
Feb 2, 2018
1972a17
Add new arguments for autodriver. Remove the confusing term autograder.
Feb 2, 2018
caac9b4
Streamline logging in worker's important run() function.
Feb 9, 2018
c47d889
worker's job exec (copyin/run/copyout) logic flow cleanup complete.
Feb 14, 2018
9565275
jobs with big ids starve after job id wraps. Fix with a timestamp.
Feb 22, 2018
c80a5d3
Disable wrap-around of vm ids.
Feb 22, 2018
6c03416
Move timestamping into appendTrace function and use local time.
Feb 26, 2018
b5730f3
Remove unused function descrementPoolSize
Feb 28, 2018
f8d3800
Cleanup script and add ability to terminate aws instances.
Feb 28, 2018
c4b98b2
Change boolean variable name to avoid the word "not".
Mar 12, 2018
72c29d7
Remove reference to aws auto scaling group, also fix a problem with t…
Mar 12, 2018
ae1751e
make autodriver config variables optional for backward compatibility.
Mar 12, 2018
2c1eac8
Add missing config variable and comments.
Mar 12, 2018
c8fafd0
Add a couple of git ignore files.
Mar 12, 2018
fcc8777
Better comments for run_job scripts.
Mar 20, 2018
c58c87b
ec2Read can delete vms with name tag matching at the beginning or the…
Apr 9, 2018
70c1ed1
more comments, isolate timestamp insertion for easy understanding of …
Apr 17, 2018
d27fb52
Add info about the HostPort config variables.
Apr 18, 2018
c2e1a61
Better error report in case of write failure.
Apr 24, 2018
b880160
Change redis port name ec2Read to match the config variable.
Apr 26, 2018
c1aa0ce
Fix a buffer over-read problem in audodriver.c.
May 14, 2018
7580ccf
Tougher autodriver unit test -- random length long lines.
May 14, 2018
9c9b5e5
Add -l (list instances and pools) and -e (empty pools) to ec2Read.
Jun 12, 2018
4cd4f0a
First round of converting all ec2 code to using boto3
Jun 22, 2018
559f7aa
Clean up the tool that experiments on aws instances and tango vms.
Jun 29, 2018
9c1acfa
Cleanup the ec2 tool script.
Jul 2, 2018
13f7363
ec2 tool script uses boto3 API consistent with Tango code.
Jul 5, 2018
788ee1d
consolidate boto3 connection naming, make aws access id/key work agai…
Jul 9, 2018
2b0ea16
Use boto3 resource API to process images for consistent use of boto3 …
Jul 10, 2018
ce3f9a4
In detachVM, remove the replace_vm feature which allows a "bad" vm (os
Jul 13, 2018
721d6c2
destroyVM doesn't return a value for ohter vmms modules. remove from…
Jul 16, 2018
1ffa4ae
Rework exception catch for vmms/ec2 module.
Jul 16, 2018
2ba0f33
reindent to use 4 spaces.
Jul 17, 2018
9028de0
Tell autolab web server Tango's timezone and offset from UTC.
Jul 31, 2018
a068e15
Add OVERRIDE_INST_TYPE to force the aws instance type when necessary.
Aug 6, 2018
da04503
Add ability to create aws instance with specified instance type.
Aug 13, 2018
cdf8811
rework tango's reset code. It fails to remove existing vms on clean …
Aug 16, 2018
d6ee44d
getVMs for ec2 should returns only vms that belongs to Tango (with pr…
Aug 16, 2018
fe00e97
Rename TangoMachine's name property to pool. vmms/ec2 is done.
Aug 22, 2018
f9e03c2
remove bad references to locations.
Aug 28, 2018
88ac6e1
log file copied from vm is now readable to all.
Aug 30, 2018
77a26fa
Add script to check the health of Tango.
xyzisinus Nov 15, 2018
09bca0a
check in a working version of check_jobs
xyzisinus Nov 19, 2018
d24fdb4
fix: timezone issue, default instance type not observed, undefined va…
Nov 20, 2018
6b6de4c
Fix check_jobs (cron job in production Autolab): comment its purpose,
xyzisinus Aug 14, 2019
8f0ed66
Add README for autodriver
Aug 20, 2019
7a18dbc
time.localtime().tm_isdst gives the correct answer to day light saving.
xyzisinus Sep 3, 2019
48568e9
Deal with exception inside exception handler of initializeVM.
xyzisinus Sep 3, 2019
0b1b344
Clean existing output files for those student jobs selected to run.
xyzisinus Sep 3, 2019
a49c447
Commit changes to build/admin files.
Sep 9, 2019
42fb51b
remove the unusual redis port mapping from run_jobs config file.
Sep 9, 2019
ae2799f
Make redis port available outside the container
Sep 9, 2019
4e6c345
Swap the order of wating from pending to running and instance tagging.
Oct 8, 2019
defb37f
Add timed cleanup for untagged stale vms. Add tests
Oct 9, 2019
3d60ae0
Use proper timezone to log instance launch time.
Oct 9, 2019
50f31a5
Add exception handling to cleanup function.
Oct 9, 2019
870757f
Add python pytz package.
Oct 9, 2019
fc2dec6
Manually start the cleanup function from test script.
Oct 9, 2019
0ba4bae
ec2SSH is activated from multiple tango services, probably unnecessar…
Oct 9, 2019
d406a13
Remove tool's reference to redis and close redis's open door
Oct 18, 2019
96c9830
dump the content of an email in 2018 in a file.
xyzisinus Nov 9, 2019
a0a7434
Add the timestamp in the "major fixes" file.
xyzisinus Nov 9, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ vmms/id_rsa*
courselabs/*
# config
config.py
output_gen
.gitignore


# Virtualenv
.Python
Expand Down
42 changes: 19 additions & 23 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,34 +1,27 @@
# Start with empty ubuntu machine
FROM ubuntu:15.04
FROM ubuntu

MAINTAINER Autolab Development Team "autolab-dev@andrew.cmu.edu"

# Setup correct environment variable
ENV HOME /root

# Change to working directory
WORKDIR /opt

# Move all code into Tango directory
ADD . TangoService/Tango/
WORKDIR /opt/TangoService/Tango
RUN mkdir volumes

WORKDIR /opt
RUN mkdir -p /opt/TangoFiles/volumes /opt/TangoFiles/courselabs /opt/TangoFiles/output

# Install dependancies
RUN apt-get update && apt-get install -y \
nginx \
curl \
git \
iputils-ping \
vim \
supervisor \
python-pip \
python-dev \
build-essential \
tcl8.5 \
wget \
libgcrypt11-dev \
libgcrypt11-dev \
zlib1g-dev \
apt-transport-https \
ca-certificates \
Expand All @@ -38,13 +31,10 @@ RUN apt-get update && apt-get install -y \
&& rm -rf /var/lib/apt/lists/*

# Install Redis
WORKDIR /opt
RUN wget http://download.redis.io/releases/redis-stable.tar.gz && tar xzf redis-stable.tar.gz
WORKDIR /opt/redis-stable
RUN make && make install
WORKDIR /opt/TangoService/Tango/

# Install Docker from Docker Inc. repositories.
RUN curl -sSL https://get.docker.com/ | sh
RUN make && make install

# Install the magic wrapper.
ADD ./wrapdocker /usr/local/bin/wrapdocker
Expand All @@ -53,23 +43,29 @@ RUN chmod +x /usr/local/bin/wrapdocker
# Define additional metadata for our image.
VOLUME /var/lib/docker

# Create virtualenv to link dependancies
# Install python dependancies
ADD ./requirements.txt /opt/TangoFiles/requirements.txt
WORKDIR /opt/TangoFiles
RUN pip install virtualenv && virtualenv .
# Install python dependancies
RUN pip install -r requirements.txt
RUN pip install pytz

RUN mkdir -p /var/log/docker /var/log/supervisor

# Move custom config file to proper location
RUN cp /opt/TangoService/Tango/deployment/config/nginx.conf /etc/nginx/nginx.conf
RUN cp /opt/TangoService/Tango/deployment/config/supervisord.conf /etc/supervisor/supervisord.conf
RUN cp /opt/TangoService/Tango/deployment/config/redis.conf /etc/redis.conf
ADD ./deployment/config/nginx.conf /etc/nginx/nginx.conf
ADD ./deployment/config/supervisord.conf /etc/supervisor/supervisord.conf
ADD ./deployment/config/redis.conf /etc/redis.conf

#JMB added for EC2 config
ADD ./deployment/config/boto.cfg /etc/boto.cfg
ADD ./deployment/config/746-autograde.pem /root/746-autograde.pem
RUN chmod 600 /root/746-autograde.pem

# Reload new config scripts
CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/supervisord.conf"]


# TODO:
# TODO:
# volumes dir in root dir, supervisor only starts after calling start once , nginx also needs to be started
# Different log numbers for two different tangos
# what from nginx forwards requests to tango
Expand Down
1 change: 1 addition & 0 deletions autodriver/Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
CC = gcc
CFLAGS = -W -Wall -Wextra
LDFLAGS = -pthread

OBJS = autodriver.o

Expand Down
25 changes: 25 additions & 0 deletions autodriver/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
To build a grading vm image for Autolab jobs:

Create a vm with a stock linux image
Copy autodriver.c and Makefile to the vm and compile it to autodriver
Copy autodriver to any common path, make it owned by root wtih setuid bits.
For example: -rwsr-sr-x 1 root root /usr/bin/autodriver

Create the following users
autolab: The ssh/scp user tied with selected key pair of you cloud account
autograde: The user to run TA's grader starting from the top Makefile (see autodriver.c)
student: For student to use the exact image for coding/testing

The sequence of grading using the above image is such:

The grading engine: scp top level Makefile, autograde.tar (both made by course staff)
and student's submission to the grading vm.

The grading engine: ssh to run autodriver program.

The greating vm: autodriver program (running as root because of the setuid bit) starts
a child process (running as user autograde) to run "make" with top level Makefile.

The grading engine: scp the output file from the grading vm.


Loading