Skip to content

Commit

Permalink
Add solutions for week 1
Browse files Browse the repository at this point in the history
  • Loading branch information
wmutschl committed Oct 19, 2023
1 parent 043909f commit 4f1de44
Show file tree
Hide file tree
Showing 10 changed files with 677 additions and 1 deletion.
197 changes: 197 additions & 0 deletions exercises/gitkraken_quick_tour_solution.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
\begin{enumerate}
\item \emph{git} is a version control system, i.e. a way to track changes to code, text, documents, data etc.
It let's you go back and forth between many different versions of the same file, and see a list of the differences.
Collaboration becomes (technically) very easy and straightforward as people can work on different files or different versions of the same file simultaneously
and afterwards merge their changes.

\emph{git} is the most popular version control system invented in 2005 to track the development of the worldwide largest open-source project: the Linux Kernel.
It is a \textbf{command line tool}, and at some point you should learn the commands on the Command Line Interface (CLI).
However, there are many graphical user interface (GUI) programs that make getting started with Git much easier
and integrate seamlessly with online collaboration platforms such as GitHub or GitLab.
Therefore, in this exercise I will focus on such a tool called GitKraken,
which I use daily and highly recommend.\footnote{Other GUI programs work very similarly,
see \url{https://en.wikipedia.org/wiki/Comparison_of_Git_GUIs} for a comparison of features.
I also recommend the built-in git functionality of Visual Studio Code.}
GitKraken offers a free trial of their paid license,
but also offers a free version for use on publicly-hosted repositories (which is fine for our purposes as we will mostly do stuff locally anyway).
Additionally, GitKraken also offers the Pro license FREE to students and teachers through the GitHub Student Developer Pack (\url{https://education.github.com/pack}).

While git and GitKraken are tools that you install on your computer,
GitHub is an online platform that provides a nice visual interface to help you manage your version-controlled projects remotely.
It is the largest git repository hosting service and has become by far the largest open-source collaboration site.
Another important online platform is Gitlab as you can also host that on your computer or server.
For individuals gitea offers yet another way to host a stripped down version of GitHub or Gitlab.
I personally have accounts on GitHub (\url{https://github.com/wmutschl}) and Gitlab (\url{https://gitlab.com/wmutschl}),
but also use self-hosted versions of Gitlab (\url{https://git.dynare.org/wmutschl}) and gitea (\url{https://git.mutschler.eu}) to mirror my projects.
\item A typical workflow looks like this:
\begin{itemize}
\item retrieve data and prepare it for estimation purposes
\item select a model framework, decide on certain hyperparameters and modeling choices and then run an estimation
\item prepare tables, graphs and reports
\end{itemize}
All of these tasks heavily rely on coding, i.e. putting text into some files that are then evaluated by software
that actually performs the tasks.
Moreover, we will see that estimating macroeconomic models requires a lot of trial and error and accordingly those files constantly change and need to be adapted.
\emph{git} enables you to track these changes as it gives you an organized revision history.
So you can experiment with your codes, make changes to a project and always keep the ability to go back and fourth between changes.
So stop naming files like \emph{2022-10-17-master-thesis-v2-final-now-really-final.tex}
and let \emph{git} do its magic for you by simply tracking the file \emph{thesis.tex} with all of its revision history.
\item Follow the instructions provided in the links or get in touch if you are struggling with the installation.
\item Follow the instructions provided in the links or get in touch if you are struggling with the installation.
\item Follow the instructions provided in the links or get in touch if you are struggling with the installation.
\item In GitKraken: Open a new Tab, click \emph{Start a local repo}, then on the \emph{Init} register select \emph{Local Only} and fill out the details.
Note that GitKraken automatically creates a first commit with a \texttt{README.md} file.
Inside every repository there is a hidden folder \texttt{.git}.
It contains everything done by \emph{git}, so all the changes you will ever do.
Never delete this folder!
Also putting a repository on a cloud storage folder might damage this folder,
so best practice is to use a local folder on the disk.
We will cover how to push the repository to a so-called remote which works basically like syncing,
but much more robust and git-ier.
\item Now the benefits of using a GUI like GitKraken become evident,
as our changes are displayed in the \emph{Unstaged Files} area
and by clicking on the file we get a really pretty side-by-side comparison of all the changes.
We can now decide which lines we want to \texttt{stage} and \texttt{commit}.
\item The git model looks like the following diagram:\\
\begin{center}

\begin{tikzpicture}[mypostaction/.style 2 args={
decoration={
text align={
left indent=#1},
text along path,
text={#2}
},
decorate
}
]
\node[draw,
fill=Rhodamine!50,
minimum width=3.5cm,
minimum height=3cm,
text width=3cm,
text centered
] (unstaged) at (0,0){File changes in working directory (Unstaged Files)};

\node [draw,
fill=Goldenrod,
minimum width=3.5cm,
minimum height=3cm,
text width=3cm,
text centered,
right=1cm of unstaged
] (staged) {Staged files};

\node [draw,
fill=SpringGreen,
minimum width=3.5cm,
minimum height=3cm,
text width=3cm,
text centered,
right=1cm of staged
] (local) {Local repository};

\node [draw,
fill=SeaGreen,
minimum width=3.5cm,
minimum height=3cm,
text width=3cm,
text centered,
right=1cm of local
] (remote) {Remote repository (GitHub, GitLab) [optional]};

% Arrows with text label
\coordinate (unstageRoot) at (-0.5,2); \coordinate (stageRoot) at (4,2);
\draw[-latex, blue!20!white, line width=2ex] (unstageRoot) to[in=135,out=90] (stageRoot);
\path [postaction={mypostaction={1cm}{stage changes}},postaction={mypostaction={1.5cm}{git add},/pgf/decoration/raise=-3mm}](unstageRoot) to [in=180,out=3] (stageRoot);

\coordinate (stageRoot) at (4.5,2); \coordinate (localRoot) at (9,2);
\draw[-latex, blue!20!white, line width=2ex] (stageRoot) to[in=135,out=90] (localRoot);
\path [postaction={mypostaction={1cm}{commit changes}},postaction={mypostaction={1.5cm}{git commit},/pgf/decoration/raise=-3mm}](stageRoot) to [in=180,out=3] (localRoot);

\coordinate (localRoot) at (9.5,2); \coordinate (remoteRoot) at (14,2);
\draw[-latex, blue!20!white, line width=2ex] (localRoot) to[in=135,out=90] (remoteRoot);
\path [postaction={mypostaction={1cm}{push changes}},postaction={mypostaction={1.5cm}{git push},/pgf/decoration/raise=-3mm}](localRoot) to [in=180,out=3] (remoteRoot);
\end{tikzpicture}
\end{center}
You do your work in your \textbf{working directory}.
On the \textbf{stage} you collect all the changes that you want to save.
This is very powerful because sometimes it is just individual lines of code or text that you want to keep track of and not the whole file.
Once you've tracked all the changes that you want to combine, it is time to collect these changes into a \textbf{commit}.
A commit is a permanent snapshot of the files that git tracks stored in the .git directory. It is associated with a unique identifier (hash).
In other words, a commit is like a snapshot in time; you can always revert back to this and see what changes were made compared to any other commit.
On your local repository (i.e. on your local machine) you now have a nice versioned history.
However, if you want to collaborate with others or sync your repository to a specialized cloud provider you need to push these changes to a so-called remote repository,
typically on GitHub, GitLab, but any folder that you can access via remotely might serve as a remote repository.
\item Click on the file and select \texttt{Stage File} or add each line by clicking on the plus or minus signs left to each line.
Once you are happy with the file, click on the X to close the file-comparison window.
We now don't see any unstaged files and can proceed to write a commit message and then click on the big green button.
\item A \emph{good commit} typically does one discrete task or change only.
For example, you added a variable to the regression specification in the code, in the output and in the report.
Or you changed the name of a variable and treat it properly across multiple scripts.
This enables you to make meaningful commit messages like \emph{Add year dummies to regression specification}
and you thus end up with a well organized repository.
This workflow needs some practice and everyone is slightly different with regards to this.
Nevertheless, try to combine changes to certain meaningful smaller tasks and provide good commit messages.
In my experience, having ten tiny commits is always preferable to one large commit.
Your future self and collaborators will thank you!

The question to what you should include in your commits,
is also a matter of choice and preference.
Definitely your script files of codes, latex and text files.
Data is also sometimes given as csv files which are basically just text files.
Binary files (like Excel sheets, Word documents, Power Point slides) are a bit tricky to handle,
as you can't see the differences between versions in git.
It depends on the specific needs whether one should commit these files as well (e.g. for Excel files with data this obviously makes sense),
but I usually don't do this.
Note that GitHub doesn't allow files larger than 100 MB or projects with total size larger than 1 GB.
There is also a way to deal with large binary files called \texttt{Git Large File Storage (LFS)}, but we won't need this.
\item Right click on the initial commit and select \emph{Reset main to this commit - Soft}.
Click on the file in the staged files section and remove the last line from the stage.
Re-commit your stage by providing a meaningful commit message and hitting the green button.
Click on Stash to put the remaining changes into the stash.
\item Simply click on Push and add the remote. On the left Panel click on REMOTE to see the current remote (usually named origin).
Note that you can add several remotes (say from different people) and compare the commits.
Remotes are also a nice backup of your codes.
\item Branches are arguably the most powerful part of \emph{git}.
By default you have a \textbf{main} branch,
but what if you want to do some experiments, re-write an estimation function from scratch, work on a new feature, etc?
You could copy the whole folder and start working there or you use git and create a branch and make the changes there.
You can switch between branches, make commits to any branch, move them around, etc.
If your experiment doesn't work out, simply delete the branch.
If your experiments work out, commit them and merge them into the main branch.
Sometimes there will be conflicts which one needs to sort out,
but using GUI tools like GitKraken makes this very easy
as you have a pretty side-by-side comparison of changes.
Branches are arguably the most powerful part of \emph{git} especially for our purposes
as research is a highly nonlinear process, and this way of doing version control is much more similar to how we actually work
than the very linear way that other cloud storage providers do version control.
Branches are also extremely powerful for collaboration
as different people can work on the same thing at the same time.

Select a so-called parent commit, where you want to create a new branch.
Note that this doesn't have to be the latest commit.
Click on the button \emph{Branch} and name it according to the exercise.
On the left panel, click on LOCAL to see an overview of all your branches.
\item Create, copy and paste the three files into your repository.
Check for pasting errors and then \emph{Stage all changes} and commit them.
\item Run the commands and solve any errors you might get from latex.
\item Follow the instructions in the exercise.
Note that there is a difference between \enquote{Ignore} and \enquote{Ignore and Stop Tracking}.
\enquote{Ignore} simply adds the file(type) to the \texttt{.gitignore} file so that new files with that name/type/whatever are not tracked.
To \enquote{Ignore and Stop Tracking} means to remove the file(s) from git version control:
they will no longer be in the repo (as of the commit that performs the "stop tracking").
Basically, use \enquote{Ignore and Stop Tracking} if the file(s) you are ignoring never should have been in the repo in the first place.
\item Make sure you are on the correct branch \emph{latex-exam-template}
and push this branch to GitHub.
Either right click on the commit or go to the left panel, click on PULL REQUESTS and on the green plus sign that appears.
Select the \emph{latex-exam-template} as the FROM REPO branch and \emph{main} as the TO REPO branch.
Enter a Title and Description and click on the green button.
Have a look in GitHub ar the pull request.
As there are no conflicts merge it and go back to GitKraken to see what happens in your repository.
You might need to \enquote{fetch origin} by right clicking on the origin remote.
\item Double click on your local main branch and then click on pull,
which fast forwards your repo to the merged changes.
Then click on Pop to get the WIP codes which were stored on main.
Right click on the README.md file in the \emph{Unstaged Files} area and select \emph{Discard changes}.
\end{enumerate}
1 change: 1 addition & 0 deletions exercises/matlab_quick_tour_solution.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
\lstinputlisting[style=Matlab-editor,basicstyle=\mlttfamily]{progs/matlab/quickTourMatlab.m}
87 changes: 87 additions & 0 deletions exercises/programming_languages_solution.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
\begin{enumerate}
\item General purpose: C/C++, Fortran, Python, Excel.
Domain-specific: MATLAB, Julia, R, Mathematica, EViews.
\item Every program is a set of instructions, say to add two numbers.
Compilers and interpreters take human-readable code and convert it to computer-readable machine code.
In a compiled language, the target machine directly translates the program.
In an interpreted language, the source code is not directly translated by the target machine.
Instead, a different program, aka the interpreter, reads and executes the code.
Some modern languages like Python can have both compiled and interpreted implementations,
but for simplicity's sake it is useful to keep in mind the distinction.

Compiled languages like Fortran, C or C++ are usually fastest, more efficient and more powerful,
but they are harder to learn and harder to code in.
They also require a build step, i.e. they need to be compiled.
Interpreted languages like Python, R, Mathematica, MATLAB, R or Julia are slower,
but easier to learn and faster to code in.
Interpreters run through a program line by line and execute each command.
Interpreted languages tend to be very similar in the syntax,
but differ in best practices and concepts.

Interpreted languages were once significantly slower than compiled languages.
But, with the development of just-in-time (JiT) compilation, that gap is shrinking.
MATLAB and Julia are two very prominent examples that make use of JiT compilation,
that is they combine both worlds.

You can also make use of e.g. Fortran or C++ code in MATLAB, R, Python or Julia;
that is, write very CPU-intensive tasks in a compiled language
and use them in an interpreted language.

\item Learning a programming language is a huge investment;
however, once one has knowledge of one, learning another one tends to be easier
as they are based on similar principles.
Try to stick with popular choices as the choice of learning resources and communities
that help you learn this language are wider spread,
i.e. googling for help is much easier for Python than for Fortran.
Often the project you are working on dictates which programming language you should use.
The general purpose languages can be used in many non-scientific applications,
so your investment might payoff in very different fields in the end.

In scientific computing, particularly in Macroeconomics,
we are often faced with CPU intensive problems
and need to prototype models and methods quickly.
An interpreted language like MATLAB or Julia that does just-in-time compilation
is therefore best suited for such tasks.
Moreover, having some basic knowledge in C++ is advisable
to write computational intensive tasks in a compiled language
and reuse this as e.g. so-called MEX files in MATLAB.
However, the main determining factor is by looking at legacy code of the last 20-30 years of research done
in quantitative and computational Macroeconomics,
we see that most was and still is conducted in MATLAB,
whereas highly intensive tasks were programmed in Fortran.
So keep in mind, that you need to understand this legacy codebase.
In the last couple of years, researchers in Macroeconomics are really pushing Julia.
New developments like Machine Learning require you to invest in Python.
For writing scientific reports and papers you should get familiar with Latex and Markdown.

Another issue to consider is the license, cost and support of the language maintainers.
Most programming languages are free and open-source,
others like MATLAB are proprietary and are quite expensive
(free and open-source clones like Octave tend to be very slow unfortunately).
Regardless of the license, having a good governance structure,
i.e. a board, cooperation or company driving the development of the language,
is very important for the sustainability of the language
and for your investment in a computer language.

Lastly, and very importantly, have a look at the toolset available for the languages.
Which Integrated Development Environment (IDE) do you like best?
Which code editor do you prefer?
How good are the debugging capabilities of your chosen environment.
Things like syntax highlighting, smart indentation, code linting, comparison tools,
handling of workspace, etc. are very important.
Some languages like MATLAB bring their own IDE in one big package and it works very well.
Others like Julia, Python or C++ can be neatly integrated in a variety of environments;
in fact Visual Studio Code has become the leading editor and environment for many languages,
but of course there are many other great choices depending on your needs and preferences.

So which computer languages should you devote your time into,
if you are interested in computational or quantitative macroeconomics?
\\
\textbf{Here is my opinionated advice:}
\begin{itemize}
\item Default languages (excellent knowledge): Julia and MATLAB
\item Data analysis and Machine Learning (advanced knowledge): R and Python
\item Heavy tasks (basic knowledge): C++ and Fortran
\item Scientific writing (advanced knowledge): Latex and Markdown
\end{itemize}
\end{enumerate}
Loading

0 comments on commit 4f1de44

Please sign in to comment.