-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathwebarchives.html
executable file
·170 lines (123 loc) · 8.81 KB
/
webarchives.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
title: UAlbany Web Archiving Program
subtitle: 'preserving albany.edu and more since 2012'
layout: one_column
permalink: /webarchives/
---
<style>
#webArchivesSearch .form-group {
/*text-align: center;*/
}
#webArchivesSearch .form-group .form-control {
/*display: inline;*/
}
#webArchivesSearch .btn {
font-size: 18px;
padding: 6px 16px;
}
</style>
<script>
function uaWayback(){
var action_src = "https://wayback.archive-it.org/org-652/*/" + document.getElementById("uaWaybackURL").value;
var your_form = document.getElementById('uaWayback');
your_form.action = action_src ;
}
</script>
<div class="col-md-12">
<p>Web Archives are preserved websites that can be "replayed" even if the original changes or goes offline!</p>
</div>
<div class="col-md-12">
<form class="form" id="uaWayback" onsubmit="uaWayback()">
<div class="form-group">
<label for="basic-url">Find a URL</label>
<div class="input-group input-group-lg">
<input type="text" id="uaWaybackURL" class="form-control" placeholder="https://">
<div class="input-group-append">
<button class="btn btn-primary input-lg" type="submit"><i class="fa fa-play" aria-hidden="true"></i></span></button>
</div>
</div>
<label for="basic-url">https://wayback.archive-it.org/org-652/*/...</label>
</div>
</form>
</div>
<div class="pb-2 mt-4 mb-2 border-bottom col-md-12">
<h3>Purpose</h3>
</div>
<div class="row">
<div class="col-md-4 order-md-last">
<div class="card">
<div class="card-body">
<h5 class="card-title">Search Web Archives</h5>
<form class="form-inline" id="webArchivesSearch" action="{{ site.url }}/webarchives/search">
<div class="form-group col-xs-12">
<div class="input-group">
<input class="form-control" name="query" id="exampleInputName2" placeholder="Search" type="text"></input>
<div class="input-group-append">
<button class="btn btn-primary" type="submit"><i class="fa fa-search" aria-hidden="true"></i></button>
</div>
</div>
</div>
</form>
</div>
</div>
</div>
<div class="col-md-8 order-md-first">
<p>An important part of the mission of The M. E. Grenander Department of Special Collections and Archives is to collect, preserve, and provide access to the official records of the University at Albany. The University website is one of the primary sources of information about campus policies, administrative activities, curricular changes, news, and events. Much of this information can no longer be found in print, so in order to preserve these records for posterity or to meet our legal requirements, it is necessary to collect parts of the web.</p>
<p>The goal of the web archiving program is to preserve records that deserve to be retained for the long term in accordance with our <a href="{{ site.url }}/policy">collection development policy</a>. Primarily, this includes collecting the <a href="https://www.albany.edu">website of the University at Albany, SUNY</a> to insure all permanent public records can be accessed in the future. Typically, we also collect the websites of organizations whose paper records we hold, as these group continue to provide more and more documentary evidence online.</p>
<p>We are committed to preserving websites in their original form to preserve their original context and structure. We expect future researchers to get more value out of web archives, than the same content printed to paper or flat PDF documents.</p>
<p>Websites are inherently transitory. As pages are edited and updates are made, older information is lost. Therefore, website archiving cannot be a one-time procedure, it must be done regularly in order to accurately capture the changing nature of web-based information.</p>
</div>
</div>
<div class="pb-2 mt-4 mb-2 border-bottom col-md-12">
<h3>How Websites are Collected</h3>
</div>
<div class="col-md-12">
<p>Websites that have been selected for the archives are harvested periodically using the Archive-It service from the Internet Archive. In a given web crawl, the <a href="https://github.com/internetarchive/heritrix3">Heritrix crawler</a> begins from a seed (a high level domain, such as www.albany.edu) or set of seeds, and then automatically harvests successive layers of the website by following links from this seed. This process continues for a specified duration or until a specified number of documents have been harvested.
<p>The frequency and depth of harvesting varies depending on the website. For example, Archive-It performs a daily shallow crawl of the top level University domain (www.albany.edu) and a more comprehensive monthly crawl to attempt to capture the entire UAlbany web presence, including a longer list of University domains.</p>
</div>
<div class="pb-2 mt-4 mb-2 border-bottom col-md-12">
<h3>How to Access Web Archives</h3>
</div>
<div class="col-md-12">
<p>There are several ways to access the University's web archives. If a given collection includes archives of an organization's website, the web archives will be included within that organization's <a href="{{ site.rooturl }}/description">collection page</a>.
<br/>
<p><a href="https://wiki.albany.edu/display/SCA/How+Web+Archives+URLs+Work">More Details on how Web Archives URLs work</a></p>
<div class="col-md-12">
<div class="panel panel-primary" style="margin-top: 20px;">
<div class="panel-heading">
<h4 class="panel-title">Web Archives Collections</h4>
</div>
<div class="panel-body">
<ul>
<li><a href="https://archives.albany.edu/description/catalog/ua940">Website of the University at Albany, SUNY</a></li>
<li><a href="https://archives.albany.edu/description/catalog/ua200aspace_f171df81392544f76aacbab24d111666">University Senate</a></li>
<li><a href="https://archives.albany.edu/description/catalog/ua746aspace_91ae585a4c60a1b6210524cc0a4edcb7">UAlbany Sports</a></li>
<li><a href="https://archives.albany.edu/description/catalog/ua809aspace_e4a66f9cad6f5c3dce8c51a91eed676d">Albany Student Press</a></li>
<br/>
<li><a href="https://archives.albany.edu/description/catalog/apap043aspace_c5982b776cca73d2d2c086bb587b4f17">Business Council of New York State</a></li>
<li><a href="https://archives.albany.edu/description/catalog/apap104aspace_fd970763eb173cb0e231f3f565f1ca46">Environmental Advocates of New York State</a></li>
<li><a href="https://archives.albany.edu/description/catalog/apap331aspace_11decb1396d802acfd57fcd871991bf6">New York Civil Liberties Union</a></li>
<li><a href="https://archives.albany.edu/description/catalog/apap362aspace_feb98853a3f9b30d1c7d65537417a1c8">Parks & Trails New York</a></li>
<li><a href="https://archives.albany.edu/description/catalog/apap361aspace_580384ecf25c510ba1ec8a9650daeb31">Pride Center of the Capital Region</a></li>
<br/>
<a href="https://archives.albany.edu/description/?f[access_subjects_ssim][]=Web+Archives">All Web Archives Collections</a><br/>
<a href="https://archive-it.org/organizations/652">UAlbany Archive-It page</a>
</ul>
</div>
</div>
</div>
</div>
<div class="pb-2 mt-4 mb-2 border-bottom col-md-12">
<h3>Limitations of Web Archiving</h3>
</div>
<div class="col-md-12">
<p>Like photographs or other historical documents, archived websites represent a static snapshot of information. They contain only that information that was available on the website at the time of capture. As a result, archived websites may contain outdated information, broken links, malfunctioning email addresses or errors. Additionally, although the goal of web archiving is to create a complete snapshot, it is often impossible or infeasible to capture 100% of the content and functionality found on a complex webpage. Therefore, content may be missing from a given website due to issues such as rendering Javascript, streaming media, dynamic form and database-driven content and robots.txt exclusions.</p>
<p>Columbia University Libraries have compiled a useful list of website design best practices that can facilitate web archiving by mitigating some of the problems listed above. Website owners are encouraged to take these practices into account and consider the value of long term preservation when making web-design decisions. A link to these best practices can be found below.</p>
<p><a class="btn btn-primary btn-md" href="https://www.loc.gov/programs/web-archiving/for-site-owners/creating-preservable-websites/">Guidelines for Preservable Websites <span class="glyphicon glyphicon-new-window"></span></a></p>
</div>
<div class="pb-2 mt-4 mb-2 border-bottom col-md-12">
<h3>Privacy Issues</h3>
</div>
<div class="col-md-12">
<p>We are eager to hear from website owners who have concerns about content that has been included in our web archives. If you wish to discuss the removal of your website from our web archives, please <a href="https://albany.libwizard.com/f/contactus?i_have_a_questi=Special%20Collections%20%26%20Archives">contact us.</a></p>
</div>