Releases: httpreserve/tikalinkextract
Tikalinkextract 0.0.3
Tikalinkextract 0.0.3
adds the ability for users to use custom protocols.
This commit gives users the ability to specify custom protocols in an extensions file so that uri-types that may not be as well-known can be identified by the link scanner. Examples might include those used by content management systems as internal mechanisms of accessing information.
An extension file might looks as follows:
{
"Extensions": [
"pw://",
"info:ark/",
"info:pronom/",
"info:hdl/"
]
}
And using the extensions-test
folder distributed with the tool, can be tested as follows: ./tikalinkextract --file extensions-test/ -extensions "extensions.json" 2> /dev/null
where the output will look like:
extensions-protocols.txt, pw://somedata.dat
extensions-protocols.txt, info:ark/somedata.dat
extensions-protocols.txt, info:pronom/somedata.dat
extensions-protocols.txt, info:hdl/somedata.dat
For more information about tikalinkextract please take a look at my Open Preservation Foundation blog about it.
tle-0.0.2
To support my latest OPF blog. All releases contain Apache Tika 1.16 for maximum usability, https://tika.apache.org/download.html
Releases available for Windows and Linux.
Blog
Hyperlinks in your files? How to get them out using tikalinkextract