Determine the relation between two
URL
s.
Node.js >= 14
is required. To install, type this at the command line:
npm install url-relation
const URLRelation = require('url-relation');
const url1 = new URL('http://domain.com/');
const url2 = new URL('http://domain.com/#hash');
const options = {
components: [URLRelation.HASH],
ignoreComponents: true
};
if (URLRelation.match(url1, url2, options)) {
// considered the same
}
component
is the same as targetComponent
.
ignoredComponents
is the same as components
. However, if it's value is a non-empty array, it will also set ignoreComponents
to true
.
const URLRelation = require('url-relation');
const url1 = new URL('http://domain.com/');
const url2 = new URL('http://domain.com/#hash');
const relation = new URLRelation(url1, url2, options);
if (relation.upTo(URLRelation.HASH, [URLRelation.HASH])) {
// considered the same
}
if (relation.upTo(URLRelation.PATH)) {
// considered the same
}
It is simplest to use an option profile, but custom configurations are still possible.
Type: Array<Symbol>
Default value: []
A list of URL components for ignoreComponents
. See URL Components for possible values.
Type: Object
Default value: {}
A map of protocol default ports for ignoreDefaultPort
. Be sure to include the suffixed ":" in the key. Common protocols already have their ports removed.
Type: Boolean
or Function
Default value: true
When set to true
or a function that returns true
, a URL's components specified in components
will be ignored during comparison.
Type: Boolean
or Function
Default value: true
When set to true
or a function that returns true
, a URL's port that matches any found in defaultPorts
will be ignored during comparison.
Type: Boolean
or Function
Default value: Function
When set to true
or a function that returns true
, a URL's file name that matches any found in indexFilenames
will be ignored during comparison.
Type: Boolean
or Function
Default value: Function
When set to true
or a function that returns true
, a URL's empty query parameters (such as "?=") will be ignored during comparison. This option will be silently skipped if the input URL
s do not support URLSearchParams
.
Type: Boolean
or Function
Default value: false
When set to true
or a function that returns true
, a URL's query parameters matching queryNames
will be ignored during comparison. This option will be silently skipped if the input URL
s do not support URLSearchParams
.
Type: Boolean
or Function
Default value: Function
When set to true
or a function that returns true
, the order of unique query parameters will not distinguish one URL from another. This option will be silently skipped if the input URL
s do not support URLSearchParams
.
Type: Boolean
or Function
Default value: false
When set to true
or a function that returns true
, empty segment names within a URL's path (such as the "//" in "/path//to/") will be ignored during comparison.
Type: Boolean
or Function
Default value: Function
When set to true
or a function that returns true
, a URL's "www" subdomain will be ignored during comparison.
Type: Array<RegExp|string>
Default value: ['index.html']
A list of file names for ignoreIndexFilename
.
Type: Array<RegExp|string>
Default value: []
A list of query parameters for ignoreQueryNames
.
Type: Symbol
Default value: URLRelation.HASH
The URL component at which to limit—and include in—the relation from left to right. See URL Components for more info and for possible values.
When an option is defined as a Function
, it must return true
to be included in the custom filter:
const options = {
ignoreIndexFilename: (url1, url2) => {
// Only URLs with these protocols will have their index filename ignored
return url1.protocol === 'http:' && url1.protocol === 'https:';
}
};
CAREFUL_PROFILE
is useful for a URL to an unknown or third-party server that could be incorrectly configured according to specifications and common best practices.
COMMON_PROFILE
, the default profile, is useful for a URL to a known server that you trust and expect to be correctly configured according to specifications and common best practices.
An example of checking for a trusted hostname:
const dynamicProfile = (url1, url2) => {
const trustedHostnames = ['domain.com'];
const isTrusted = trustedHostnames
.reduce((results, trustedHostname) => {
results[0] = results[0] || url1.hostname.endsWith(trustedHostname);
results[1] = results[1] || url2.hostname.endsWith(trustedHostname);
return results;
}, [false,false])
.every(result => result);
return URLRelation[`${isTrusted ? 'COMMON' : 'CAREFUL'}_PROFILE`];
};
const url1 = new URL('http://domain.com/');
const url2 = new URL('http://domain.com/#hash');
const profile = dynamicProfile(url1, url2);
const custom = {
...URLRelation.COMMON_PROFILE,
indexFilenames: ['index.html', 'index.php']
};
Or:
const extend = require('extend');
const custom = extend(true, {}, URLRelation.COMMON_PROFILE, { indexFilenames:['index.php'] });
AUTH HOST PATH
__|__ ___|___ ______|______
/ \ / \ / \
USERNAME PASSWORD HOSTNAME PORT PATHNAME SEARCH HASH
___|__ __|___ ______|______ | __________|_________ ___|___ |
/ \ / \ / \ / \ / \ / \ / \
foo://username:password@www.example.com:123/hello/world/there.html?var=value#foo
\_/ \_/ \_____/ \_/ \_________/ \________/
| | | | | |
PROTOCOL SUBDOMAIN | TLD SEGMENTS FILENAME
|
DOMAIN
The components of URLs are compared in the following order:
PROTOCOL
USERNAME
PASSWORD
AUTH
TLD
DOMAIN
SUBDOMAIN
HOSTNAME
PORT
HOST
SEGMENTS
FILENAME
PATHNAME
SEARCH
PATH
HASH
As you may have noticed, there are a few breaks in linearity:
TLD
is prioritized beforeDOMAIN
because matching a domain on a different top-level domain is very uncommon (but still possible viaignoreComponents
).SUBDOMAIN
is prioritized afterDOMAIN
.
Other considerations:
- URLs with invalid domain names, reserved domains, unlisted TLDs or IP addresses that have been determined to have related
HOSTNAME
components will also have relatedTLD
,DOMAIN
andSUBDOMAIN
components due to the above mentioned comparison order only; not because they actually have those components.