Duplicates removal

Please visit this project at duplicates-removal-ankihg.herokuapp.com

Runtime analysis

This implementation of removal of duplicates is written in JavaScript and achieves O(n) time complexity.

A quest for linear time complexity

One approach to this algorithm could keep an array of found addresses, iterating over it for each element of the input. Best case, this would achieve O(n) (not accounting for the removal process), where there are no duplicates, but would increase exponentially as the number of duplicates increases, arriving at a worst case of O(n^2) where all elements are duplicated. Could an alternate data structure with more efficient lookups be introduced to track found addresses?

Enter hashing

Taking advantage of the fact that input is email addresses (a String), I use a JavaScript key/value object as a hash. Hashing achieves an O(1) lookup, sparing the expense of iterating through all found elements for each input element. Now it's time to account for the removal process.

Enter linked list

Removal from an array is expensive; worst case O(n) when being removed from the front of the array. Linked lists achieve constant time removal by setting a node's .next property to .next.next. Let's see where these refactors have left us.

Arrival at linear time complexity

Since both duplicate lookup and removal occur within iteration of the input addresses, cutting these processeses to constant time dramatically reduces complexity. Time to step through the algorithm.

The parameter head is a node pointing to a node of the first input address.

Before entering iteration of input addresses, I initialize two objects, a found JavaScript object to track found addresses and a currentNode to track the position in the linked list, initialized as head.

I move through each element of the linked list until a node with no next property is reached. Instead of handling the currentNode on the iteration, I handle currentNode.next for easy removals. Since the initial currentNode is a pointer to the first input address, this approach will not skip the first input element.

The currentNode.next.value is looked up in the found hash in constant time. If it is found, currentNode.next is removed from the linked list in constant time by setting currentNode.next to currentNode.next.next.

Otherwise, if the currentNode.next.value wasn't found, it is now marked on the found object and the currentNode is set to currentNode.next for the next iteration.

Because there is one iteration over n elements, with only constant time operations performed within, the runtime complexity of this implemenation of removal of duplicates is O(n).

function removeDuplicates(head) {

  var found = Object.create(null);              // empty object; does not inherit from protoype to avoid collision
  var currentNode = head;                       // tracks current node

  while (currentNode.next) {                    // iteration through n input elements

    if (found[currentNode.next.value])          // constant time lookup of found addresses
      currentNode.next = currentNode.next.next; // constant time removal of duplicates

    else {                                      // if it has not been found
      found[currentNode.next.value] = 1;        // put value on found hash
      currentNode = currentNode.next;           // go to next element
    }
  }
}

Design considerations

Since no design specifications conflict with doing so, the removal of duplicates is handled on the front-end to simplify architecture. This could be a useful task to run on the client side before sending data to the server to reduce data transmission and save server resources. With an O(n) runtime complexity, it could handle a substantially large input size without sacrificing user experience. This task may be better performed on the back-end where the input size is very large or the client has other resource-intensive responsibilities to prioritize user experience.

One peculiarity of design is the use of linked-list for input and output to the duplicates removal function but arrays for the data display with Angular. With linked list's constant time removal, it is clearly an optimal choice for the duplicates removal problem but Angular's ng-repeat directive which can only be used on JavaScript Collections is the typical approach for iterating over elements with Angular. With more time, I would write an Angular directive to display linked lists.

I decided not to include validation for email addresses because I took the significance of that detail to be that the input is Strings so they can be efficiently looked up through hashing. Email validation could easily be added through Angular if user email address input is allowed.

Front-end architecture

The front end is organized according to Angular's component-based architecture. I recently discovered this design pattern and appreciate it for its ease of navigation, reusability, and representation of data flow.

Name	Name	Last commit message	Last commit date
Latest commit ankihg Update README.md Jul 14, 2016 81899b3 · Jul 14, 2016 History 83 Commits
build	build	css layout, bigger displays	Jul 13, 2016
lib	lib	hello util service	Jul 9, 2016
src	src	utilservice to appCtrl	Jul 14, 2016
test	test	generate dulicates list testing	Jul 10, 2016
.gitignore	.gitignore	gitignore sass-cache	Jul 13, 2016
README.md	README.md	Update README.md	Jul 14, 2016
config.js	config.js	angular setup	Jul 7, 2016
gulpfile.js	gulpfile.js	about-me assets	Jul 9, 2016
package.json	package.json	app and duplicates display components setup	Jul 8, 2016
server.js	server.js	angular setup	Jul 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Duplicates removal

Runtime analysis

A quest for linear time complexity

Enter hashing

Enter linked list

Arrival at linear time complexity

Design considerations

Front-end architecture

About

Releases

Packages

Languages

ankihg/duplicates-removal

Folders and files

Latest commit

History

Repository files navigation

Duplicates removal

Runtime analysis

A quest for linear time complexity

Enter hashing

Enter linked list

Arrival at linear time complexity

Design considerations

Front-end architecture

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages