Skip to content

Search Engine API designed to display links and relationships between your query and the rest of the internet.

License

Notifications You must be signed in to change notification settings

stevenayers/opensearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

clamber

Build Status codecov.io Code Coverage Go Report Card Release GoDoc

Fast & efficient web crawler providing an API which provides a means of bidirectional path finding.

Distributed version is in progress. Standalone version is here

The infrastructure stack I have chosen to use is purely AWS because I wanted a project where I could apply all the technologies covered in the AWS DevOps Engineer Certification syllabus.

If I were to choose a stack based on what I believe to be the most appropriate, I would use the following:

  • AWS/Azure/GCP
  • Apache Kafka
  • Dgraph
  • Kubernetes (Preferably PaaS)
  • EFK
  • Prometheus, Alertmanager & Grafana
  • Drop the Page Store and store the Page data in Dgraph.

Software Design

Infrastructure Design

Stay tuned...

Endpoints

Search

Takes a URL, depth, allow_external_links, checks Page Database to see if we already have the info. If we do, query and return it. If not, initiate recursive crawl.

/search will take the following query parameters:

Parameter Type Stability Description
url string Tested starting url for sitemap
depth int Tested -1 is infinite. If you specified 10, that would be your max depth to crawl.
display_depth int Experimental how deep a depth to return in JSON
allow_external_links bool Not Yet Implemented whether to crawl external links or not (Not yet implemented)

Sample response:

{
    "query": {
      "url": "https://example.com",
      "depth": 1, 
      "display_depth": 10,
      "allow_external_links": false
    },
    "status": {
      "message": "5 pages found at a depth of 1.",
      "code": "200"
    },
    "results": [
        {
            "URL": "https://example.com",
            "timestamp": "<time>",
            "links": [
                {
                    "URL": "https://example.com/about",
                    "timestamp": "<time>",
                    "links": []
                },
                {
                    "URL": "https://example.com/contact",
                    "timestamp": "<time>",
                    "links": []
                },
                {
                    "URL": "https://example.com/faq",
                    "timestamp": "<time>",
                    "links": []
                },
                {
                    "URL": "https://example.com/offices",
                    "timestamp": "<time>",
                    "links": []
                }
            ]
        }
    ]
}

url depth startUrl parentUrl

About

Search Engine API designed to display links and relationships between your query and the rest of the internet.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published