Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Replacing Bonsai.io with AWS OpenSearch #97

Closed
windbeneathyourwings opened this issue Oct 19, 2021 · 1 comment
Closed

Investigate Replacing Bonsai.io with AWS OpenSearch #97

windbeneathyourwings opened this issue Oct 19, 2021 · 1 comment

Comments

@windbeneathyourwings
Copy link
Collaborator

windbeneathyourwings commented Oct 19, 2021

1/26/2022

Bonsai cluster has been officially de-provisioned and account closed.
Demo URLs currently run on open search but are pointing at api in dev environment.
Open search needs to be provisioned on prod, configured for fine grain access control, and pages imported.

Closing this since specific issues have been created for those things.


https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html

————————————

Create aws dev domain
Unauthenticated users read only policy
Auenticated users can’t be completely open to write for routes otherwise users could override others routes.

losd from s3
Or limit route writing based on if a route already exists and user is allowed to override.


Created open search dev cluster with 1 node on aws using t3.small ebs storage.
Created new master user: kibana-master - use this user to authenticate curl requests and login to dashboard. In future can explore using iam users. I tried to use an iam user but the user always failed to authenticate against dashboards. Therefore, to circumvent this issue I used a master user instead. This got me into dashboards.

Reference loading data using curl and master user.

https://docs.aws.amazon.com/opensearch-service/latest/developerguide/gsgupload-data.html

there are two main ways to populate the open search cluster. The first is to use curl as described on the above document. This approach will probably be taken for initial testing and possibly initial migration from bonsai using bulk data upload option via a json file upload through curl.

The second option is to use the sdk. The v3 sdk for node is referenced in a linked doc. There is still a question of allowing users of the application readonly access to certain indexes and authenticated users write access. Authenticated users will need to be able to write to specific indexes when creating panel pages.

I would also like to be able to easily tear down and recreate clusters to save costs. This is especially useful for dev where the cluster only needs to be up during development which is going to be a small portion of time. At the very least need a way to export data into files that can be easily reloaded using curl. Otherwise I think the process of setting up the open search cluster again can be manual for now maybe. I'm sure it can be automated using cloudfront or the sdk but not sure if I want to spend my time figuring that out at the moment.


Program has been created to export bonsai clips to json files that can be ingested into aws open search using curl. The program that does this is a separate node js repository under this org named bonsai2opensearch.


aws/aws-sdk-js-v3#2099
cars10/elasticvue#73

I think I've been successfully able to create a signed request and use the angular http client to post it to open search. The current open search api although public (not in a vpc) is behind an api gateway. Using an API gateway seems to have been failed attempt to get CORS to work to allow the browser/client to communicate directly with open search. However, this doesn't seem to work perfectly. The current issue is allow origin headers don't exist on the response from open search and api gateway restricts them from being added. In open search there are settings to enable cors but those settings are disclosed by aws open search or the dashboard. Therefore, I think at this point a custom proxy with needed instead of being able to use api gateway or query directly.

I think there are two main options for a proxy.

The first option is to use angular.json built in proxy.

https://stackoverflow.com/questions/37172928/angular-cli-server-how-to-proxy-api-requests-to-another-server

The second option is to add a proxy endpoint to express app using the below npm package.

https://www.npmjs.com/package/express-http-proxy

I think for simplicity I prefer the first option but we will see how that goes.

Both options are only available when deploying the app as a lambda. Luckily I have been able to successfully deploy the app as a lambda using serverless. So having done that work I'm hoping its possible to "easily" use one of the above aftermentioned methods to proxy requests to open search through the serverside expresss node app running as a lambda.

If I get the proxy working automatically signing the request to open search in the proxy config/code itself is something that could be investigated. However, I think this would require auth state on the lambda and I'm not so sure that is working just yet.

I don't think I will be moving over to the lambda app though just yet. I think still serving from the cdn without serverless is most optimized for the time being. The cdn file will need to be used in links to assets to optimize the lambda app. The link js assets should be pulled from the lambda but the cdn. That is another story through.


I have successfully created a proof of concept that queries elastic. How I did so has been documented in this thread that lead me on path to solving the problem in the first place.

aws/aws-sdk-js-v3#2099

Next steps will be to allow anonymous users to query elastic. I think I just screwed up something simple in the identity provider. This worked before. Need to remove open search vpc domain because it is no longer needed. Also remove lambda proxy since the proxy now resides inside the application.

Create open search crud adaptor. Once this is done the open search crud adaptor can be used instead of rest. Effectively the last step of moving away from verti-go I do so believe and bonsai. Also need to save routes (panel pafges) to open search index when creating and updating panel pages. Effectively the logic in the s3 hook needs to be implemented on the front-end now.


The opensearch proxy has been deployed to api gateway as a lambda. However, something strange is happening. The proxy is failing to catch the request and initiating angular instead.

https://e4cq5a4vfc.execute-api.us-east-1.amazonaws.com/opensearch/classified_ads/_search

On localhost this is working perfectly fine receiving a restricted response when hitting directly since not authenticated.

I identified the problem. There are two files.

main.server.ts
main.lambda.ts

In my attempts to just get this working with lambda I decided to copy and not share the main code. So I just needed to past the proxy code into the lambda file. In the future should find a better means of sharing apis between the main server and lambda. For now though we are just going to continue along our marry way and leave sharing apis without dupicating code for another time.

I don't believe main.server.ts is being used on lambda. Its just there to make local testing easier by running the server via expresss instead of replicated lambda offline environment.

After copying the proxy code to open search to main.lambda.ts the dev environment (aws lambda) works as expected.

GET https://e4cq5a4vfc.execute-api.us-east-1.amazonaws.com/opensearch/classified_ads/_search

{"Message":"User: anonymous is not authorized to perform: es:ESHttpGet"}

Exactly what we want to see since only authenticated users can access the search service at the moment.


Replacing panel page list item search

At the moment the verti-go end-point to search / discover matching panel pages against a route uses this go template to generate the json. This is effectively a go template that builds dynamic json from request params. This need to be replicated somehow in open search. I'm limited in what I can do at the moment in terms of using the body for a post search request since the api that I have built for crud doesn't really support that yet. To support that would really require using params inside a body string rather than just a url. I will need to create an interpretor that is able to replace params or use a template engine to build a body for a request. I consider this to be a separate issue which I think I don't need to consider and can work around using a simpler method to achieve a initial baseline migration to open search.

{{ define "panelpages" }}
{
  "query": {
    "bool": {
      "filter": [
        {
            "bool": {
                "must": [
                    {
                        "bool": {
                            "should": [
                                {
                                    "term": {
                                        "entityPermissions.readUserIds.keyword": {
                                            "value": "*"
                                        }
                                    }
                                },
                                {
                                    "term": {
                                        "entityPermissions.readUserIds.keyword": {
                                            "value": "{{ userId .Req }}"
                                        }
                                    }
                                },
                                {
                                    "term": {
                                        "entityPermissions.writeUserIds.keyword": {
                                            "value": "{{ userId .Req }}"
                                        }
                                    }
                                },
                                {
                                    "term": {
                                        "entityPermissions.deleteUserIds.keyword": {
                                            "value": "{{ userId .Req }}"
                                        }
                                    }
                                }
                            ]
                        }
                    }
                    {{ if $.Req.MultiValueQueryStringParameters.path }},
                    {
                        "bool": {
                            "should": [
                                {{ range $index, $value := $.Req.MultiValueQueryStringParameters.path }}{{ if ne $index 0 }},{{ end }}
                                    {
                                        "term": {
                                            "path.keyword": {
                                                "value": "{{ $value }}"
                                            }
                                        }
                                    }
                                {{ end }}
                            ]
                        }
                    }
                    {{ end }}
                ]
            }
        }
      ]
    }
  },
  "size": 1000
}
{{end}}

This query reduces the routes / panel pages prior to being processed on the front-end further by two main criteria.

The first is restricting results based on permissions.
The second is filtering against matched paths.

Can I create a view in elastic search that can be used as the main panel page list item end-point to generate a similiar or exact result as what is here. If so that seems like it would be the easiest path. I don't think I can easily translate query string params to this json structure using the limited crud or datasource api at the moment.

I think from what I have read open search provides much flexibility in terms of locking down searches now that integration with cognito has been achieved. In theory I think it is possible to lock down elastic to only a view that would be used to query and discover these routes for all normal and unauthenticated users.

I do need to somehow pluck the userId out of the token in open search. That may be a bit troublesome to achieve this end-goal.

More investigation is needed regarding the search options that exist to rebuild this query in open search.

Open search supports URI searches. This would be directly compatible with crud query.

https://docs.aws.amazon.com/opensearch-service/latest/developerguide/searching.html#searching-uri

Search templates is exactly what we want to use. However, I have not seen this feature mentioned in the aws open search docs. Therefore, need to fist confirm it is supported on aws.

https://opensearch.org/docs/latest/opensearch/search-template/

It is kind of buried and not mentioned in the docs but open search on aws DOES support _scripts (saved templates).

https://docs.aws.amazon.com/opensearch-service/latest/developerguide/supported-resources.html

Interesting note there is the support for 3 separate template engines.

  • Painless
  • Lucene Expressions
  • Mustache

To be realistic all I really care about is the site, path, and mapping type at the moment. Exposing those three criteria to everyone has no real security threat.

  • id
  • site
  • path
  • panelPageId
  • !args[] - future addition to provide more granularity - is this needed now that it is part of rules?
  • rules - rules engine rules same as panel pages

Authentication and authorization rules can be added.

Selection rule examples:

  • device
  • screen size
  • role
  • args [has arg or arg =, contains, etc ] --- args can also be used inside other rules as value references, right?

is this set of data best described as a route, path or something completely different. I don't like using the term route since it is already uses in angular. Are they aliases - alias module already exists. They are kind of aliases for a component/entry way into the application.

I don't think the full alias entity needs to be stored in elastic. This will probably be better off as split storage in elastic for initial reduction based on path matching with the exception of roles, and authentication. However, I don't know if there is much to gain there from a security perspective if its not possible to access that type of info when building the query in the search script.

The id, site, and path will probably get me to where I need to go for now. The rest can be added later. None of those other features with rule selection currently exist anyway. It is probably best to think through some of those before implementation. Its best to create a working baseline first with the limited necessary features I think.

As I continue to think about this just using the panel page index for now seems like the best approach. The data can be migrated to a new structure when we decide if a more generic structure is necessary. However, for now the migration can be kept simple. There isn't much functionality hear anyway beyond reducing the set based on site, and path matching in the search query with the exception of permissions which I think is safe to leave out or handle on the front-end if absolutely necessary.

Write panel page list items

Currently in verti-go panel page list items are created in a background lambda when panel pages are saved to s3. This process will need to be replaced. My initial inclination is to do this on the client. Authenticated users would need write access to panel page list item index in open search. This is a bit of security concern though since pre-existing pages can only be written to by users with write permissions to that panel page.

Below is the lambda responsible for indexing entities when added to s3. This lambda also indexes classified entities which are not relevant to this migration since the ULTIMATE goal is to be using panel pages everywhere.

I need to explore what type of path matching can be accomplished with a url query. If the findings are acceptable I think the route forward will be to implement the crud adaptor. I think the crud adaptor is going to be necessary anyway to support the indexing. Knocking it all out with the crud adaptor for open search would be nice.

package main

import (
	"context"
	"goclassifieds/lib/ads"
	"goclassifieds/lib/attr"
	"goclassifieds/lib/cc"
	"goclassifieds/lib/entity"
	"goclassifieds/lib/vocab"
	"os"
	"strings"

	"github.com/aws/aws-lambda-go/events"
	"github.com/aws/aws-lambda-go/lambda"
	session "github.com/aws/aws-sdk-go/aws/session"
	elasticsearch7 "github.com/elastic/go-elasticsearch/v7"
	"github.com/mitchellh/mapstructure"
	"github.com/tangzero/inflector"
)

func handler(ctx context.Context, s3Event events.S3Event) {

	elasticCfg := elasticsearch7.Config{
		Addresses: []string{os.Getenv("ELASTIC_URL")},
	}

	esClient, err := elasticsearch7.NewClient(elasticCfg)
	if err != nil {

	}

	sess := session.Must(session.NewSession())

	for _, record := range s3Event.Records {

		pieces := strings.Split(record.S3.Object.Key, "/")

		pluralName := inflector.Pluralize(pieces[0])
		singularName := inflector.Singularize(pieces[0])

		entityManager := entity.NewDefaultManager(entity.DefaultManagerConfig{
			SingularName: singularName,
			PluralName:   pluralName,
			Index:        "classified_" + pluralName,
			EsClient:     esClient,
			Session:      sess,
			UserId:       "",
			Stage:        os.Getenv("STAGE"),
			BucketName:   os.Getenv("BUCKET_NAME"),
		})

		id := pieces[1][0 : len(pieces[1])-8]
		ent := entityManager.Load(id, "default")

		if singularName == "ad" {
			ent = IndexAd(ent)
		} else if singularName == "panelpage" {
			ent = IndexPanelPage(ent)
		}

		entityManager.Save(ent, "elastic")
	}
}

func IndexAd(obj map[string]interface{}) map[string]interface{} {

	var item ads.Ad
	mapstructure.Decode(obj, &item)

	allAttrValues := make([]attr.AttributeValue, 0)
	for _, attrValue := range item.Attributes {
		attributesFlattened := attr.FlattenAttributeValue(attrValue)
		for _, flatAttr := range attributesFlattened {
			attr.FinalizeAttributeValue(&flatAttr)
			allAttrValues = append(allAttrValues, flatAttr)
		}
	}
	item.Attributes = allAttrValues

	for index, featureSet := range item.FeatureSets {
		allFeatureTerms := make([]vocab.Term, 0)
		for _, term := range featureSet.Terms {
			flatTerms := vocab.FlattenTerm(term, true)
			for _, flatTerm := range flatTerms {
				allFeatureTerms = append(allFeatureTerms, flatTerm)
			}
		}
		item.FeatureSets[index].Terms = allFeatureTerms
	}

	ent, _ := ads.ToEntity(&item)
	return ent

}

func IndexPanelPage(obj map[string]interface{}) map[string]interface{} {

	var item cc.PanelPage
	mapstructure.Decode(obj, &item)

	item.GridItems = make([]cc.GridItem, 0)
	item.Contexts = make([]cc.InlineContext, 0)
	item.Panels = make([]cc.Panel, 0)
	item.RowSettings = make([]cc.LayoutSetting, 0)

	ent, _ := cc.ToPanelPageEntity(&item)
	return ent

}

func main() {
	lambda.Start(handler)
}

It's almost at this point I begin asking myself does it make sense to be using panel page list items anymore or would it be better to make this more generic since that is where I would like to head anyway. There is another issue related to refactoring the routing discovery that should be highly considered as part of this migration process since I think it is directly related. Panel pages don't really need to be indexed only the parts of the panel pages that are used for discovery.

Issues that might be relevant to a refactor.

#25
#34
#101

@windbeneathyourwings windbeneathyourwings changed the title Investigate Replacing Bonsai.oi with AWS OpenSearch Investigate Replacing Bonsai.io with AWS OpenSearch Oct 19, 2021
@windbeneathyourwings
Copy link
Collaborator Author

awinmem

AW(s) OpenSearch

provides ngrx data service wrapping aws v3 client-opensearch

https://docs.aws.amazon.com/es_es/AWSJavaScriptSDK/v3/latest/clients/client-opensearch/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant