Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comprehensive enhancements for Salesforce connector #1859

Closed
18 tasks done
danajuratoni opened this issue Oct 30, 2023 · 22 comments
Closed
18 tasks done

Comprehensive enhancements for Salesforce connector #1859

danajuratoni opened this issue Oct 30, 2023 · 22 comments

Comments

@danajuratoni
Copy link
Contributor

danajuratoni commented Oct 30, 2023

Acceptance criteria

  • Updated connector timeline
  • Removal of Global Admin as role requirement in Salesforce connector.
    • This is a sensitive role that customers prefer not to grant or outright refuse to grant. The connector should change to no longer require that role.
  • Salesforce objects are represented 1:1 in Elasticsearch docs (removal of current unique mapping system)
  • Custom object and custom field support
    • Ability to ingest all custom objects and their associated fields
    • Ability to index specified custom objects
    • Ability to index all custom fields for an object without explicitly specifying them all, to reduce the URI size and avoid this long URI error on the Salesforce side
      • In Workplace Search content source it is documented having to specify individual fields in the indexing rule in order to index custom object fields. However, URI Too Long error is returned when using indexing rules to specify hundreds of fields to be indexed from a custom object in Salesforce. The better approach is to add an option to configure that all fields from the custom object must be automatically indexed.
  • Discovery document with DLS for Salesforce
  • Add DLS support for Salesforce as defined in the discovery
  • Functional tests performed
  • DLS documentation for Salesforce
    • with several examples illustrating the syntax, how a basic use case works, limitations
  • Regression test suite performed for the release along with QA day for all supported OS in https://www.elastic.co/support/matrix
  • Performance test/ reports against the last stable release branch of the connector framework and in Elastic Cloud
  • Updated connector docs
  • Demo of the connector with DLS in latest builds in Elastic Cloud in GDrive
  • Sign off given from an engineer in Ingestion
    • Code backported to the correct version
    • Automated tests to 90% coverage
@navarone-feekery
Copy link
Contributor

Custom field support

I think this could be implemented with Advanced Sync Rules. We could set it up identically to WS Salesforce connector. We could do something like:

{
  "custom_fields": [
    {
    "include": "account",
      "filter_type": "object_type",
      "fields": [
        {
          "remote": "AccountNumber",
          "target": "title"
        },
        {
          "remote": "Phone",
          "target": "description"
        }
      ]
    }
  ]
}

@moxarth-rathod
Copy link
Contributor

We've one more option to fetch all fields including custom ones using SOQL query - SELECT FIELDS(ALL) FROM OBJECT. There would be a lot of custom fields and by using SOQL we can fetch them all.

As per this point - Ability to index all custom fields for an object without explicitly specifying them all, we are supposed to index all custom fields. So, if the requirement is to fetch all custom fields of an object, SOQL would be a good option.

Also, the RCF seems easy to configurable to user for specifying custom objects for this point - Ability to index specified custom objects

what's your view on it? @danajuratoni @navarone-feekery

@navarone-feekery
Copy link
Contributor

We've one more option to fetch all fields including custom ones using SOQL query - SELECT FIELDS(ALL) FROM OBJECT. There would be a lot of custom fields and by using SOQL we can fetch them all.
As per this point - Ability to index all custom fields for an object without explicitly specifying them all, we are supposed to index all custom fields. So, if the requirement is to fetch all custom fields of an object, SOQL would be a good option.

The downside/limitation to using SELECT FIELDS(ALL) is that it will literally fetch everything, even non-custom fields that we don't care about. I can see in the documentation that SELECT FIELDS(CUSTOM) might be better (link). It can even be combined with the fields we specifically request, e.g. SELECT Id, Name, FIELDS(CUSTOM) is viable syntax.

We currently map the result to an elastic doc format in this class, so including all custom fields will return some fields not included here that have a __c suffix. Without a 1:1 mapping for this, we would have to infer what the appropriate key on the elastic doc should be. For example if we have a custom field called someCustomField__c, for the elastic doc key we could either keep the suffix (e.g. someCustomField__c) or remove it (someCustomField).

Currently all elastic doc keys are snake case. @danajuratoni would you prefer if we convert the custom fields to snake case? The above someCustomField__c would become some_custom_field__c. A quick google search shows we can easily do this with regex (link).

@navarone-feekery
Copy link
Contributor

navarone-feekery commented Nov 9, 2023

Also, the RCF seems easy to configurable to user for specifying custom objects for this point - Ability to index specified custom objects

I want to avoid RCFs for this. This would be a dynamic JSON input that the user would need to configure. I think using Advanced Sync Rules is a better fit. The Kibana UI supports it and the JSON is much easier to edit. We can add custom objects to the example I made above, like this:

{
  "indexing": {
    "rules": [
      {
        "object": "account",
        "all_custom_fields": false,
        "fields": [
          {
            "remote": "AccountNumber",
            "target": "title"
          },
          {
            "remote": "Phone",
            "target": "description"
          }
        ]
      },
      {
        "object": "campaign",
        "all_custom_fields": true,
        "fields": []
      },
      {
        "object": "idea",
        "all_custom_fields": false,
        "fields": [
          {
            "remote": "IdeaNumber",
            "target": "idea_number"
          },
          {
            "remote": "Origin",
            "target": "origin"
          }
        ]
      }
    ]
  }
}

In this structure the object field contains the name of the object in Salesforce. It can be a standard or a custom object (a custom object's inclusion in this list means that it will be ingested).
all_custom_fields could be a boolean. If true we use the SELECT FIELDS(CUSTOM) SOQL syntax to get all fields. If it's false we use the values in fields to decide what fields to pull, and what key they should map to on the elastic doc.}

In the above example, account and campaign are standard objects that we already ingest. idea is a hypothetical custom object.

@navarone-feekery
Copy link
Contributor

Removal of Global Admin as role requirement in Salesforce connector.

I followed documentation here to configure the OAuth permissions for a custom app. We would need to figure out which of these permission(s) are the lowest required to do all of the operations we need.

@navarone-feekery
Copy link
Contributor

Discovery document with DLS for Salesforce

I had a look and found that we didn't have DLS set up for the Workplace Search Salesforce connector (list of ACL-enabled WS sources), so we would be starting from zero here.

@moxarth-rathod
Copy link
Contributor

The downside/limitation to using SELECT FIELDS(ALL) is that it will literally fetch everything, even non-custom fields that we don't care about. I can see in the documentation that SELECT FIELDS(CUSTOM) might be better (link). It can even be combined with the fields we specifically request, e.g. SELECT Id, Name, FIELDS(CUSTOM) is viable syntax.

Agreed, SELECT FIELDS(CUSTOM) was already on my radar. This query - SELECT Id, Name, FIELDS(CUSTOM) can be a good option to fit in our requirement. And we would need to apply the LIMIT - OFFSET for the pagination if we use FIELDS function.

@moxarth-rathod
Copy link
Contributor

I want to avoid RCFs for this. This would be a dynamic JSON input that the user would need to configure. I think using Advanced Sync Rules is a better fit.

I feel the same. Advanced sync rules would be more flexible for user to specify objects and fields.

@navarone-feekery I've drafted a schema

{
    "objects": [
        {
            "object": "Account",
            "object_type": "standard",
            "all_custom_fields": false,
            "custom_fields": [
                {
                    "field": "BirthYear",
                    "field_type": "custom"
                }
            ],
            “standard_fields”: [
                {
                    "field": "AccountNumber",
                    "field_type": "standard"
                }
            ]
        },
        {
            "object": "Connector",
            "object_type": "custom",
            "all_custom_fields": true,
            "custom_fields": [],
            “standard_fields”: [
                {
                    "field": "ConnectorName",
                    "field_type": "standard"
                }
            ]
        }
    ]
}

@moxarth-rathod
Copy link
Contributor

moxarth-rathod commented Nov 10, 2023

Ability to ingest custom objects

@navarone-feekery just a question, in which case all custom objects should index? Are we planning ingest all pre-defined objects (

RELEVANT_SOBJECTS = [
) + all custom objects if advanced sync rules are not enabled?

cc: @danajuratoni

@navarone-feekery
Copy link
Contributor

I've drafted a schema

@moxarth-elastic Thanks for thinking of a sample schema! I have a few concerns about it though.

In the custom_fields and standard_fields we probably want to allow the user to also define what key will be used in Elastic. That was what the target key was in my schema above. Also specifying object_type and field_type can be removed, as all custom objects and fields are suffixed with __c already.

I chose remote and target as key names because that's what we use in workplace search already. Configuration parity would help existing Salesforce users migrating to this connector. The keys refer to the same things but renaming them could be confusing.

I'm happy with having objects be the top-level key but I think we should keep fields remote and target for the above reasons.

Also thinking back I don't like all_custom_fields as it's not clear what it means. I think we should change it to ingest_all_custom_fields. WDYT?

Your sample with edits:

{
    "objects": [
        {
            "object": "Account",
            "ingest_all_custom_fields": false,
            "fields": [
                {
                    "remote": "BirthYear__c",
                    "target": "custom"
                },
                {
                    "remote": "AccountNumber",
                    "target": "account_number"
                }
            ]
        },
        {
            "object": "Connector__c",
            "ingest_all_custom_fields": true,
            "fields": [
                {
                    "remote": "ConnectorName",
                    "target": "connector_name"
                }
            ]
        }
    ]
}

@moxarth-rathod
Copy link
Contributor

moxarth-rathod commented Nov 10, 2023

In the custom_fields and standard_fields we probably want to allow the user to also define what key will be used in Elastic. That was what the target key was in my schema above. Also specifying object_type and field_type can be removed, as all custom objects and fields are suffixed with __c already.

Oh, i see. I've added a type just to add __c manually in the code so that user don't need to care about it. I guess, this is also fair if user will add object/fields with suffix __c 👍

I'm happy with having objects be the top-level key but I think we should keep fields remote and target for the above reasons.

Totally agreed, we should make this consistent with the WS connector.

Also thinking back I don't like all_custom_fields as it's not clear what it means. I think we should change it to ingest_all_custom_fields. WDYT?

+1. Even i was thinking the same, ingest_all_custom_fields is more clear ➕

The updated schema looks good to me 👍

@navarone-feekery
Copy link
Contributor

navarone-feekery commented Nov 13, 2023

Another point I've found out (this time about DLS):

Salesforce DLS exists for both Objects and Fields. This is an issue for us for two main reasons (cc @danajuratoni):

  1. We have never restricted access based on fields before, so we have no precedence for it. We would need to figure out how this works from scratch. I think this would be a serious project.
  2. We currently don't do a 1:1 mapping of Fields for Salesforce documents -> Elastic documents. We process some fields and combine others (the extent of this can be seen in the SalesforceDocMapper class). This mapping would need to be reverted to a 1:1 mapping for any connector with field-based access control enabled.

@navarone-feekery
Copy link
Contributor

Regarding Field Level Security (FLS), we do support it in Elasticsearch, just not in any connectors yet. Here's some extra documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/field-level-security.html

@navarone-feekery
Copy link
Contributor

@moxarth-elastic I've spoken with the team on this topic and there are a few points that we would like to see implemented/changed.

  1. The mappings are a problem with this connector and are blocking development of other features (Advanced Sync Rules and Field Level Security). I've added another checkpoint on this issue to remove the custom mappings that we currently have and instead have a 1:1 representation of Salesforce objects in Elastic. Some discovery will be needed to see the extent that this impacts things, but this is probably the first task that needs to be done so we don't continue to build on the connector in its current state.

  2. We've discussed a few schemas for Advanced Sync Rules here, but we haven't considered the possibility of allowing users to generate their own queries using SOQL or SOSL. Can you investigate that as well? I think that could be a lot more flexible than copying something that was done in Workplace Search.

@moxarth-rathod
Copy link
Contributor

I've added another checkpoint on this issue to remove the custom mappings that we currently have and instead have a 1:1 representation of Salesforce objects in Elastic.

Yes, i think this should be done because we never know what fields will be there in custom objects and might get missed while doing custom mapping. Even we've done a 1:1 representation in Jira connector due to custom fields' presence.

but we haven't considered the possibility of allowing users to generate their own queries using SOQL or SOSL. Can you investigate that as well?

Yes, SOQL can be a flexible approach. @navarone-feekery Do you have any draft schema in mind?

@moxarth-rathod
Copy link
Contributor

@navarone-feekery @danajuratoni For the SOQL query, there is a limitation i have recently come across that if we use FIELDS function (e.g. FIELDS(ALL) to fetch all fields or FIELDS(CUSTOM) to fetch all custom fields) then the maximum value we can set for LIMIT is 200 and OFFSET is 2000. So, ultimately we can fetch 2200 documents if fetch all standard/custom fields.

@moxarth-rathod
Copy link
Contributor

@navarone-feekery please refer this documentation to remove global admin permission role to fetch data from salesforce - https://docs.google.com/document/d/1HL2SNhdAvXN7SakBXdBN1AwjPhVkJpudhkbRVwElAv0/edit#heading=h.cjdfeh86h0i.

@moxarth-rathod moxarth-rathod self-assigned this Dec 21, 2023
@moxarth-rathod
Copy link
Contributor

@danajuratoni
Copy link
Contributor Author

@navarone-feekery please refer this documentation to remove global admin permission role to fetch data from salesforce - https://docs.google.com/document/d/1HL2SNhdAvXN7SakBXdBN1AwjPhVkJpudhkbRVwElAv0/edit#heading=h.cjdfeh86h0i.

@moxarth-elastic Were there any changes related to this needed in the connector? Which connector version is the first one that can be used without global admin permission role?

@moxarth-rathod
Copy link
Contributor

@moxarth-elastic Were there any changes related to this needed in the connector? Which connector version is the first one that can be used without global admin permission role?

There were no changes related to this in the connector. The current version can be used with the suggested approach in the doc.

@danajuratoni
Copy link
Contributor Author

cc: @navarone-feekery ticket is waiting on the engineer signoff before closing it

@navarone-feekery
Copy link
Contributor

Everything looks good, thanks heaps @moxarth-elastic, great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants