Duplicate detection for CRM integrations #1347
Replies: 3 comments 1 reply
-
The main issue I see here is that the burden is on the user to define a bunch of things. Determining what is a duplicate would need to be defined per-integration and per-object. How also do we determine what values to use to check against existing data? Is it email, first name + last name, etc? If so, those fields will be required to be mapped and also required in the submission. We could maybe define these ourselves (contact requires email is easy enough), but things like leads get tricky to pick a unique value. I don't want to have to get users to define what fields are used to determine duplicate checks - this just complicates the mapping process for novice users. Like your thoughts point out, there's a bunch of things to consider, and I can already see the support requests coming through with people complaining their things don't appear in the destination platform. Most integration platforms are smart enough to throw an error if a duplicate is made (and not allowed). This raises another point that sometimes duplicates are desired, which as you say should be opt-in. It's something I'll consider, but it'll be a bit of work to implement for every integration + every data object. |
Beta Was this translation helpful? Give feedback.
-
It is certainly an advanced function and totally get your view point on additional settings and configuration, I'd be more inclined to suggest this an advanced function and make it somewhat less visible by default. Duplicates for me are more for singular entities so contact and organisations, something which should be singular. Leads will often be duplicated in many cases therefore I don't think it's something it should be done on. Much like a relational database, one contact can have many leads, therefore have multiple leads is absolutely right to me. Duplicates being required which is why this should be an opt-in feature. Scenario for creating many duplicates: You offer a range of courses and on each course is a register your interest form which generates a lead. Each course page passes the element ID of itself to the form so the course entry is captured against the form submission (as a hidden field), this is sent to a CRM with the course details captured in the hidden field. Someone could be interested in several courses, so they submitted 3 different forms on three different courses, you've now got three contact records with the same data, with each lead linked to each one. You could argue have a single form with a dropdown of course options instead, but the idea being you aren't actually presenting the course as a selectable field, you are automatically populating it, because you know they are on that course page at that time. To have that single contact and all leads under the parent, these now need to be merged. With a duplicate check, the first form submission would have created a record given a duplicate check would have returned no match (assuming this person has never been seen before), the other two submissions would go through the same duplicate check and would have mostly likely been able to prevent the creation and instead obtain the unique ID/GUID of the first contact record to link to instead, thus the lead data goes to the original contact record, not a duplicate one. Duplicate rules need to be very robust for sure. I have looked at our own rules active in our CRM environment and I don't trust them enough, so even the official RetrieveDuplicates endpoint is out. Therefore if we were to do it, I'd build our own lookup query and edge on the side of caution by needing a minimum of 3 personal identifiable markers e.g. First Name, Last Name, Email or First Name, Last Name, Date of birth. It would be likely too risky to use just email alone, what if someone shared the same email, and First Name and Last Name was different. Doing this makes the duplicate check more robust at the cost of potentially not catching more simple duplicates, but the objective would be to try and reduce duplicates, not remove them entirely given there will always be a data management requirement for any CRM. Dynamics 365 CRM can outright stop duplicates with a simple header in the API request on a create, however it's off by default and likely for a good reason, in most cases you don't want to stop the data from being created, but you might want to check for an existing record first. Basically implementing some form of search/lookup on the entity payload data first, before just sending a POST to create, of course easy said than done with the various area. It's possible this is borderline of bespoke business requirements and therefore extending the CRM integration class and building it into a custom integration is possibly more likely, but I thought I'd create a discussion, just to see thoughts. I'm primarily looking at it from a Dynamics 365 CRM angle, I'm not sure how many Formie users are extending the Dynamics 365 CRM integration outside of the out of the box version, likewise other CRM products like Salesforce, Pardot etc may do a better job of the problem itself, but I'll leave it here for anyone else to comment. |
Beta Was this translation helpful? Give feedback.
-
I've tested a working concept as an initial proof of concept/proof of functionality, but very opinionated to Dynamics 365, given it's implemented directly, but perhaps is a starting point for a future wider CRM feature or at least provides some theory or implementation ideas if it was something to be added. The Example of modifying the contact map logic to include a duplicate check, while maintaining the existing create new record logic. if ($this->mapToContact) {
$contactPayload = $contactValues;
if ($this->duplicateContactCheck && (false !== ($duplicateRecord = $myService->findExistingRecord('contacts', $contactPayload)))) {
$response = $this->deliverPayload($submission, "contacts($duplicateRecord)?\$select=contactid", $contactPayload, 'PATCH');
}
else {
$response = $this->deliverPayload($submission, 'contacts?$select=contactid', $contactPayload);
}
if ($response === false) {
return true;
}
$contactId = $response['contactid'] ?? '';
if (!$contactId) {
Integration::error($this, Craft::t('nottingham-college-module', 'Missing return contactid {response}. Sent payload {payload}', [
'response' => Json::encode($response),
'payload' => Json::encode($contactPayload),
]), true);
return false;
}
} Adding the extra properties of duplicateContactCheck and duplicateAccountCheck allows this behaviour to be limited to an entity and be on a per form basis. One benefit of the Dynamics 365 API is you can send a PATCH request when targeting a specific record, therefore only the values present in the payload will be modified, not the entire record, which is much safer. Given a match was found, certain amounts of data are already going to be present on the contact record, therefore you can update these without risk, with possibly any other data passed over which wasn't already there. As far as I can tell any null values in the mapping are stripped at this stage, so there is less risk of removing data accidentally. There's also an extra defence you can prevent an upsert type request from creating records if they didn't exist already. The findExistingRecord() function, then uses one or more match rules on properties that would potentially be present, however it is not guaranteed that these would always be present. So a check has to be done to determine if the properties are all present to make a query for them, if not, return false, given we can't reliably match the data without all being present. For safety, we must have a minimum of reasonable data present to make an accurate judgement on a duplicate. Dynamics 365 duplicate rules can be less wide, because the process generally is run a duplicate query, get results back, make a judgement based on the possible duplicates reported. This however is done by a human to review the suggested records marked as a duplicate but some might not be. Given we are trusting a lookup result, we have to be more carefun. In the findExistingRecord function, we can define match rules and check the payload to make sure the properties are present, otherwise we'll need to skip over the duplicate check if not all data is available, given we'll get too wide results. public function findExistingRecord(string $entity, array $payload): bool|string
{
// Properties to use for checking entities for duplicates
$propertiesForMatching = [
'contacts' => [
['firstname', 'lastname', 'birthdate'],
['firstname', 'lastname', 'emailaddress1']
],
'accounts' => [
['name', 'address1_postalcode']
]
];
$matchRules = $propertiesForMatching[$entity] ?? null;
if (!$matchRules) {
return false;
}
foreach ($matchRules as $fields) {
$fieldsInPayload = count(array_intersect_key(array_flip($fields), $payload));
// If we don't have all the properties for matching, skip the rule
if ($fieldsInPayload !== count($fields)) {
continue;
}
// Make sure we always query active records only
$filter = ['statecode eq 0'];
foreach ($fields as $field) {
// Date values use a different comparison method
if ($field === 'birthdate') {
$filter[] = "Microsoft.Dynamics.CRM.On(PropertyName='$field',PropertyValue='$payload[$field]')";
}
else {
$filter[] = "$field eq '$payload[$field]'";
}
}
$response = $this->getCrmIntegration()?->request('GET', $entity, [
'query' => [
'$select' => implode(',', $fields),
'$filter' => implode(' and ', $filter)
]
]);
$existingRecord = $response['value'][0] ?? null;
if ($existingRecord) {
break;
}
}
// GUID exists under key accountid, contactid etc...
$entityId = rtrim($entity, 's') . 'id';
return $existingRecord[$entityId] ?? false;
} Using the handy Further thoughts
Just some thoughts and testing. |
Beta Was this translation helpful? Give feedback.
-
CRM systems have to deal with the data problem of duplicate records. Key entities like contacts, organisations which are intended to be singular can often be duplicated due to multiple data capture streams, particularly for multiple campaigns or offering different products/services.
It could be advantageous for Formie to have duplicate detection when mapped against types like contacts, accounts/organisations to prevent duplicate records being created.
The actual way to check for duplicates would differ for each CRM integration and the API methods available, but potentially extending the existing payload/send payload logic to incorporate duplicate checking is possibly a way to achieve it. The debate would be that the data deduplication process should happen at the CRM side, but you could counter that preventing duplicates being created to begin with helps the overall process and creates less work at the CRM side.
The Dynamics 365 CRM example for checking for the potential duplicates:
Querying the result of either method to determine if there was a match returned. Further checking may need to be done to reduce a false positive result. If deemed a duplicate, use the returned record and obtain it's unique ID/GUID, this should then be used against further relational data, rather than creating a new record. This would only apply for certain entities, others are intended to be duplicated e.g. leads.
Further thoughts
Beta Was this translation helpful? Give feedback.
All reactions