Duplicate Detection That Actually Works — Before the Import, Not After

The duplicate problem nobody wants to talk about

Every advancement database has duplicates. That’s not controversial. What is controversial is how they got there — because in most cases, the answer is “we imported them.”

Not on purpose, obviously. But the standard import workflow has a structural blind spot: it checks whether a row is valid, but it doesn’t check whether that row already exists. So you upload 500 new phone numbers, 480 of them land on the right records, and 20 of them create brand-new contact records because the system couldn’t match them to anyone.

Now you have 20 phantom contacts. They’ll accumulate mail. They’ll show up in queries. Eventually someone will notice, merge them by hand, and wonder why the duplicate count never goes down.

Import Studio takes a different approach: detect duplicates before the write, not after.

How traditional duplicate detection works (and why it fails at import time)

Most CRM systems — including Dynamics 365 underneath Advance — have built-in duplicate detection rules. They fire when a record is created or updated through the UI. The problem is:

They fire after the fact. The record already exists by the time the rule triggers. You’re now in cleanup mode.
They’re designed for one-at-a-time entry. When you bulk-import 2,000 rows, you get 2,000 individual duplicate warnings — which most import processes suppress to avoid halting.
They’re system-wide. You can’t customize the matching logic for a specific import scenario.

So the practical reality is: bulk imports bypass meaningful duplicate detection. Teams compensate by pre-deduplicating in Excel, which works until it doesn’t.

How Import Studio handles duplicates

Import Studio performs duplicate detection as a pipeline step — after validation, before the CRM write. Every row goes through the detector, and matches are surfaced in the pre-flight review so you can decide what to do before anything is written.

Here’s how it works, step by step:

Parse the row — read each column from your file
Validate the fields — check formatting, required values, data types
Detect duplicates — search CRM for existing records that match
Decide the action:
- No match found → create a new record
- Match found → update the existing record instead
- Ambiguous match → flag for human review

The critical difference: steps 1–3 all happen before anything is written to your database. By the time a record is created or updated, Import Studio already knows it’s not a duplicate.

Two detection strategies

Import Studio supports two complementary approaches to finding duplicates:

1. Alternate Key Matching

You define a combination of fields that together uniquely identify a record. For example:

Organization ID + First Name + Last Name
External System ID
Email Address + Zip Code

If the incoming row matches an existing record on all key fields, it’s treated as an update rather than a create. This is deterministic — it either matches or it doesn’t.

Example: Alternate Key Setup

Key fields (all must match for a row to be considered a duplicate):

Organization ID

First Name

Last Name

When a match is found, you choose what happens:

Update the existing record with the new data

Skip the row entirely

Flag it for a human to review

This gives you certainty: if all three fields match an existing record, Import Studio knows it’s the same person and updates them instead of creating a duplicate.

2. Column-Based Detection

For fuzzier scenarios — where you’re not sure the key fields will match exactly — you can designate detection columns. Import Studio queries CRM for records that match on those columns and flags them as potential duplicates for human review.

This is useful when:

Names might be spelled slightly differently
You’re importing from an external system with different ID schemes
You want a human to confirm matches before anything is overwritten

What the duplicate report looks like

When Import Studio finds potential duplicates, it doesn’t just say “23 duplicates found.” It gives you a downloadable report showing exactly which incoming rows matched which existing records, and on which fields:

Row	Your File	Already in CRM	Matched On
14	John Smith, jsmith@univ.edu	John Smith, j.smith@univ.edu	Org ID + Last Name
87	Mary Chen, 617-555-0142	Mary Chen, 617-555-0142	Org ID + First + Last
203	Robert J. Johnson	Bob Johnson	Org ID + Last Name (first name mismatch)

Row 203 is the interesting one. The system found a match on Organization ID and Last Name, but the first names don’t match (“Robert J.” vs. “Bob”). A naive system would either skip it or overwrite it. Import Studio flags it so a human can decide.

Create, Update, or Skip — you choose

For each import template, you configure what happens when a duplicate is detected:

Scenario	What Import Studio does
No match found	Creates a new record
Exact key match	Updates the existing record (upsert)
Partial match	Flags for review — no automatic action
Multiple matches	Flags for review — ambiguous, needs human

This means your “create” imports won’t accidentally update, and your “update” imports won’t accidentally create. The operations are explicit, not inferred.

Duplicate detection + validation = clean imports

The real power comes from combining duplicate detection with the validation pipeline. Consider this scenario:

You’re importing 1,200 contact updates from an alumni event registration system.

Without Import Studio:

You upload the file
1,180 rows succeed
20 rows create new contacts (because the system couldn’t match them)
You discover the duplicates three weeks later during a mail merge
Someone spends a day merging records and fixing gift credit

With Import Studio:

You upload the file
Pre-flight check shows: 1,165 matched to existing records, 15 flagged as potential duplicates, 20 failed validation
You review the 15 flagged rows, confirm 12 are real matches (name variations), and exclude 3 that are genuinely new people
You fix the 20 validation errors in your source file
You run the import: 1,177 updates, 3 creates (the genuinely new people), 0 duplicates

The difference isn’t just fewer duplicates. It’s fewer surprises. Your data team isn’t spending Friday afternoon cleaning up Monday’s import.

A note on performance

“Doesn’t checking every row for duplicates make imports really slow?”

It depends on the detection strategy. Alternate key matching uses indexed lookups — it’s fast even at thousands of rows. Column-based detection does run a query per row, which is slower but still faster than cleaning up duplicates after the fact.

For large imports (5,000+ rows), Import Studio batches the detection queries and caches results. If your file has 200 rows with the same Organization ID, it queries that org once, not 200 times.

Where this is headed

Import Studio’s duplicate detection is currently available in DLight and is migrating to Corral Works. In Corral Works, we’re adding:

Saved detection rules — define your matching criteria once, reuse across all templates
Match confidence scoring — see how strong each potential match is, not just whether one exists
Team review queues — route ambiguous matches to a colleague for a second opinion before import

Next in this series

This is part two of a three-part series on Import Studio:

Stop dreading data imports — What Import Studio is and how the five-step workflow eliminates guesswork
This post — How duplicate detection catches problems before they’re created
Building your own import templates — How to design reusable templates with validation rules, transformations, and custom logic

Import Studio is included in Corral Works. If duplicate records are a recurring pain point for your team, reach out for a demo.