Corral Works
Product

Duplicate Detection That Actually Works — Before the Import, Not After

DawsonSoft Team
#corral-works#import-studio#duplicate-detection#data-quality#ellucian-advance

The duplicate problem nobody wants to talk about

Every advancement database has duplicates. That’s not controversial. What is controversial is how they got there — because in most cases, the answer is “we imported them.”

Not on purpose, obviously. But the standard import workflow has a structural blind spot: it checks whether a row is valid, but it doesn’t check whether that row already exists. So you upload 500 new phone numbers, 480 of them land on the right records, and 20 of them create brand-new contact records because the system couldn’t match them to anyone.

Now you have 20 phantom contacts. They’ll accumulate mail. They’ll show up in queries. Eventually someone will notice, merge them by hand, and wonder why the duplicate count never goes down.

Import Studio takes a different approach: detect duplicates before the write, not after.


How traditional duplicate detection works (and why it fails at import time)

Most CRM systems — including Dynamics 365 underneath Advance — have built-in duplicate detection rules. They fire when a record is created or updated through the UI. The problem is:

  1. They fire after the fact. The record already exists by the time the rule triggers. You’re now in cleanup mode.
  2. They’re designed for one-at-a-time entry. When you bulk-import 2,000 rows, you get 2,000 individual duplicate warnings — which most import processes suppress to avoid halting.
  3. They’re system-wide. You can’t customize the matching logic for a specific import scenario.

So the practical reality is: bulk imports bypass meaningful duplicate detection. Teams compensate by pre-deduplicating in Excel, which works until it doesn’t.


How Import Studio handles duplicates

Import Studio performs duplicate detection as a pipeline step — after validation, before the CRM write. Every row goes through the detector, and matches are surfaced in the pre-flight review so you can decide what to do before anything is written.

Here’s how it works, step by step:

  1. Parse the row — read each column from your file
  2. Validate the fields — check formatting, required values, data types
  3. Detect duplicates — search CRM for existing records that match
  4. Decide the action:
    • No match found → create a new record
    • Match found → update the existing record instead
    • Ambiguous match → flag for human review

The critical difference: steps 1–3 all happen before anything is written to your database. By the time a record is created or updated, Import Studio already knows it’s not a duplicate.

Two detection strategies

Import Studio supports two complementary approaches to finding duplicates:

1. Alternate Key Matching

You define a combination of fields that together uniquely identify a record. For example:

If the incoming row matches an existing record on all key fields, it’s treated as an update rather than a create. This is deterministic — it either matches or it doesn’t.

Example: Alternate Key Setup

Key fields (all must match for a row to be considered a duplicate):

  1. Organization ID
  2. First Name
  3. Last Name

When a match is found, you choose what happens:

  • Update the existing record with the new data
  • Skip the row entirely
  • Flag it for a human to review

This gives you certainty: if all three fields match an existing record, Import Studio knows it’s the same person and updates them instead of creating a duplicate.

2. Column-Based Detection

For fuzzier scenarios — where you’re not sure the key fields will match exactly — you can designate detection columns. Import Studio queries CRM for records that match on those columns and flags them as potential duplicates for human review.

This is useful when:


What the duplicate report looks like

When Import Studio finds potential duplicates, it doesn’t just say “23 duplicates found.” It gives you a downloadable report showing exactly which incoming rows matched which existing records, and on which fields:

RowYour FileAlready in CRMMatched On
14John Smith, jsmith@univ.eduJohn Smith, j.smith@univ.eduOrg ID + Last Name
87Mary Chen, 617-555-0142Mary Chen, 617-555-0142Org ID + First + Last
203Robert J. JohnsonBob JohnsonOrg ID + Last Name (first name mismatch)

Row 203 is the interesting one. The system found a match on Organization ID and Last Name, but the first names don’t match (“Robert J.” vs. “Bob”). A naive system would either skip it or overwrite it. Import Studio flags it so a human can decide.


Create, Update, or Skip — you choose

For each import template, you configure what happens when a duplicate is detected:

ScenarioWhat Import Studio does
No match foundCreates a new record
Exact key matchUpdates the existing record (upsert)
Partial matchFlags for review — no automatic action
Multiple matchesFlags for review — ambiguous, needs human

This means your “create” imports won’t accidentally update, and your “update” imports won’t accidentally create. The operations are explicit, not inferred.


Duplicate detection + validation = clean imports

The real power comes from combining duplicate detection with the validation pipeline. Consider this scenario:

You’re importing 1,200 contact updates from an alumni event registration system.

Without Import Studio:

With Import Studio:

The difference isn’t just fewer duplicates. It’s fewer surprises. Your data team isn’t spending Friday afternoon cleaning up Monday’s import.


A note on performance

“Doesn’t checking every row for duplicates make imports really slow?”

It depends on the detection strategy. Alternate key matching uses indexed lookups — it’s fast even at thousands of rows. Column-based detection does run a query per row, which is slower but still faster than cleaning up duplicates after the fact.

For large imports (5,000+ rows), Import Studio batches the detection queries and caches results. If your file has 200 rows with the same Organization ID, it queries that org once, not 200 times.


Where this is headed

Import Studio’s duplicate detection is currently available in DLight and is migrating to Corral Works. In Corral Works, we’re adding:


Next in this series

This is part two of a three-part series on Import Studio:

  1. Stop dreading data imports — What Import Studio is and how the five-step workflow eliminates guesswork
  2. This post — How duplicate detection catches problems before they’re created
  3. Building your own import templates — How to design reusable templates with validation rules, transformations, and custom logic

Import Studio is included in Corral Works. If duplicate records are a recurring pain point for your team, reach out for a demo.

← Back to Blog