
Why Dirty CRM Data Is Costing You Deals
Your sales team logs "Apple inc" in one record, "Apple Inc." in another, and "Apple Computer" in a third. Marketing can't segment by industry because half your records say "Tech," a quarter say "Technology," and the rest are blank. Your VP asks for a list of enterprise software companies, and you spend two hours manually cleaning duplicates and standardizing names before you can even answer the question. Meanwhile, your CRM—supposedly your single source of truth—is riddled with inconsistent company names, missing industry classifications, and outdated information that makes every report unreliable.
Time saved: Reduces 2-3 hours of manual data cleaning per week to 10-15 minutes of AI-assisted enrichment with spot-checking
Consistency gain: Standardizes company naming conventions and industry classifications across your entire database, eliminating reporting errors and segmentation headaches
Cognitive load: Eliminates the frustrating detective work of "Is this the same company?" and frees sales ops to focus on pipeline analysis instead of data janitorial work
Cost comparison: Prevents revenue leakage from missed opportunities—when reps can't find existing accounts due to naming inconsistencies, they create duplicates or miss cross-sell chances. Clean CRM data directly improves account-based targeting and win rates.
CRM enrichment is perfect for AI delegation because it combines pattern recognition (matching variant company names), data normalization (standardizing industry categories), and research synthesis (finding authoritative sources)—tasks where AI excels once you define the rules and quality standards.
Here's how to delegate this effectively using the 5C Framework.
Why This Task Tests Your Delegation Skills
Data enrichment reveals whether you understand delegation as specification, not just instruction. A competent junior analyst can't clean your CRM without knowing your naming conventions, your industry taxonomy, how to resolve conflicts when sources disagree, and what level of confidence constitutes "good enough" versus "needs manual review."
This is delegation engineering, not prompt hacking. Just like training a new data analyst, you must specify:
- Authority hierarchy (when LinkedIn conflicts with Crunchbase, which source wins?)
- Standardization rules (how do you want "Inc.", "Incorporated", and "Corporation" handled?)
- Edge case protocols (what happens with subsidiaries, acquisitions, or holding companies?)
The 5C Framework forces you to codify data governance decisions into AI instructions. Master this SOP, and you've learned to delegate any data cleaning task—from email normalization to address standardization to contact deduplication.
Configuring Your AI for CRM Data Enrichment
| 5C Component | Configuration Strategy | Why it Matters |
|---|---|---|
| Character | Data quality analyst with sales operations background, trained in CRM hygiene best practices and B2B company research methodologies | Ensures AI applies data governance principles—understanding legal entity vs. brand name, recognizing parent-subsidiary relationships, and distinguishing authoritative sources from unreliable ones |
| Context | Your CRM platform, existing naming conventions, industry taxonomy (your categories vs. standard classifications like NAICS/SIC), data sources you trust, and tolerance for ambiguity | Different CRMs have different constraints—Salesforce account hierarchies work differently than HubSpot company records. Your industry categories must match your segmentation needs, not generic taxonomies |
| Command | Analyze messy company names and industries, research authoritative sources, standardize to your conventions, flag ambiguous cases requiring human review, provide confidence scores for each enrichment | Prevents AI from making authoritative-sounding guesses—you need enrichment with quality indicators so you can decide which automated changes to trust versus which need verification |
| Constraints | Use only specified data sources; never invent information; maintain original data in separate field for audit trail; flag subsidiaries and recently acquired companies; limit to active companies only (exclude defunct entities) | Stops AI from confidently standardizing "Apple Computer" when it might legitimately be a different company than Apple Inc., or from updating acquired companies to their new parent without flagging the change |
| Content | Provide examples of your standardized company names (showing how you handle legal suffixes, punctuation, capitalization), your industry category definitions with examples, and cases where you've made judgment calls on ambiguous situations | Teaches AI your organization's specific conventions—whether you use "Inc." or "Inc" (no period), how you categorize companies that span multiple industries, and your threshold for "Technology" vs. "Software" vs. "SaaS" |
The Copy-Paste Delegation Template
<role>
You are a data quality analyst specializing in CRM enrichment and B2B company research. You understand the difference between legal entity names and brand names, recognize parent-subsidiary relationships, and know how to evaluate source authority when data conflicts. You prioritize accuracy over speed and flag ambiguity rather than guessing.
</role>
<context>
I need to enrich and standardize company records in our CRM. Our current data has inconsistent naming (legal suffixes, capitalization, abbreviations) and incomplete/inconsistent industry classifications.
Our data standards:
- Company name format: [Specify your convention, e.g., "Use official legal name with standard suffix format: 'Inc.' not 'Incorporated', capitalize properly, no extra punctuation"]
- Industry taxonomy: [Provide your categories and definitions, e.g., "We use these 12 primary industries: SaaS, Enterprise Software, Hardware, etc. - see definitions below"]
- Trusted sources (in priority order): [e.g., "1. Company's official website, 2. LinkedIn company page, 3. Crunchbase, 4. SEC filings for public companies"]
- Confidence threshold: [e.g., "Flag for manual review if confidence is below 80% or if sources significantly conflict"]
Our industry definitions:
[Paste your industry categories with 2-3 example companies for each to train the pattern]
</context>
<instructions>
Follow this sequence:
1. **Analyze input data** to identify:
- Company name variants that likely refer to the same entity (consider common misspellings, legal suffix variations, abbreviations)
- Missing or vague industry classifications (blank fields, generic entries like "Other" or "Services")
- Data quality red flags (all caps, obvious typos, outdated information)
2. **Research each company** using approved sources:
- Start with company's official website for legal name and self-described industry
- Cross-reference with LinkedIn company page for employee count and current status
- Check Crunchbase for funding stage and detailed categorization
- For public companies, verify with recent SEC filings
- Document which sources you used for each field
3. **Standardize company names** following these rules:
- Use the official legal entity name as it appears on the company website or SEC filings
- Apply consistent formatting for legal suffixes [follow your specified convention]
- Preserve official capitalization and punctuation (e.g., "eBay" not "Ebay", "T-Mobile" not "T Mobile")
- Flag subsidiaries and note parent company relationship
- If company has been acquired within the last 2 years, flag for review with acquisition date
4. **Classify industry** based on:
- Primary business model and revenue source (not just what they call themselves)
- Match to our defined taxonomy using the example companies as reference patterns
- If company spans multiple categories, choose primary based on revenue majority
- If unclear or company has pivoted recently, flag for manual classification
- Never use generic fallbacks like "Technology" or "Services" without specificity
5. **Generate output** with this structure for each record:
- **Original Name:** [As it appeared in CRM]
- **Standardized Name:** [Your recommended name following conventions]
- **Industry Classification:** [Your category assignment]
- **Confidence Score:** [0-100%, based on source agreement and clarity]
- **Data Sources:** [List sources used and note any conflicts]
- **Flags for Review:** [List any ambiguities, recent changes, or edge cases requiring human judgment]
- **Rationale:** [Brief explanation of classification decision, especially for non-obvious cases]
Quality control rules:
- If multiple name variants exist in CRM, identify potential duplicates and note the record IDs
- Never invent information—if industry is unclear after research, mark as "Needs Manual Classification" rather than guessing
- Preserve original messy data in your output so I can verify changes before implementing
- For any company with significant recent news (acquisition, bankruptcy, major pivot), include a note
Output as a structured table or CSV format ready for CRM import review.
</instructions>
<input>
Paste your messy CRM data below. Include these fields:
- Record ID (so we can match back to CRM)
- Current company name (as it appears in your system)
- Current industry classification (if any)
- Any additional context (recent notes, deal stage, etc.)
Format: CSV, TSV, or structured list
Example input:
Record ID: 12345
Company: apple inc
Industry: (blank)
Context: Enterprise deal, procurement contact
Record ID: 12346
Company: APPLE COMPUTER
Industry: Tech
Context: SMB deal, closed-lost 2023
Record ID: 12347
Company: Apple, Inc.
Industry: Consumer Electronics
Context: Strategic account, active opportunity
[PASTE YOUR CRM DATA HERE]
</input>The Manager's Review Protocol
Before importing AI-enriched data into your CRM, apply these quality checks:
- Accuracy Check: Spot-check 10-15 enriched records by manually verifying against the cited sources—did AI correctly interpret company websites and LinkedIn pages, or did it misread context? Confirm legal name formatting matches your actual conventions (check a few you know well like Microsoft, Salesforce, etc.). Verify industry classifications against the actual business model, not just marketing language.
- Hallucination Scan: Ensure AI didn't invent company information that doesn't exist in cited sources or classify based on assumptions rather than research. Check that confidence scores genuinely reflect data quality—low scores should correspond to genuinely ambiguous cases. Verify any flagged parent-subsidiary relationships are current and accurate. Watch for AI "confidently" standardizing rare company names that might actually be correct as-is.
- Tone Alignment: Confirm industry classifications match your organization's segmentation strategy, not generic taxonomies. For instance, if you distinguish "SaaS" from "Enterprise Software" for targeting purposes, verify AI isn't collapsing these categories. Ensure naming conventions match your existing high-quality records—some orgs prefer "Google LLC" while others use "Google"—consistency matters more than absolute correctness.
- Strategic Fitness: Evaluate whether enriched data actually solves your business problem—can you now segment effectively for campaigns, generate accurate reports, and identify account relationships? Check if AI flagged the right edge cases (acquisitions, subsidiaries, industry ambiguity) that genuinely require sales ops judgment. Verify the enrichment adds value proportional to effort—if half the records need manual review anyway, the delegation framework needs adjustment.
Build your SOP Library, one drop at a time.
We are constantly testing new ways to delegate complex work to AI. When we crack the code on a new "Job to be Done," we send the SOP directly to you, fresh from the lab.
Our Promise: High signal, low noise. We email you strictly once a week (max), and only when we have something worth your time.
When This SOP Isn't Enough
This SOP solves one-time CRM data cleaning projects, but sales operations teams typically face ongoing data decay—new records come in messy, reps bypass validation rules, and company information changes over time. The full 5C methodology covers automated enrichment workflows (integrating with CRM data entry to catch problems at source), continuous monitoring systems (detecting when company data drifts from standards), and multi-system synchronization (keeping data consistent across CRM, marketing automation, and data warehouses).
For cleaning a backlog of messy data, this template works perfectly. For maintaining data quality at scale, preventing future decay, or building automated enrichment pipelines, you'll need the advanced delegation frameworks taught in Sorai Academy.