v1.0 · Now available

Business entity data,
finally standardized.

One API call normalizes company names across 100+ countries, scripts, and languages. Cyrillic, Chinese, Arabic — handled. Legal suffixes, acronyms, transliteration — automated.

# Normalize a company name
curl -X POST https://api.ambect.com/v1/normalize/company \
  -H "Authorization: Bearer $AMBECT_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "Microsoft Corporation", "country": "us"}'

# Response
{
  "data"}: {
    "canonical": "microsoft",
    "legal_type": "corp",
    "tokens": ["microsoft"]
  },
  "meta"}: {
    "pipeline": ["lowercase", "legal_suffix"]
    "confidence": 0.99,
    "ms": 1.4
  },
  "error": null
}
100+
Countries supported
13
Pipeline stages
<5ms
Median response time
99.9%
Uptime SLA
3,000+
Synonym mappings
Capabilities

Everything your data pipeline
actually needs.

A multi-stage normalization engine that handles the edge cases your data pipeline encounters in the real world.

🌐

Global Script Support

Transliterate Cyrillic, Chinese, Japanese, Arabic, Hebrew, and more to Latin — automatically, by country.

⚖️

Legal Form Normalization

Strip and standardize legal suffixes across jurisdictions. "GmbH", "S.A.", "LLC", "株式会社" — all handled.

🔤

Synonym Expansion

3,000+ curated synonym mappings. "Corp" = "Corporation" = "Korporeishn". Country-specific overrides included.

📛

Acronym Resolution

Expand country-specific acronyms to canonical forms. "BMW" → "Bayerische Motoren Werke". 250+ entries.

🚫

Stop Word Removal

Eliminate noise tokens by entity type and country. 850+ stop word mappings across 80+ countries.

📊

Confidence Scoring

Every response includes a confidence score and the exact pipeline stages that fired — full observability.

How it works

A multi-stage pipeline,
not a regex.

Each stage is discrete, auditable, and country-aware. The pipeline fires only what's needed — and tells you exactly what ran.

Pipeline stages — POST /v1/normalize/company · input: "ソニー株式会社" · country: jp

transliteratelowercasepunctuationwhitespacelegal_formacronym_expandstop_wordssynonym_mapsort_tokens