Business entity data,
finally standardized.
One API call normalizes company names across 100+ countries, scripts, and languages. Cyrillic, Chinese, Arabic — handled. Legal suffixes, acronyms, transliteration — automated.
# Normalize a company name
curl -X POST https://api.ambect.com/v1/normalize/company \
-H "Authorization: Bearer $AMBECT_KEY" \
-H "Content-Type: application/json" \
-d '{"name": "Microsoft Corporation", "country": "us"}'
# Response
{
"data"}: {
"canonical": "microsoft",
"legal_type": "corp",
"tokens": ["microsoft"]
},
"meta"}: {
"pipeline": ["lowercase", "legal_suffix"]
"confidence": 0.99,
"ms": 1.4
},
"error": null
}Everything your data pipeline
actually needs.
A multi-stage normalization engine that handles the edge cases your data pipeline encounters in the real world.
Global Script Support
Transliterate Cyrillic, Chinese, Japanese, Arabic, Hebrew, and more to Latin — automatically, by country.
Legal Form Normalization
Strip and standardize legal suffixes across jurisdictions. "GmbH", "S.A.", "LLC", "株式会社" — all handled.
Synonym Expansion
3,000+ curated synonym mappings. "Corp" = "Corporation" = "Korporeishn". Country-specific overrides included.
Acronym Resolution
Expand country-specific acronyms to canonical forms. "BMW" → "Bayerische Motoren Werke". 250+ entries.
Stop Word Removal
Eliminate noise tokens by entity type and country. 850+ stop word mappings across 80+ countries.
Confidence Scoring
Every response includes a confidence score and the exact pipeline stages that fired — full observability.
A multi-stage pipeline,
not a regex.
Each stage is discrete, auditable, and country-aware. The pipeline fires only what's needed — and tells you exactly what ran.
Pipeline stages — POST /v1/normalize/company · input: "ソニー株式会社" · country: jp