OCR for supplier invoices is one of the most-requested modules by Cameroonian SMEs switching to SynkriaOps. The need is clear: reduce manual entry from 3-4 minutes per invoice to 30 seconds of visual validation.
But generalist “Anglo-Saxon” OCR performs poorly on Cameroonian invoices. Here is what we learned to calibrate.
The photo quality problem
First field observation: 80% of invoices are photographed with a smartphone, not scanned. Consequences:
- Inconsistent lighting — often near a window, frequent backlight.
- Tilt — invoice rarely flat, sometimes hand-held.
- Reflections — on the glossy paper of hotel chains for example.
- Folds — especially on delivery slips collected in batch.
The OCR must perform perspective correction + denoising + deskew preprocessing before attempting text extraction. Without this, the error rate on numbers explodes.
The NIU format problem
The NIU (Unique Identification Number) is the Cameroonian equivalent of the
French SIRET: 13 alphanumeric characters (e.g., M101212345678A). It
identifies a taxpayer and must appear on every invoice.
A generalist OCR looks for an EU VAT (FR12345678901), a US EIN (12-3456789), or a French SIRET (123 456 789 00012). None of these patterns matches a Cameroonian NIU.
SynkriaOps uses a dedicated regex and a plausibility score (the first letter
is generally M or P depending on status). Without this explicit
extraction, tax audits become painful (the NIU is mandatory in the “supplier
identity” field of an accounting entry).
The VAT at 19.25% problem
Cameroonian VAT stands at 19.25% (since 2005). Many OCRs pre-trained on French or German content calibrate their heuristics on 20% or 19%, and reject a VAT line at 19.25 as “probably not a VAT amount”.
Required adjustment: widen the VAT plausibility window to common CEMAC rates. Table:
| Country | Standard rate | Reduced rates |
|---|---|---|
| Cameroon | 19.25% | 0% (exemptions) |
| Gabon | 18% | 5% and 10% |
| Chad | 18% | — |
| Congo | 18.9% | 5% and 8% |
| Central African Republic | 19% | 5% |
| Equatorial Guinea | 15% | — |
Without this table, OCR “smooths” the VAT and loses precision on rounding (see next pitfall).
The bilingual labels problem
In Cameroon, many invoices are in primarily French but with English mentions (“Invoice”, “Tax”, “Subtotal”) or pidgin on wholesale delivery slips. The OCR must recognize all of:
Montant HT↔Subtotal↔Net amountTaxe sur la valeur ajoutée↔VAT↔Sales taxTotal TTC↔Grand total↔Amount due
Typical error: taking Subtotal for the gross amount and Total for the net
amount (full meaning inversion in some US-imported formats).
What SynkriaOps does — and does NOT do
What SynkriaOps OCR does:
- Perspective + denoising + deskew preprocessing (pdfjs-dist + OpenCV)
- Text extraction per block with coordinates
- Dedicated patterns: NIU CM/GA/CG, VAT 19.25/18/15, XAF currency
- Confidence score per field
- Mandatory human validation before accounting entry (no auto-validation)
What it does NOT:
- Auto-validation of invoices > 1 million XAF (configurable threshold)
- Proprietary learning on each tenant’s data (privacy by design — we do not aggregate data across SMEs)
- Advanced fraud detection (forged signature, fake NIU) — that is the human accountant’s role
Field measurement
On 600 supplier invoices from the Douala pilot SME (see community testimonial), net extraction rate is 100% (every expected field is extracted), with human validation maintained at 100% during the first 6 months — this is a deliberate discipline, not a technical limit.
Effective human corrections represent 1.8% of fields over the period — typically a debit/credit inverted on a poorly-calibrated credit note or a mis-weighted rounding line.
Technical details: apps/api/src/modules/ocr/. The service uses
@anthropic-ai/sdk 0.78.0 for the final LLM layer (structured extraction),
backed by in-house image preprocessing. The full pattern is documented with
E2E tests test/ocr-realtime.e2e-spec.ts and test/pdf-ocr-import.e2e-spec.ts.