MexCep exposes two separate but structurally parallel bulk-import pipelines: one creates beneficiaries, the other launches validations. Both share the same async loop (Symfony Messenger over the Redis imports stream) and the same UI contract — the user uploads a file, the system parses it, the client reviews the preview, edits dubious rows, and confirms the commit when satisfied.

Beneficiaries

POST /v1/beneficiaries/imports — multipart with file + parse_mode (template or free). Accepted on cookie or API key. Formats: csv, xls, xlsx, txt, pdf. 20 MB cap.

The parser is hybrid manual+LLM: CsvParser / XlsxParser / PdfParser runs first for the structured formats or the standard template. When ParseResult::shouldEscalateToLlm returns true (typically a scanned PDF or unstructured free text), LlmAccountExtractor sends the file to Gemini 2.5 Flash and reparses the output through the same RowProcessor. The LLM never persists raw rows — every extracted account passes the same CLABE/Luhn checks as the manual path.

The parser's terminal rows fall into five buckets (src/Beneficiaries/Import/Dto/ParsedRow.php):

  • valid — passes every check, ready for commit.
  • correctable — some field is invalid but the user can fix it in preview.
  • fatal — an error the user cannot fix (e.g. required fields missing).
  • duplicate_accountaccount_number already exists in the user's list.
  • duplicate_aliasalias collides with another active beneficiary of the same user.

Validations

POST /v1/validations/imports — multipart with file + upload_type (file for CSV/XLSX/etc. or images for receipts) + parse_mode. Cookie-only (routes use AuthMethodMiddleware::allow('cookie')), no API key. file formats: csv, xls, xlsx, txt, json, zip (text). images formats: PNG/JPG/WebP individually, multi-select (PHP receives them as file[] and the server packs them into a tmp ZIP), or a ZIP directly.

For image imports the OCR runs in preview, not in commit — each image goes through ImageSanitizer (EXIF/XMP/ICC strip) and then through the same runOcrPipeline as a single OCR call, guaranteeing byte-for-byte parity. Rows whose tipo_comprobante is recognized and whose NormalizationService::normalize completes land in the valid bucket; the rest go to correctable or fatal. The buckets for this pipeline are four: valid, correctable, fatal, duplicate.

For each image batch the system allocates a sticky proxy (ProxyPoolService::allocateBatchSession) so the N Banxico requests share a single NAT session, and a slot semaphore (BatchSlotSemaphore, default cap 5 from bulk_validations.per_batch_concurrent) caps how many rows of the same batch may be in flight simultaneously — independent from the global per-user concurrency cap.

The preview → commit flow

Same for both pipelines:

  1. Uploadpending (or parsing once the worker claims it).
  2. Parsepreview_ready with rows binned. UI polls every 2 s.
  3. PreviewGET /v1/beneficiaries/imports/{id}/preview or GET /v1/validations/imports/{id}/preview — paginated, filterable by buckets[].
  4. EditPATCH .../rows/{row_id} re-runs RowProcessor over the edited fields; a row can jump from correctable to valid (or back).
  5. CommitPOST .../commit ingests only valid and correctable rows. The duplicate bucket is never ingested.
  6. CancelDELETE /v1/beneficiaries/imports/{id} (or its validations sibling) — only for non-terminal jobs.

One active import per user

Both pipelines enforce a 1-in-flight rule scoped to user_id. ImportJobService::handleUpload checks for a sibling job in pending, parsing, preview_ready, or committing; if one exists, it replies 422 import_already_in_flight. The rule is per-pipeline (an in-flight beneficiary import does NOT block a validation import) — they use separate tables and locks.

When a job is stuck in parsing longer than expected (a dead worker that did not release the lock), validation imports have a guard that detects the stale lock and clears it on the next upload from the same user.