MexCep exposes two separate but structurally parallel bulk-import pipelines: one creates beneficiaries, the other launches validations. Both share the same async loop (Symfony Messenger over the Redis imports stream) and the same UI contract — the user uploads a file, the system parses it, the client reviews the preview, edits dubious rows, and confirms the commit when satisfied.
Beneficiaries
POST /v1/beneficiaries/imports — multipart with file + parse_mode (template or free). Accepted on cookie or API key. Formats: csv, xls, xlsx, txt, pdf. 20 MB cap.
The parser is hybrid manual+LLM: CsvParser / XlsxParser / PdfParser runs first for the structured formats or the standard template. When ParseResult::shouldEscalateToLlm returns true (typically a scanned PDF or unstructured free text), LlmAccountExtractor sends the file to Gemini 2.5 Flash and reparses the output through the same RowProcessor. The LLM never persists raw rows — every extracted account passes the same CLABE/Luhn checks as the manual path.
The parser's terminal rows fall into five buckets (src/Beneficiaries/Import/Dto/ParsedRow.php):
valid— passes every check, ready for commit.correctable— some field is invalid but the user can fix it in preview.fatal— an error the user cannot fix (e.g. required fields missing).duplicate_account—account_numberalready exists in the user's list.duplicate_alias—aliascollides with another active beneficiary of the same user.
Validations
POST /v1/validations/imports — multipart with file + upload_type (file for CSV/XLSX/etc. or images for receipts) + parse_mode. Cookie-only (routes use AuthMethodMiddleware::allow('cookie')), no API key. file formats: csv, xls, xlsx, txt, json, zip (text). images formats: PNG/JPG/WebP individually, multi-select (PHP receives them as file[] and the server packs them into a tmp ZIP), or a ZIP directly.
For image imports the OCR runs in preview, not in commit — each image goes through ImageSanitizer (EXIF/XMP/ICC strip) and then through the same runOcrPipeline as a single OCR call, guaranteeing byte-for-byte parity. Rows whose tipo_comprobante is recognized and whose NormalizationService::normalize completes land in the valid bucket; the rest go to correctable or fatal. The buckets for this pipeline are four: valid, correctable, fatal, duplicate.
For each image batch the system allocates a sticky proxy (ProxyPoolService::allocateBatchSession) so the N Banxico requests share a single NAT session, and a slot semaphore (BatchSlotSemaphore, default cap 5 from bulk_validations.per_batch_concurrent) caps how many rows of the same batch may be in flight simultaneously — independent from the global per-user concurrency cap.
The preview → commit flow
Same for both pipelines:
- Upload →
pending(orparsingonce the worker claims it). - Parse →
preview_readywith rows binned. UI polls every 2 s. - Preview →
GET /v1/beneficiaries/imports/{id}/previeworGET /v1/validations/imports/{id}/preview— paginated, filterable bybuckets[]. - Edit →
PATCH .../rows/{row_id}re-runsRowProcessorover the edited fields; a row can jump fromcorrectabletovalid(or back). - Commit →
POST .../commitingests onlyvalidandcorrectablerows. The duplicate bucket is never ingested. - Cancel →
DELETE /v1/beneficiaries/imports/{id}(or its validations sibling) — only for non-terminal jobs.
One active import per user
Both pipelines enforce a 1-in-flight rule scoped to user_id. ImportJobService::handleUpload checks for a sibling job in pending, parsing, preview_ready, or committing; if one exists, it replies 422 import_already_in_flight. The rule is per-pipeline (an in-flight beneficiary import does NOT block a validation import) — they use separate tables and locks.
When a job is stuck in parsing longer than expected (a dead worker that did not release the lock), validation imports have a guard that detects the stale lock and clears it on the next upload from the same user.