For Data Analysts
Efficiently clean and parse large datasets with complex Dutch text patterns β without writing regex from scratch.
Workflow
End-to-End Data Cleaning Workflow
From raw CSV exports to analysis-ready tables β RegExMakker handles the messy Dutch-text transformations that slow down your pipeline.
1. Import & Inspect
Paste a sample of your dataset β up to 5,000 rows β and RegExMakker highlights recurring patterns: phone numbers in +31 format, municipality names with diacritics, postal codes like 1017 AB, and IBAN variations. It flags fields that need regex-based cleaning before you commit to a full ETL run.
2. Match & Refine
Use the interactive builder to compose patterns for Dutch-specific edge cases: compound words (gezaghebbendeinstantie), hyphenated terms (e-health), and regional spellings (gebruik/gebruik). The real-time preview shows every match, including zero-width assertions and named capture groups for downstream parsing.
3. Extract & Replace
Apply your regex across the entire dataset with one click. Extract structured fields (dates in dd-mm-jjjj, amounts in β¬ notation) into new columns, or replace inconsistent entries with canonical values. Every transformation is logged so you can audit or roll back any step.
4. Export & Integrate
Export your cleaned data as CSV, JSON, or a Python/R snippet that reproduces the same regex pipeline. The generated script includes all compiled patterns, flags, and substitution logic β ready to drop into your Jupyter notebook or Airflow DAG.
Ready to automate your data cleaning? Start with a sample dataset or import your own CSV to see how fast RegExMakker can normalize Dutch text at scale.
Start Cleaning Your Dataset View Sample Patterns