Drive sales on autopilot with ecommerce-focused features

See Features

Working at Omnisend

From days to minutes: How Omnisend embedded AI into the data lifecycle

By: Simas Janušas — Feb 24, 2026

In 2025, there is perhaps no organization under the sun that isn’t making data-driven decisions, or at least claiming to. At Omnisend, data isn’t just a claim — it’s the foundation upon which the rest of the house is built.

DataOps & Insights is the source of truth for our decisions: operational reporting, predictive analytics, and self-serve data. Our CI/CD pipeline enforces quality so product teams can move fast without breaking trust.

However, after recently joining the team as a Software Engineer, I noticed a friction common to many data teams: the mechanics of the work were slowing down the logic. We needed to optimize not just our code, but our workflows. After extensive trials, we identified high-ROI applications for Large Language Models (LLMs) that compress our time-to-insight from days to minutes.

Here’s how we did it.

1. The analytics gap: Speed vs. quality

The real challenge in modern data engineering isn’t writing SQL — it’s closing the gap between a sharp business question and a trustworthy answer before the opportunity window closes.

At Omnisend, we realized that while our logic was sound, the “chores” of data modeling were creating a bottleneck.

The friction: Context switching and boilerplate

Building a robust data model requires constant context switching: jumping between dbt conventions, YAML configurations, testing suites, and documentation standards. We faced repetitive scaffolding tasks across every layer of our transformation pipeline (staging → dims → facts → marts).

Every context switch introduced a chance for error, and maintaining consistency across environments became increasingly fragile. We needed a way to automate the rigorous, repetitive parts of the job so our analysts could focus on architecture rather than typing.

The solution: Context-aware modeling with Cursor

We turned to Cursor, an AI-powered code editor. Unlike standard autocomplete tools, Cursor indexes our entire repository, allowing it to understand the specific context of our project structure, data lineage, and naming conventions.

We set up the environment to support the AI:

Repo Indexing: Cursor indexed our data models and documentation, giving it a “map” of our data warehouse
Guardrails and prompts: We established well-scoped prompts aligned with our SQL style
Inline critiques: The AI flags anti-patterns — like CTEs that break incremental models or fan-out joins — before a Pull Request (PR) is even raised

Implementation: From hours to minutes

With these guardrails in place, the workflow shifted dramatically. When an analyst defines a business requirement, Cursor generates the initial data models in seconds. It selects appropriate source tables, generates files in correct project paths (staging/dims/facts), and even pulls column descriptions to auto-populate documentation.

What used to take hours of manual file creation is now done in minutes. The analyst’s role shifts from “writer” to “reviewer.”

A note on hallucinations: It’s important to be realistic: the tool isn’t perfect. It makes mistakes when the token probability sequence gets “confused.” However, getting 90% of the work done instantly allows us to spend our energy on the final, critical 10% of validation.

The impact

Development velocity: A 2-5x increase in model delivery speed by templating YAML and test creation
Improved consistency: SQL and YAMLs now follow a strict standard, reducing data incidents.
Better traversability: The AI enforces a consistent hierarchy (staging > dims > facts > marts), making the codebase easier to navigate, understand, and use when creating a new data model

2. Fewer review cycles, fewer incidents

Once modeling is done locally in Cursor and a PR is raised, the workflow shifts from creation to validation. This is where we hand the baton to Gemini Code Assist.

The challenge: The peer review bottleneck

Peer reviews are critical for quality, but they can become a bottleneck. A human reviewer — especially one from a different product group — might miss subtle deviations from our dbt style guide or fail to spot non-optimal BigQuery functions.

We faced common pain points:

Context blindness: Struggling to understand cross-file context in large diffs
Style drift: Inconsistent formatting making diffs harder to read
Logic gaps: Missing subtle business logic breaks (e.g., attribution order changes) that look syntactically correct but are functionally wrong

The solution: Gemini Code Assist (with strict tuning)

We deployed Gemini Code Assist as our first line of defense. It summarizes diffs by intent, checks against a repo-specific style guide, and proposes concrete fixes.

However, out of the box, the AI was noisy. To make it useful, we had to set up the reviewer just like we set up the writer:

Noise reduction: We tightened the .gemini/config.yaml to prioritize critical findings over nitpicks
Context injection: We added a ./gemini/styleguide.md file containing our specific dbt conventions and governance checks

Real-rorld optimization: The tale of three CTEs

The value of a second AI opinion became clear during a recent refactor. We had a model with three duplicated Common Table Expressions (CTEs).

Cursor (the writer) flagged them but suggested an “if it ain’t broken, don’t fix it” approach, warning that unioning might be slower.

Whilst Gemini (the reviewer) flagged the same duplication, but recommended a concrete optimization: consolidating them into one union with a single unnest/join.

We tested the Gemini-suggested refactor. The result was a ~50% reduction in runtime. This interplay is critical: the drafting AI prioritized speed, while the reviewing AI prioritized architecture.

The impact

30–40% fewer review cycles: Gemini catches syntax and style issues before a human sees them
15–25% reduction in logical errors: Fewer post-merge defects tied to inconsistent logic
Automated governance: The assistant flags PII issues and validates source-of-truth tables automatically

3. Solving data discovery: The “where is X?” problem

As our Superset environment scaled to thousands of assets, it became a victim of its own success. A simple question like “Where can I find our monthly recurring revenue chart for M-segment clients?” required deep platform knowledge or a ping to the data team.

The solution: Embed, index, retrieve

We embedded a Chainlit chatbot directly into the Superset UI.

Ingestion: A daily automated pipeline (via Dagster) extracts metadata from every dashboard and chart
Indexing: Metadata is synced to a vector knowledge base on OpenAI
Retrieval: Chainlit responds through the OpenAI Assistant API, returning ranked assets with direct links when available, or suggesting where results may be found.

It all comes down to context

The power of this approach is understanding data relationships. A marketer recently asked: “How long, on average, does it take merchants to activate forms from the time of creation?“

No pre-built dashboard answered this. However, the Assistant analyzed the intent and correctly identified the relevant dataset and columns needed to calculate the answer. It transformed a “no results” dead end into a self-serve win.

The impact

Silence is golden: A 25–40% drop in “Where is X?” pings to the DataOps team
Forced hygiene: Because the bot relies on metadata, “undocumented” became “invisible,” incentivizing the team to adopt better documentation standards

4. Scaling EDA: 76 hours of video in minutes

Some of our most valuable data isn’t in a database — it’s in unstructured text, such as customer conversations. We recently had 76 hours of Quarterly Business Review (QBR) recordings — a goldmine of client feedback that was practically impossible to analyze manually.

The approach: Bypassing the context window

We used Cursor with Claude-4-Sonnet to build an iterative ETL pipeline for text.

Context definition: We defined a prompt targeting specific topics (metrics, benchmarks, recommendations)
Tool generation: Cursor generated a Python script to process 116 transcript files
Iterative extraction: The script iterated through files, extracting relevant sentences into structured CSVs, which were then summarized

The impact

This approach gave us a blended view of qualitative and quantitative insights: frequency counts of topics alongside exemplar quotes.

More importantly, it democratized the workflow. Vytautas Jakštys, our Product Director, a non-technical leader, now uses this same method. He generates SQL from our dbt docs using Claude, then uses Cursor to analyze customer chats to understand the “why” behind the numbers.

Final thoughts on data as a conversation

We aren’t stapling AI onto our stack for show we are baking it into how Omnisend asks, answers, and acts.

The result is a department that ships models faster, reviews code smarter, and lets everyone find trustworthy data without a guided tour. AI handles the mundane work — building new data models from business requirements, writing YAML documentation and tests, checking syntax and correct model use, validating and reviewing, and finding charts — clearing the runway for us to focus on the real question:

What’s the next step that moves us ahead?

The next step is to continue codifying our judgment into the markdown files: rules, guidelines, styles, and more. It’s an ever-evolving process. As new LLM models emerge, so do new prompting methods and approaches.

Most importantly, such workflows run mostly on well-curated metadata. Your AI is only as good as your documentation.

If you own a dataset, adopt the style guide and certify your assets. You aren’t just helping a human reader today, you’re making the assistant smarter for tomorrow.

Article by

Simas Janušas

Simas is a Senior Software Engineer at Omnisend, with a passion for technology and data, he enjoys getting carried away wherever his curiosity and data leads him.

TABLE OF CONTENTS

Subscribe and don’t miss any updates!

No fluff, no spam, no corporate filler. Just a friendly letter, twice a month.