Case Study: How to Build a 360° Customer Data Platform with Salesforce, Stripe, Segment, HubSpot, Airflow, dbt, and Snowflake

October 11, 2025

20 min

Head of Software Engineering

Vadym Shvydkyi

INSART’s tech quarterback. Oversees full-stack architecture from backend to data platforms. His team crafts fintech solutions at startup pace while keeping enterprise-grade quality and reliability front-and-center.

For fintech companies, data is both an opportunity and a liability. Every customer touchpoint — from onboarding and KYC checks to transactions and support tickets — generates data. But these data points often live in silos: CRMs, payment gateways, marketing tools, and internal databases all hold pieces of the truth.

Without integration, what emerges is a fragmented view: marketing sees engagement, operations see payments, and compliance sees KYC — yet nobody sees the whole customer.

At INSART, we believe that data integration is the foundation of customer intelligence. A unified “Customer 360°” view transforms disconnected datasets into actionable insight — the kind that drives retention, upsell, and lifetime value.

This case study describes how INSART builds such platforms — integrating Salesforce, Stripe, Segment, and HubSpot into a scalable data architecture using Airflow, dbt, and Snowflake, topped with Looker or Superset dashboards.

The result is a system that lets fintechs understand their users at every layer: financial, behavioral, and emotional.

The Challenge

Fintech products interact with customers across multiple systems:

Onboarding and KYC data sits in compliance tools.
Transactional data lives in Stripe or a proprietary ledger.
CRM data exists in Salesforce or HubSpot.
Behavioral analytics flow through Segment or Mixpanel.

Each source provides a partial story, updated at different times, with its own identifiers and data model.

For example, one customer might appear as:

user_id = 24681 in the product database,
contact_id = A023F00 in Salesforce,
customer_id = cus_9HG72 in Stripe, and
lead_id = 512 in HubSpot.

The goal of the Customer 360° project is to stitch these identities together, unify the data, and make it usable in analytics and personalization engines — all while maintaining compliance with GDPR and PCI DSS.

INSART’s Architectural Vision

INSART approaches Customer 360° systems as data products, not as static warehouses.

They must be:

Continuously updated — reflecting every customer action in near real time.
Reliable — data must be trusted for decision-making.
Secure — personal data must comply with GDPR.
Extensible — new tools (marketing automation, AI scoring) can plug in easily.

To achieve this, INSART employs a four-layer architecture:

Ingestion Layer — APIs and webhooks from all external systems.
Transformation Layer — normalization and modeling using Airflow + dbt.
Storage Layer — a warehouse (Snowflake or BigQuery) optimized for analytics.
Delivery Layer — dashboards and reverse-ETL feeds (Looker, Superset, or Hightouch).

Data Ingestion and Integration

The integration begins by connecting the data sources — each with its unique access pattern and refresh cycle.

Salesforce: Extracted daily through the Salesforce REST API or Bulk API. Objects like Accounts, Contacts, and Opportunities are incrementally synced using the SystemModstamp field for delta loads.
Stripe: Connected via the Stripe API with webhook ingestion for near real-time updates on transactions, subscriptions, and disputes.
Segment: Provides streaming customer events (page views, clicks, sessions) — these are consumed through Segment’s S3 destination or Kafka connector.
HubSpot: Synced through HubSpot’s CRM API, with incremental updates for marketing campaigns, email engagement, and customer support tickets.

Each source data set is serialized into JSON, versioned with schema metadata, and stored in the Bronze layer of an S3 data lake (or GCS for GCP deployments).

Airflow orchestrates the ingestion DAGs:

@dag(schedule_interval="@hourly", tags=["customer360", "ingestion"])
def customer360_ingestion():
    salesforce_data = extract_salesforce()
    stripe_data = extract_stripe()
    segment_data = extract_segment()
    hubspot_data = extract_hubspot()
    combine_sources([salesforce_data, stripe_data, segment_data, hubspot_data])

Each Airflow task runs inside a KubernetesPodOperator for scalability and isolation. Logs are streamed to CloudWatch or Stackdriver, ensuring traceability and auditability of every API call.

Transformation and Modeling

Once raw data lands in the lake, INSART’s engineers apply dbt (data build tool) for modeling.

Here, each raw source table is transformed into structured, analytics-ready tables in Snowflake or BigQuery.

The process includes:

Data normalization: Mapping common fields like email, account_id, country, and created_at to unified naming conventions.
Entity resolution: Matching customer records across systems using probabilistic and deterministic methods (e.g., fuzzy matching on email and name, or exact match on external IDs).
Business modeling: Building derived models such as customers_master, transactions_enriched, engagement_summary, and customer_lifetime_value.

Example dbt model:

-- models/customers_master.sql
SELECT
  COALESCE(sf.email, hs.email, st.email) AS email,
  sf.account_id,
  st.customer_id AS stripe_id,
  hs.lead_id AS hubspot_id,
  MAX(sf.created_date, st.created, hs.created_at) AS first_seen,
  COUNT(tr.id) AS total_transactions,
  SUM(tr.amount) AS total_spent,
  AVG(sess.duration) AS avg_session_time
FROM {{ ref('salesforce_contacts') }} sf
LEFT JOIN {{ ref('stripe_customers') }} st ON sf.email = st.email
LEFT JOIN {{ ref('hubspot_leads') }} hs ON sf.email = hs.email
LEFT JOIN {{ ref('transactions') }} tr ON tr.customer_id = st.customer_id
LEFT JOIN {{ ref('segment_sessions') }} sess ON sess.email = sf.email
GROUP BY 1,2,3,4

These models create a single source of truth: one table per customer, one version of every metric.

Validation and Data Quality

To ensure trust, every dbt model is validated using Great Expectations.

Typical checks include:

No duplicate emails or IDs.
Non-null account_id and email.
Transaction totals are positive.
Timestamps fall within valid ranges.

Any data anomaly triggers Airflow alerts to Slack and creates a ticket in Jira for investigation.

This automated quality gate ensures that marketing automation and risk scoring models consume only verified data.

Warehousing and Performance

All transformed data lives in Snowflake (or BigQuery for GCP clients).

INSART’s engineers partition the warehouse into three logical zones:

Core schema — tables for unified entities (customers, transactions, sessions).
Analytics schema — derived aggregates like LTV, churn probability, engagement score.
Compliance schema — audit logs, data access history, and data retention policies.

Performance tuning involves clustering on high-selectivity fields (e.g., email, account_id), caching frequent joins, and using materialized dbt models for heavy aggregations.

Access control is enforced through Snowflake roles — analysts can query aggregated data, while sensitive fields (like emails or phone numbers) remain masked for privacy.

Dashboards and Data Delivery

INSART integrates Looker or Apache Superset directly with the warehouse.

These BI layers offer dynamic dashboards that visualize the entire customer journey:

Acquisition funnel: From first website visit (Segment) to converted user (Salesforce).
Revenue and retention: Stripe subscriptions, renewals, refunds.
Engagement patterns: Session frequency, app usage, and inactivity signals.
Marketing impact: HubSpot campaigns mapped to transactions and LTV.

Each dashboard is backed by parameterized SQL views, allowing stakeholders to filter by segment, geography, or customer type.

For marketing automation, INSART configures reverse ETL pipelines using tools like Hightouch or Census — automatically syncing enriched customer attributes back to Salesforce or HubSpot for hyper-personalized offers.

Example:

A user flagged as “High LTV + Inactive 30 days” triggers a reactivation email or app notification campaign.

Security and Compliance

Customer data, by definition, is sensitive. INSART embeds privacy controls at every layer.

Encryption: All data at rest in S3 or Snowflake is encrypted using AES-256; in transit via TLS 1.3.
Tokenization: Sensitive PII is tokenized with AWS KMS or Vault before leaving ingestion.
Access management: Row-level security (RLS) ensures users see only permitted data.
Auditing: Access logs are stored in a dedicated compliance_audit schema for SOC 2 reporting.
Retention: Data older than the retention threshold is archived to cold storage and anonymized.

This framework allows fintechs to remain compliant with GDPR, CCPA, and PCI DSS, even when data flows across multiple systems.

Advanced Analytics and Personalization

Once unified, the data becomes a playground for intelligence.

INSART often extends Customer 360° platforms with machine learning pipelines built on Databricks or SageMaker:

Churn prediction — logistic regression models trained on engagement and transaction data.
Customer segmentation — K-means clustering by spending behavior, product mix, and recency.
LTV modeling — gradient boosting to estimate future revenue.

Outputs from these models are written back into Snowflake, then surfaced via dashboards or reverse ETL to Salesforce — where sales teams can act instantly on insights.

Example:

A customer with low engagement but high historic LTV is automatically tagged for a “personal outreach” workflow in Salesforce.

Example Architecture (Text Representation)

+-------------------+         +-------------------+
|   Salesforce CRM  |         |    Stripe API     |
+-------------------+         +-------------------+
          |                            |
          | REST / Webhooks            |
          +------------+---------------+
                       |
                +------v------+
                | Airflow DAG |
                +-------------+
                       |
         +-------------v-------------+
         |  S3 / GCS  (Bronze Layer) |
         +-------------+-------------+
                       |
                +------v------+
                |     dbt     |
                | (Transform) |
                +------v------+
                       |
               +-------v--------+
               |  Snowflake DB  |
               | (Gold Layer)   |
               +-------v--------+
                       |
        +--------------v--------------+
        | Looker / Superset Dashboards|
        | Reverse ETL (Hightouch)     |
        +-----------------------------+

This architecture allows both business and technical teams to operate from a single, trusted dataset.

Measurable Impact

A unified Customer 360° system transforms fintech operations in measurable ways:

+25–40% improvement in campaign ROI due to precise targeting.
+20% increase in LTV through personalized retention strategies.
Single source of truth — eliminates manual data reconciliation across systems.
Automated reporting — saves analysts 10–15 hours weekly.
Regulator-ready compliance — all customer interactions auditable in one system.

The result: fintech companies can finally see their customers not as records in separate systems, but as living, evolving relationships.

Delivery Methodology

INSART executes such integrations following a disciplined DataOps approach:

Discovery Phase (2–3 weeks)
Mapping data sources, identifying entity overlaps, and defining KPIs.
Implementation Phase (6–8 weeks)
Setting up ingestion, modeling, and BI dashboards.
Validation & Optimization Phase (2 weeks)
Ensuring performance, data quality, and stakeholder adoption.

Continuous integration ensures new data sources can be added with minimal disruption.

Lessons Learned

Data consistency is a process, not a product. Continuous validation and lineage tracking are essential.
Schema changes happen weekly. Automating ingestion with schema registry prevents failures.
The customer is multidimensional. A financial customer’s story is as much about transactions as about behavior, sentiment, and engagement.
Analytics without trust is noise. Rigorous data quality enforcement turns analytics into decisions.

The INSART Advantage

INSART’s strength lies in the combination of fintech domain depth and data engineering maturity.

We understand that in financial services, customer data is not just a marketing asset — it’s a regulated, risk-bearing entity.

Our teams blend:

ETL craftsmanship (Airflow, dbt, Great Expectations)
Warehouse optimization (Snowflake, BigQuery)
Fintech understanding (KYC, AML, transaction modeling)
Operational discipline (DevSecOps, Terraform, monitoring)

This allows us to deliver data platforms that both empower growth and withstand regulatory scrutiny.

Conclusion

A true Customer 360° system doesn’t just integrate data — it integrates understanding.

It aligns teams around the same version of the truth, fuels AI-driven personalization, and builds the foundation of customer trust.

INSART’s approach makes that possible.

By merging data from Salesforce, Stripe, Segment, and HubSpot, orchestrating ETL with Airflow and dbt, and warehousing in Snowflake, we create systems that turn fintech data into foresight.

Because in modern finance, knowing your customer means more than verifying identity — it means understanding their entire journey.

VIDEOS