Robo-advisors’ activity strongly depends on financial market data, which they need to process and analyze in order to suggest the best portfolio management approach for their clients. This data includes end-of-day prices, bid and ask quotes, split ratios, account data, etc. The efficiency of a particular robo-advisor depends on the number of securities covered (scope) and the historical depth (number of years) of data processed. The volume of data that needs to be stored thereby grows significantly.

Financial market data may be received from multiple sources, such as data aggregators and vendors. A number of companies provide market data and account data aggregation, and each of them support particular data formats. The following table contains a list of nine popular financial data providers, and information about the data types and formats that they provide.

Data Provider Data Formats Supported Data Types Provided


API Real-time, End-of-day, Historical data
Commodity Systems Inc.


ASCII, CSI, CSV, CSIM,  MetaStock, Excel


End-of-day, Historical data
CQG Inc.


ASCII, Excel Real-time, Historical data
Marketsource Online


ASCII End-of-day, Historical data


JSON; SFTP batch files

RESTful API; webhooks for real-time visibility into end user activity

Real-time, Historical data




Real-time, Historical data




Real-time, Historical data




End-of-day, Real-time data


ASCII, EzyCharts, MetaStock, Quicken, CSV End-of-day, Historical data

The most common formats are ASCII, CSV, MetaStock, and Excel. However, data may be delivered in other formats as well. Most data vendors provide APIs that give access to the market data. SOAP and REST protocols are generally used.

Thus, robo-advisors receive vast amounts of data of various formats. All this data needs to be collected, stored, transformed, governed, integrated, analyzed, and visualized.

To deal with it all, the Extract, Transform, Load (ETL) process is used.

What is ETL and why does a WealthTech project need it?

ETL is the process of loading data from the source systems to the data warehouse. The steps are as follows:

  • Extract
    The data is extracted from the data vendors and made accessible for processing in the current wealth-management project (e.g., a robo-advisor). An incremental extract (only modified and new records are extracted) or full extract may be performed. Because the volumes of wealth-management data can be immense, the incremental extract is prefered. The full extract may affect performance, response time, and other parameters of both the data vendor and the robo-advisor.
  • Transform
    To ensure the quality of the data in the data warehouse, the extracted data requires cleansing according to the following unification rules:

    1. Make identifiers unique;
    2. Convert data to a standardized format;
    3. Validate fields against each other.

    Data transformation also includes joining and deduplicating data from several market data providers; sorting; deriving new calculated values; etc.

  • Load
    The load step ensures that the data is loaded into the database, and that this process is performed correctly. In wealth-management projects, new data does not generally overwrite existing data; instead, new data is added in a historical form at regular intervals—on an hourly or daily basis.

Given the various formats of financial market data, the ETL process enables the transformation and merging of data received from different sources, and the easier integration with requirements of algorithms and logics of a robo-advisor platform.

To use as few resources as possible, the ETL process ensures that there is no data redundancy, and significant focus within the process is on data quality and data monitoring.

How can we provide ETL for WealthTech?

Due to the high volumes of data involved, the ETL process may take certain amount of time and cause delays before the data can be leveraged by a WealthTech platform. Thus, the data value is limited, as it is not usable for some time. However, Big Data solutions, such as Hadoop and Spark, enable us to change this situation in terms of aggregating, transforming, sorting, and analyzing data inside the frameworks.

Let’s take a look at how this works:

  • Real-time financial data is aggregated by multiple data vendors in different formats.
  • This data is extracted in real time by a robo-advisor.
  • The data is aggregated and transformed in real time inside a Big Data repository (e.g., Hadoop or Spark).
  • The financial market transformations are processed by a robo-advisor platform to perform efficient customer-portfolio management—that is, to make decisions about asset allocation, diversification, or rebalancing.
  • As the robo-advisor platform performs certain actions, such as selling or purchasing assets, the outcomes of these actions, among others, are captured by data aggregators.

ETL in a WealthTech platfrom

Because of the enormous volumes of financial market data and the need to manage data movement, transformation, and analysis, Hadoop and Spark bring a tremendous advantage for wealth-management platforms.

However, both these data frameworks have advantages and disadvantages.

Apache Hadoop


  • Hadoop can run on commodity hardware or in the cloud.
  • Hadoop has extensive functionality, including deriving data from any format and converting it into a single format; data warehousing; fraud detection; etc.
  • Hadoop has means to store large volumes of raw data without increasing expenses. Thus, if a robo-advisor changes its logics in the future, it can refer to the raw data.
  • Hadoop provides data processing and uses tools located on same servers as the data. Thus, processing operations are carried out faster.
  • By duplicating data and creating multiple copies, Hadoop protects against data failure and loss.
  • Clustering allows for fault tolerance.
  • Hadoop is scalable and gives the platform flexibility as the data lake grows.


  • In Hadoop, the security measures are disabled by default. To secure sensitive data, specific measures should be taken.
  • Hadoop runs more slowly that Spark because Spark works in-memory.
  • Hadoop requires highly skilled specialists to operate it.

Apache Spark


  • Spark is an in-memory platform, so it executes data processing more than 10 times faster than Hadoop does.
  • Fault tolerance is achieved using data replication to several nodes.
  • Spark supports many languages, such as Java, Scala, Python, etc.
  • Spark is highly configurable—it can thus be configured to work in the best way for each particular use case.
  • Spark supports graph execution, machine learning, and streaming functionalities.


  • Spark takes up significant system resources.
  • Spark has no file management system, so it needs to rely on other platforms.
  • If not configured correctly, Spark may have performance issues in a production environment.

Robo-advisors are platforms with rapidly growing data pools. To gather financial market data from multiple sources, store the data in reliable and secure way, and enable further processing for the platform’s purposes, Big Data frameworks appear to be the best solution. Hadoop and Spark are extremely popular; however, in order to make the best of them, you need skilled IT developers with experience in working with particular frameworks. While Hadoop is used for its ability to store increasingly large amounts of data, Spark is also becoming popular due to its compatibility with machine learning.

As open-source frameworks, both Hadoop and Spark have been widely implemented. Nevertheless, today there are several alternatives, and you may find a framework that suits your platform better than those mentioned here.

Integration Challenges in Fintech: ETL Processing

Integration Challenges in Fintech: ETL Processing

For technology platforms, data is the lifeblood. Let me show you what challenges you might encounter when integrating data streams from financial institutions and taking control of the process.

How to Avoid Overspending on Amazon Cloud

How to Avoid Overspending on Amazon Cloud

In this modern world of digital transformation—especially in fintech, where financial institutions are forced to adopt cloud for the sake of saving competitiveness with start-ups—overspending cloud budgets seem shocking. However,…

Problems that Waylay FinTech Platform Integration With Salesforce

Problems that Waylay FinTech Platform Integration With Salesforce

CRMs enable FinTechs to systematize client information and allow clients to apply proper communication strategies. Salesforce is one of the most popular options out there. In the article, we go…

When Salesforce Integration Becomes a Problem

When Salesforce Integration Becomes a Problem

B2B WealthTech software providers integrate with Salesforce to make their own platforms more attractive for existing and potential users. In today's article, we look through pitfalls in the Salesforce integration…

Pitfalls of Integrating WealthTech Platforms

Pitfalls of Integrating WealthTech Platforms

The industry requires all FinTech players to be flexible, integrable, and snap-in. In this article, we pay attention to integrating platforms through API because it’s the most widespread type among…

Databases Used by Robo-Advisors (Infographics)

Databases Used by Robo-Advisors (Infographics)

Robo-advisors’ functioning is based on algorithms that allow platforms to suggest investment strategies based on large amounts of data which is received from different sources. Thus, robo-advisors use various data…

Technologies Used by Robo-Advisors for Frontends (Infographics)

Technologies Used by Robo-Advisors for Frontends (Infographics)

To create frontends for robo-advisors, generally, frameworks, such as AngularJS and React, are used. Unlike most frontend frameworks, PHP performs server-side operations in addition to the client-side script. This is…

Technologies Used by Robo-Advisors for Backends (Infographics)

Technologies Used by Robo-Advisors for Backends (Infographics)

The efficiency of robo-advisors generally depends on algorithms selected. However, technology stack used for the software platforms has a significant impact on satisfying clients’ expectations such as the system performance,…