DataPipeline Announces 21 Million Yuan Series A Round, FreeS Fund Continues to Follow On | FreeS Fund Investment News

峰瑞资本·April 10, 2018·12·1

Break down data silos.

DataPipeline recently announced a RMB 21 million Series A round led by Matrix Partners China, with FreeS Fund, the investor from its angel round, following on.

Since its founding two years ago, DataPipeline has focused on providing customers with a one-stop platform and solution for data and application integration. It has already served several large enterprise clients, including Fortune 500 companies, across industries such as retail, manufacturing, banking, energy, and internet. Additionally, DataPipeline has established strategic partnerships with dozens of upstream and downstream industry partners.

What the Investors Say

Yizhou Zhu, Vice President, FreeS Fund

Email: zyz@freesvc.com

We have observed that digitalization and intelligence are driving and leading development and innovation across vertical industries, such as retail and advanced manufacturing. Traditional data integration services fall short in architectural flexibility and multi-heterogeneous data source integration, failing to meet enterprise demands. DataPipeline is focused on this space, growing rapidly, and has the potential to become a leading new-generation data integration service provider.

DataPipeline has been dedicated to providing customers with a one-stop platform and solution for data and application integration, helping enterprises connect various structured, semi-structured, and unstructured data in the cloud — including cloud databases, microservices clusters, SaaS, and industrial applications — bridging internal data silos and enabling customers to more precisely drive business and operational decisions through data analytics.

According to Forrester research, the global integration market reached $32 billion in 2017. Competition among foreign vendors is intensifying. Following its successful IPO in 2017, MuleSoft was acquired by Salesforce in March 2018 at a $6.5 billion premium. SnapLogic's CEO also expressed IPO intentions in 2017.

If we compare communication between enterprises to a web, each enterprise is a node on that web. In the China market, current integration projects mainly address single-point problems — connecting all internal data silos within an enterprise. Therefore, enterprise customers are eager to find an integration product that is born for the cloud, more agile in connectivity, more reliable in performance, and more powerful in real-time capabilities.

Unlike traditional data integration solutions, DataPipeline has conducted thorough research and optimization from the outset in product architecture, solution design, and user experience, making its products and solutions better aligned with the current needs of Chinese enterprise customers.

First is large data volume. The explosive year-over-year growth of enterprise big data poses severe challenges to traditional integration systems. Traditional ETL tools often fail high-concurrency performance tests or lack scalability, do not natively support distributed architecture, and cannot provide both real-time and batch processing options. DataPipeline's product architecture was designed from the beginning with ultra-large data volume synchronization in mind, offering high concurrency and scalability orders of magnitude beyond traditional ETL tools. It can support customers in stably parallel-transmitting thousands of tables and hundreds of gigabytes of incremental data daily, with cumulative data transmission exceeding dozens of terabytes.

Second is real-time capability. As the variety of heterogeneous data sources and destinations grows, using traditional ETL tools or writing custom scripts involves high complexity and maintenance costs. Data typically goes through processes such as model design, code writing, and testing before deployment, resulting in excessively long cycles that easily block downstream data application development. DataPipeline supports automated data exchange across multiple heterogeneous data sources and destinations, currently supporting over 20 mainstream sources and destinations. By parsing database replication logs to capture changes in data and data definitions, it enables data synchronization tasks to be real-time and self-adaptive.

Third is data quality. With traditional integration solutions, enterprise customers often cannot promptly control data quality after synchronization is complete. They must spend considerable time tracing upstream data issues from downstream data applications, lacking data quality alerts and corresponding remediation measures. DataPipeline provides end-to-end data quality monitoring for customers, including data status monitoring, alert queue management, and multi-dimensional data quality detection without requiring predefined human rules — so customers need not worry about frequent error states affecting data quality.

Fourth is agility and ease of use. Currently, the fixed transformations in traditional ETL solutions have become a constraint rather than an advantage. ETL jobs are difficult to maintain and reuse, greatly reducing flexibility. Yet enterprise demands for business applications and data applications are changing rapidly, with data usage evolving from fixed data warehouse modeling toward exploratory data applications and AI applications. DataPipeline provides moderate data cleansing capabilities, using built-in cleansing functions and cleansing APIs to build a flexible framework that supports customers in more agilely and freely performing data processing, analysis, and visualization.

Currently, DataPipeline mainly provides data synchronization, data cleansing, data task management, error queue management, operations management, and user management functions. To lower the barrier for engineers, DataPipeline adopts a visual configuration interface that allows creating data synchronization tasks in five minutes without any code.

In terms of deployment models, DataPipeline supports hybrid cloud, multi-cloud, and on-premise private deployment options. To make private deployment more cost-effective and efficient, DataPipeline uses advanced container technology. To enhance security in non-private environments, DataPipeline encrypts hybrid cloud and multi-cloud deployment models. For pricing, DataPipeline charges an annual fee based on the servers occupied by the customer's system.

For enterprise IT managers and engineers, DataPipeline can significantly improve work efficiency. On one hand, it can fully liberate engineers from labor-intensive tasks, allowing them to focus on data value extraction rather than getting bogged down in data connectivity issues, and devote more energy to meeting business demands. On the other hand, DataPipeline helps IT managers monitor data task dynamics, error queue management, data resource mining, and data asset management in real time and with precision.

In 2017, DataPipeline assembled professionals from well-known companies including Google, Yelp, Amazon, Oracle, Chinese Academy of Sciences, Huawei, Informatica, and Talend. They bring years of deep experience in R&D, product, and project management in the data industry, along with industry influence, and possess a profound understanding of enterprise customer needs and pain points. Moreover, DataPipeline has R&D centers in Beijing and Nanjing, enabling more timely response to customer needs nationwide.

With the Series A funding, DataPipeline will focus on strengthening its product, R&D, and marketing teams. It aims to enhance customer satisfaction through greater depth and breadth in product R&D, build a more mature and efficient pre-sales and sales team, continue increasing market expansion efforts, and further strengthen close cooperation with upstream and downstream industry partners.