Data As A Product (DAAP) Services

A Data-as-a-Product (DaaP) service provides the end data that can be directly consumed by applications. DaaP is defined on a new data consumption model, which separates the consumption of data from the raw data, and thus enable cloud computing for big data applications.

ServiceImg
  • Workflows for Tailoring Data Refine Process We provide various data refine (e.g., data cleaning, data summary/compression) algorithms within cloud datacenters, and these algorithms only can be completed by employing abundant computing and storage resources. Users can select different algorithms for different purposes in a data refine workflow.
  • Methods for Consuming Data Products Data product is a higher level of data format than raw data but with smaller size. Existing data process methods should be adapt to a new format of input data rather than raw data. For example, the parameters of a traditional data mining algorithm should have different meanings and should be studied again.
  • Big Data Sharing The massive raw data, once be settled, can be shared without any remote transmitting, while data products can be transported remotely from one site to another site to satisfy various applications.
  • DaaP Applications An application is implemented by data refine modules in Cloud datacentre and application-oriented data process modules in users' server.
The DaaP data consumption model can resolve the big/massive data problem and provide many advantages.

First, any two different applications use the massive raw data from different aspects, thus they need not the whole original data. The DaaP interface just provides different applications with different data products. Data products are more portable than massive raw data and thus can flexibly satisfy remote usages.

Also, data refine modules will be designed as basic modules shared on the cloud datacenters, and thus applications can focus on designing of various application-oriented data process modules to consume data products for their purposes.

Meanwhile, the best algorithms can be reused and shared on the Internet, while small sized data product reduces the communication costs and promote the prosperity of workflows and distributed computing.