Data As A Product (DAAP) Services
A Data-as-a-Product (DaaP) service provides the end data that can be directly consumed by applications.
DaaP is defined on a new data consumption model, which separates the consumption of data from the raw data,
and thus enable cloud computing for big data applications.
-
Workflows for Tailoring Data Refine Process
We provide various data refine (e.g., data cleaning, data summary/compression) algorithms within cloud datacenters,
and these algorithms only can be completed by employing abundant computing and storage resources. Users can select
different algorithms for different purposes in a data refine workflow.
-
Methods for Consuming Data Products
Data product is a higher level of data format than raw data but with smaller size.
Existing data process methods should be adapt to a new format of input data rather than raw data.
For example, the parameters of a traditional data mining algorithm should have different meanings
and should be studied again.
-
Big Data Sharing
The massive raw data, once be settled, can be shared without any remote transmitting,
while data products can be transported remotely from one site to another site to satisfy various
applications.
-
DaaP Applications
An application is implemented by data refine modules in Cloud datacentre and
application-oriented data process modules in users' server.
The DaaP data consumption model can resolve the big/massive data problem and provide many advantages.
First, any two different applications use the massive raw data from different aspects,
thus they need not the whole original data. The DaaP interface just provides different applications
with different data products. Data products are more portable than massive raw data and
thus can flexibly satisfy remote usages.
Also, data refine modules will be designed as basic modules shared on the cloud datacenters,
and thus applications can focus on designing of various application-oriented data process modules to
consume data products for their purposes.
Meanwhile, the best algorithms can be reused and shared on the Internet, while small sized data product
reduces the communication costs and promote the prosperity of workflows and distributed computing.