Welcome to the CloverDX Walk-through
In this guide we will learn how to build, deploy, run and monitor a complete automated data processing pipeline using CloverDX. Note: This guide uses the terms “CloverETL” and “CloverDX” interchangeably, both refer to (different versions of) the same product.
This trail demonstrates:
- Reading a simple CSV file (Lesson 1)
- Filtering and removal of unwanted records (Lesson 2)
- Loading records into Microsoft SQL Server relational database (Lesson 3)
- Transforming different input data formats into a single consistent output format (Lesson 4)
- Detecting the arrival of new files to be processed (workflow, Lesson 5)
- Setting up additional workflow steps (archive input files, maintain a log file, Lesson 6)
Automated Data Pipeline
Automated Data Pipeline is about removing the laborious manual processes involved in any stage of data processing. Data integration is not only about connecting data sources with data targets; it’s also about automating the process – triggering based on events, scheduling, monitoring, trouble-shooting and running workflows that handle errors, cleanup, file handling, etc.
Reading Data From a File
First of all, let's get familiar with CloverDX Designer environment and read a single file.
Filtering and Writing Data
As a second step, we will filter the input data and write it to a file.
Cleaning Up Data
Here, we will unify different input formats, start reading multiple input files and keep a log of records we choose to ignore.
Writing to a Relational Database
Then, we’ll write the transactions into a database, mapping field names and performing type conversions as necessary.
Deploying to CloverDX Server
Now, we’ll deploy our project into a production environment and schedule its execution, moving it from CloverDX Designer to CloverDX Server. This is the beginning stage of automating our data pipeline.
Orchestration with Jobflows
Finally, we’ll fully automate the execution of the data pipeline in CloverDX Server. We will watch for the arrival of new input files, process them using our graph, back up the input files and log the entire process.