Book a Call
Book a Demo

Pasteur: Simple & Powerful No-Code Data Preparation

Have you ever tried to modify data? This task looks rather simple, but it contains a lot of error-prone, manual and repetitive work. Let's fix that.
Sinchan Banerjee
Sinchan Banerjee
Jul 30

Have you ever tried to modify data and found that it took more time than expected? This task looks rather simple, but it contains a lot of error-prone,  manual and repetitive work. 

Data preparation is often the most time consuming part of an analytical project.  You can do it manually, but that will be a slow and error prone process. Some teams will try to automate it, using Alteryx,  Excel macros or Python, but this takes a long time to set up and often has a big learning curve. Even a simple task like extracting a day of the week can take way longer than expected. 

The New York Times reports that “Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.”  There’s an increasing amount of data in more of our lives since then and the data wrangling problem hasn’t gotten better.

What types of data would you like to wrangle?

It could be the Excel spreadsheet that powers your weekly sales meeting if you’re managing a sales team.  Maybe it’s that enormous Google sheet filled with feedback from the survey you’ve been running over the past month. If you’re a data scientist, it’s the feature engineering needed for machine learning.

Whether you are doing this in Excel, Google Sheets or Python, data prep is a laborious task that needs to be faster, simpler, and more powerful.

Such repetitive tasks that contain certain rules are a perfect fit for automation. We wanted to provide the easiest and most convenient way to do this. At Intersect Labs, we sought out to make the world’s most advanced and easy to use data prep tool. We designed a flow that would let you transform data 11x faster, no matter how simple or complex the transformation is.

Introducing Pasteur: The No-Code Data Prep Tool You Deserve

Now you can create a sequence of modifications called a pipeline. You simply add the “Transform” block you want to add to your data pipeline from a gallery of transforms. Want to multiply two columns? Add the “Multiply Columns” block, done. Want to group by one column and sum the other columns’ values? Add the “Aggregation” transform block, done.

With over 50 data transform blocks available, you can build your complex data pipeline in a point and click fashion.

Let’s try a real data example.

Say we want to get insights from the data that we have on our customers' individual orders for 2020. We would like to quickly produce an overview of the performance for each month. So we’d like to:

  1. Group the orders by month;
  2. Get the number of sales and revenue per month.

Since this is a repetitive task for us, we want to set up a pipeline that we can use again in the future. Once we set one pipeline we can just upload the next dataset and out comes the results.

Loading your data in Pasteur

1) Set up an account on Pasteur if you haven't already

2) Create a new project

  • Go to Projects
  • Click on New Project 
  • Click on Select Data Set
  • Go to Upload CSV and select your CSV
  • Confirm columns and then press Confirm

You can also connect your data with Google Sheets, MongoDB or SQL databases.

Let's look at the data

We need the Order Date, Quantity and Item Cost to get the Monthly Sales number out.

Setting up the pipeline

Now we can start appending transforms to the pipeline. First, we want to multiply two columns, the “Quantity” and the “Item cost” column.  We add the ‘Multiply columns’ transform and select which columns to multiply and we put a new name for this column, “Total Cost”.

Add the first  transform block ‘Multiply columns’ to our pipeline, which produces a new column.

If we click on ‘Apply Transform’, and set the Data Preview to ‘After’, we can see that the transform has applied a new column to the data pipeline, our ‘Total Cost’.

Now we would like to group the data by month. We can use the Order Date column (which is in European format) for this, and luckily, Intersect Labs can handle this without a sweat. Add the ‘Extract from Date/Time column’ to the pipeline.

The Date/Time transform block automatically creates a column for the day, month and year out of the single column. We can even extract the name of the day or month.

We just added  3 columns for the year, month and day.

Lastly, let's group the orders on the month column.
This can be done with the “Aggregate Data” transform.  First, we select which column to group our orders by. In our case this is the “Month” column. Then, we need to select which column we would aggregate our data on, and how.  We want to add up all the “Total Cost” values together, thus we select sum. While we are at it, there are a bunch of aggregations to select, such as the mean, standard deviation (std), maximum, minimum, etc.  The mean gives us the average order size, which might be handy to include, so we select this one too.

Great! We now have our Total Sales per month and Average Sales per month out of our sales data spreadsheet. Let’s get the total dataset out and make a nice graph, get some coffee, and make our manager happy.

Download the transformed spreadsheet.

Let’s open and plot the data

That’s it. In no-time we have set up a data transformation pipeline using Intersect Labs, that can quickly get data insights for your sales meeting. The best thing is that we can reuse this pipeline every time we have a new list of orders. We can simply apply it again to a newly uploaded dataset, and get new insights in under 5 minutes. Of course, this was just an easy example. There are over 50 transform blocks available to do any data modification you need. 

Try out the Intersect Labs pipeline yourself using your own dataset.