A Nocoder’s Guide to Formatting Scraped Data in Browser Automation
Contents
Data storage and manipulation are essential for browser automation. Whether you’re auto-filling forms or extracting information from sites, it’s safe to assume you’ll be moving data from one place to another.
Proper data formatting is crucial to all types of browser automations, but particularly for web scraping. To ensure the quality of your final result, it’s key to know when and where to process scraped data. But you might find it tricky because of the unique nature and sheer possibilities of nocode workflows.
The truth is: there’s no one right way to do it. But we’re gathered here today to talk about why it’s so important and what you should consider when formatting data in your own workflows.
Why is Data Formatting Important in Browser Automation?
Formatting can often become an afterthought in automation. After all, isn’t it enough for the final result to be presentable?
Even use cases involving structured data scraping—which centers on accurate extraction and proper processing for use—can overlook formatting. But there are several reasons to give formatting the consideration it deserves, especially if you’re working with nocode tools. Let’s go over a few:
- Data Accuracy : Properly formatted data ensures that any information collected through browser automation is accurate and reliable. Inaccurate or incorrectly formatted data can lead to misleading results.
- Data Analysis : Well-formatted data is easier to analyze and interpret. When extracted data is standardized, it becomes simpler to identify patterns, trends, and correlations.
- Data Integration : Clean data is more compatible with other systems and tools. Whether you need to import the data into a spreadsheet, a database, or another software application, properly formatted data can be easily integrated without needing tedious manual adjustments.
- Data Presentation : Formatting data correctly makes it more presentable and user-friendly. Clean and consistent formatting enhances readability, making it easier for others to understand and use the collected data.
By prioritizing proper data formatting in browser automation, you can improve the quality, reliability, and usefulness of what you collect.
How to Format Scraped Data with Nocode Tools
Automating a process without code often involves integrating the capabilities of several different apps. As data moves from one step to the next, it can change forms to suit your needs.
Scraped data often requires tweaking before use, as it usually does not meet your needs as is. Information extracted from online sites can—depending on how it is structured—be excessive, incomplete, or poorly organized.
When data scraping workflows include multiple tools, carefully laying out your tools and choosing when to process data results in cleaner, more accurate output. Most use cases will be unique and need to be adapted to your needs—but you may still benefit from the tips and guiding principles we’ll go through in this article.
Formatting Data with Spreadsheets and Databases
Spreadsheets and databases are built with data manipulation in mind. That means the possibilities of tweaking, transforming, and organizing your information are near endless.
Some nocode tools you may consider for this are:
- Airtable
- Baserow
- Google Sheets
- Microsoft Excel
- Notion
The biggest downside to formatting data with spreadsheets or databases is that the learning curve is often quite steep. Complex formulas can certainly be a language of their own! Syntax also varies from app to app, making it difficult to switch from one to another.
Even so, spreadsheets and databases are the most powerful nocode solutions to data transformation by far. When you’re fairly proficient with using one, you’ll find you can transform your data in a myriad of ways.
Related reading : Get a better look at formulas in action by reading 10 Airtable Formulas All No-coders Should Know!
Let’s take a look at some examples of formatting data using a spreadsheet or database.
Example 1. Extracting Job Title from String in Airtable
In this example, we will use data extracted with Browserbear and added to Airtable without any changes. Our goal is to separate the first part of the text from the second part following the hyphen.
Many databases (including Airtable) don’t have a split command that automatically isolates parts of a string based on a separator. This means you’ll need to work around it using formulas.
We’ve used REGEX in this example, but there is more than one solution for each use case.
Using the formula REGEX_EXTRACT({Job Title},"^([^-]*) - (.*)")
, we were able to extract the Job Title: Full Stack Engineer
from the string: Full Stack Engineer - Remote, Europe
.
Example 2. Changing Capitalization Format in Airtable
Using the same set of scraped data, let’s use Airtable to make a string of text entirely uppercase.
The app has a formula specifically for this modification: UPPER
.
That means the formula is relatively simple: UPPER({Company})
. Using this, we are able to produce a string in all uppercase (STACKER
) from a string in title case (Stacker
).
Formatting Data with Zapier
Zapier allows you to integrate applications to build larger, more comprehensive workflows that might not be possible with a single app. But because not all apps process information in the same way, adjustments often need to be made before data is sent to another program.
Zapier supports four different types of data manipulation:
- Numbers (do math operations, reformat currencies, and more)
- Text (find & replace, capitalize, remove HTML, and more)
- Date / Time (change formatting or add/subtract time and more)
- Utilities (perform actions like “choose value from list” or “look up in table” and more)
Not all use cases are suitable for Zapier formatting because data transformation types are limited. The learning curve can also be steep, and too many actions can make your zap long and cumbersome.
Since transformation takes place within your zap, it streamlines integration into your automations. It's helpful for minor and uncomplicated adjustments. Custom JavaScript and Python code can be included for more control, should you need it.
Related reading : Want more ideas on manipulating data with Zapier? Read 5 Ways to Transform Your Automation Data with Zapier Formatter!
Let’s take a look at some examples of formatting data using Zapier.
Example 1. Splitting Text in Zapier
In this example, we continue using the dataset extracted with Browserbear, adding it to Zapier without any changes. Our goal is to separate the first part of the text from the second part enclosed in parentheses.
Zapier has a text transform action that splits strings based on an indicated separator. It then returns your preferred segment.
Using the string Remote Full Stack Engineer (Remote, Europe)
as the input, a separator of (
, and the first segment index, we produced the following output: Remote Full Stack Engineer
.
Example 2. Changing Capitalization Format in Zapier
Now, let’s use the same set of scraped data to make the string containing the company name entirely uppercase.
Zapier has a text transform action that capitalizes every character in the text. The input value, Stacker
, only has to be mapped to the setup field.
Formatting Data with Browserbear
Browserbear automates repetitive browser tasks like web scraping, site testing, and more. Web scraping is an especially strong use case because you can set up a data extraction task, create a JSON feed, then set up scrubbing transformations to produce exactly the output you need.
Transformations you can apply to your data include:
- Append
- Find and Replace
- Find Email
- Find Phone Number
- Lowercase
- Prepend
- Split By
- Strip
- Titlecase
- Truncate
- Uppercase
The limited transformation options mean that Browserbear might not be suitable for advanced data scrubbing needs. There might also be a learning curve to using the integrated feed features. When you have a grasp on setting them up, though, they come together quickly, allowing you to easily route your data to other apps.
Since Browserbear’s data transformation feature allows you to produce output that’s already formatted the way you want, it reduces the amount of manipulation needed in other steps of your workflows. Even if your process includes spreadsheets and zaps, you won’t need to add excessive formulas and actions to these steps—your data is tweaked from the moment it’s extracted online.
Bear Tip 🐻: To scrub / clean your JSON data feed in Browserbear, start by creating a data feed. You can then use the field builder to set up one or more transformations to produce custom output!
Let’s take a look at some examples of formatting data using Browserbear.
Example 1. Splitting Text in Browserbear
In this example, we’re scraping data using Browserbear and transforming it immediately within the app, so it does not need to be transferred to any other programs. The goal remains the same: separate the first part of the text from the second enclosed in parentheses.
When you create a custom data feed from task output, you can apply transformations to the fields. Here we can see the raw scraped data:
Now, we choose Job Title as the field that needs to be modified. Setting up a transformation is just a matter of selecting Split By
as the transformation type, adding the separator (
, and selecting the segment index First
.
Adding the transformation will yield the custom output we want ( Remote Full Stack Engineer
), which can then be viewed on the data feed’s page.
Example 2. Changing Capitalization Format in Browserbear
We’ll use the same set of data to make the string containing the company name entirely uppercase.
The data feed’s page shows the raw scraped data, and clicking the transformations button corresponding to the Company field will lead us to the setup page. We’ll set up a transformation by selecting Uppercase
as the transformation type.
Returning to the data feed’s page, we can see the output STACKER
in the custom output field.
Choose the Best Formatting Tool for Your Needs
Is there one “right” answer to how to format data? We don’t think so.
Using a spreadsheet or database like Airtable for formatting is best for complex use cases where significant data transformation is required. It’s also fantastic if you already have a database in the workflow.
Zapier’s formatting features make it a great option if you just have a few small tweaks to make before data is sent to the next app. But if there are too many transformations needed, your zap will quickly become long and cumbersome.
Browserbear’s built-in data massaging features make it best for web scraping use cases where you need to transform standard output slightly before using it in your final product. You can stack multiple transformations on top of each other until the custom output is exactly what you need.
There are so many options for formatting data without using any code. You can choose to transform information using different apps at different steps of the process. It all depends on your needs and what will help you reach your goal most effectively.