We have a new name! Browserbear is now Roborabbit

How to Clean Scraped Data with Roborabbit

Scrubbing your scraped information, improves accuracy, streamlines downstream processes, and produces a high-quality data set. Here's how you can clean your extracted data with Roborabbit.
by Julianne Youngberg · · Updated

Contents

    Raw data often needs cleaning and formatting before it can be used effectively. Depending on where you source the information, your data might be excessive, incomplete, or poorly organized. That’s why when you’re extracting data online, transforming the output to better suit your needs is a typical part of processing.

    Cleaned and formatted data is important because it:

    • Improves accuracy
    • Prepares information for analysis
    • Integrates more seamlessly with other systems and tools
    • Looks more presentable and user-friendly

    When working with data extracted using Roborabbit, you may find that simple transformations can improve the quality of your output and prepare it for use. This guide will explain when and how you can use Roborabbit's built-in features to scrub your output feed.

    When to Clean Structured Data

    Scrubbing structured data can be largely automated at different stages of the process, whether it's within a database, through task automation, or at point of extraction. Here's a simplified overview of when you might consider each option:

    • Database : When complex data processing is required
    • Task automation tool : When a few tweaks are needed for a specific automated task
    • Web scraping tool : When standard output needs to be transformed slightly before being stored or used in a final product

    Knowing when to format extracted data is a matter of understanding your use case and the impact the decision makes on workflow efficiency. For example, performing too many transformations in a single task automation may be more costly and time-consuming than a nested database formula. Similarly, using a database transformation for a minor adjustment could require more storage space than handling the data at point of extraction. The best decision depends on the volume and specifics of your data cleaning requirements.

    How to Use Roborabbit to Scrub Data

    To clean data with Roborabbit, you have to create a custom feed that transforms your output based on your specifications, then delivers it along with the rest of your task run results so you can route it to the storage or task process of your choice.

    Step 1. Navigate to Feeds

    Let’s start from a task with output that needs cleaning.

    From the task page, scroll down to the Feeds which is listed under the Integrations section. Click Settings.

    Screenshot of Roborabbit task page with Feeds outlined in red

    Step 2. Create a Custom Feed

    Now, you should be on the Feeds page, where you can manage one or more output feeds from your task data.

    Click Create a New Feed.

    Screenshot of Roborabbit Feeds create a new feed

    Step 3. Add Fields to Builder

    You should now be on a page where you can set up your feed’s custom transformations and view the output.

    To apply transformations, you first need to add the fields that will be modified to the builder. Do this by specifying each field and choosing between text or number types. You can also add a target name if desired. Then, click Add Field to load it into the builder.

    Screenshot of Roborabbit feeds page with field builder outlined in red

    Alternatively, you can click Add All Fields to auto-add all of your standard output components to the builder.

    Screenshot of Roborabbit feed setup with red arrow pointing to add all fields

    Step 4. Apply Transformation(s)

    Now, you can make adjustments to your output. Click on the transformation counter next to the field you want to modify.

    Screenshot of Roborabbit feed setup page with transformations outlined in red

    Set up your transformation by choosing the transformation type.

    Screenshot of Roborabbit feed transformation setup outlined in red

    Then, fill out any specifications that might be required. Click Add Transformation.

    Screenshot of Roborabbit feed transformation setup with step outlined in red

    You can stack multiple transformations until the custom output is exactly what you need.

    Hare Hint 🐰: Learn all about Roborabbit’s data transformation types and examples of when to use them in the next section of this guide!

    Step 5. Check Output

    It’s important to make sure all of your structured data is being processed correctly, so inspect your output by returning to the feed page and viewing the custom output section at the bottom.

    Screenshot of Roborabbit custom feed setup page with output outlined in red

    Another way to access this data is from the task’s Feeds page, where you will find your Field URLs.

    Screenshot of Roborabbit feeds page with feed url outlined in red

    Clicking this will lead to a JSON array of your custom output. This is also accessible via the Roborabbit API, allowing you to retrieve your cleaned feed instead of the raw extracted data.

    Screenshot of custom feed output as json array

    Roborabbit Data Transformation Types

    Roborabbit currently supports 16 different data transformation types, as follows:

    Append String to Scraped Data

    The Append transformation option adds something to the end of your data, such as a description or reference number.

    Screenshot of Roborabbit custom feed append transformation

    Convert Integer to Float

    The Convert to Float transformation option converts an integer to a float, which can be helpful when you’re working with monetary amounts or wanting to maintain a consistent number of characters.

    Convert Float to Integer

    The Convert to Integer transformation option converts a float to an integer, making for cleaner output if decimals aren’t important.

    Screenshot of Roborabbit custom feed convert to integer transformation

    Prepend String to Scraped Data

    The Prepend transformation option adds something to the beginning of your data, such as a reference number or currency symbol.

    Screenshot of Roborabbit custom feed prepend transformation

    Find and Replace Text in Scraped Data

    The Find and Replace transformation option locates and replaces all mentions of a string into something else, helping to standardize formatting or remove unwanted terms.

    Screenshot of Roborabbit custom feed find and replace transformation

    Find Email in Scraped Data

    The Find Email transformation type locates strings following typical email formats, making it easy to extract contact information and compile mailing lists.

    Screenshot of Roborabbit custom feed find email transformation

    Find Phone Number in Scraped Data

    The Find Phone Number transformation type locates strings following typical phone number formats, assisting you in extracting contact information.

    Screenshot of Roborabbit custom feed find phone number transformation

    Transform Scraped Data into Lowercase

    The Lowercase transformation type transforms all text into lowercase letters, ensuring consistent formatting.

    Screenshot of Roborabbit custom feed lowercase transformation

    Transform Scraped Data into Uppercase

    The Uppercase transformation type transforms all text into uppercase letters, ensuring consistent formatting.

    Screenshot of Roborabbit custom feed uppercase transformation

    Transform Scraped Data into Titlecase

    The Titlecase transformation type transforms all text into title case, which can clear up inconsistencies and prepare headers.

    Screenshot of Roborabbit custom feed titlecase transformation

    Split Scraped Data String By Separator

    The Split transformation type splits text using a separator, then returns the specified string such as a first name or a last name.

    Screenshot of Roborabbit custom feed split by transformation

    Strip White Space from Scraped Data

    The Strip transformation type removes white spaces from the beginning and ending of a string, cleaning it up for use.

    Truncate Scraped Data

    The Truncate transformation type shortens text to be within a certain character limit, which can keep product titles or descriptions within display constraints or trim lengthy URLs.

    Screenshot of Roborabbit custom feed truncate transformation

    Remove All Spaces from Scraped Data

    The Remove All Spaces transformation type removes all spaces from the string, which can be helpful when forming values like SKUs or unique identifiers that aren’t meant to have any whitespace.

    Screenshot of Roborabbit custom feed remove all spaces transformation

    Remove URL from Scraped Data

    The Remove URL transformation type removes any URL strings within your data, which can clean up extraneous links and keep core content clean and structured.

    Split Scraped Data String Into Array

    The Split Into Array transformation type splits a string into items that make up an array, such as lists, comma-separated values, or names.

    Screenshot of Roborabbit custom feed split into array transformation

    Cheat Sheet

    Command Description
    Append Add something to the end of your data
    Convert to Float Converts an integer to a float
    Convert to Integer Converts a float to an integer
    Prepend Add something to the beginning of your data
    Find and Replace Find a string and replace it with something else
    Find Email Filters data matching typical email format
    Find Phone Number Filters data matching typical phone number format
    Lowercase Transform all text to lowercase
    Uppercase Transform all text to uppercase
    Titlecase Transform the first letter of every word to uppercase
    Split By Use a separator to split some text
    Strip Removes white spaces from the beginning and end of text
    Truncate Shortens the text to a specific number of characters
    Remove All Spaces Remove all spaces from your data
    Remove URL Removes URLs from your data
    Split Into Array Splits your data into an array using a separator

    Conclusion

    Properly cleaning and formatting your web-scraped data is a critical step in the data extraction process. By leveraging Roborabbit's built-in data transformation features, you can take messy, unstructured output and transform it into clean, usable data that is ready for analysis, integration, and presentation.

    From removing unwanted URLs and standardizing text formatting, Roborabbit has the tools to handle basic data cleaning requirements. By taking the time to properly scrub your scraped information, you'll improve accuracy, streamline downstream processes, and end up with a high-quality data set.

    About the authorJulianne Youngberg@paradoxicaljul
    Julianne is a technical content specialist fascinated with digital tools and how they can optimize our lives. She enjoys bridging product-user gaps using the power of words.

    Automate & Scale
    Your Web Scraping

    Roborabbit helps you get the data you need to run your business, with our nocode task builder and integrations

    How to Clean Scraped Data with Roborabbit
    How to Clean Scraped Data with Roborabbit