How to Clean Data in Excel?

How to clean data in

Clean data in Excel is a very important step before performing analysis on that data. It helps get rid of unnecessary and junk characters and helps maintain uniformity.

Also read: The Ultimate Guide to Sorting Data in Excel?

Different Ways to Clean Data in Excel

In this article, we will learn how to clean data using the following methods.

1. Remove extra spaces in Excel

The TRIM function in Excel removes spaces from the start and end of the text. It also removes extra spaces in between words except for single spaces. Syntax: TRIM(text), where text is the text or the cell reference to the cell containing the text from which you want spaces to be removed.

Let us consider two text examples, the first text contains extra spaces between the words and the second text has leading and trailing spaces.

extra spaces

Follow these steps to remove extra spaces:

  • Select the cell where you want to display the clean text.
  • Type =TRIM(, select the cell containing the text and complete the formula with ).
first trim
  • Press the Enter key to display the result.
first trim result
  • Copy the formula for the entire list by dragging down the fill handle.
trim entire list

2. Handle blank cells in Excel

Blank cells in a data set may result in error messages while using in-built functions in Excel.

Let us consider a data set of students with their marks, with marks of some students not available as they were absent. To handle such cases we may need to fill those blank spaces with the word Absent.

student list with marks

Follow these steps to find and fill the blank cells:

  • Select the entire list which contains blank cells.
  • Press the F5 key on the keyboard to launch the Go To dialog box.
go to dialog
  • Click on the Special… button at the bottom left to launch the Go To Special dialog box.
  • In the Go To Special dialog box, check the Blanks radio button and click on OK.
go to special dialog box 1
  • All the blank cells are now selected, type Absent and press CTRL+ENTER to fill all blank cells with Absent.
student absent

Note: If you simply press the Enter key after typing Absent, the value is inserted only in the current cell.

3. Change text case in Excel

Follow these steps to properly format names in Excel:

  • Select the cell where you want to display the properly formatted names.
  • Type =PROPER(, select the cell containing the text you want to format and complete the formula with ).
first proper formula
  • Press the Enter key to display the result.
first proper result
  • Copy the formula for the entire list by dragging down the fill handle.
proper result entire list

Improperly formatted emails are difficult to read, having all the letters in lower case make it easier to read. Follow these steps to properly format email addresses:

  • Select the cell where you want to display the properly formatted emails.
  • Type =LOWER(, select the cell containing the text you want to format and complete the formula with ).
email format formula
  • Press the Enter key to display the result.
email format

Some words need to be written in all capital letters so that more emphasis is given to them. Follow these steps to properly format company names:

  • Select the cell where you want to display the properly formatted company names.
  • Type =UPPER(, select the cell containing the text you want to format and complete the formula with ).
upper case formula
  • Press the Enter key to display the result.
upper case result

4. Remove duplicates in Excel

Duplicate data may result in redundancy, cause calculation errors, and large file sizes. Follow these steps to effectively remove duplicates in Excel:

  • Select the cells you want to check for duplicate values.
selection for duplicates 1 1
  • Click on Remove Duplicates in the Data Tools group on the Data tab.
remove duplicates data tools 1 1
  • Select the columns from which you wish to remove duplicates from the Remove Duplicates dialog box that appears and click OK.
remove dupolicates dialog box 1
  • A popup appears that tells us how many duplicate values were found and removed, and also the number of unique values that remain. Click OK.
warning message 1

The final result looks like this:

duplicates removed 1

5. Split Data into Columns in Excel

We can parse people’s names into first names and last names by using Text to Column functionality in Excel. Follow these steps to split data into columns:

  • Select the cell containing the text you want to parse.
names to parse
  • Click on the Text to Columns option in the Data Tools group on the Data tab.
text to columns option
  • In the Convert Text to Columns Wizard – Step 1 of 3 dialog box, check the Delimited radio button and click on Next.
wizard step 1 of 3
  • Then in the Convert Text to Columns Wizard – Step 2 of 3 dialog box, check the Space check box under the Delimiters group and click on Next.
wizard step 2 of 3
  • Finally, in the Convert Text to Columns Wizard – Step 3 of 3 dialog box, check the appropriate data format radio button under the Column data format group and click on Finish.
wizard step 3 of 3 1

The final formatting looks like this:

text to columns formatting

Conclusion

In this article, we learned how to clean data in excel for data analysis.

References