Nick is Head of Data Science at Evolve and is a master of new research technologies including text analytics and AI.
Many of us love to dig deep into our data and DIY the basics, and with the tools available today in programs like Excel it’s easy for everyone to get involved.
However, there are some common watch outs and failures so we’ve compiled a list of occasions where Excel’s tools can behave unexpectedly.
Avoid the blood, sweat, tears and swearing by learning from these experiences!
Excel’s Vlookup Function
One of the most powerful functions in Excel is being able to match and merge data from one source into another using Vlookup.
Here are some common ways which Vlookup can behave strangely:
Check your data for duplicates – at least one of your data sets needs to be unique for the function to work properly
Avoid issues with blank cells – Vlookup interprets blank cells as ‘0’. This is not the same thing as blank and can cause problems when merging numeric data, where there is a difference between 0 and missing. To avoid this, you may need an ‘if’ statement that tests if the length of the matched result is 0 or not
Excel's handling of CSV files
CSV (Comma Separated Values) files are one of the most common and open ways to share data between different sources.
It can be tempting to load CSV files into Excel, but be aware that Excel may change the format of CSV files, such that when you save the files, they are different!
Some common watch outs:
Mobile numbers preceded with a 0 (e.g. 0401001001) get the 0 dropped and Excel treats the value as a number! If you then attempt to load the file into a platform which can dial these numbers you’re likely to get a message that no valid mobile numbers are found, which can delay your activity while you resolve this issue
Excel will often change the format of dates. This can cause issues downstream if you’re trying to match dates back to other files. This is one to test, understand and avoid upfront as there is no single way to clean/treat the data
Excel may also interpret dates as US dates depending on your computer’s localisation option – check your defaults to confirm. If your date data doesn’t imply that it’s day/month/year (for instance, your data file contains only dates early in the month before the 13th), be on the lookout for Excel playing around with your data.
If you have a CSV file with ‘ID’ as cell A1, Excel does not recognise it as a CSV file, and instead will save it as another type of document called a SYLK. This will throw several errors upon trying to load the file, and may cause frustration.
Excel's filtering options
With a well-formed data set, Excel’s filtering options can be great to help quickly sort columns and filter out options. With large data files, Excel sometimes does not display an exhaustive list of options available in a column. Don’t rely on filter information to inform you about included values!
When using Excel filtering, be sure that you’ve squarely-selected all of the data in your workbook. If you miss some rows, they will not be included in any filtered operations such as sorting and filtering. Including too many rows may result in blanks being included as well.
Excel hidden information
There are all kinds of ways to hide data in Excel files such as hiding tabs. Many data analysis downstream processes may look for the first sheet of an Excel file; and you may not realise that the process is using the first sheet if it’s hidden. There are other ways of hiding data such as with formatting. When cleaning sample files from unknown sources, start with removing all formats and checking for hidden tabs.
Solid data analysis starts from the ground up.
Having trust in the data collection and data management allows us to report insights with confidence.
Seeing inconsistent results or behaviour in our primary tools can be unsettling.
Understanding some of the shortcomings and problems means we can adapt and continue to report with confidence.