Avoiding transcription errors and joining separate lists: a VLOOKUP primer

When you’re relatively new to Excel, learning about VLOOKUP blows your mind. Completely. It forever changes the way you think about managing data using Excel. Consider this a comprehensive introduction to VLOOKUP, its uses, and misuses.

Anatomy of a VLOOKUP

VLOOKUP is Excel’s shorthand for “vertical lookup.” VLOOKUP provides you a way to match data from one table to data in another without having to do any kind of sorting of the data. Let’s look at a diagram of its inner workings and an example.

The Parameters of VLOOKUP

VLOOKUP requires four parameters to work, each of which is mentioned in parentheses:

1. A value (lookup_value) that is common to both tables. As I discuss below, this is what Excel is going to look for in each table, so it’s important that these values be exactly the same in both of them.

2. A table of values where its going to look up stuff (table_array). This table must have the identifying values (sometimes called a foreign key) along its leftmost column.Most of the time, this range of cells should be the same for all of the rows in your table, so you will want to lock these cell references down. Save yourself a lot of inconvenience and used a named range. It’s a little easier on the eyes, too.

3. The column that has the values you want to look up in the foreign table (index_num). In the simplest case, this is just a number that indicates which column has the data you’re want to retrieve. In a future post, I’ll show you how to do something a little fancier (and safer) using the MATCH function.

4. The last parameter (range_lookup) should be set to FALSE if you want Excel to find exact matches to the value you’re looking up; if approximate matches are acceptable, set this to TRUE. In general, if you’re not sure what set this to, I recommend you use FALSE. Using approximate matches is a dangerous game, especially as your data set gets large and your ability to validate values gets cramped.

When Should (and Shouldn’t) You Use VLOOKUP?

In order for VLOOKUP to work correctly, only two things need to be true about your data:

1. The values that identify your data are exactly the same in both tables. Case sensitivity doesn’t matter: having “rEgIon 4″ in one table and “Region 4″ in another won’t mess with function’s output. Extra characters, though, will give you headaches: having “Region 4 ” (note the space at the end) in one table and “Region 4″ in the other will cause no match to be found. I use the Region-4-plus-a-space example because it’s caused me a good of annoyance in past.

2. There are no repeat entries of the identifying value in the target data set. Let me decode that statement. Say you have two entries for Region 3 in Table 2 because you revised total revenues after they were published. You also want to merge the data from Target Table 2 to the Source Table as shown in the picture. VLOOKUP has no way of distinguishing between the revised and original values, since all it knows is that it should find “Region 3″ in Table 2. It might find the right value by chance, but that depends on the order in which the values appear in the table. VLOOKUP chooses the first value associated with “Region 3″ without distinguishing between the old and new ones.

On the other hand, if you have multiple entries for “Region 3″ in a source table, one value for revenues in a target table, and you want each of them to have the revenue value from the target table (i.e., a many-to-one join), you’re golden. The important thing is that the target table from which you’re getting values does not have multiple values that could be joined to the source table. This is the type of mistake that you can’t catch by looking for calculation errors.

Scalability

The best part of VLOOKUP is that it scales well for lots of applications. If you’ve only got 10 or 20 values to join, it’s convenient; if you have 1000 values to join, it’s necessary. Plus, once you understand how VLOOKUP works, you can use the closely-related HLOOKUP to join values based on columns instead of rows.

If you want to see how all of this works in Excel, grab the workbook by clicking on this link.