Many spreadsheet users will know that when you need to pick out a value from a list, it's a good idea to use a lookup formula. There are numerous ways to do this, but which is the best?
Each formula or method has its advantages and disadvantages depending on the circumstances. It's hard to cover all of these circumstances, so we'll focus on one specific, yet common, situation.
Let's say you have a unique list of items, say "fruit". Each type of fruit has a numerical value associated with it — perhaps the quantity or price of the item. Somewhere else in your spreadsheet you want to enter the fruit item you're interested in and have a formula return the specific value for this fruit item. Let's further assume that the items are not sorted in any particular order.
In this table, we want to enter the name of the fruit in cell B10, and we want the value in cell C10 to automatically pick out the correct quantity in the source data.
So, now that we've set the scene, let's look at 7 different ways to do this in Excel.
Method 1 – SUMIF
SUMIF is a very useful way to do conditional sums. It lets you specify the range of items (i.e. the list of fruit names), the criteria (i.e. the particular fruit you want to find), and the result range (i.e. the quantities associated with the items). It does not require the list to be sorted in any order, but for this to work as a lookup each item in the list needs to be unique. Furthermore, it will only work for numerical values in the result range. If you wanted to return say the colour of the fruit, SUMIF would not work.
Method 2 – INDEX and MATCH
INDEX and MATCH is probably the method I use most often to perform exact lookups. For some reason it just makes the most sense to me, and has clear advantages over VLOOKUP and HLOOKUP. If the items in the list are not unique then it returns the associated value for the first one it finds. Although this indicates a problem with your source data, this seems better to me than SUMIF which would effectively aggregate all the items it finds. Furthermore, INDEX and MATCH works with non-numeric data, so would work fine in the situation discussed above regarding picking a colour out of the list. Finally, the ranges do not have to be contiguous, and there is a more advanced version of INDEX and MATCH which can match criteria in two dimensions i.e. rows and columns.
Method 3 – VLOOKUP
This is similar to the INDEX and MATCH approach but with one significant disadvantage — it requires you to specify the column number of the data you want returned (in this case the 4th column of the range). It is possible to calculate this column number dynamically, but this is hard, opaque and error-prone. Furthermore, the formula requires that the left-most column of the range contains the items you're searching. This is unnecessarily restrictive and is not required by the INDEX and MATCH approach.
Method 4 – DGET
This is a neat method and probably should get more use. By all accounts it is a computationally efficient method, and when working with arrays it allows for flexible criteria and boolean logic e.g. "apples" OR "bananas" (e.g. DSUM, DCOUNT, etc.). One drawback is that it requires a criteria range to be set up in a particular way (B10:B11 in the formula) and this may not suit your worksheet layout and may break later if you insert rows in the wrong place. In general, I would avoid this approach and still go with INDEX and MATCH.
Method 5 – Array IF
Arrays are incredibly powerful and intuitive when you really get them. Sometimes thinking about them can cause your brain to leak out of your ear though, so be careful. In this case, using an array IF is no better than SUMIF and has the disadvantage of being more advanced – so less skilled users of your model may be confused. Also arrays can only be entered by holding down Ctrl and Shft at the same time when hitting Enter. Again, this is not something less skilled users will know about — not that they should be messing with your formulas anyway.
Method 6 – Range Names
This assumes that you've created range names for your source data. You can do this easily / automatically by selecting your source data (B4:I7) and hitting Ctrl Shft F3, then choose the Left column option and hit Enter. This will create range names out of all item names. The INDIRECT formula is the way to get Excel to use the input value you type in B11 as if you'd written =Bananas in cell H11. This method is bad design! The main problem is that you now need to maintain and ensure the integrity of a whole lot of range names. If you later change the item Bananas in your source list to be Kiwi Fruit, you'll need to delete your Bananas range name and add one for Kiwi Fruit. There are so many things that can/will go wrong with this method, and we generally recommend not using range names at all in your models.
Method 7 – PivotTables
PivotTables are very useful for analysing data set up in a database format. They can certainly be used to lookup data as we're doing here, but they require you to create a PivotTable and ensure that it is being refreshed when your source data changes. This does not happen automatically like normal spreadsheet functions. It also requires you to use and understand the GETPIVOTDATA formula, which can be a bit fiddly. In this case, using a PivotTable approach is a bit like using a sledge-hammer to kill an ant, so it's not our preferred approach.
So, while none of these methods are foolproof, the winner is clearly method 2 – INDEX and MATCH. To further improve your exact lookups, you should 1) check your source items are unique, 2) use data validation to ensure the result item is from this list, 3) deal with potential errors with IFERROR or ISERROR. Finally, although not done in the above examples, remember to get your $ signs in the right places in case you need to copy and paste this formula to other cells.