When people come up to speed in C#, collection types present options. And, sometimes, these options confuse. Should you use C# IEnumerable, IList, or ICollection? And what’s an IQueryable, anyway?
This type of wondering generally brings you to the official documentation for one of the types. For example, look at the declaration of the IList<T> interface.
public interface IList<T> : ICollection<T>, IEnumerable<T>, IEnumerable
IList<T> implements IEnumerable<T>, so IEnumerable must be kind of a subset of IList. Right? And, for the sake of performance, we should probably use only what we need. A good rule of thumb then would be to see if you can easily use IEnumerable, and then switch to IList when you need to do something IEnumerable fails to do. Seems reasonable.
Well, it seems reasonable until you really, truly understand both types. But I’ll come back to that.
A Tale of Performance Woe
I think most people in the industry can relate to differences between behavior in development or test and in production. In fact, we’ve immortalized the sentiment with a ubiquitous catchphrase. It works on my machine!
When we think of this phrase, we usually think in terms of behavior. Everything worked when you ran it locally but then blew up when running on someone else’s machine. Oops. Turned out you referenced an environment variable set on your machine but not elsewhere.
But it can also apply to performance concerns. You code something up, run it against your local database, and all seems fine. So you ship it to pre-prod or even production. Only then do you discover some horrifyingly different behavior that you can’t explain with simple database scale?
After some pain and digging, you find yourself learning about something you’d never previously heard of called the N+1 problem. Apparently, your elegant data access code isn’t quite as elegant as it first seemed.
Digging Into C# IEnumerable, Arrays, and Lists
Let’s now get to the subject of true understanding that I mentioned a moment ago. I’ll start by explaining some collection types with which you may be more familiar. Consider, for starters, the humble array.
Arrays date back about as far as programming, and they represent grouped values. They have declared and fixed capacity, so if you make a 4 element array, it will always store 4 elements. You can set those elements to different values and you can iterate over the array if you so choose, performing an operation on each element.
As I mentioned, arrays are old. People have used them forever and, during that time, have bumped up against their limitations. The property of having fixed length causes annoyance, and people find it convenient to perform quick operations like sorting and filtering duplicates. So you wind up with heavier weight, more sophisticated types like List<T>, which implements the IList<T> interface. That interface demands that its implementations support operations such as adding, removing and clearing.
Many implementations of IList will involve arrays at their core, decorating them with convenience functionality. So you can think of them as convenient, heavyweight arrays.
But then if lists are heavyweight arrays, does that make IEnumerable<T>, with its single “GetEnumerator” method, something like a lighter weight array? No, it turns out. Not at all.
The Promise of IEnumerable
When using the C# IEnumerable construct, things get conceptually weird in short order. At least they do if you’re not used to how this stuff works.
In a sense, arrays and lists are tangible. You have a bunch of strings or integers sitting there in memory in a row, waiting for you to do things to them. Easy to work with and easy to reason about. When it comes to IEnumerable, however, you don’t have anything tangible. Instead, you have a promise that you’ll have things sitting there in a row later when you really need them. Perhaps you’ve heard the term lazy loading before (“don’t load until you have to”)? Well, IEnumerable<T> implements the related concept of deferred execution, which forbids computation until someone uses the result.
Let’s make this more tangible with a simple allegory. Let’s say that I’m a method, and my job is to return fruit. When I return a List of fruit to you, you ask me for fruit and I hand it to you in an orderly fashion. Here’s an apple, here’s an orange, etc. But when I return an IEnumerable of fruit to you, I hand you something else entirely. I hand you a note that says, “this note entitles you to some fruit — give me a call when you want to eat the first piece of fruit, and I’ll produce it at that time.”
Back in the programming world, you can think of this as a commitment or promise of sorts. In reality, all IEnumerable promises you is an underlying strategy for delivering the next element — a state machine if you want to get technical about it.
Back in the World of Databases and Performance
So let’s go back now to the narrative. You sit at your desk, implementing an MVC app over a database, and everything seems fine. You follow the rule of thumb I mentioned earlier, using IEnumerable<T> (or IQueryable<T>) when they have sufficient functionality. Everything goes along swimmingly as you develop and perform your testing.
When you do see some performance issues, they don’t crop up during database operations. Instead, they crop up as you cycle through records and build the pages you plan to return over the wire. So whatever slowness you notice must come from some part of the web framework itself is slow. Your database queries return with lightning speed, so all is good!
Except, now you know that isn’t true. As you step through the call stack down toward the database calls and then back up again, you understand what you’re really getting. That call that seems to trigger a “SELECT *” and population of the results really just returns an assurance that it will make that call when the time comes. That method returning an IEnumerable says, “yep, here you go — one note promising that we’ll make a database later when you need the data.”
Keeping Performance Intact
With that knowledge in mind, you can have better antennae for the types of performance problems that seem small in dev, but come up huge in production. Deferred execution (and lazy loading) offer a nice way to put off potential performance hits until the absolute last minute. But, in doing that, they force a tradeoff upon you wherein they make execution harder to reason about. When you make extensive use of these patterns, you can lose track of where the bottlenecks really lie. So you must stay vigilant.
Also, bear in mind that not all IEnumerable implementations use deferred execution. After all, IList implements IEnumerable and it doesn’t defer anything. Deferred execution is just a possibility, depending on which implementation you wind up with. So keep your eye out for it. And, speaking more broadly, make sure you understand exactly what IEnumerable is and what you’re signing up for when you use it. It’s not just a lighter weight list. This understanding will help you keep your code both performant and correct.