In my first post, I promised to keep my blog focused mainly on computer science topics. With that said, it's quite sad that my second post is going to break my promise. But I didn't morally bind myself to that promise, and I'm sure my readers won't mind anyway.
I enjoy using LINQ to Objects on collections to filter, sort, find and project collection data. The terse syntax that lambda expressions bring to the table makes my code feel more functional, and I like that feeling. Just recently, I was writing a method to pull some rows out of a DataTable based on a filter condition. I wanted to use the Where extension method on my DataTable to pull out the data I needed based on my filtering predicate. This is what Where looks like:
public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
Where evaluates a predicate and yields each item from the source into an IEnumerable collection if the predicate returns true for the item. A use of this extension method might look like the following:
IEnumerable<string> filteredList = unfilteredList.Where(x => x == "searchString");
At the time when filteredList is enumerated, it will contain the elements from unfilteredList which equal "searchString" (note that I'm not using the LINQ query operators, although the example above would be easy to implement using them). The goal of Where here is to enumerate through every single item in list and yield some of the elements into a new list. Which is fine and dandy until your data is sorted. Where doesn't know anything about the data it's filtering, and it shouldn't. It's good to have that disconnect, it provides flexibility. But it would be nice if there was an implementation of Where that could take a hint about the data and know when to stop filtering. Let's call this new implementation WhereFew.
The goal of WhereFew is to enumerate through a sorted collection until it finds data matching a predicate and then stop enumerating when it can no longer find data matching the predicate. Since the signatures for Where and WhereFew match, the only detail you'll have to keep in mind is that the data to filter must be sorted and the filtering predicate must filter on the sorted field. Here is what this method looks like:
static IEnumerable<T> WhereFew<T>(this IEnumerable<T> collection, Func<T, bool> predicate)
{
bool searching = true;
foreach (T val in collection)
{
if (predicate.Invoke(val))
{
if (searching)
{
searching = false;
}
yield return val;
}
else if (!searching)
{
yield break;
}
}
}
If we followed the rules of this method correctly (data has to be sorted and the predicate filters based on the sorted field), then WhereFew will enumerate the list until the predicate is satisfied and keep going until the predicate cannot be satisfied. This has potential to save a lot of enumerating. Imagine if the list you are trying to filter has 100,000 elements, and your filter clause cares only about the first 20. If you were using Where, you would yield the first 20 elements and then enumerate the remaining 99, 980 elements without reason.
We can make this a little bit better yet. Wouldn't it be nice if WhereFew could identify the point in which it could never satisfy the predicate and stop enumerating at that time? Well, let's give it a shot:
static IEnumerable<T> WhereFew<T>(this IEnumerable<T> collection, Func<T, bool> predicate, Func<T, bool> terminatePredicate)
{
bool searching = true;
foreach (T val in collection)
{
if (terminatePredicate.Invoke(val))
{
yield break;
}
else if (predicate.Invoke(val))
{
if (searching)
{
searching = false;
}
yield return val;
}
else if (!searching)
{
yield break;
}
}
}
A slight modification has been made to WhereFew. It now accepts a predicate that tells WhereFew when to terminate enumerating. You'll have to supply this predicate yourself when calling WhereFew, but it's quite simple since you know how your data is sorted and you know what you're filtering for. Here is how we could use WhereFew with a terminate predicate:
List<string> unfilteredList = new List<string>() { "foobar", "foobar", "foo", "foo", "bar", "bar" };
IEnumerable<string> filteredList = unfilteredList.WhereFew(x => x == "foo", y => y.CompareTo("foo") > 0);
The extra lambda (our terminate predicate) needs to know how our data is sorted to know when to terminate. In this example, when it encounters anything after "foo" (which is what we're filtering by), it will force WhereFew to stop enumerating.
Well, there you have it. Hopefully next time you have some sorted data you need to filter in .Net, you will find this helpful. Enjoy!