LINQ to SQL and lifetime of objects, references to values

advertisements

I came across an interesting bug with linq to sql. Take a look at the code below which is loosely translated from a LINQtoSQL query from a search engine i'm writing.

The goal of the query is to find any groups which have the ID's "Joe", "Jeff", "Jim" in consecutive order.

Pay careful attention to the variables named localKeyword and localInt. If you were to delete the declarations of these seemingly useless local variables and replace them with the ones they are proxying, you would find the query no longer works.

I'm still a beginner with linq to sql but it looks like it is passing all the locals as references. This results in the query only having the value of local variables when the query is evaluated. In LINQ to SQL my query ended up looking like

SELECT * FROM INDEX ONE, INDEX TWO, INDEX THREE
  WHERE ONE.ID = 'Jim' and TWO.ID = 'Jim' and
    TWO.SEQUENCE = ONE.SEQUENCE + 2 and
    THREE.ID = 'Jim' and
    THREE.SEQUENCE = ONE.SEQUENCE + 2 and
    ONE.GROUP == TWO.GROUP and ONE.GROUP == THREE.GROUP

The query is of course paraphrased. What exactly is happening, is this a bug? I am asking to perhaps better understand why this is happening. You should find the code compiles in visual studio 2008.

using System;
using System.Collections.Generic;
using System.Text;
using System.Linq;

namespace BreakLINQ
{
    class Program
    {
        public struct DataForTest
        {
            private int _sequence;
            private string _ID;
            private string _group;

            public int Sequence
            {
                get
                {
                    return _sequence;
                }
                set
                {
                    _sequence = value;
                }
            }
            public string ID
            {
                get
                {
                    return _ID;
                }
                set
                {
                    _ID = value;
                }
            }
            public string Group
            {
                get
                {
                    return _group;
                }
                set
                {
                    _group = value;
                }
            }
        }
        static void Main(string[] args)
        {
            List<DataForTest> elements = new List<DataForTest>
            {
                new DataForTest() { Sequence = 0, ID = "John", Group="Bored" },
                new DataForTest() { Sequence = 1, ID = "Joe", Group="Bored" },
                new DataForTest() { Sequence = 2, ID = "Jeff", Group="Bored" },
                new DataForTest() { Sequence = 3, ID = "Jim", Group="Bored" },
                new DataForTest() { Sequence = 1, ID = "Jim", Group="Happy" },
                new DataForTest() { Sequence = 2, ID = "Jack", Group="Happy" },
                new DataForTest() { Sequence = 3, ID = "Joe", Group="Happy" },
                new DataForTest() { Sequence = 1, ID = "John", Group="Sad" },
                new DataForTest() { Sequence = 2, ID = "Jeff", Group="Sad" },
                new DataForTest() { Sequence = 3, ID = "Jack", Group="Sad" }
            };

            string[] order = new string[] { "Joe", "Jeff", "Jim" };
            int sequenceID = 0;
            var query = from item in elements
                        select item;
            foreach (string keyword in order)
            {
                if (sequenceID == 0)
                {
                    string localKeyword = keyword;
                    query = from item in query
                            where item.ID == localKeyword
                            select item;
                }
                else
                {
                    string localKeyword = keyword;
                    int localSequence = sequenceID;
                    query = from item in query
                            where (from secondItem in elements
                                   where secondItem.Sequence == item.Sequence + localSequence &&
                                         secondItem.ID == localKeyword
                                   select secondItem.Group).Contains(item.Group)
                            select item;
                }
                sequenceID++;
            }
        }
    }
}

The value of the query after the code completes should have the value {"Joe", "Bored", 1}.


The reason this fails without the 'proxying' variables is that the variables are captured by the expressions in the LINQ query. Without the proxies, each iteration of the loop references the same two variables (keyword, sequenceID), and when the query is finally evaluated and executed, the value substituted for each of these references is identical; namely, whatever value is present in those variables when the loop terminates (which is when you want us to evaluate 'query').

The query behaves as expected with the proxies because the captured variables are uniquely declared per iteration of the loop; subsequent iterations do not modify the captured variables, because they are no longer in scope. The proxy variables are not useless at all. Furthermore, this behavior is by design; let me see if I can find a good reference link...