How to improve the atrocious performance of EF4 when listing an association?

advertisements

I'm working on movie database app using Entity Framework 4 (database first) and it is taking 30 seconds to load about 8,200 rows into a List. There are three tables involved and when I use .Include(), performance degrades even more -- almost three minutes to load 8,200 rows. This is painful. Given I'm learning a lot of technologies at once, I'm hoping there is a simple fix. Here are the details:

Table 1 - Videos

This is a large table with 31 columns about about 7,800 rows of videos. It uses a Guid as its primary key.

Table 2 - ActorsVideos (a junction table)

This table has two columns: (1) VideoID column, and (2) ActorID column. Both columns are Guid's and are foreign keys into the Video and Actor tables, respectively. This table uses a composite primary key where both columns act as the primary key. EF4 does not model this table; however, it creates a navigation property. This table allows the user to assign any number of actors to a movie.

Table 3 - Actors

Has 16 columns about about 400 rows. Again, the primary key is a Guid.

In the code, I'm reading about 10 columns in the Videos table and then I read columns from the associated Actors table.

The C# code looks something like this:

var videos = context.Videos
foreach (var video in videos)
{
    // retrieve 10 or so properties from 'video'

    if (video.Actors.Count > 0)
    {
        foreach (var actor in video.Actors)
        {
            // retrieve some properties on the actor
        }
    }
}

I've tried adding .Include("Actors") after context.Videos and, as stated above, performance went from terrible to horrific.

I've looked at the SQL that is generated with the Include and it's about 2K of text given the number of columns that are in the video table.

Do I have to split the video table using a Master/Detail pattern? My next step is to cache the actors table and avoid the navigation/association property altogether. Any other suggestions to make this faster? It should run in less than 5-6 seconds in my opinion.

EDIT: The database is SQL Server CE 3.5.


You're asking Entity Framework to load a video along with all of it's actors, then you're doing the filtering in application code. Generally, you're pulling way more data than you need to. I would have SQL Server (or whatever DB you're using) pre-filter for you:

var videos = context.Videos;
var results = from video in videos
              where video.Actors.count > 10
              group video.Actors by video.VideoID into grouping
          select new
          {
              video.VideoID,
              video.Actors
          };

foreach (var group in results)
{
    foreach (var actor in group.Actors)
    {
        // do stuff
    }
}

Loading ~8200 rows, along with their associated rows in the videos table should be extremely fast. I did some development at my job where I had to deal with a 70+ million row test data table with a 5 table join. This ran in something like half a minute.

But, the reason why it ran that much faster than what you're doing is because I was filtering inside of SQL Server. The equivalent "procedural" program using EF took several minutes because I was doing filtering AFTER I pulled the rows from the database.

Think of it this way: You're not only asking for every row in your database, but you're pulling in data that you don't even need to multiple times.