What is the best way to load multiple remote RSS feeds?

advertisements

I'm working on a project where i need to load multiple (100+) remote RSS feeds, parse them and query for some keywords. Obviously this process is time consuming and i'm looking for the best way to implement this.

My current implementation loads the feeds synchronously, because the asynchronous implementation with TPL failed because there are to many tasks created during the process and finally it throws an exception.

The async part for loading the remote feed looks like this:

/// <summary>
/// Loads the specified URL.
/// </summary>
/// <param name="url">The URL.</param>
/// <returns></returns>
/// <exception cref="ScanException">Unable to download rss feed from the specified url. Check the inner exception for more details.</exception>
protected async Task<XDocument> Load(string url)
{
    XDocument document = null;

    try
    {
        using (var client = new HttpClient())
        {
            HttpResponseMessage response = await client.GetAsync(url);

            if (response.IsSuccessStatusCode)
            {
                string content = await response.Content.ReadAsStringAsync();
                document = XDocument.Parse(content);
            }
        }
    }
    catch (Exception ex)
    {
        throw new ScanException(url, "Unable to download rss feed from the specified url. Check the inner exception for more details.", ex);
    }

    return document;
}

I hope you guys can point me in the right direction, so i can get this to work right (performance wise).

The final question is: What is the best way to load multiple remote RSS feeds?

Test code

/// <summary>
        /// Reads the feeds by batch async.
        /// </summary>
        /// <param name="feeds">The feeds.</param>
        public void ReadFeedsByBatchAsync(string[] feeds, TorrentStorage storage, int batchSize = 8)
        {
            var tasks = new List<Task>(batchSize);
            var feedsLeft = feeds.Length;

            foreach (string feed in feeds)
            {
                var readFeedTask = this.client.GetStringAsync(feed);

                if (readFeedTask.Status == TaskStatus.RanToCompletion)
                {
                    XDocument document = XDocument.Parse(readFeedTask.Result);
                    var torrents = ProcessXmlDocument(document);

                    storage.Store(torrents);
                }

                tasks.Add(readFeedTask);
                --feedsLeft;

                if (tasks.Count == tasks.Capacity || feedsLeft == 0)
                {
                    var batchTasks = tasks.ToArray();
                    tasks.Clear();

                    try
                    {
                        Task.WaitAll(batchTasks);
                    }
                    catch (Exception)
                    {
                        throw;
                    }
                }
            }

        }


I have solved a similar issue in my fork of GitExtensions. I am dispatching batches of 8 REST API calls by creating the tasks, and doing a Task.WaitAll for each batch of 8. It is a bit simplistic, but it does the job without complicating the code too much:

https://github.com/PombeirP/gitextensions/blob/BuildServerIntegration/Plugins/BuildServerIntegration/TeamCityIntegration/TeamCityAdapter.cs#L178.

One thing I would suggest is reusing the HttpClient class. It looks like a bit of overkill to always create a new instance for each request.