What is the most efficient way to perform large and slow batch work on GAE


Say I have a retrieved a list of objects from NDB. I have a method that I can call to update the state of these objects, which I have to do every 15 minutes. These updates take ~30 seconds due to API calls that it has to make.

How would I go ahead and process a list of >1,000 objects?

Example of an approach that would be very slow:

my_objects = [...] # list of objects to process
for object in my_objects:
  object.process_me() # takes around 30 seconds

Two options:

  • you can run a task with a query cursor, that processes only N entities each time. When these are processed, and there are more entities to go, you fire another task with the next query cursor.
    Resources: query cursor, tasks

  • you can run a mapreduce job that will go over all entities in your query in a parallel manner (might require more resources).
    Simple tutorial: MapReduce on App Engine made easy