Selecting parents with different values ​​in a child column in SQLAlchemy ORM

advertisements

I'm trying to trick sqlalchemy into doing something of a weird query. I need to write a query that selects parent objects in a one-to-many database where the value of a data column on a child object differs from the value on other children of the same parent. I know I can easily do this with python after doing the query, but for the sake of performance I'd like to use sqlalchemy's api to do so if possible.

The following minimal example explains what I'm trying to do. Much of it is setup, so only the last two blocks of code are really relevant to the question.

from sqlalchemy import Column, Integer, Enum, String, ForeignKey, create_engine
from sqlalchemy.orm import relationship, sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from collections import namedtuple

Base = declarative_base()

Data_t = namedtuple('Data_t', 'data1, data2, data3')("1", "2", "3")

class Parent(Base):
    __tablename__ = 'parents_table'

    id = Column(Integer, primary_key=True)
    children = relationship('Child')

    def __repr__(self):
        return ("Parent %s" % self.id)

class Child(Base):
    __tablename__ = 'children_table'

    id = Column(Integer, primary_key=True)
    parent_id = Column(Integer, ForeignKey('parents_table.id'))
    parent = relationship('Parent', back_populates='children')

    data = Column(String, Enum(*Data_t._asdict().values(), name='data'))

    def __init__(self, _parent, data):
        self.parent = _parent
        self.parent_id = _parent.id
        self.data = data

    def __repr__(self):
        return "Child %s: %s" % (self.id, self.data)

# Create session
engine = create_engine('sqlite:///:memory:', echo=False)
engine.connect()
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
Session.configure(bind=engine)
session = Session()

# Create parents
parent1, parent2 = Parent(), Parent()
session.add(parent1)
session.add(parent2)
session.commit()

# Create children with differing data column values
for i in range(3):
    child = Child(parent1, str(i + 1))
    session.add(child)
# Create children with the same data column value
for i in range(3):
    child = Child(parent2, str(1))
    session.add(child)
session.commit()

# Run query - This query pulls all the contents of the parent table
# whereas I'd like to only select the parents where the children
# data values differ - i.e. I'd like to select parent 1, but not parent 2
for parent in session.query(Parent).all():
    print(parent)
    for child in parent.children:
        print("   ", str(child))

This code outputs the following:

Parent 1
    Child 1: 1
    Child 2: 2
    Child 3: 3
Parent 2
    Child 4: 1
    Child 5: 1
    Child 6: 1

I'd like to be able to select only parent 1, but I haven't found anything in sqlalchemy's documentation that seems like it's capable of producing such a query.


The key to the filter is the COUNT(DISTINCT ..) SQL clause:

from sqlalchemy import func
parents = (
    session
    .query(Parent)
    .outerjoin(Child)
    .group_by(Parent)
    .having(func.count(Child.data.distinct()) > 1)
).all()

for parent in parents:
    print(parent)
    for child in parent.children:
        print("   ", str(child))

Bonus: If you really need to fetch Child as well, and you would like to do it in just one SQL statement, you can add following option to the query, so that also the children are retrieved in the same query:

    .options(joinedload(Parent.children))  # from sqlalchemy.orm import joinedload