I'm trying to trick sqlalchemy into doing something of a weird query. I need to write a query that selects parent objects in a one-to-many database where the value of a data column on a child object differs from the value on other children of the same parent. I know I can easily do this with python after doing the query, but for the sake of performance I'd like to use sqlalchemy's api to do so if possible.
The following minimal example explains what I'm trying to do. Much of it is setup, so only the last two blocks of code are really relevant to the question.
from sqlalchemy import Column, Integer, Enum, String, ForeignKey, create_engine from sqlalchemy.orm import relationship, sessionmaker from sqlalchemy.ext.declarative import declarative_base from collections import namedtuple Base = declarative_base() Data_t = namedtuple('Data_t', 'data1, data2, data3')("1", "2", "3") class Parent(Base): __tablename__ = 'parents_table' id = Column(Integer, primary_key=True) children = relationship('Child') def __repr__(self): return ("Parent %s" % self.id) class Child(Base): __tablename__ = 'children_table' id = Column(Integer, primary_key=True) parent_id = Column(Integer, ForeignKey('parents_table.id')) parent = relationship('Parent', back_populates='children') data = Column(String, Enum(*Data_t._asdict().values(), name='data')) def __init__(self, _parent, data): self.parent = _parent self.parent_id = _parent.id self.data = data def __repr__(self): return "Child %s: %s" % (self.id, self.data) # Create session engine = create_engine('sqlite:///:memory:', echo=False) engine.connect() Base.metadata.create_all(engine) Session = sessionmaker(bind=engine) Session.configure(bind=engine) session = Session() # Create parents parent1, parent2 = Parent(), Parent() session.add(parent1) session.add(parent2) session.commit() # Create children with differing data column values for i in range(3): child = Child(parent1, str(i + 1)) session.add(child) # Create children with the same data column value for i in range(3): child = Child(parent2, str(1)) session.add(child) session.commit() # Run query - This query pulls all the contents of the parent table # whereas I'd like to only select the parents where the children # data values differ - i.e. I'd like to select parent 1, but not parent 2 for parent in session.query(Parent).all(): print(parent) for child in parent.children: print(" ", str(child))
This code outputs the following:
Parent 1 Child 1: 1 Child 2: 2 Child 3: 3 Parent 2 Child 4: 1 Child 5: 1 Child 6: 1
I'd like to be able to select only parent 1, but I haven't found anything in sqlalchemy's documentation that seems like it's capable of producing such a query.
The key to the filter is the
COUNT(DISTINCT ..) SQL clause:
from sqlalchemy import func parents = ( session .query(Parent) .outerjoin(Child) .group_by(Parent) .having(func.count(Child.data.distinct()) > 1) ).all() for parent in parents: print(parent) for child in parent.children: print(" ", str(child))
Bonus: If you really need to fetch
Child as well, and you would like to do it in just one
SQL statement, you can add following option to the query, so that also the children are retrieved in the same query:
.options(joinedload(Parent.children)) # from sqlalchemy.orm import joinedload