The duplication of the shared hierarchy and LEFT JOIN


Let's assume that I can only do this in SQLite.

I have two Tables.

State {
  DistrictID: int
  SubdistrictID: int
  CityID: Int
  name: varchar(36)

Theres an obvious heirarchy in the State table. A State has a number of Districts, Each district has a number of Subdistricts, each Subdistrict has a number of Cities.

The other table is the number of Students in a City, where the Students.CityID = State.CityID

Students {
  CityID: int
  number: int

I want to know take out the number of Students existing in a District '221'.

QUERY: SELECT sum(students.number) from State LEFT JOIN Students students ON State.CityID = students.CityID WHERE State.DistrictID = 221 GROUP BY State.DistrictID

So far so good, this does work. I get the total number of students from across all the cities that are in District 221.

Heres the complication.

Some Cities are SHARED between Subdistricts. In this case, a District have One city thats shared across two Subdistricts. This is reflected within the State Table.

State Table
Row0: DistrictID: 221; SubDistrictID: 332; CityID: 554
Row1: DistrictID: 221; SubDistrictID: 332; CityID: 555
Row2: DistrictID: 221; SubDistrictID: 333; CityID: 554
Row3: DistrictID: 221; SubDistrictID: 333; CityID: 557

The Rows 0 and 2 have the same city (554) shared across two Subdistricts - 332 and 333.

In this case, the above sql query would double the SUM() value as the same city is counted TWICE.

How am I to solve this complex duplication that logically arose because of this technically faulty, but realistical problem, by not changing the schema of the table? I tried using distinct, but it doesn't fit this purpose, and as such, doesn't work.

You can use select distinct to get one reference to each city in the district, before doing the join:

select sum(s.number)
from (select distinct cityid
      from state
      where destrictid = 21
     ) c left join
     students s
     on s.cityid = c.cityid