Let's assume that I can only do this in SQLite.
I have two Tables.
State {
DistrictID: int
SubdistrictID: int
CityID: Int
name: varchar(36)
}
Theres an obvious heirarchy in the State table. A State has a number of Districts, Each district has a number of Subdistricts, each Subdistrict has a number of Cities.
The other table is the number of Students in a City, where the Students.CityID = State.CityID
Students {
CityID: int
number: int
}
I want to know take out the number of Students existing in a District '221'.
QUERY: SELECT sum(students.number) from State LEFT JOIN Students students ON State.CityID = students.CityID WHERE State.DistrictID = 221 GROUP BY State.DistrictID
So far so good, this does work. I get the total number of students from across all the cities that are in District 221.
Heres the complication.
Some Cities are SHARED between Subdistricts. In this case, a District have One city thats shared across two Subdistricts. This is reflected within the State Table.
State Table
Row0: DistrictID: 221; SubDistrictID: 332; CityID: 554
Row1: DistrictID: 221; SubDistrictID: 332; CityID: 555
Row2: DistrictID: 221; SubDistrictID: 333; CityID: 554
Row3: DistrictID: 221; SubDistrictID: 333; CityID: 557
The Rows 0 and 2 have the same city (554) shared across two Subdistricts - 332 and 333.
In this case, the above sql query would double the SUM() value as the same city is counted TWICE.
How am I to solve this complex duplication that logically arose because of this technically faulty, but realistical problem, by not changing the schema of the table? I tried using distinct
, but it doesn't fit this purpose, and as such, doesn't work.
You can use select distinct
to get one reference to each city in the district, before doing the join
:
select sum(s.number)
from (select distinct cityid
from state
where destrictid = 21
) c left join
students s
on s.cityid = c.cityid