Consult 2 different elements in the same field and the same table

advertisements

I'm very new to SQL and I don't know how to query 2 different items within the same field and the same table. I'm writing this in Excel VBA using SQL via oledb to attach to a PostGreSQL datasource

Basically I have 2 queries that I need to combine into one query. The first query is the primary group. I need to first find all those people with the code C10%. Then of those with C10 who also have the code R110%

The codes are in the srch table and the people names are in the person table, these are joined by master_id=p.entity_id

Here are the 2 queries I need to combine:

Dim DIAG As String
    DIAG = "SELECT DISTINCT master_id, eventdate, code, term, surname, forename " _
    & "FROM srch INNER JOIN person p ON master_id=p.entity_id " _
    & "WHERE code LIKE 'C10..%' " _
    & "ORDER BY master_id "

Dim DIAG As String
    DIAG = "SELECT DISTINCT master_id, eventdate, code, term, surname, forename " _
    & "FROM srch INNER JOIN person p ON master_id=p.entity_id " _
    & "WHERE code LIKE 'R110%' " _
    & "ORDER BY master_id "

The tables have 100 of rows and each has a master_id that identifies the person. Therefore row would be master_id = 1 and code = C10. The next could be master_id= 1 and code R110.

You are correct in that different codes cannot exist on the same row

Does this help.

person table
entity_id | surname | forename
1         | Smith   | John
2         | Mouse   | Mickey
3

srch table
master_id | code | term | eventdate
1         |C10   |  DM  | 01/01/2000
2         |R110  |  AL  | 01/01/2001
1         |R110  |  AL  | 01/01/2002

I need to find person 1

Result
master_id|code|term| eventdate |surname|forename
   1     |R110| AL | 01/01/2002| Smith | John


Since the column references aren't qualified, we can't tell which table each column reference refers to.


SUGGESTION: as an aid to future readers of the SQL, please consider qualifying all column references with the table name, or even better, a short (unique) table alias. Then the future reader won't be scouring the table definitions to figure out which table contains which column.) Also consider what is going to happen with this SQL statement when a column of the same name is added to another table used in the query.)


We'll just have to guess at which table contains which columns.

I'd suggest a pattern using a GROUP BY clause, rather than DISTINCT.

And I suggest computing an aggregate on the result of an expression that tests whether an included row satisfies the particular condition.

And then performing a test on the aggregate, to see if any rows existed (within the group) where the condition evaluated to TRUE.

As an example: (not tested)

SELECT s.master_id
     , s.eventdate
     , s.code
     , s.term
     , p.surname
     , p.forename
  FROM srch s
 INNER
  JOIN person p
    ON p.entity_id = s.master_id
 WHERE ( s.code LIKE 'C10..%' OR s.code LIKE 'R110%' )
 GROUP
    BY s.master_id
     , s.eventdate
     , s.code
     , s.term
     , p.surname
     , p.forename
HAVING SUM(CASE WHEN s.code LIKE 'C10..%' THEN 1 ELSE 0 END) > 0
   AND SUM(CASE WHEN s.code LIKE 'R110%'  THEN 1 ELSE 0 END) > 0
 ORDER BY s.master_id

The CASE expressions will evaluate to either 1 or 0, for each row. The SUM() aggregate will total the 1s and 0s up. The return from the aggregate will be greater than zero if there was any row (within a "group") that satisfied the condition.

If there are no rows (within the group) with code LIKE 'R110%', the SUM() will evaluate to zero, and the comparison will evaluate to FALSE, and the row will not be returned.

NOTE: The comparison of the aggregates is in a HAVING clause because the results from the aggregates is not available when the conditions in the WHERE clause are evaluated, when the rows are accessed.

FOLLOWUP

Doh! That query above isn't going to return rows. That's my bad. There's a reason we test against some test cases. It helps us identify doofus problems like the one in the query I suggested above.

It's impossible for that query to return any rows. The code column (as I hadn't really noticed) is in the GROUP BY clause. So at least one of the aggregate functions in the HAVING clause is guaranteed to evaluate to zero.

(My problem was that I hadn't noticed that code was a column in the GROUP BY. Doh!)


If all of the columns in the SELECT list need to match, except for the "code" column... (I hate to use an expensive correlated subquery when we don't have to...) we could add "EXISTS (correlated subquery).

If master_id is a foreign key reference to entity_id in person, and entity_id is the primary key of person... we could put off the join operation until after we had the results from srch.

Does term and event also need to match, or just the code? How we write the query depends on that...


Based on the responses in the comments, term and event_date don't need to match. We're looking for rows in srch for the same person (master_id) that have at least one row with the C10 code and at least one row with the R110 code.

Identifying those values of master_id follows the same pattern in the query above, using the GROUP BY and conditional tests on aggregates in the HAVING clause.

This is a query that should return the master_id values that have both a C10 code and an R110 code. This is something we can test... it doesn't return the whole resultset, in only gets us the master_id values we want to return:

SELECT r.master_id
  FROM srch r
 WHERE ( r.code LIKE 'C10..%' OR r.code LIKE 'R110%' )
 GROUP
    BY r.master_id
HAVING SUM(CASE WHEN r.code LIKE 'C10..%' THEN 1 ELSE 0 END) > 0
   AND SUM(CASE WHEN r.code LIKE 'R110%'  THEN 1 ELSE 0 END) > 0

Once we get that, we can use that query as an inline view... wrap in parens, assign an alias and reference it like it was a table. For example:

SELECT q.*
  FROM ( SELECT r.master_id
           FROM srch r
          WHERE ( r.code LIKE 'C10..%' OR r.code LIKE 'R110%' )
          GROUP
             BY r.master_id
         HAVING SUM(CASE WHEN r.code LIKE 'C10..%' THEN 1 ELSE 0 END) > 0
            AND SUM(CASE WHEN r.code LIKE 'R110%'  THEN 1 ELSE 0 END) > 0
       ) q

We should test whether PostgreSQL will run that. If we can't get that to run, there's no point in building on it. Once we confirm that runs, we can add a join to the srch table, get the rows that have a matching master_id with a C10 or R110 code.

SELECT q.master_id
     , s.code
     , s.term
     , s.event_date
  FROM ( SELECT r.master_id
           FROM srch r
          WHERE ( r.code LIKE 'C10..%' OR r.code LIKE 'R110%' )
          GROUP
             BY r.master_id
         HAVING SUM(CASE WHEN r.code LIKE 'C10..%' THEN 1 ELSE 0 END) > 0
            AND SUM(CASE WHEN r.code LIKE 'R110%'  THEN 1 ELSE 0 END) > 0
       ) q
  JOIN srch s
    ON s.master_id = q.master_id
   AND s.code LIKE 'R110%'

We can add a join to the person table, to retrieve the given name and surname by primary key lookup.

  JOIN person p
    ON p.entity_id = s.master_id

add the appropriate column references to the SELECT list. The original queries had DISTINCT keyword. We can add that, or we could add GROUP BY clause. Whichever.

We can reference master_id from either the inline view or the srch table, or even the entity_id from the person table. The join conditions guarantee us they will all be non-null and equal to each other.

And wind up with something like this (desk checked only, not tested):

SELECT s.master_id
     , s.code
     , s.term
     , s.event_date
     , p.surname
     , p.forename
  FROM ( SELECT r.master_id
           FROM srch r
          WHERE ( r.code LIKE 'C10..%' OR r.code LIKE 'R110%' )
          GROUP
             BY r.master_id
         HAVING SUM(CASE WHEN r.code LIKE 'C10..%' THEN 1 ELSE 0 END) > 0
            AND SUM(CASE WHEN r.code LIKE 'R110%'  THEN 1 ELSE 0 END) > 0
       ) q
  JOIN srch s
    ON s.master_id = q.master_id
   AND s.code LIKE 'R110%'
  JOIN person p
    ON p.entity_id = s.master_id
 GROUP
    BY s.master_id
     , s.code
     , s.term
     , s.event_date
     , p.surname
     , p.forename
 ORDER
    BY s.master_id
     , s.code

And, if I haven't made some other doofus mistake again, I expect this will return the result specified by OP. (I've attempted to provide a few comments along the way, about how we went about building the query.)

I'd be interested in finding out how big of smoke ball this one makes.

ANOTHER FOLLOWUP

As an alternative, since we don't need to return the C10 rows, the query could actually be a little simpler. The inline view could just return us master_id values related to the C10 codes, and we can dispense with the HAVING clause with tests on the aggregates. This should return a result equivalent to the one above, perhaps even a little faster:

SELECT s.master_id
     , s.code
     , s.term
     , s.event_date
     , p.surname
     , p.forename
  FROM ( SELECT r.master_id
           FROM srch r
          WHERE r.code LIKE 'C10..%'
          GROUP
             BY r.master_id
       ) q
  JOIN srch s
    ON s.master_id = q.master_id
   AND s.code LIKE 'R110%'
  JOIN person p
    ON p.entity_id = s.master_id
 GROUP
    BY s.master_id
     , s.code
     , s.term
     , s.event_date
     , p.surname
     , p.forename
 ORDER
    BY s.master_id
     , s.code