ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

Quickest Way To Fetch First Row In A Group With Filter

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Quickest Way To Fetch First Row In A Group With Filter

    Here are "a few" sql statements that can be used as a reference to help explain what I'm trying to accomplish:


    PHP Code:
    -- customer master
    create table mylib
    .custmast (
        
    customerid for column custid numeric(10,0not null with default
        , 
    customername for column custname char(40not null with default
        , 
    customerstatus for column custstatus char(1not null with default 'A'
        
    primary key(custid)
    )
    rcdfmt custmastr;

    insert into mylib.custmast(custidcustnamevalues (1,'Customer 1');
    insert into mylib.custmast(custidcustnamevalues (2,'Customer 2');
    insert into mylib.custmast(custidcustnamevalues (3,'Customer 3');
    insert into mylib.custmast(custidcustnamevalues (4,'Customer 4');
    insert into mylib.custmast(custidcustnamevalues (5,'Customer 5');
    insert into mylib.custmast(custidcustnamevalues (6,'Customer 6');
    insert into mylib.custmast(custidcustnamevalues (7,'Customer 7');
    insert into mylib.custmast(custidcustnamevalues (8,'Customer 8');

    -- 
    customer detail
    create table mylib
    .custdtl (
        
    customerid for column custid numeric(10,0not null with default
        , 
    customerdtltype for column custdtltyp char(20)  not null with default
        , 
    customerdtlseq for column custdtlseq numeric(6,0not null with default
        , 
    customerdtldata for column custdtldta char(80not null with default
        , 
    primary key(custidcustdtltypcustdtlseq)
    )
    rcdfmt custdtlr;

    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (1,'TYPE1',1,'XXXXXXX1');
    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (1,'TYPE1',2,'XXXXXXX2');
    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (1,'TYPE1',3,'XXXXXXX3');
    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (1,'TYPE1',4,'XXXXXXX4');
    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (1,'TYPE2',1,'XXXXXXX1');
    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (1,'TYPE2',2,'XXXXXXX2');
    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (1,'TYPE2',3,'XXXXXXX3');
    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (1,'TYPE3',1,'XXXXXXX1');
    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (1,'TYPE3',2,'XXXXXXX2');

    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (2,'TYPE2',1,'XXXXXXX1');
    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (2,'TYPE2',2,'XXXXXXX2');
    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (2,'TYPE2',3,'XXXXXXX3');
    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (2,'TYPE3',1,'XXXXXXX1');
    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (2,'TYPE3',2,'XXXXXXX2');

    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (3,'TYPE2',1,'XXXXXXX1');
    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (3,'TYPE2',2,'XXXXXXX2');
    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (3,'TYPE2',3,'XXXXXXX3');
    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (3,'TYPE3',1,'XXXXXXX1');
    insert into mylib.custdtl(custidcustdtltypcustdtlseqcustdtldtavalues (3,'TYPE3',2,'XXXXXXX2'); 
    So, without debating the design, imagine customer detail having 2bil + rows. What is the theoretically fastest way to return custdtl rows when at least 1 row has, for example, 'TYPE1'. I only want 1 row per customerid as the result.

    When I run the following query, the result comes back VERY fast:

    PHP Code:
    select a.custidb.custdtltyp
    from 
        mylib
    .custmast a
            join mylib
    .custdtl b
                on a
    .custid b.custid
    where
         a
    .custid 1
         
    and b.custdtltyp in ('TYPE1','TYPE2','TYPE4')
    group by a.custidb.custdtltyp
    order by a
    .custidb.custdtltyp

    Remember, the actual data in production is 2bil+ rows.

    So, when I attempt to join these results over a larger set of custmast records (by not specifying the a.custid=1), the query takes forever. I must not be thinking like the DB2 SQE because in my eyes, I only care about the first matching row between custmast and custdtl. I tried using CTE's to "pre-filtering" the custdtl then joining back to custmast. I tried using row_number() over(partition order by) to get the first row. I'm not sure what else I can do to get SQE to quit scanning when the first row in custdtl is found by custid and custdtltyp.

    I hope this makes sense

  • #2
    I'd start with adding this clause:
    Code:
    fetch first 1 row only

    Comment


    • #3
      Have you tried using Visual Explain to see if it recommends any indexes? You could write the above without table a, does this run better?

      PHP Code:
      select b.custidb.custdtltyp
      from 
           mylib
      .custdtl b
      where
           b
      .custdtltyp in ('TYPE1','TYPE2','TYPE4')
      group by b.custidb.custdtltyp
      order by b
      .custidb.custdtltyp 

      Comment


      • #4
        Did you run your query through visual explain?
        Did you check whether there are any index advices for the query?
        Did you create these indexes?

        You need in either way and index build over CUSTID for the CUSTMAST table and an index built over CUSTDTLTYP and CUSTID for the CUSTDTL table.
        May be an additional EVI built over the CSTDTLTYP will also help to speed up your query.

        Birgitta
        Last edited by B.Hauser; March 27, 2019, 12:16 PM.

        Comment


        • #5
          Originally posted by jtaylor___ View Post
          I'd start with adding this clause:
          Code:
          fetch first 1 row only
          Using my query above as an example, where would you add the FETCH FIRST 1 ROW ONLY? It seems when I add that, the entire query returns 1 result (which is wrong).

          I can say the equivalent of what i'm looking for in RPG is a setll/reade then quit (immediately). If no row found, return null. I don't want SQE to scan every record for each cust, just the first one for that custid.

          Another case in point is I need to know when something is not found (i need null returned if a match is not found). So "where exists....fetch first row" is also questionable to be used as a join.

          Comment


          • #6
            Originally posted by Scott M View Post
            Have you tried using Visual Explain to see if it recommends any indexes? You could write the above without table a, does this run better?

            PHP Code:
            select b.custidb.custdtltyp
            from
            mylib
            .custdtl b
            where
            b
            .custdtltyp in ('TYPE1','TYPE2','TYPE4')
            group by b.custidb.custdtltyp
            order by b
            .custidb.custdtltyp 
            Yes. Visual explain absolutely wanted me to create an index over custdtl on custdtltyp. However, (for reasons not worth explaining), I cannot do that. That being said, the detail file IS ALREADY keyed by custid, custdtltyp. There are 12 custdtltyp's i'm concerned about. I tried utilizing what i call a "loose join" by joining a dummy table ( containing all of my custdtltyp's i'm interested in ) back to the custmast so that every custmast record will be repeated with each type:

            PHP Code:
            with tmp as (
            select a.*, b.dummytype
            from mylib
            .custmast a
                    join mylib
            .dummytbl b on a.custid a.custid
            )
            select tmp.*, c.*
            from tmp left join mylib.custdtl c on tmp.custid c.custid and tmp.dummytype c.custdtltyp
            where c
            .custdtltyp is null 
            This still runs slower than I would expect an RPG program to run. (I hope that made sense...I typed it on-the-fly)

            Comment


            • #7
              I don't see how to edit my above post, so I am continuing it here.


              Here is the scenario i was referring to:



              PHP Code:
              create table mylib.dummytype (
                  
              testtype char(20)  not null with default
                  , 
              primary key(testtype)
              )
              rcdfmt dummytyper;

              insert into mylib.dummytype(testtypevalues ('TYPE1');
              insert into mylib.dummytype(testtypevalues ('TYPE2');
              insert into mylib.dummytype(testtypevalues ('TYPE3');
              insert into mylib.dummytype(testtypevalues ('TYPE5');
              insert into mylib.dummytype(testtypevalues ('TYPE7');

              with tmp as (
              select a.*, b.testtype
              from 
                  mylib
              .custmast a
                      join mylib
              .dummytype b
                          on a
              .custid a.custid
              )
              select 
                  tmp
              .*, c.*
              from 
                  tmp 
                      left join mylib
              .custdtl c
                          on tmp
              .custid c.custid and tmp.testtype c.custdtltyp
              where
                  c
              .custdtltyp is null 
              This still runs A LONG TIME when queried over 2bil records.

              (answering other responses soon)


              Comment


              • #8
                Originally posted by TheZenbudda View Post

                Using my query above as an example, where would you add the FETCH FIRST 1 ROW ONLY? It seems when I add that, the entire query returns 1 result (which is wrong).

                I can say the equivalent of what i'm looking for in RPG is a setll/reade then quit (immediately). If no row found, return null. I don't want SQE to scan every record for each cust, just the first one for that custid.

                Another case in point is I need to know when something is not found (i need null returned if a match is not found). So "where exists....fetch first row" is also questionable to be used as a join.
                Assuming we're talking about the same query, it would go like this:
                Code:
                select a.custid, b.custdtltyp
                from
                    mylib.custmast a
                        join mylib.custdtl b
                            on a.custid = b.custid
                where
                     a.custid = 1
                     and b.custdtltyp in ('TYPE1','TYPE2','TYPE4')
                group by a.custid, b.custdtltyp
                order by a.custid, b.custdtltyp
                [COLOR=#FF0000]fetch first 1 row only[/COLOR]
                ;

                Comment


                • #9
                  Originally posted by TheZenbudda View Post

                  Using my query above as an example, where would you add the FETCH FIRST 1 ROW ONLY? It seems when I add that, the entire query returns 1 result (which is wrong).

                  I can say the equivalent of what i'm looking for in RPG is a setll/reade then quit (immediately). If no row found, return null. I don't want SQE to scan every record for each cust, just the first one for that custid.

                  Another case in point is I need to know when something is not found (i need null returned if a match is not found). So "where exists....fetch first row" is also questionable to be used as a join.
                  Ok - I'm still confused - the thread asks the quickest way to get the first row - which is 'FETCH FIRST 1 ROW ONLY'... I do realize though that it's often to ask a full question in a summary statement like that.

                  So - if I understand what you are saying you want...
                  1.) Verify that there is at least one record for each customer that has a corresponding detail....
                  --->>> The Join handles that as you won't get any records from custmast that doesn't have at least one record in CUSTDTL.
                  2.) You want to only have the first custdtl record for each given customer in custmast? Or just for a specified customer - ie custid=1 in the example given?

                  If it's taking a great deal of time to return a result, I suspect you need at least one index built - most likely over custdtl. Have you looked at the index advisor on these two tables? Especially after you do this query once?

                  Comment


                  • #10
                    Originally posted by TheZenbudda View Post

                    Yes. Visual explain absolutely wanted me to create an index over custdtl on custdtltyp. However, (for reasons not worth explaining), I cannot do that. That being said, the detail file IS ALREADY keyed by custid, custdtltyp.
                    The database engine is telling you how to speed it up, if you can't take its advice then it can't be the most efficient. Don't know if you have the capability to copy the data to another library to do a proof of concept, then maybe you would have ammunition for being able to create an index.

                    Comment


                    • TheZenbudda
                      TheZenbudda commented
                      Editing a comment
                      God has spoken. No indexes. God has demanded that I get creative and solve the problem without an index. God has spoken.

                  • #11
                    Originally posted by jtaylor___ View Post

                    Assuming we're talking about the same query, it would go like this:
                    Code:
                    select a.custid, b.custdtltyp
                    from
                    mylib.custmast a
                    join mylib.custdtl b
                    on a.custid = b.custid
                    where
                    a.custid = 1
                    and b.custdtltyp in ('TYPE1','TYPE2','TYPE4')
                    group by a.custid, b.custdtltyp
                    order by a.custid, b.custdtltyp
                    [COLOR=#FF0000]fetch first 1 row only[/COLOR]
                    ;
                    Ok, now remove a.cust = 1. It will only return 1 row. I was looking for a way to get SQE to return the 1st matching row in custdtltyp. I need all rows in custmast that don't have certain custdtltyp's (remembering custdtl has 2bil rows).

                    And..unfortunately..no indexes can be created.




                    Comment


                    • #12
                      Originally posted by Rocky View Post

                      Ok - I'm still confused - the thread asks the quickest way to get the first row - which is 'FETCH FIRST 1 ROW ONLY'... I do realize though that it's often to ask a full question in a summary statement like that.

                      So - if I understand what you are saying you want...
                      1.) Verify that there is at least one record for each customer that has a corresponding detail....
                      --->>> The Join handles that as you won't get any records from custmast that doesn't have at least one record in CUSTDTL.
                      2.) You want to only have the first custdtl record for each given customer in custmast? Or just for a specified customer - ie custid=1 in the example given?

                      If it's taking a great deal of time to return a result, I suspect you need at least one index built - most likely over custdtl. Have you looked at the index advisor on these two tables? Especially after you do this query once?
                      Yeah, cannot build an index over this file. after enough index advisaries, DB2 will create an MTI (which is good enough..even though it goes away after an IPL).

                      Comment


                      • #13
                        Originally posted by Rocky View Post

                        Ok - I'm still confused - the thread asks the quickest way to get the first row - which is 'FETCH FIRST 1 ROW ONLY'... I do realize though that it's often to ask a full question in a summary statement like that.

                        So - if I understand what you are saying you want...
                        1.) Verify that there is at least one record for each customer that has a corresponding detail....
                        --->>> The Join handles that as you won't get any records from custmast that doesn't have at least one record in CUSTDTL.
                        2.) You want to only have the first custdtl record for each given customer in custmast? Or just for a specified customer - ie custid=1 in the example given?

                        Ughhhh! I cannot edit my posts so I cannot go back and add to my previous response. Sorry.

                        Anyway, to answer your 2 response:

                        1. No. I need a left join so that I can detect the absence of the types. These are the records I need to see.
                        2. Almost. I want only SQE to examine 1 record per custid and custdtltyp.

                        Comment


                        • #14
                          Ok! I knew fetch first 1 row only was something i've been looking for, but i couldn't make it work in the joins in the main query (or even CTE queries). Then it dawned on me. Use "fetch first 1 row only" in a select subquery:

                          PHP Code:
                          with tmp as (
                              
                          select 
                                  a
                          .custida.custnamea.custstatus
                                  ( 
                                      
                          select b.custdtltyp
                                      from mylib
                          .custdtl b
                                      where 
                                          a
                          .custid b.custid
                                          
                          and b.custdtltyp in ('TYPE1','TYPE2','TYPE5','TYPE6')
                                      
                          fetch first 1 row only    
                                  
                          ) as type
                              from 
                                  mylib
                          .custmast a
                              where 
                                  a
                          .custstatus 'A'
                          )
                          select *
                          from tmp
                          where type is null

                          This forces SQE to attempt to retrieve only 1 row per type that has matching keys with custmast. Very much like doing a setll/reade then quit reading the detail file. I completely forgot that you can use a subquery in a select statement to accomplish this task (freeing up the complications behind using a join in the FROM clause of the main query).

                          Comment


                          • #15
                            Alternatively...

                            If all you're after is customers that don't have a corresponding detail records of a list of types....

                            Code:
                            SELECT A.CUSTID, A.CUSTNAME, A.CUSTSTATUS FROM CUSTMAST A EXCEPTION JOIN CUSTDTL B ON (A.CUSTID=B.CUSTID) AND (B.CUSTDTLTYP IN (''TYPE1','TYPE2','TYPE5','TYPE6'))
                            This will return data from custmast that DO NOT have any corresponding records in CUSTDTL.

                            Comment


                            • TheZenbudda
                              TheZenbudda commented
                              Editing a comment
                              Rocky, thanks for your response and sorry for the delay in getting back with you.

                              I think you found the "shortcut" method I was looking for. You are specifying the exception join but in the join condition you are specifying the "filter" in the join-on clause. When I did my exception join, I coded my filter in the where clause, which of course, would end up returning 0 rows. The solution I presented above still works, but seems a lot more cumbersome.
                          Working...
                          X