Here are "a few" sql statements that can be used as a reference to help explain what I'm trying to accomplish:
So, without debating the design, imagine customer detail having 2bil + rows. What is the theoretically fastest way to return custdtl rows when at least 1 row has, for example, 'TYPE1'. I only want 1 row per customerid as the result.
When I run the following query, the result comes back VERY fast:
Remember, the actual data in production is 2bil+ rows.
So, when I attempt to join these results over a larger set of custmast records (by not specifying the a.custid=1), the query takes forever. I must not be thinking like the DB2 SQE because in my eyes, I only care about the first matching row between custmast and custdtl. I tried using CTE's to "pre-filtering" the custdtl then joining back to custmast. I tried using row_number() over(partition order by) to get the first row. I'm not sure what else I can do to get SQE to quit scanning when the first row in custdtl is found by custid and custdtltyp.
I hope this makes sense
PHP Code:
-- customer master
create table mylib.custmast (
customerid for column custid numeric(10,0) not null with default
, customername for column custname char(40) not null with default
, customerstatus for column custstatus char(1) not null with default 'A'
, primary key(custid)
)
rcdfmt custmastr;
insert into mylib.custmast(custid, custname) values (1,'Customer 1');
insert into mylib.custmast(custid, custname) values (2,'Customer 2');
insert into mylib.custmast(custid, custname) values (3,'Customer 3');
insert into mylib.custmast(custid, custname) values (4,'Customer 4');
insert into mylib.custmast(custid, custname) values (5,'Customer 5');
insert into mylib.custmast(custid, custname) values (6,'Customer 6');
insert into mylib.custmast(custid, custname) values (7,'Customer 7');
insert into mylib.custmast(custid, custname) values (8,'Customer 8');
-- customer detail
create table mylib.custdtl (
customerid for column custid numeric(10,0) not null with default
, customerdtltype for column custdtltyp char(20) not null with default
, customerdtlseq for column custdtlseq numeric(6,0) not null with default
, customerdtldata for column custdtldta char(80) not null with default
, primary key(custid, custdtltyp, custdtlseq)
)
rcdfmt custdtlr;
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (1,'TYPE1',1,'XXXXXXX1');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (1,'TYPE1',2,'XXXXXXX2');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (1,'TYPE1',3,'XXXXXXX3');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (1,'TYPE1',4,'XXXXXXX4');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (1,'TYPE2',1,'XXXXXXX1');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (1,'TYPE2',2,'XXXXXXX2');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (1,'TYPE2',3,'XXXXXXX3');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (1,'TYPE3',1,'XXXXXXX1');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (1,'TYPE3',2,'XXXXXXX2');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (2,'TYPE2',1,'XXXXXXX1');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (2,'TYPE2',2,'XXXXXXX2');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (2,'TYPE2',3,'XXXXXXX3');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (2,'TYPE3',1,'XXXXXXX1');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (2,'TYPE3',2,'XXXXXXX2');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (3,'TYPE2',1,'XXXXXXX1');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (3,'TYPE2',2,'XXXXXXX2');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (3,'TYPE2',3,'XXXXXXX3');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (3,'TYPE3',1,'XXXXXXX1');
insert into mylib.custdtl(custid, custdtltyp, custdtlseq, custdtldta) values (3,'TYPE3',2,'XXXXXXX2');
When I run the following query, the result comes back VERY fast:
PHP Code:
select a.custid, b.custdtltyp
from
mylib.custmast a
join mylib.custdtl b
on a.custid = b.custid
where
a.custid = 1
and b.custdtltyp in ('TYPE1','TYPE2','TYPE4')
group by a.custid, b.custdtltyp
order by a.custid, b.custdtltyp
;
So, when I attempt to join these results over a larger set of custmast records (by not specifying the a.custid=1), the query takes forever. I must not be thinking like the DB2 SQE because in my eyes, I only care about the first matching row between custmast and custdtl. I tried using CTE's to "pre-filtering" the custdtl then joining back to custmast. I tried using row_number() over(partition order by) to get the first row. I'm not sure what else I can do to get SQE to quit scanning when the first row in custdtl is found by custid and custdtltyp.
I hope this makes sense
Comment