ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

Deleted records setll reade

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deleted records setll reade

    I feel like I have read this before on here but I cannot seem to find it. We have an older file that has 50 million records in it and 10 million deleted records, we are going to get this cleaned up. We also have a program that does a setll and reade (both have keys) on it twice for each customer and then writes out a record for each customer. This was performing horribly, I did a copy file to a test library to get rid of the deleted records and ran the program and it performed well.

    The question is I thought if you used a key on setll and reade it would skip all the deleted records, am I incorrect on this?

  • #2
    I've not heard that. Tables with lots of deleted rows can perform poorly. If you change the table to REUSEDLT(*YES), that may help in the future after the reorg. And only do this if you don't use OVRDBF EOFDLY (which is not commonly used).

    Comment


    • #3
      AFAIK keyed access will skip the deleted rows while unkeyed access will not.

      Birgitta

      Comment


      • #4
        Originally posted by jj_dahlheimer View Post
        The question is I thought if you used a key on setll and reade it would skip all the deleted records, am I incorrect on this?
        I believe you are correct... the deleted records will be 'skipped'. Somewhere in the back of my brain my head is telling me that when a program deletes a record, it fills the record with X'FF', which naturally moves that record to the end of a group of records when the file is read in keyed order.

        I'm thinking that where your performance problem comes from is when the program looks for key 'ABC123', those records are scattered throughout the 50 million records and the program reads relative record number 10,000 then RRN 1,000,493, then 48,279,483, then 26,485... and so on and on. Therefore, when the file is copied to a different library and has been reorganized, and your program looks for key "ABC123', all of those records could have been moved to consecutive relative record numbers - which would greatly improve the access time for each group of records.

        Of course, this explanation would be perfectly true, depending on key 'ABC123' being the primary key of the basic file itself. Although, you would still see a possible performance increase just because the file has been reorganized, and all the records (and logical indexes) have been cleaned up even if 'ABC123' is not the primary key.

        (Edit: My simplistic explanation does not take into account 'record blocking'... in that my example above, your first read not only loads RRN 10,000 into the file buffer, it might also load relative records 10,001 through 11,000. So, under the covers the machine is also reading 1,000 'extra' records that your program does not use for each READE your loop processes.)

        Best Regards,

        Fred Williams

        Comment


        • #5
          Either way we are going to get the deleted records cleaned up, I was just surprised at the performance difference. There are no logicals involved it is just a keyed physical file. Running of the file with the deleted records it only got through about 6k customer records and without the deleted records it got through about 1.3 million. Thanks.

          Comment


          • #6
            Even if a READx statement "skips" a deleted record and the physical file is in key-sequence, blocks of records are likely to contain deleted record 'gaps'. So transfers of physical blocks will be more often than needed. The physical "reads" are often more important for performance than logical ones, and they definitely take longer than moving to a different position in a large I/O buffer.
            Tom

            There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors.

            Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth?

            Comment


            • #7
              I second Fred's explanation. If you used the CPYF command to copy the file to the test library and didn't specify FROMRCD(1), then not only did the deleted records get removed but the records were copied in keyed sequence so the resulting test file was in keyed order. If you want to try the test again to isolate the performance difference of just removing the deleted records, then use the FROMRCD(1) parameter on the CPYF command and it will copy the records in the physical sequence of the production file without the deleted records.

              Comment


              • #8
                Bingo, it was not the deleted records. I didn't understand Freds explanation until I tested and re read his response a few times. So is a re org the only way to get those keys back and grouped together?

                Comment


                • #9
                  It's not the only way, but it's the most practical (it does it in one step instead of multiple steps). To get the records in physical sequence that matches the key, you have to use the KEYFILE(*FILE) parameter on the RGZPFM command, otherwise the deleted records are removed but the physical sequence of the data is not changed.

                  Originally posted by Whitecat27 View Post
                  I believe you are correct... the deleted records will be 'skipped'. Somewhere in the back of my brain my head is telling me that when a program deletes a record, it fills the record with X'FF', which naturally moves that record to the end of a group of records when the file is read in keyed order.
                  Actually, the contents of a deleted record are unchanged, it's just an internal flag that is set on, otherwise some utilities that can "undelete" a record would not work. All of the access paths are updated to remove the pointers to that record.

                  Comment


                  • #10
                    Originally posted by Brian Rusch View Post
                    Actually, the contents of a deleted record are unchanged, it's just an internal flag that is set on, otherwise some utilities that can "undelete" a record would not work. All of the access paths are updated to remove the pointers to that record.
                    Brian, thanks for the reminder... way back in my S/34 and (early) S/36 days the deleted records were filled with X''FF', however, that is no longer true, which is why we can undelete records now-a-days... whilst posting, I'd forgotten that point.

                    Speaking of olde tyme stuff... If y'all remember the olde tyme Command 'KEYSORT (filename)'... this is because our venerable (now iSeries) machines kept the physical data on the disk drives wherever it landed. IBM kept a separate area on the drive(s) for the location of the actual records on the disk in keyed order. So, what ended up happening was than when a KEYSORT command happened, that key area of the disk was updated and changed to allow all the keys (i.e. 'ABC123') to appear to be grouped together. Many people actually had their nightly procedures KEYSORT all their main production files.... I can only imagine the confusion and complaints that would happen now-a-days if we all processed the main production files through CPYF each night.

                    (Ahhh yes, the good old days) LOL!

                    So... Brian's explanation of using CPYF, and it's end results, is really very, very similar to KEYSORT... with the difference being that CPYF moves the actual records, where KEYSORT only changed the area of the disk that held the file keys... or something like that anyway.


                    Best Regards,

                    Fred Williams

                    Comment

                    Working...
                    X