ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

Missing records at 4:30 AM

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Well Red,

    What is the status of your investigation?
    Right now we have given you a lot of ideas where and how to search but given the vague
    description of your system we are reaching the limit of how to locate the cause of the problem.

    I fully understand that you don't want to publish it - seen from a security point of view and also from company rules.

    But it would be interesting to hear what you have found out so far.
    And when you find it, I would be very interested to hear what was the cause of the problem.
    One might even learn something new.

    Regards
    Peder

    Comment


    • #17
      Peder,
      It has not been resolved, hence no update.
      Nothing has worked that has been suggested either here or internally from my colleagues or externally from friends.

      Very peculiar, comes and goes; no errors, no details, nothing. Just doesn't work all the time.
      Very simple process.

      Red.
      PS: I usually give thanks and praise upon a solution.
      Everyday's a school day, what grade are you in?

      Comment


      • #18
        Maybe you could supply some more details?
        Are you retrieving the records via an RPG read or via SQL or something else (opnqryf etc etc)?

        Comment


        • #19
          Perhaps modify the process to do some additional logging. E.g. the first thing it could do is copy ALL of the source data to another table, so you can verify that it was/was not all present.

          You could also journal the source tables so you can see if/when the source records were added/updated?

          Comment


          • #20
            Gentlemen,
            I greatly appreciate your time, patience and possibilities presented and certainly do not want to alienate anyone but;
            There is no commitment control involved that I've added.
            There is no SQL involved.
            There are no file locks due to back ups or anyone being on the system except for the one user running this.
            It's only RPGLE (RPGIV).
            It's a simple process:
            A master control file determines clients to be used and begins with ...
            1. program "A" is called - it checks for required records.
            2. if no records exist, program "B"is called - it reads a client master file to get the clients selected in the control file.
            3. for each client, items are selected from a master item file and checked for certain values - if they pass scrutiny, they are selected and added to a work file.
            4. when program "B" ends, the records in the work file are retrieved by program "A" and loaded to a sub file.

            This is where the intermittent issue rears its head....
            Not all the required clients from the control file have been selected - there is nothing that I can see that could prevent this situation.
            As I stated, it's an intermittent issue.

            I've exhausted all obvious possibilities and came to you.
            Never have I encountered an issue like this in over 18 years of coding.

            This will be my last update until it happens again as it hasn't happened since I initiated the original post.
            I really thought the user was doing something strange; pressing an F key at the wrong time or something else but; until the user sees the sub file load, the entire process is out of their reach.

            Red.
            Everyday's a school day, what grade are you in?

            Comment


            • #21
              Good luck Red.
              I hope you find the cause and look forward to hear about it.

              Remember the quote from Sherlock Holmes:
              How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth?

              But gremlins in the system are always a pain in the ....

              Regards
              Peder

              Comment


              • #22
                Just to round off the problem.

                Have you checked the indexes used? Do they point to the right file in the right library?
                You know IndexA and IndexB should have pointed to LIB1/FILE but IndexB points to FALSELIB/FILE.

                Another participant in this forum mentioned something about sequence of the received data.
                Could there be a difference between the "4:30" run and the following runs?
                You should check the sequence of the data - perhaps a specification that keyed sequence should be used
                in the F-specification in the program is missing.

                Check also for a missing SETLL. Or false initialization of the key.

                Not to mention - do the programs end with *INLR set on?

                Regards
                Peder

                Comment


                • john.sev99
                  john.sev99 commented
                  Editing a comment
                  It was me who mentioned the return order simply because we have had this issue recently after installing some group PTFs.
                  The issue initially came from CPYTOIMPF where intermittently the records would not be returned in arrival sequence. There is now an ORDERBY parameter in this command which was not there in earlier OSs so our programs weren't using it. It was intermittent as changing a record (change, add, delete) would nearly always correct the problem. The cause is that IBM are changing SQL to be more standard and SQL doesn't guarantee the return order of records unless you specifically tell it. At any rate, our quick fix was the change the default for that parameter to be *ARRIVAL until our programs are changed to use this new parameter. This problem also affects a simple RUNQRY QRYFILE(x/x) as query is using the SQE engine now. The same intermittent records returned in a strange order issue. Unfortunately, there isn't a fix for this as query offers no means to sort records by rrn/arrival sequence. This will also affect OPNQRYF. Although IBM admitted not being able to specify arrival sequence was an issue, their response was simply "working as designed" and "raise an RFE" (https://www.ibm.com/developerworks/r...e&CR_ID=131314).
                  Last edited by john.sev99; September 12, 2019, 02:52 PM.

              • #23
                Originally posted by redvan View Post
                My program requires some basic data input; client# & date.
                Is there a date format issue? MM/DD/YY as opposed to DD/MM/YY? Maybe the date associated with a particular client is not as expected.

                Another possibility, has a program ended with LR off? It sounds like a file has been left open, perhaps by whatever program populates the files that your user's program only retrieves partial data from. Maybe the program your user is experiencing the problem with isn't the problem, the problem is with the program(s) populating the files his program reads. Check programs which have recently changed.

                Comment


                • #24
                  Do you have something like sharing open data paths?
                  OVRDBF myFile SHARE(*YES)

                  This terrible thing was very popular 20+ years ago but the problem was
                  that many programmers didn't know what they were dealing with and
                  how they should program in order to implement this in a proper way.

                  Comment


                  • #25
                    maybe some additional suggestions :
                    - is program B closed with an *inlr = *on as peter udensen suggested ?
                    - when writing to the workfile in program B ... can u define the workfile with block(*no) so there is no buffering ?
                    - maybe u can use the QUSRMBRD-API to read the number of records in the workfile :
                    * in program A before calling program B
                    * in program A after calling program B
                    * in program B just before returing
                    and write that output with timestamp to a logging-file
                    - put some trigger-on-insert-and-delete on the workfile to see when a record is added (timestamp) and write it to a logging file too

                    or ... just wake up at 4.30 and let's do some debugging ... but i would prefer the more sleepfull-solution

                    Comment


                    • #26
                      Ok,
                      Explain why it doesn't work at 4:30 but does at 8:00 AM, 2;00 PM, a half hour later and beyond throughout the day and evening.

                      Same user, same workstation, same program.....

                      Read my posts regarding program flow and what the process doesn't contain.

                      Red
                      Everyday's a school day, what grade are you in?

                      Comment


                      • #27
                        Here's the thing red. There is nothing previously wrong with your process as you have described it to us. There could be a number of subtle reasons why. But without having all your source code in front of us to analyse, we can't see subtle things like that. All we can do is suggest possibilities, or suggest ways to debug.

                        If there's nothing obvious, then maybe something you assume must be working, isn't working. So test that everything is working.

                        For example, you say:
                        • "There are no file locks due to back ups or anyone being on the system except for the one user running this."
                        • "Not all the required clients from the control file have been selected - there is nothing that I can see that could prevent this situation"

                        So I figured, let's prove that is the case. So I suggested that you add journalling to that control file, so you have a log of all changes, so you can prove the records are present. Then we know that the table content is fine.


                        Also, something just occurred to me. If I understand properly your program B has a loop that is reading through clients from the control file, but sometimes it is not finding all the records? What happens if there is an error on a specific read? Does it go into MSGW, log an error and continue, ignore the error and skip to the next record, etc?

                        Comment


                        • Vectorspace
                          Vectorspace commented
                          Editing a comment
                          "There is nothing previously wrong" - I meant to say "There is nothing obviously wrong"

                      • #28
                        i have been reading your program flow in your previous post(s)

                        Code:
                        A master control file determines clients to be used and begins with ...
                        1. program "A" is called - it checks for required records.
                        2. if no records exist, program "B"is called - it reads a client master file to get the clients selected in the control file.
                        3. for each client, items are selected from a master item file and checked for certain values - if they pass scrutiny, they are selected and added to a work file.
                        4. when program "B" ends, the records in the work file are retrieved by program "A" and loaded to a sub file.
                        i can't guarantee that no other process is involved here too which manipulates the data of the masterfile of workfile

                        so that's why i suggested you to :
                        - add some triggers (to the master-file and the work file) to see which program makes the insert/update/delete
                        - add some logging in the programs A and B (to see which steps are taken) :
                        * is program B called from program A ?
                        * what is number of records of the master-file/work file in every step ?

                        maybe u can even use some logging with a little program C (SQLRPGLE)
                        (if the masterfile is too big just add a where-statement to only select the necessary clients that are gone be copied)

                        Code:
                        d PROGRAM_C       pi                
                        d  pprog                         1
                        d  pstep                         2 0
                        
                        exsr sr_start;
                        exsr sr_master;
                        exsr sr_work;
                        exsr sr_end;
                        
                        begsr sr_master;
                          exec sql
                            insert into yourlib/logfile
                            select :ppog, :psep, 'MASTER', index_from_master, current timestamp
                            from datalib/masterfile;
                        endsr;
                        
                        begsr sr_work;
                          exec sql
                            insert into yourlib/logfile
                            select :pprog, :pstep, 'WORK', index_from_master, current timestamp
                            from datalib/masterfile;
                        endsr;
                        
                        begsr sr_start;
                          exec sql
                            insert into yourlib/logfile
                            select :pprog, :pstep, 'START', '', current timestamp
                            from datalib/masterfile;
                        endsr;
                        
                        begsr sr_end;
                          exec sql
                            insert into yourlib/logfile
                            select :pprog, :pstep, 'END', '', current timestamp
                            from datalib/masterfile;
                        endsr;
                        in both program A and B add some logging with this program

                        program A
                        Code:
                        program_C('A':1);
                        
                        setll key lworkfile;
                        if not %equal;
                          program_C('A':2);
                        
                          program_B();
                        endif;
                        
                        program_C('A': 3);
                        
                        exsr fill_dspf;
                        
                        *inlr = *on;
                        return;
                        program B
                        Code:
                        program_C('B':1);
                        
                        exsr fill_work;
                        
                        program_C('B':2);
                        
                        *inlr = *on;
                        return;
                        if u combine that with those suggested triggers we can be a step further in analysing
                        than we can base our analysis on real content instead of guessing what might be going on ...

                        you keep us up-to-date ?
                        Last edited by pet0etie; September 16, 2019, 06:34 AM.

                        Comment


                        • #29
                          You could do one thing.
                          Try to change the force ratio to 1 for the files that are changed in your programs.
                          Especially the master control file. As I understand your description this file is essential in the program flow
                          and one of the first files to be used.

                          Be aware that changing this value might have an impact on performance. Remember to write down the original values.
                          CHGPF FILE(MYFILE) FRCRATIO(1)

                          But if changing it fixes the problem then the cause is buffering. We have been writing about this earlier.

                          Comment


                          • #30
                            Originally posted by Peder Udesen View Post
                            You could do one thing.
                            Try to change the force ratio to 1 for the files that are changed in your programs.
                            Especially the master control file. As I understand your description this file is essential in the program flow
                            and one of the first files to be used.

                            Be aware that changing this value might have an impact on performance. Remember to write down the original values.
                            CHGPF FILE(MYFILE) FRCRATIO(1)

                            But if changing it fixes the problem then the cause is buffering. We have been writing about this earlier.
                            or if this is simply the only process use just FEOD in your programs to prevent performance hits on other processes hitting that master file
                            I'm not anti-social, I just don't like people -Tommy Holden

                            Comment

                            Working...
                            X