ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

Massive QZDSASONIT jobs being closed

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Massive QZDSASONIT jobs being closed

    Hello people. For about 1 month we have been seeing the following random behavior. The QZDASONIT (the job that handles remote connections to the iseries) beging closing all jobs without an apparent error.

    This is what gets shown at DSPLOG (just a snip, there are WAY many more):

    Code:
    [...]
    Trabajo 344542/QUSER/QZDASOINIT finalizado el 29/08/11 a las 14:39:42;
    Trabajo 350212/QUSER/QZDASOINIT finalizado el 29/08/11 a las 14:39:42;
    Trabajo 349765/QUSER/QZDASOINIT finalizado el 29/08/11 a las 14:39:43;
    Trabajo 349498/QUSER/QZDASOINIT finalizado el 29/08/11 a las 14:39:43;
    Trabajo 347698/QUSER/QZDASOINIT finalizado el 29/08/11 a las 14:39:43;
    Trabajo 350321/QUSER/QZDASOINIT finalizado el 29/08/11 a las 14:39:43;
    Trabajo 347542/QUSER/QZDASOINIT finalizado el 29/08/11 a las 14:39:43;
    Trabajo 350220/QUSER/QZDASOINIT finalizado el 29/08/11 a las 14:39:44;
    Trabajo 348716/QUSER/QZDASOINIT finalizado el 29/08/11 a las 14:39:44;
    Trabajo 347337/QUSER/QZDASOINIT finalizado el 29/08/11 a las 14:39:45;
    Trabajo 346273/QUSER/QZDASOINIT finalizado el 29/08/11 a las 14:39:45;
    Trabajo 345156/QUSER/QZDASOINIT finalizado el 29/08/11 a las 14:39:46;
    Trabajo 348535/QUSER/QZDASOINIT finalizado el 29/08/11 a las 14:39:46;
    Trabajo 345171/QUSER/QZDASOINIT finalizado el 29/08/11 a las 14:39:46;
    Trabajo 347451/QUSER/QZDASOINIT finalizado el 29/08/11 a las 14:39:47;
    Trabajo 350210/QUSER/QZDASOINIT finalizado el 29/08/11 a las 14:39:47;
    Trabajo 345169/QUSER/QZDASOINIT finalizado el 29/08/11 a las 14:39:47;
    [...]
    When you press F1 on any one of them, it says:

    Code:
    ID de mensaje  . . . . :   CPF1164       Gravedad . . . . . . . :   00      
    Tipo de mensaje  . . . :   Terminación                                      
    Fecha envío  . . . . . :   29/08/11      Hora envío . . . . . . :   14:39:42
                                                                                
    Mensaje . . . . :   Trabajo 344542/QUSER/QZDASOINIT finalizado el 29/08/11 a
      las 14:39:42; se utilizaron 1 segundos; código de finalización 0          
                                                                                
                                                X.                              
    Causa . . . . . :   El trabajo 344542/QUSER/QZDASOINIT se ha completado el  
      29/08/11 a las 14:39:42 después de utilizar un tiempo de unidad de proceso
      de 1 segundos. El trabajo tuvo el código de finalización 0. El trabajo    
      finalizó después de 1 pasos de direccionamiento con un código de          
      finalización secundario 0.  Los códigos de finalización de trabajo y sus  
      significados son los siguientes:                                          
         0 - El trabajo se completó normalmente.
    Basically it says that the job ended on code 0 (ended succesfully) and lasted 1 second. What is that X doing so far away from the "finalizacion 0"? I have seen that X on all logs.

    Is there something going on, or are there too many users therefore the system is closing connections? Is there something i can check / do? Thanks in advance

    EDIT: forgot to say that the system gets awfully slow when this happens and our users start calling us. It happens at a random hour each 1-3 days

  • #2
    Re: Massive QZDSASONIT jobs being closed

    Other stuff i have checked:

    QSERVER has *NOMAX maximum jobs on the subsystem.
    QZDAINIT (NOT QZDASOINIT, i don't know why i can't find that one) has 1 initial job, 3 additional jobs, *NOMAX maximum jobs and 1 max job reuse.

    I don't know anything else because i am not the admin, just a programmer.

    Comment


    • #3
      Re: Massive QZDSASONIT jobs being closed

      This is still happening to us, each 1 or 2 days. QUSER jobs show a * ROLLBACK message and then an EOJ message.

      The reason displayed is something related error communication server recv() - length but we don't have an idea of how to diagnostic this issue

      Comment


      • #4
        Re: Massive QZDSASONIT jobs being closed

        fjleon,

        We are experiencing the same problem. Were you ever able to figure out what was causing the problem and/or find a resolution?

        Stan

        Comment


        • #5
          Re: Massive QZDSASONIT jobs being closed

          we still have the issue and we think it's related to a security program installed on the iseries called BSAFE. We also suspect an asp.net web app that is on a different subnet may not close the connections properly.

          Comment


          • #6
            Re: Massive QZDSASONIT jobs being closed

            I have used BSafe before and it did not cause this on our box, not saying that it might be configured differently.
            Hunting down the future ms. Ex DeadManWalks. *certain restrictions apply

            Comment


            • #7
              Re: Massive QZDSASONIT jobs being closed

              well if you manage to fix it i would like to see what was the solution since we don't know and even tech supports from IBM haven't found the problem.

              Comment


              • #8
                Re: Massive QZDSASONIT jobs being closed

                fjleon,
                I?m sorry for not responding sooner. I can only relay our observations and the steps we took to correct the problem.

                Observations:
                1. As you mentioned earlier, the issue came to our attention when the system would start to slow down dramatically for no apparent reason. This happened once a month, on the first business day of the month, for three consecutive months. We thought that we had some kind of rouge program in our end of month job stream that was killing the system.
                2. By reviewing the system history log, we also found that a large number QZDASOINIT jobs were ending at the same time.
                3. Found a QZDASOINIT job log with following messages:
                A connection with a remote socket was reset by that socket.
                Host server communications error occurred on recv() ? length.
                Error running database host server. Error code 2.
                4. Opened PMR with IBM and sent performance data captured with IBM Performance Tools for i5/OS to IBM for analysis.
                5. IBM analysis reveals a major chunk of seize contention at the time of the slowdown, but does not reveal what job is the holder of the jobs.

                Steps taken to resolve problem:
                1. IBM recommended we apply the latest cum/hyper/Performance ptfs. Then we setup a Job Watcher trace to capture more detail information. Instructions were also given to setup a PEX for High CPU and Busy Drives (not to be run at the same time).
                2. Started monitoring the increase in QZDASOINIT jobs throughout the day. We started the day at 150-200 and ended the day with 2500-3000 jobs.
                3. We run our daily backups at 22:00. I reviewed the QZDASOINIT jobs that were in the system prior to the backup starting. Looking at the files that were open, found that a vast majority of the jobs had an entry to the same stored procedure (QSYS2/SYSROUTINE rrn). This procedure was called from an ASP.NET program that is used by all of our intranet applications to authenticate the user and determine level of access.
                4. Review of the ASP.Net program revealed that three separate connections were being opened each time the program was called. Also, in most instances, the logic to close the connections was not being executed.
                5. Corrected ASP.Net program logic and number of QZDASOINIT jobs reduced to 18-20. System performance increased markedly.

                We have not experienced any system slowdowns since the program correction was put in place. We were never able to identify the root cause of the QZDASOINIT jobs ending at the same time. We put the program correction in place prior to the Job Watcher trace capturing an ?event? (term we used for system slowdown).
                fjleon, your remark about the ASP.NET program is what made us look hard at the QZDASOINIT files.

                Hope this helps.
                Stan

                Comment


                • #9
                  Re: Massive QZDSASONIT jobs being closed

                  I am so glad my post was able to help you find the solution! Now, i am very interested to see what changes you made to your ASP.NET code. We have a "design bug" in that we open a connection to the iseries server on every visitor that comes to our main company webpage. A bad idea, but this was working properly for almost 9 months before our problem began.

                  We run code on Application_Start (global.asax) that connects to the iseries database. We close() and set null the connection object on the end.
                  On Session_Start we also open a connection, and when the session ends we close, dispose and null it. The issue is that even when doing this most of the time the QZDASOINIT job stays up.

                  We are using OS 5.4 on the iseries and currently there are more than 300 QZDASOINIT jobs running! We are using asp.net 4 on visual studio 2010 and have the latest ibm client access DLL.

                  Some sample code:

                  Application start:
                  Code:
                  System.Collections.Generic.Dictionary<int, string> dicSucursales = new System.Collections.Generic.Dictionary<int, string>();
                          //CESP07        
                          cmd = new IBM.Data.DB2.iSeries.iDB2Command("consulta_ctraltab", System.Data.CommandType.StoredProcedure, As400Conexion);
                          drCtraltab = cmd.ExecuteReader();
                          while (drCtraltab.Read())
                              dicSucursales.Add(drCtraltab.GetiDB2Integer(0), drCtraltab.GetiDB2VarChar(1));
                          drCtraltab.NextResult();
                          Application.Add("Sucursales", dicSucursales);
                          dicSucursales = null;
                  
                          drCtraltab.Close();
                          drCtraltab = null;
                          
                          As400Conexion.Close();   
                          As400Conexion = null;
                  Session start:
                  Code:
                  Session["financiamientoexitoso"] = false;
                          Session.Add("As400Conexion", null);
                          Session.Add("FinanciamientoCliente", null);
                          IBM.Data.DB2.iSeries.iDB2Connection As400Conexion = new IBM.Data.DB2.iSeries.iDB2Connection(ConfigurationManager.ConnectionStrings["iSeries"].ConnectionString);
                          As400Conexion.Open();
                          Session["As400Conexion"] = As400Conexion;
                  Session end:
                  Code:
                  // or SQLServer, the event is not raised.
                          try
                          {
                              IBM.Data.DB2.iSeries.iDB2Connection As400Conexion;
                              As400Conexion = (IBM.Data.DB2.iSeries.iDB2Connection)Session["As400Conexion"];
                              if (As400Conexion != null)
                              {
                                  As400Conexion.Close();
                                  As400Conexion.Dispose();
                                  As400Conexion = null;
                              }            
                              Session["As400Conexion"] = null;
                          }
                          catch (Exception)
                          {
                             
                          }
                  Some code from one webpage:

                  Code:
                  if (!Page.IsPostBack)
                  {
                              As400Conexion = Utilitario.ChequearConexion((iDB2Connection)Session["As400Conexion"]);
                  }
                  That utility class has this:
                  Code:
                  public static iDB2Connection ChequearConexion(iDB2Connection As400Conexion)
                      {
                          if (As400Conexion == null || As400Conexion.State != ConnectionState.Open)
                          {
                              if (As400Conexion == null)
                              {
                                  As400Conexion = new iDB2Connection(ConfigurationManager.ConnectionStrings["iSeries"].ConnectionString);
                                  As400Conexion.Open();
                              }
                              if (As400Conexion.State != ConnectionState.Open)
                              {
                                  As400Conexion.Dispose();
                                  As400Conexion = new iDB2Connection(ConfigurationManager.ConnectionStrings["iSeries"].ConnectionString);
                                  As400Conexion.Open();
                              }
                          }
                          return As400Conexion;
                      }
                  Most of the heavy queries are done with stored procedures right on the iseries server.

                  Comment

                  Working...
                  X