ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

Using the read() API with VARCHAR

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using the read() API with VARCHAR

    I'm trying to use the read() API to load the entire contents of a stream file into a variable defined a varchar(2000000). It works fine if I define the variable as char(2000000).

    Basically, I'm missing the first 4 bytes of data when I pass the address of the varchar variable. I know this has something to do with the first (2 or 4) containing the size.. but I don't know how to account for it. I've tried several methods to no avail.
    Code:
    [SIZE=2][COLOR=#ff0000][SIZE=2][COLOR=#ff0000]dcl-s[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2] StmfData [/SIZE][SIZE=2][COLOR=#800000][SIZE=2][COLOR=#800000]varchar[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]([/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#0000ff][SIZE=2][COLOR=#0000ff]2000000[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]);[/COLOR][/SIZE][/COLOR][/SIZE]
    
    [SIZE=2]len [/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]=[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2] read[/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]([/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2]fd[/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]:[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#0000ff][SIZE=2][COLOR=#0000ff]%addr[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]([/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2]StmfData[/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]):[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#0000ff][SIZE=2][COLOR=#0000ff]%size[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]([/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2]StmfData[/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080])+[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#0000ff][SIZE=2][COLOR=#0000ff]4[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]);[/COLOR][/SIZE][/COLOR][/SIZE]
    [SIZE=2]len [/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]=[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2] read[/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]([/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2]fd[/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]:[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#0000ff][SIZE=2][COLOR=#0000ff]%addr[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]([/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2]StmfData:*data[/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]):[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#0000ff][SIZE=2][COLOR=#0000ff]%size[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]([/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2]StmfData[/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080])[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]);[/COLOR][/SIZE][/COLOR][/SIZE]
    [SIZE=2]len [/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]=[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2] read[/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]([/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2]fd[/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]:[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#0000ff][SIZE=2][COLOR=#0000ff]%addr[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]([/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2]StmfData[/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]):[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#0000ff][SIZE=2][COLOR=#0000ff]%size[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]([/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2]StmfData[/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080])[/COLOR][/SIZE][/COLOR][/SIZE][SIZE=2][COLOR=#800080][SIZE=2][COLOR=#800080]);[/COLOR][/SIZE][/COLOR][/SIZE]
    None of these work. The first and last option produce the data missing the first 4 characters. The middle option produces no data.

    I need the stream file data in a variable to call Qc3CalculateHash to calculate the MD5 hash of the document.

    I thought about just defining it as a character variable.... but it's so large..

  • #2
    Your two examples are dangerous. You have told it that your field is 4 bytes longer than it actually is, so the system can overwrite areas of memory that come after your variable. That is unsafe. Your last example, at least, is not dangerous -- though it writes data into the length portion of the field, not just the data portion, so won't work the way you want.

    The proper approach requires 3 steps:

    1) Set the field to it's maximum length. If you don't do this, you'll only get the data that fit into the variable at the length it was set prior to the read() call.
    2) Call read() with %addr(*data) and %len(*max)
    3) Use the len returned from read() with the %len() BIF to set the new length.

    Code:
    %len(stmfData) = %len(stmfData:*MAX);
    len = read(fd: %addr(stmfData:*data): %len(stmfData:*MAX));
    if len < 0;
       len = 0;
    endif;
    %len(stmfData) = len;
    I'd wrap that logic in a subprocedure so you don't have to repeat it every time you read the file.

    FYI: On a VARCHAR field with a 4-byte length, %addr(stmfData:*data) is equivalent to %addr(StmfData)+4, and %len(stmfData:*MAX) is equivalent to %size(stmfData)-4. If the length is 2 bytes, it's the same thing, but +/- 2. On other data types like UCS-2 or Graphic the %size() will be twice as large (since %size is bytes, %len is characters) plus or minus the length. Using %addr(*data) and %len(*max) figures that out for you, making your code more understandable and less prone to mistakes. But, before we had those features, you'll see examples where people manually add/subtract the length.

    Comment


    • #3
      Thanks... I knew one of my examples was likely writing to memory that it shouldn't (so i didn't do it other than to debug). Right now it's working with a fixed-length field of 1 MB. The file I'm reading should never approach that size.

      So my question is whether I should use VARCHAR at all? If I set the field length to it's max before using it, do I still get the performance gain of VARCHAR versus CHAR?

      Comment


      • #4
        If you don't use VARCHAR, then you'll get basically the same performance by using %SUBST every time you use the CHAR version of the field. But having to use %SUBST all the time would be inconvenient and possibly error-prone.

        So before changing to use CHAR, I would do a bit of performance testing to see if setting the full VARCHAR to blanks is really an issue.

        Try the program below. Call it with increasing values, say 1000, 10000, 100000, until it reports that it took at least one second.

        On my machine, calling this program with 10000 iterations took a bit less than a second, meaning it took about .0001 seconds for each setting of the VARCHAR field to all blanks. For me, that would be too small to make my code ugly and error-prone by using %SUBST everywhere, although I guess it would depend on how many times the value was actually going to be used.

        Code:
                ctl-opt dftactgrp(*no);
                dcl-pi *n;
                   iters_parm packed(15:5) const;
                end-pi;
                dcl-s iters int(10);
                dcl-s i int(10);
                dcl-s fld varchar(2000000);
                dcl-s t_start timestamp;
                dcl-s t_end timestamp;
                dcl-s seconds packed(5 : 2);
                iters = iters_parm;
                t_start = %timestamp();
                for i = 1 to iters;
                   fld = *blanks;
                endfor;
                t_end = %timestamp();
                seconds = %diff(t_end : t_start : *ms) / 1000000;
                dsply ('That took ' + %char(seconds) + ' seconds');
                return;

        Comment


        • #5
          Originally posted by gwilburn View Post
          So my question is whether I should use VARCHAR at all? If I set the field length to it's max before using it, do I still get the performance gain of VARCHAR versus CHAR?
          Well, you don't keep it at max, do you? In my post, I set it to max before the read, and set it to the proper length after the read so that it ended up with the proper length.

          Assuming your program utilizes the string after the part where you set the length to the proper length, you should get a performance benefit.

          An even bigger advantage to using varchar is just that it makes your code so much simpler and more elegant, not worrying about all the %TRIMs and other junk that goes with fixed-length strings.

          Comment


          • #6
            Good info!

            No, I do not keep it at the max length. I later pass it (as a pointer) to other subprocedures that translate it to ASCII and Calculate the MD5 hash.

            I really appreciate the explanation.

            Comment

            Working...
            X