ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

Vendor requires file in UTF-8 encoding of file created in QTEMP

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Vendor requires file in UTF-8 encoding of file created in QTEMP

    Hi,
    We are sending a customer address list to a vendor via sFTP. They have requested UTF-8 encoding. What is the easiest way to accomplish this?
    We are on V7R3. The RPG program extracts data to a long 200 byte alpha string formatted to the vendor's requirements.
    There is some data that has special characters, such as the Polish city of Rzeszow. The letter "o" is actually Latin small letter O with acute. It is showing for the vendor as "xF3". On the iSeries it is hex "CE".
    I tried a few different things, like creating a stream file with CCSID of 1208 on the IFS, or setting the definition of the field to CCSID(*UTF8). But no matter what, the vendor says it is not UTF-8, but rather is ISO 8859-15.
    Any suggestions or questions are welcome.
    Thanks

  • #2
    How are you creating the file and writing the data to it?

    Comment


    • #3
      Hi Scott,
      Thanks for replying, I know you are an expert in this area! The file was created in the CL and is not externally described:
      CrtPF FILE(QTEMP/Address) RcdLen(200) Size(1000000)
      There is a CCSID option on the CrtPF command, but 01208 was not an option.

      Comment


      • #4
        Creating a physical file in QTEMP is a strange thing to do if your goal is to have a UTF-8 file in the IFS. Can you explain why you are doing this? Is it needed for other processes?

        After you've created this file, you must be doing something to convert it to a UTF-8 file in the IFS so that you can send it with SFTP. Can you explain how this is being done?

        Comment


        • #5
          Sorry I was not clear...getting it into UTF-8 is the requirement of the vendor, but the program was not designed or written with that as a goal. We found out only after the first test files were created and sent, when the vendor complained about the encoding.
          I know I may have to rewrite some of this to make it work. Just hoping for suggestions about the most straightforward way to get it done. Basically, we have to create extracts of our iSeries database files, format the data, and deliver it in UTF-8 format. I don't need to send it to the IFS, that was just one thing I tried that didn't work. How do you usually approach this?
          Thanks for your patience and time!

          Comment


          • #6
            To get UTF-8 data into a flat file, you could create the file with CCSID(*HEX) and put UTF-8 data into it.

            Comment


            • #7
              Couldn't you do something like this?

              CRTPF FILE(QTEMP/XX) RCDLEN(200)
              CPYTOIMPF FROMFILE(XX) TOSTMF('/tmp/xx.txt') STMFCCSID(1208) RCDDLM(*CRLF)

              You don't tell us if the vendor expects an IFS file or an as/400 file.

              regards
              Peder

              Comment


              • #8
                Yeah, we really need more information to help you. We need to know the character set of the data after it has been written to the file, and how you know that was done correctly. Then, we need to know how you are converting the file once more to make it compatible with sftp, or whether you are trying to directly copy the data from the PF (which seems unlikely?) its very hard to help you when you don't provide complete information.

                Comment


                • #9
                  Hi,
                  Thanks all for your responses!
                  We are sending an address list to our vendor. They have never even heard of an iSeries/AS400. I'm unsure of what kind of system they have.

                  Each field has to be wrapped in double-quotes, and comma-separated with no padding or extra spaces. If the field is null, then a single comma should be sent. There is a line-feed at the end. The file needs to be called "address.csv". The encoding must be UTF-8. There is a header row, also wrapped in double-quotes, comma-separated. I won't list every field, but this is how it should look:

                  "Customer","Address 1","Address 2","City","State"
                  "ABC","1 Main Street","Floor 3","NewYork","New York"
                  "XYZ","2 Main Street",,"New York","New York"

                  I tried Barbara's suggestion, but it did not seem to work. Peder's suggestion was one approach I took previously that also did not work.
                  Right now the program is set the way Barbara suggested. The file in QTEMP has a CCSID is 65536. I defined a field in the D-spec:

                  D @D01 s 200 CCSID(*UTF8)

                  I read each record from our iSeries database, trim it and add the quotes and concatenate it all into @D01. Then use an EXCPT statement to move it into the file.

                  Then it gets Ftp'd to our network, as "address.csv". I'm starting to wonder if the FTP needs to be done in binary?

                  When I open it in Notepad, I can do a "Save As", and I can see it is already prefilled with "ANSI", but even changing this to UTF-8 doesn't work. They told me I am sending them ISO 8859-15.
                  I am sending them other files as well, but only this file, which has some special characters for foreign cities, seems to cause an error on their system. They have to remove the special characters.

                  Happy to answer any further questions, and again, thanks for the time.

                  Comment


                  • #10
                    IMHO, an easier method is to create a file with fields in it rather than a flat file. It should have fields named "Customer", "Address1", "address2", "city" and "state". Then use CPYTOIMPF to create the CSV, it will handle adding the quotes, inserting the commas, putting the CRLF at the end. You can control the CCSID, so you can m make it generate the CSV file in UTF-8.

                    If you need greater control over the format, don't use a flat file. Create a stream file directly, this gives you absolute control, so you can set it up perfectly.

                    You previously said you were using sftp to transfer the file. This last message says FTP, which is a very different tool. With FTP, you will definitely need to use binary mode. Creating the file as UTF-8 but then telling FTP to translate it with ASCII mode really defeats the purpose of making it UTF-8 to begin with. This would be a non-issue with sftp, which doesn't support anything else aside from binary mode.

                    With regards to Notepad -- I'm not sure what this has to do with anything? But, if you want Notepad to detect the file as UTF-8 you need to have a Byte Order Mark (BOM) at the start of the file. This is the only way Notepad knows that it is UTF-8 vs. ASCII. This may not be the case with the actual application you're running this with, however, and some people frown upon the use of a BOM with UTF-8. You should find out from your vendor with a BOM is desired, here, or not. If you wish to use a BOM, you'll probably have to create the stream file -- I don't think it's possible to add a BOM another way. (Or, at least, I don't know how...)

                    Comment


                    • #11
                      Thanks Scott! On my first pass at this, I had exactly what you described, which was an externally described file, using CPYTOIMPF. But it couldn't handle their null requirement. When an address was blank, it was sending "". So from the example above it sent:

                      "XYZ","2 Main Street","","New York","New York"
                      instead of
                      "XYZ","2 Main Street",,"New York","New York"

                      That's why I switched to a flat file. I've never worked with a stream file before, so I'll have to research that.

                      As far as the FTP, I apologize. There are 2 FTP jobs. The first moves it from the iSeries to our MFT (managed file transfer) server. That is done using FTP. The actual file transfer to the vendor is sFTP.

                      When I tried changing the FTP from the iSeries to binary, the file on the network was all hex characters, not readable.

                      I only mentioned Notepad because that was the only software I have that allows "Save As" to a specific file encoding. Is there another way to look at a file and see what the encoding is?

                      Comment


                      • #12
                        Generally speaking, the computer doesn't know what the encoding of a file is. It only knows that it is a series of bytes.

                        Which byte values get written to the file is determined by the application. So a particular program might write the correct bytes for UTF-8, or some flavor of EBCDIC, or some flavor of ASCII to the file based on how that particular application has been written. Likewise, different applications might interpret it based on how the interpreting application has been written.

                        IBM i (please don't call it iSeries again) has a method of storing a CCSID in the descriptor of the file. This feature doesn't exist on most other computer platforms. But, you can store the CCSID of the file at the object-level, which is handy because now the OS knows how the file was encoded. However, its important to understand that this is just a descriptive label that is set by an application. If the application sets it wrong (or it is left to the user, who may not know what they are doing) then the label might not match the actual contents. As an analogy... my mother used to make jars of strawberry jam. Once she accidentally put a label on the jar that said "raspberry". Nobody could tell that it was wrong until it was eaten. Likewise, if you write the bytes for an ISO-8859-15 file and label it with the CCSID for UTF-8, that does not actually make it a UTF-8 file. Someone trying to view it using the CCSID placed on the file descriptor might view it incorrectly.

                        So the best solution, if you don't already know what the encoding is, is NOT to simply trust the CCSID label. Instead, take a look at the hex values of the characters and compare them against a chart showing what they should be for a given encoding.

                        With regards to getting "all hex characters" after FTPing in binary mode, this likely means that the file was created as EBCDIC and you're trying to view it as UTF-8 (or maybe ASCII) which will certainly not work. Create the file as UTF-8 to begin with, and transfer THAT in binary mode and it should work.

                        Comment


                        • #13
                          Thanks, Scott. When you say "create the file as UTF-8 to begin with", I think that's where I keep getting stuck. When I do the CRTPF, UTF8 is not a CCSID option. If I try anything other than *HEX, I get the error:
                          Message . . . . : CCSID value not allowed with FILETYPE(*DATA).
                          Cause . . . . . : If FILETYPE(*DATA) is specified, the only value allowed on
                          the CCSID parameter is *HEX or 65535.
                          Are you suggesting abandoning the CRTPF altogether?

                          Comment


                          • #14
                            I already suggested two alternatives for creating the file as UTF-8

                            1. Create a PF. then copy it with CPYTOIMPF (or CPYTOSTMF if you insist on doing it yourself with a flat file)
                            2. Create the stream file directly.

                            Transfer the output of one of those in binary mode.

                            Comment


                            • #15
                              Thanks, Scott. I will research the direct stream. I couldn't use the CPYTOIMPF because of the nulls. Have a great weekend!

                              Comment

                              Working...
                              X