ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

Replace the ellipsis character in text string

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace the ellipsis character in text string

    We receive PC data files (tab-delimited text) to import items into our system. Sometimes the item descriptions will contain invalid characters (CR, LF, etc.). So I have a procedure written to escape these characters.

    I recently added the ellipsis character to list (constants defined in the program)... looking at the stream file using Notepad++ on my PC in hex, and viewing the same file on the IFS (in hex via wrklnk), it appears to be x'85'. But adding this removes lower case "e". The CCSID on the IFS file is 1252.

    In the EBCDIC table, "e" = x'85'.

    This conversion and CCSID always confuses the heck out of me... is this possible to escape in RPG?

  • #2
    Re: Replace the ellipsis character in text string

    Not familiar with this to provide an answer, but a quick question...have you put it in debug to see the hex value of the field being parsed?

    Comment


    • #3
      Re: Replace the ellipsis character in text string

      yes... but I'm relatively new to RDi debug (1.5 years)... couldn't figure out how to view the variables content in hex.

      Comment


      • #4
        Re: Replace the ellipsis character in text string



        Search on "4.3.10 Adding a memory monitor" in the document.

        I think that's what you're looking for...I am still trying to lobby for RDi here.

        Comment


        • #5
          Re: Replace the ellipsis character in text string

          way too much work compared to the green screen debugger... there has to be an easier way.

          The real issue is that the "ellipsis" character is not a valid EBCDIC character, so how do you "escape" that in RPG?

          Comment


          • #6
            Re: Replace the ellipsis character in text string

            You could use SQL function TRANSLATE to replace unwanted hexadecimal character values with blanks.

            Code:
              // Update Fields to replace any invalid hexadecimal character values
                exec sql
                  Update QTEMP.FILE1
                   Set FIELD1 =
                       Translate(FIELD1, ' ',
                            X'000102030405060708090A0B0C0D0E0F10111213141516171819+
                              1A1B1C1D1E1F202122232425262728292A2B2C2D2E2F30313233+
                              3435363738393A3B3C3D3E3F41424344454647484951525354+
                              565758596263646566676869707172737475767778808A8B8C8D+
                              8E8F909A9B9C9D9E9FA0AAABACADAEAFB0B1B2B3B4B5B6B7B8B9+
                              BABBBCBDBEBFCACBCCCDCECFDADBDCDDDEDFE1EAEBECEDEEEF'
                       );
            Jim

            Comment


            • #7
              Re: Replace the ellipsis character in text string

              Jim... that is basically what I'm doing with an RPG procedure, except I'm removing them altogether. Thanks for the suggestion... didn't know that was an option with SQL

              Comment


              • #8
                Re: Replace the ellipsis character in text string

                Greg, yes, SQL will allow it. You can squeeze out the invalid characters by using an empty space as the 2nd parameter.

                Comment


                • #9
                  Re: Replace the ellipsis character in text string

                  EBCDIC is not just one thing, it is an entire family of character encodings, there are dozens (hundreds?) of tables called "EBCDIC" and each one is different.

                  To deal with this, someone took all of the different character encodings (all of the EBCDIC varieties, ASCII varieties and Unicode varieties) and assigned each one a number. That is what a CCSID is.

                  Hopefully with that description, it's not so confusing?

                  Comment


                  • #10
                    Originally posted by Jim_IT View Post
                    Re: Replace the ellipsis character in text string

                    Greg, yes, SQL will allow it. You can squeeze out the invalid characters by using an empty space as the 2nd parameter.

                    Revisiting this yet again... the SQL function works when replacing characters with a space... but when I remove the space I get the following error (from Run Sql Scripts).

                    Code:
                    Select NOTES,  
                    Translate(NOTES, '',
                                    X'000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F41424344454647484951525354565758596263646566676869707172737475767778808A8B8C8D8E8F909A9B9C9D9E9FA0AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFCACBCCCDCECFDADBDCDDDEDFE1EAEBECEDEEEF')
                    From template.bwtest;
                    I would like to simply remove them.

                    Message: [SQL0171] Argument 02 of function TRANSLATE not valid

                    Last edited by gwilburn; March 12, 2018, 07:48 AM.

                    Comment


                    • #11
                      Originally posted by Jim_IT View Post
                      Re: Replace the ellipsis character in text string

                      You could use SQL function TRANSLATE to replace unwanted hexadecimal character values with blanks.

                      Code:
                       // Update Fields to replace any invalid hexadecimal character values
                      exec sql
                      Update QTEMP.FILE1
                      Set FIELD1 =
                      Translate(FIELD1, ' ',
                      X'000102030405060708090A0B0C0D0E0F10111213141516171819+
                      1A1B1C1D1E1F202122232425262728292A2B2C2D2E2F30313233+
                      3435363738393A3B3C3D3E3F41424344454647484951525354+
                      565758596263646566676869707172737475767778808A8B8C8D+
                      8E8F909A9B9C9D9E9FA0AAABACADAEAFB0B1B2B3B4B5B6B7B8B9+
                      BABBBCBDBEBFCACBCCCDCECFDADBDCDDDEDFE1EAEBECEDEEEF'
                      );
                      Jim
                      FWIW - this is NOT a good list of non-display characters. Some of the hex values above include perfectly good characters (i.e left bracket '[' is x'BA')

                      Comment


                      • #12
                        You could translate all the non-display characters to a random one of the non-display characters, say x'00', and then replace that one character with an zero-length string.

                        You can do this in RPG with %XLATE and %SCANRPL.

                        Code:
                        dcl-c non_display
                        X'000102030405060708090A0B0C0D0E0F10111213141516171819+
                        ...
                        BABBBCBDBEBFCACBCCCDCECFDADBDCDDDEDFE1EAEBECEDEEEF';
                        
                        dcl-s all_x_00(256) inz(*allx'00');
                        dcl-s empty_string varchar(1) inz; // init value is zero-length
                        
                        temp = %xlate(non_display : all_x_00 : string);
                        temp = %scanrpl (x'00' : empty_string : temp);
                        I think it's also possible to do this with the SQL TRANSLATE and REPLACE functions, but I'm not expert in those.

                        Comment


                        • #13
                          Originally posted by Barbara Morris View Post
                          You could translate all the non-display characters to a random one of the non-display characters, say x'00', and then replace that one character with an zero-length string.

                          You can do this in RPG with %XLATE and %SCANRPL.

                          Code:
                          dcl-c non_display
                          X'000102030405060708090A0B0C0D0E0F10111213141516171819+
                          ...
                          BABBBCBDBEBFCACBCCCDCECFDADBDCDDDEDFE1EAEBECEDEEEF';
                          
                          dcl-s all_x_00(256) inz(*allx'00');
                          dcl-s empty_string varchar(1) inz; // init value is zero-length
                          
                          temp = %xlate(non_display : all_x_00 : string);
                          temp = %scanrpl (x'00' : empty_string : temp);
                          I think it's also possible to do this with the SQL TRANSLATE and REPLACE functions, but I'm not expert in those.
                          Barbara - thank you. Never thought of that.

                          I guess my biggest question is what are the "non-display" characters for a 5250 screen? Currently, I'm removing x'00' through x'3F' and x'41'.

                          Comment


                          • #14
                            If a user types ... (3 periods), MS Word (and probably Excel) automatically replaces those 3 periods as a single character ellipsis (unless you disable that autoformat feature). For CCSID 37, that ASCII ellipsis gets imported/pasted as EBCDIC x'15'. That's been my experience anyway. Green screens don't like characters below X'20. Replace the X'15' with another character like a blank or 3 periods.

                            Ringer

                            Comment


                            • #15
                              Originally posted by CRinger400 View Post
                              If a user types ... (3 periods), MS Word (and probably Excel) automatically replaces those 3 periods as a single character ellipsis (unless you disable that autoformat feature). For CCSID 37, that ASCII ellipsis gets imported/pasted as EBCDIC x'15'. That's been my experience anyway. Green screens don't like characters below X'20. Replace the X'15' with another character like a blank or 3 periods.

                              Ringer
                              That's exactly what is happening. These special characters originate from various OE systems outside of our own (i.e. independent sales rep agency software)... users of that software are often using copy/paste to add notes/comments to an order. The data comes to us as Tab-Delimited Text or CSV. My pgms (typically) use CPYFRMIMPF to move them into database files before importing them into our ERP.

                              The ellipsis character is just one example of the garbage we see. That particular character will lock up a workstation because it can't be displayed. I know in ASCII it's x'85'.

                              So by replacing all characters below x'3F' I should eliminate that character along with many others

                              Comment

                              Working...
                              X