ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

How to Remove HTML tags from a string

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to Remove HTML tags from a string

    Hi guys, I know how to remove or replace some words from a string, but how to remove the whole html tags from a long string ?

    Thank you!

  • #2
    Assuming you want to do it in RPG I'm not aware of any canned options for doing it.
    Basically you would search for a '<' and copy the substring from start to that point to the output. Then ignore everything until the next '>' and set the new start point to the character after the '>'. Rinse and repeat.
    Be sure to use a Varchar field to receive the output.
    You'll need to decide what to replace the tags with of course.

    Comment


    • lxb0007
      lxb0007 commented
      Editing a comment
      I am new to rpg, can I use regular expression . like i = %scan('</[^a-zA-Z]/>':stringname); ?

    • JonBoy
      JonBoy commented
      Editing a comment
      You cannot do what you are asking as such. You can use regex within RPG but not within the %Scan. If you google for using the regex APIs in RPG you will find a number of examples.
      You could also use regex within SQL and embed that in the RPG.
      That said, the task you asked about is pretty trivial to code in any language and knowledge of regex within the RPG community is somewhat limited so by using it you may be creating a bit of a maintenance nightmare.

  • #3
    RPG does not support regular expresssions directly but SQL supports regular expressions, so you may be able to use embedded SQL with the REGEXP_REPLACE function.
    But I assume it is really difficult to define the pattern so all HTML characters can be removed.

    Birgitta

    Comment


    • lxb0007
      lxb0007 commented
      Editing a comment
      Thanks! Birgitta

  • #4
    Originally posted by JonBoy View Post
    Assuming you want to do it in RPG I'm not aware of any canned options for doing it.
    Basically you would search for a '<' and copy the substring from start to that point to the output. Then ignore everything until the next '>' and set the new start point to the character after the '>'. Rinse and repeat.
    Be sure to use a Varchar field to receive the output.
    You'll need to decide what to replace the tags with of course.
    Thank you JonB

    Comment


    • #5
      Originally posted by JonBoy View Post
      Assuming you want to do it in RPG I'm not aware of any canned options for doing it.
      Basically you would search for a '<' and copy the substring from start to that point to the output. Then ignore everything until the next '>' and set the new start point to the character after the '>'. Rinse and repeat.
      Be sure to use a Varchar field to receive the output.
      You'll need to decide what to replace the tags with of course.

      This is my code. Thank you again
      begsr DltHtml;
      | Indx = 1;
      | Mglen = %len(%trim(MSG));
      | HtmlTag = *blanks;
      | dow Indx <= Mglen;
      | StrP = 0;
      | EndP = 0;
      | clear HtmlTag;
      | StrP = %scan('<':MSG);
      | EndP = %scan('>':MSG);
      | if (StrP > 0 AND EndP > 0);
      | Hlength = EndP - StrP + 1;
      | HtmlTag = %subst(MSG:i:Hlength);
      | MSG =%scanrpl('HtmlTag':'':MSG);
      | Indx = EndP;
      | Mglen = %len(%trim(MSG));
      | else;
      | leave;
      | ENDIF;
      | enddo;
      | MSG = %trim(MSG);
      endsr;

      Comment


      • #6
        What about something like this?

        Code:
        <script>
             var x = document.getElementById("whatever");
             var y = Number(x);
             if (y < 12) {
                 // do something interesting
             }
        </script>
        Seems to me that the > sign in the JavaScript code could cause problems.

        I'm curious about some of the less obvious things in your code. I haven't tried to run it or figure out all of the ways the code could work, but some things look suspicious to me.

        For example, you have this:
        Code:
          Mglen = %len(%trim(MSG));
        Can you explain why you are doing %trim, above? It seems to me that this would cause problems.

        Take this example:
        Code:
        MSG = '                    <p>Hi</p><br>';
        That's 20 spaces followed by 13 characters of HTML. MgLen would be 13 because the %TRIM removes the spaces. The %scan('>') would find position 23. It'd remove the <p>, but then exit the loop because Indx is going to be larger than 13, wouldn't it? It seems to me that you shouldn't be using a %TRIM within %LEN.

        Another suspicious bit of code is this:

        Code:
        MSG =%scanrpl('HtmlTag':'':MSG);
        This removes the literal string 'HtmlTag' from the MSG variable everywhere that it occurs. That isn't what you wanted to do, is it?!

        Also, why is this a subroutine? Wouldn't this be better as a subprocedure that can be called from any program?

        Comment


        • #7
          Originally posted by Scott Klement View Post
          What about something like this?

          Code:
          <script>
          var x = document.getElementById("whatever");
          var y = Number(x);
          if (y < 12) {
          // do something interesting
          }
          </script>
          Seems to me that the > sign in the JavaScript code could cause problems.

          I'm curious about some of the less obvious things in your code. I haven't tried to run it or figure out all of the ways the code could work, but some things look suspicious to me.

          For example, you have this:
          Code:
          Mglen = %len(%trim(MSG));
          Can you explain why you are doing %trim, above? It seems to me that this would cause problems.

          Take this example:
          Code:
          MSG = ' <p>Hi</p><br>';
          That's 20 spaces followed by 13 characters of HTML. MgLen would be 13 because the %TRIM removes the spaces. The %scan('>') would find position 23. It'd remove the <p>, but then exit the loop because Indx is going to be larger than 13, wouldn't it? It seems to me that you shouldn't be using a %TRIM within %LEN.

          Another suspicious bit of code is this:

          Code:
          MSG =%scanrpl('HtmlTag':'':MSG);
          This removes the literal string 'HtmlTag' from the MSG variable everywhere that it occurs. That isn't what you wanted to do, is it?!

          Also, why is this a subroutine? Wouldn't this be better as a subprocedure that can be called from any program?
          Thank you! Scott, you are right. I should skip %Trim part, and trim it out of loop in the end. and also I should
          Code:
          MSG =%scanrpl(HtmlTag:'':MSG);
          ^-^

          Comment


          • #8
            If calling a PHP script is an option, use the strip_tags( ) function.

            Ringer

            Comment

            Working...
            X