Tuesday, June 18, 2013

A PL/SQL String Tokenizer


I was surprised that I couldn't find a simple string function like a tokenizer in PL/SQL.  I used DBMS_UTILITY.NAME_TOKENIZE for a while, but found that it limits the tokens to the maximum length of an Oracle name which is 32 characters.  The following PL/SQL is a string tokenizer that doesn't have this limitation.

To use the function, cut-and-paste the function 'tokenize_string' into your PL/SQL.  You will also need to grab the type 'list_t' which is used in the return clause to communicate the parsed string back to the caller.  Increase the hardcoded string limit of 50, used in the type definition and the constant C_BUF_SIZE, if needed.

DECLARE

  TYPE list_t IS TABLE OF VARCHAR2(50) INDEX BY PLS_INTEGER;
  
  v_list_outer list_t; -- used in the test case
  
  FUNCTION tokenize_string( p_string VARCHAR2, 
                            p_delimiter VARCHAR2 DEFAULT ',' ) 
     RETURN list_t
  AS
    C_BUF_SIZE CONSTANT NUMBER := 50;    
    i NUMBER := 1; -- string char position counter
    j NUMBER := 1;  -- counter of list elements
    v_buf VARCHAR2(50);
    v_list list_t;
  BEGIN
    -- until the end of the string or buffer exceeded
    LOOP
        IF SUBSTR(p_string, i, 1) = p_delimiter THEN
            v_list(j) := v_buf;  -- save the accrued chars
            j := j + 1;
            v_buf := ''; -- reset the buffer
        ELSIF SUBSTR(p_string, i, 1) IS NULL THEN
            v_list(j) := v_buf;  -- grab the last item
            EXIT;
        ELSE
            v_buf := v_buf || SUBSTR(p_string, i, 1);    
        END IF;
        
        IF i >= (C_BUF_SIZE) THEN        
            RAISE_APPLICATION_ERROR(20000, 'limit string to ' || 
                C_BUF_SIZE || ' 50 chars');
        END IF;

        i := i + 1;    
    END LOOP;
    RETURN v_list;
  END tokenize_string;
  
BEGIN
    
    v_list_outer := tokenize_string(NULL);
    
    v_list_outer := tokenize_string('');
    
    v_list_outer := tokenize_string('coke,pepsi,7-up');
    
    DBMS_OUTPUT.PUT_LINE('*** outputting results');
    FOR k IN v_list_outer.FIRST .. v_list_outer.LAST
    LOOP
        DBMS_OUTPUT.PUT_LINE( '    ' || v_list_outer(k) );
    END LOOP;
    
    BEGIN
      v_list_outer := 
tokenize_string('123456789012345678901234567890123456789012345678901');
    EXCEPTION
        WHEN OTHERS THEN
            DBMS_OUTPUT.PUT_LINE('*** handled overrun with exc');
    END;
END;  

I tested for a NULL string, empty string, and maximum buffer size.  If the maximum buffer size is exceeded, an application exception is thrown.

No comments: