<?xml version="1.0" encoding="iso-8859-1" standalone="no" ?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "file:///usr/share/sgml/docbook/dtd/xml/4.1.2/docbookx.dtd">

<!-- $Id: runtime.xml,v 1.3 2002/08/22 09:42:32 jhein Exp $ -->

<article>
<title>Calculate Import/Export runtimes</title>
<articleinfo>
<author><firstname>Jochen</firstname> <surname>Hein</surname></author>
    <copyright>
      <year>2002</year>
      <holder>Jochen Hein</holder>
    </copyright>
    <legalnotice>
      <para>This program is released under the GNU General Public License.</para>
    </legalnotice>
    <releaseinfo>$Id: runtime.xml,v 1.3 2002/08/22 09:42:32 jhein Exp $</releaseinfo>
</articleinfo>

<section>
<title>Import/Export runtimes</title>

<para>When you do an OS/DB migration with the SAP tools (<command>R3SETUP</command>
and internally <command>R3load</command>), you can easily get runtimes longer
than a day.  If you happen to test the migration, you can analyze 
the export and import logs with this small tool.  You get a runtime
for each table class, so you know which <command>R3load</command> process
must be started first (the longest running processes), so that you
get the shortest runtime possible.  For information how this has
to be done, see SAP note xxx.</para>

<para>Analyzing the logs manually is cumbersome and, especially if
you had to restart the import a couple of times, not easy.  This
script reads all the logfiles, stores all filenames, start and endtimes 
in an CSV-file.  You can import the file in Excel and you are ready.  To generate
the CSV-file, cd into you export or import directory and call:</para>

<programlisting>
awk -f runtime.awk SAP*.log > runtime.csv
</programlisting>

<para>What we have done is reading all the <command>R3load</command> logs, stored
the start and end time of each run (including restarts).  Store
the output in a file <filename>runtime.csv</filename> and import
that in Excel.  All you have to do is formating the computed
columns as date/time.  If you have restarted the import, you 
can add all the partial runtimes into a sum and use that
for your export and import planning.</para>

<para>If you have any questions or comments, feel free to contact me
at <email>jochen@jochen.org</email>.  You may find an updated version
      on <ulink url="http://www.jochen.org/"/>.</para>

</section>

<section>
<title>The code</title>

<para>The code has been written in AWK, a Unix scripting language,
that is well suited for maniputating text files.  As you will see,
nothing magic hase been done here.</para>

<para>The main code of the program is structured as follows:

<programlisting>
<?lp-file id="mainlisting" file="runtime.awk"?>
<?lp-options preserve-newlines="no"?>
<?lp-section-id?>Main Listing<?lp-section-id-end?> 
<?lp-code?>
#!/usr/bin/awk -f
# $Id: runtime.xml,v 1.3 2002/08/22 09:42:32 jhein Exp $
# This program is released under the GNU General Public License
# You may find updates under http://www.jochen.org/
<?lp-ref?>Convert Column in Excel column<?lp-ref-end?> 
<?lp-ref?>Compute the runtime by Excel formula<?lp-ref-end?> 
<?lp-ref?>Initialize variables<?lp-ref-end?> 
<?lp-ref?>Format date and time for Excel<?lp-ref-end?> 
<?lp-ref?>Store a start time for this run<?lp-ref-end?> 
<?lp-ref?>Store the end time for this run<?lp-ref-end?> 
<?lp-ref?>finally, end the file<?lp-ref-end?> 

<?lp-code-end?>
<?lp-options preserve-newlines="yes"?>
</programlisting>
</para>


<!-- #!/usr/bin/nawk -f
#
# Aufruf: nawk -f runtime.awk SAP*.log > runtime.csv
# -->

<para>The <command>R3load</command> log files contain the date and time in an easily
sorted format.  Excel (and other spread sheets) import these as
Numbers, which is not quite what we want.  We convert a given
date string (e.g. 20020731152423) and mange it into an ISO like
format (e.g. 2002-07-31 15:24:23).  When Excel is reading that from
an CSV-file it does exactly what we want.</para>

<programlisting>
<?lp-section-id?>Format date and time for Excel<?lp-section-id-end?> =
<?lp-code?>
# convert the date and time for Excel
function format_date(dat)
{
  return substr(dat,1,4) "-" substr(dat,5,2)  "-" substr(dat,7,2) " " \
         substr(dat,9,2) ":" substr(dat,11,2) ":" substr(dat,13,2);
}
<?lp-code-end?>
</programlisting>

<para>Just to be sure that the above code does what we want it to
do, we'll add some testcases and examples here.  This gives us finally
a testsuite to avoid regressions.</para>

<programlisting>
<?lp-section-id?>Format date and time for Excel<?lp-section-id-end?> =
<?lp-code?>
function check_format_date()
{
}

<?lp-code-end?>
</programlisting>

<para>Excel has no numbers for columns, but uses letters.  So we
have to convert the column number we want to reference into the
Excel format.  So column "1" is "A", "26" is "Z", "27" is "AA" and so on.
This function works for small numbers of columns, and might be rewritten
in the future.  On the other hand, if you restarted the import a lot
of times, your runtime calculation will be screwed anyway, so better
get you import running in one go.</para>

    <para>The algorithm is as follows...</para>
<programlisting>
<?lp-section-id?>Convert Column in Excel column<?lp-section-id-end?> =
<?lp-code?>
# Calculate the column name
function column(col)
{
  col = col - 1;
  rem = col % 26;
  div = col / 26;

  if ( div >= 1) {
    ret = sprintf("%c",64+div);
  }

  ret2 = sprintf("%c", 65+rem);
  return ret ret2
}
<?lp-code-end?>
</programlisting>

<para>Again, to be sure, we need to create some test cases.</para>

<programlisting>
<?lp-section-id?>testcases for column<?lp-section-id-end?> =
<?lp-code?>
function check_column(col, expect)
{
  result = column(col);
  if ( result != expect ) {
    printf "Error: col %s, exp %s, result %s\n", col, expect, result;
  }
}

function check_columns()
{
  check_column(1, "A");
  check_column(2, "B");
  check_column(26, "Z");
  check_column(27, "AA");
  check_column(28, "AB");
}

function check_format()
{
}

function check_regression()
{
  check_columns();
  check_format();
}

<?lp-code-end?>
</programlisting>

<programlisting>
<?lp-section-id?>Compute the runtime by Excel formula<?lp-section-id-end?> =
<?lp-code?>
# fill Excel cell with the runtime formula (end time - start time)
function compute_runtime(run)
{
  return "=" column(run*3) filenum+1 "-" column(run*3-1) filenum+1;
}
<?lp-code-end?>
</programlisting>

<para>Before anything else, we initialize some variables that
help us to track files, restarts and that we finally found
a start and end time (still, when you manually changed the logs
or some weired crashed happend we will be screwed anyway).</para>

<programlisting>
<?lp-section-id?>Initialize variables<?lp-section-id-end?> =
<?lp-code?>
BEGIN { 
      if ( regression_test == 1 ) { check_regression(); exit; }
      ok=0; run=0; file = ""; filenum=0; start=0; }
<?lp-code-end?>
</programlisting>

<para>When we find a start time in the log, we store that as a potential
start time.  This may be a correct start time, but that can only
be checked later.  If we find a new start time, but no end time,
we only use the latest time.  This may invalidate all computations,
but it is the best we can do.  Remember, if the <command>R3load</command>
crashed, you have other problems to worry about.</para>

<para>If we started to read a new log file, we finish the last
line (with CR/LF to get a DOS file).  We add one to the count
of files, so we can generate the right references for Excel
formulas later on.  We store the new filename so we can repeat
the check for the next logfile...</para>


<programlisting>
<?lp-section-id?>Store a start time for this run<?lp-section-id-end?> =
<?lp-code?>
/^#START:/ {
        if ( file != FILENAME ) {
                run = 0;
                filenum = filenum + 1;
                printf "\r\n\"%s\";", FILENAME;
                file = FILENAME;
        }
        ok = 1;
        maybe_start = format_date($2);
        };
<?lp-code-end?>
</programlisting>

<para>Finally, we may find the end time of a <command>R3load</command> run.  
This is only valid for computations, if we had a start time earlier (if
we have one, the variable ok hold a "1".  Ok, we write the latest
start time (stored on maybe_start), a ";", the formatted end time
and the Excel formula to compute the runtime for this file and restart.
maybe_start is initialized, so we start anew.</para>

<para>The detection of start and end time in the logs are not bullet-proof,
but work for runs with no restarts and simple restarts pretty well.  If
your logfiles are beyonf repair, retry your export or import to get
usful times.</para>

<programlisting>
<?lp-section-id?>Store the end time for this run<?lp-section-id-end?> =
<?lp-code?>
/^#STOP:/ {
        run = run + 1;
        if ( ok == 1 ) {
                ok = 0;
                printf "\"%s\";", maybe_start; maybe_start = "";
                printf "\"%s\";\"%s\";", format_date($2), compute_runtime(run); };
        };
<?lp-code-end?>
</programlisting>

<para>The program is finished, only on thing remains:  The CSV file
must be ended with a newline, so Excel is not overly confused.  Happy hacking.</para>

<programlisting>
<?lp-section-id?>finally, end the file<?lp-section-id-end?> =
<?lp-code?>

END { print "\r\n"; }
<?lp-code-end?>
</programlisting>


</section>

</article>

