as i noted in the first article in this little series, one of the really great things about visual foxpro is that there are usually several different ways of doing the same thing. unfortunately, this is also one of the worst things about it because, at least if there is only one way of doing something, you don’t have to think too much about it. one of the things that i notice whenever i am working on code (my own or someone else’s) is how easily we developers fall into patterns of doing things. once we have figured out the way to tackle some particular type of operation we tend to stick to it unquestioningly.

i had occasion, recently, to re-visit some of the most basic commands in visual foxpro and to critically examine their performance. i found some of the results surprising and, in the hope that you will too, i have documented the results of my research.

how the tests were run

there are several problems when trying to assess the relative performance of visual foxpro commands and functions. the solution that i adopted was to set up my tests as individual functions, and then to use a random number generator to determine which test to call. by running the individual tests many thousands of times, in random sequences, the effects of caching and pre-compilation are effectively cancelled out. i used the same basic methodology for all tests and won’t bother to mention it again.

for example, the code used for macro expansion test #5  (indirect eval() for object value) was:

loobj = createobject( 'textbox' )
loobj.value = "this is a test string value"
lcobj = "loobj"
lnst = seconds()
for lnreps = 1 to 10000
 
luval = evaluate( lcobj + '.value' )
next
lnen = seconds()

(note: 10,000 iterations was enough to be meaningfully measurable – the result was then averaged for each test). the results of each test were written out to a table and then aggregated and averaged to get the results illustrated in this article.

name expressions vs macro substitution

one of the great strengths of visual foxpro is its ability to use indirection at run time. now, all experienced visual foxpro developers know that using a name expression (i.e. the evaluate() function) is ‘faster’ than macro substitution (& operator) for doing this. but the question is, does it really matter? seems to me that there are three basic scenarios in which we may want to use indirection when working with data:

  • use a reference to get a value: e.g. retrieving the contents of a field using only its name

  • use a reference to address an object: e.g. get a property value

  • use a reference to find some specific item of data: e.g. in a locate command

the following table shows the results of carrying out the same operation in three different ways. first using traditional vfp macro substitution, then using the evaluate function indirectly (e.g. eval( lcfieldname )) and finally using a direct evaluation direct (e.g. eval( field( nn )). 

relative performance of macro expansion and name expressions

test name

total runs

total time

avg (ms)

macro substitute for table value

107000

14.8230

0.1385

indirect eval() for table value

142000

11.6680

0.0822

direct eval() for table value

96000

6.4920

0.0676

 

 

 

 

macro substitute for object value

1090000

13.7840

0.0126

indirect eval() for object value

1040000

8.7900

0.0085

direct eval() for object value

1130000

9.1930

0.0081

 

 

 

 

macro substitute in locate

1010000

8.1700

0.0081

indirect eval() in locate

1140000

9.1250

0.0080

direct eval() in locate

1140000

9.1980

0.0081

the first thing to note is that there is no intrinsically slow method when considering each approach in isolation (the worst case scenario shows a difference of no more than 0.05 milliseconds between best and worst methodology – hardly detectable!). however, when being executed repeatedly (inside a tight loop for example), it is clear that eval() has a significant performance advantage when dealing with either data tables or objects and so the perceived wisdom that eval() is faster than &expansion is obviously well founded.

interestingly there is little detectable difference between using direct and indirect evaluation and my personal preference, if only for improving readability of the code, is therefore to use the indirect method. in other words, i will accept (the very small) performance hit to retain code like this:

lcfield = field( lncnt )
luval = evaluate( lcfield )

in preference to the less obvious:

luval = evaluate( field( lncnt ))

the main conclusion is, therefore, that there is no benefit in using macro substitution when eval() is available as an option. a secondary point to note is that as the number of repetitions increases the difference rapidly becomes significant and that eval() should always be used when processing repetitively.

conditional evaluation

when it comes to evaluating some condition and switching control depending on the result, vfp 9.0 has almost an embarrassment of riches. we now have four different ways of evaluating some condition ranging from the traditional if…then…else, through the iif() function, the do case…endcase statement and finally the new icase() function. the question is, therefore, does it matter which we use in any given situation? the speed of execution of these commands was compared using a standardized multiple level test in the general form:

if lntest = 1
 
lnres = 1
else
 
if lntest = 2
   
lnres = 2
 
else
   
if lntest = 3   
     
lnres = 3
   
else
     
lnres = 4
   
endif
 
endif
endif

relative performance of conditional testing

test name

total runs

total time

avg (sec)

do case level 1

 282000

0.6910

 0.0025

do case level 2

 242000

0.7600

 0.0031

do case level 3

 275000

1.1820

 0.0043

do case otherwise

 247000

0.9900

 0.0040

 

 

 

 

icase level 1

 313000

0.4020

 0.0013

icase level 2

 281000

0.4210

 0.0015

icase level 3

 300000

0.5610

 0.0019

icase otherwise

 302000

0.5300

 0.0018

 

 

 

 

if else level 1

 203000

0.3100

 0.0015

if else level 2

 235000

0.6430

 0.0027

if else level 3

 213000

0.7610

 0.0036

if else otherwise

 227000

0.8510

 0.0037

 

 

 

 

iif level 1

 213000

0.2700

 0.0013

iif level 2

 236000

0.4240

 0.0018

iif level 3

 209000

0.4200

 0.0020

iif otherwise

 222000

0.5100

 0.0023

 

 

 

 

level 2 of nested do case inside do case

67000

         0.4300

0.0064

level 2 of nested if…else inside if…else

56000

0.1400

0.0025

once again the first thing to note is that nothing here is exactly slow. however there are some differences and a couple of anomalies that are worth noting.

one thing that stands out clearly is that the do…case is invariably the slowest way of handling this sort of test although it is also the most readable. perhaps oddly, the fastest way to execute a simple evaluation of this type is to use the icase() function – though of course this, like the iif() is not really relevant when a block of code follows the conditional test. the main problem with both the iif() and the icase() functions is, of course, the readability of the code. for example the icase that duplicates the if…else ladder above is:

lnres = icase( lntest = 1, 1, lntest = 2, 2, lntest = 3, 4 )

which is hardly comprehensible without considerable effort. of course, the commonest scenario for this type of logic is inside an sql query and so speed of execution is, rightly, paramount in that case.

the anomaly i referred to above is that with all methods tested, the execution time increases as you go further down the levels – which is precisely what you would expect. after all it makes sense that it should take longer to make three tests than two, and longer for two than for one. however in both the case and the icase the ‘otherwise’ condition is executed faster than the level which immediately precedes it, not much faster admittedly, but undeniably and consistently faster. i first noticed this phenomenon back in vfp 7.0 and it is interesting that it persists not only in the case statement in vfp 9.0 but also in the new icase. what it means, i do not know, but it is an observed phenomenon.

the conclusions here are, first, that if you can get away with it, use the functions in preference to the plain commands. second that if you are doing this repeatedly, then an if…else ladder is likely to give you better performance than a do case – especially if the nesting exceeds one level. the caveat is that the more complex the code becomes the harder it is to read and understand when using an if…else ladder or the functions. my personal preference here is to use the case structure as a comment  in my code, but actually execute it using one of the other methods.

looping through data

one of the most fundamental operations in almost every data centric application is the requirement to retrieve a set of records and them process them sequentially. in visual foxpro there are basically three ways of doing this, the original xbase do while command, the newer scan command and newest of all (though still pretty ancient j) the for…next loop. there are, of course, many variants and variations but these three constructs provide the basis for all of them. in the next little batch of tests i evaluated the speed with which each record in a large name and address table could be visited and a specific column value retrieved (and, for the purposes of this test, immediately discarded).

each of the three constructs was tested unconditionally (i.e. start at the first record and proceed to the last) and for a set of records meeting a specific condition (in this case for addresses in the state of ohio).

the reason for this particular test is that it represents the two commonest scenarios. however unlike the other comparisons we have done to date, the code required to perform this operation is completely different for each option.

 

  unconditional conditional
do while go top
do while not eof()

  lures = cstate

 
skip
enddo
=seek( ‘oh’, ‘bigtable’, ‘cstate’ )
do while cstate == ‘oh’ and not eof()

 
lures = cstate
 
skip
enddo
scan go top
scan

 
lures = cstate
endscan
*** scan for
go top
scan for cstate == ‘oh’

 
lures = cstate
endscan
*** seek() and scan while
=seek( ‘oh’, ‘bigtable’, ‘cstate’ )
scan for cstate == ‘oh’

 
lures = cstate
endscan
for…next *** using goto ***
lnrex = reccount(‘bigtable’)
for lncnt = 1 to lnrex

 
  goto (lncnt)
 
if eof()
   
exit
 
endif
 
lures = cstate
next
lnrex = reccount( 'bigtable' )
for lncnt = 1 to lnrex

 
goto (lncnt)
 
if eof()
   
exit
 
endif
 
if not cstate == 'oh'
   
loop
 
endif
 
lures = cstate
next
*** using skip
lnrex = reccount( 'bigtable' )
for lncnt = 1 to lnrex

 
skip
 
if eof()
   
exit
 
endif
 
lures = cstate
next

 the table used has 125,000 records and so is moderately large for this type of operation. the results may surprise you, they certainly did me!

relative performance of the looping tests

test name

total runs

max (sec)

min (sec)

avg (sec)

do while not eof() with no controlling index

26

0.6110

0.4510

0.5160769

scan...endscan with no controlling index

29

0.3610

0.3310

0.3536667

for...next with skip with no controlling index

19

1.1220

0.4500

0.5746250

for...next & goto with no controlling index

26

0.6600

0.4710

0.5691667

 

 

 

 

 

do while not eof() with controlling index

21

2.7040

2.4240

2.5595714

scan...endscan with controlling index

25

2.9640

2.3530

2.5836154

for...next with skip with controlling index

9

3.0840

2.4140

2.6314444

for...next & goto with controlling index

26

0.9510

0.4810

0.6214500

 

 

 

 

 

scan for cstate = 'oh' with no index set

26

0.0710

0.0600

0.0680000

scan for cstate = 'oh' with index set

19

0.1510

0.1100

0.1208571

for...next with test for cstate = ‘oh’

16

0.6300

0.5410

0.5867500

 

 

 

 

 

seek() & do while cstate = 'oh'

19

0.0910

0.0700

0.0777500

seek() & scan while cstate = 'oh'

29

0.0900

0.0700

0.0735000

there are a number of things to note from this table.

first, when processing all records in a table, the impact of a controlling index is significant if you are using any loop control that relies on the skip command – whether explicitly (do…while and for…next) or implicitly (scan…endscan).

second, scan for is also slower when a controlling index is set. in these tests there was an optimizable index on the column used and, even though scan is optimizable, the addition of the controlling index slowed the process noticeably.

third, the controlling index does not matter when we are using seek() and a while clause with either a scan or a do while. here it makes little difference which we use as both deliver essentially the same performance and we usually need a controlling index when using a while as the scope anyway.

the following conclusions can be drawn from these results:

  • when processing all records. ensure that there is no controlling index when using a command that includes a skip. probably the best generalization is to use the for loop, with a goto, since it is essentially unaffected by the presence or absence of a controlling index

  • the most efficient way to handle selective processing depends on whether an index suitable for use with a seek() exists.  if it does, then use a combination of seek() and while scope otherwise use scan for with no controlling index set on the table

4 Responses to Writing Better Code (Part 2)

  • Craig Boyd says:

    Andy,

    Nice job on this and also Part 1. I was surprised by the results you put up for Otherwise and also the For…Next & Goto. Learned some new things today. Thanks.

  • patpedersen says:

    Andy,

    Excellent post.  Thanks for sharing your results.

  • Andy,

    Great stuff. I remember you covering some of this at last year’s SWFox and seeing it here again reminded me to do something about it.

    Thanks for all you do,

    Stewart

  • John Terburg says:

    In the scan… endscan you use a seek followed by a scan without the “rest” scope. That makes no sense as a scan wil start from the first record by default. To take advantage of the preceding seek you should really use “scan rest for…”

    Yes, you are absolutley correct. Thank you for pointing that out. I think the code got a little mangled by the web page because in my original script it is that was used to generate the timings it was,  as you say, a SCAN REST FOR — Andy

Leave a Reply

Your email address will not be published. Required fields are marked *