PDA

View Full Version : If Parser



Colin
08-03-2013, 05:05 PM
OK so i've been working a little on one of my side projects and i have come to rewriting the if parser, i'm looking for ideas, suggestions and your thoughts on the best way of performing the following task:

the code must be able to output assembly code from complex if/or/and statements. (the output is pseudo since it actually outputs machine code, but for the purpose of ideas :))

so imagining there is already many functions available, such as getnexttoken, getnextmathsop, etc etc...

so a sample if statement:



if (((edx == 1) or (edx == 7)) or (eax == 5) or ((eax == 10) and (ebx == 10))) {
//the code
}


should be able to output the following or similar as long as the output is correct of course:



cmp ebx, 10
jne or1
cmp eax, 10
je thecode
or1:
cmp eax, 5
je thecode
or2:
cmp edx, 1
je thecode
cmp edx, 7
jne exitif
thecode:
//the code
exitif:
//no code was executed


I have not began to write the new code since i am first looking for ideas on the best approach, hope someone can help me to make this work good. Thanks to anyone willing to help :)

pstudio
08-03-2013, 08:37 PM
Well, I'm no expert on assembly and compiler writing, but if I were you, I would reduce the if statement to use just on je/jne.

Basically your example is build like this:

if (( boolean or boolean ) or boolean or (boolean and boolean)) {
...
}

There is no need to test every boolean when you can combine them with logical operations in to one single boolean and then test if that boolean is true or false.

( boolean or boolean ) or boolean or ( boolean and boolean)
=> boolean or boolean or boolean // 1st reduction
=> boolean // 2nd reduction

This of course means that you are making a full evaluation of the if expression. If you want lazy evaluation you will only need to make the 1st reduction (sequentially as needed) and then test on each boolean.

Colin
09-03-2013, 01:41 AM
hmm, I'm not sure i understand your idea, could you elaborate?

pstudio
09-03-2013, 07:36 AM
Ok, I'll give it a try :)

In your example of assembly output you are using lazy evaluation in reverse ordering. That means that you are checking each comparison from right to left and you jump straight to the code block as soon as you can guarentee that the if statement is true.
Now I don't know the full scope of your If Parser language, but I suggest making a full evaluation of the if expression. In other words, in stead of testing each boolean comparison, you just calculate one single boolean value which you then test on whether to see if you should execute the code block or not. Now if your language only allows for simple boolean comparision like testing variables against immeduate values, you don't actual need full evaluation. Full evaluation is only needed if your comparisons may contain side effects. However full evalution is a nice language property IMHO and it reduces the number of branching operations to 1.

What I tried to write before was basically how the code could be seen from the parsers view point. Basically any comparision like 'eax == 10' or 'eax < 12' results in a boolean value. In the end all your if statement is doing is testing if a single boolean value is true or false to see if some code should be executed. I suggest you calculate that single boolean value and test that in stead of testing all the smaller boolean values.

Let me try and give a pseudo-code example of how your if statement could look like when using full evaluation:



//(((edx == 1) or (edx == 7)) or (eax == 5) or ((eax == 10) and (ebx == 10)))

r0 := edx = 1 // edx == 1
r1 := edx = 7 // edx == 7
r0 := r0 or r1 // (edx == 1) or (edx == 7)
r1 := eax = 5 // eax == 5
r0 := r0 or r1 // ((edx == 1) or (edx == 5)) or (eax == 5)
r1 := eax = 10 // eax == 10
r2 := ebx = 10 // ebx == 10
r1 := r1 and r2 // (eax == 10) and (ebx == 10)
r0 := r0 or r1 // ((edx == 1) or (edx == 7)) or (eax == 5) or ((eax == 10) and (ebx == 10))

jne r0 true :label: // test if r0 is true or false
// the code
:label:
// no code was executed


As you can see the code will at most make one jump while your method introduces more jumps the more complex your if statement is. I have no idea if my method is faster than yours on average, but I do know that branches are considered slow and it is good to reduce the number of them.

This method will however require that you implement boolean variables since booleans are saved in temporary registers. If you haven't implemented boolean variables already you'll have to judge if it's worth doing so. It depends on among other things whether you need boolean variables other places in your code and if you need/want full evaluation. I suppose it could be a good/fun exercise to implement boolean variables and logic in a language :)

Colin
09-03-2013, 12:42 PM
Hello pstudio, thanks for your reply, your idea is very good, my language already supports if etc and many data types, whiles, fors, repeats, pretty much everything of a modern language, however for my purpose it kind of defeats the point of the language. In your example variables would need to be changed or at the least memory would be used to create a temporary storage for doing the comparison, the idea really is to allow the programmer to know and understand exactly what his code will produce at the same time making it easier to write.

In theory this idea is pretty good but the registers would be changed without the programmer knowing defeating the point and would produce alot of extra code :)

all of the following:


r0 := edx = 1 // edx == 1


are still cmp's.

so imagine

//al as r0
al = (edx = 1);



cmp edx, 1
setz al


setz instruction basically does the following: set al if the compare is equal


Overall the overhead of a few small jne/je/jnc/jc etc is worth it over memory operations.


My current if parser is just a forward generator, a brute force recursive procedure. Thanks again for your interest and input. I hope to hear some more good ideas :)

just to show what i look for exactly... my code can handle complex if/while/for statements already so:



if ((eax != 1) and ((ecx == 2) or ((edx == 3) and (edi == 0))) and (ebx =< 22) and (esi == 12)) {
ShowMessageA('Sample', 'Hello World!');
}


this already is assembled into machine code correct, which is equivalent to:



cmp eax,00000001h
jz L00403041
cmp ecx,00000002h
jz L00403028
cmp edx,00000003h
jnz L00403041
cmp edi,00000000h
jnz L00403041
L00403028:
cmp ebx,00000016h
ja L00403041
cmp esi,0000000Ch
jnz L00403041
push _Hello_World_
push _Sample_
call SUB_L004030A6
L00403041:


so that is works is not a problem, but i just would like to see the different ideas on how they believe it should be done, and if someone comes up with an idea that will be faster/better for optimization that would be superb :)