PDA

View Full Version : Expression to ASM



Colin
16-05-2011, 12:18 PM
hi there, this is not related to game development, .. i am writing a compiler, so far the compiler supports many features, but is missing 1 vital thing, the ability to accept expressions, i have tried to come up with a way to do this but i always run into problems, the language allows for writing in assembly and higher level in the same code scope so what i need is a way to output assembly code (my assembler will assemble the code afterwards)

eax = myvar2 + (edx + ecx + (5 * edx)) - 5 xor myvar1;

this sort of thing, i need a way to output equiv assembly opcodes/operations. can anyone help me with this problem? Thanks in advance

Ñuño Martínez
16-05-2011, 01:47 PM
This is an advanced topic. The best is to build a recursive analyzer. May be Crenshaw's paper (http://compilers.iecc.com/crenshaw/) helps you.

code_glitch
16-05-2011, 03:29 PM
Hmm.. That is some very interesting stuff there Nuno... If I have time I might look into the possibility of a compiler designed to do graphics in very high speed or something to that effect. Ie. No bloat from libs as they are run at HW speed which would be nice indeed. Good luck on your compiler. If I was to evaluate expressions I would do something like:

1) find the number of brackets
2) for each bracket 'level' or sub brackets work those out and feed that back into the processor until the bracket level reaches 0
3) In each process I would identify what is a number and what is not (ie numeric first char or otherwise) and sub in values where required.
4) Process each part in BODMAS order (of if statements or similar)

Don't know if it helps but if nicely implemented it might run well. I'm working on an interpreter for rapid prototyping and debugging of my APIs and programs so I'm getting some similar challenges. Hence the use of the document Nuno posted: take interpreter and feed into binary. Done.

circular
17-05-2011, 08:26 AM
One thing to take into account is operator priority. Let's say you have the expression
1 + 2*4 + 8

If you analyse from left to right, rirst you encounter 1, then '+', then 2. So you temporarily conclude that it is an addition of 1 and 2 :
+(1, 2)

Then you find '*' and 4, which have a higher priority, so multiplication is included into the '+' and combined with the 2 :

+(1, *(2,4))

Then you find '+' and 8, which has a lower priority, so it gets out of the multiplication. It has the same proprity as the first '+', and it is generally assumed that operations are made from left to right, so it gets out of the last addition too. Thus, the full expression is combined with the 8 :

+( +(1, *(2,4)), 8 )

Let's check :

*(2,4) = 8
+(1, 8 ) = 9
+(9, 8 ) = 17

Notice that to implement it this way, you need to remember the current position in the expression, which can be a value token, an operation token, or nothing if there is nothing parsed, or if you get to the global level with a low priority.

A parenthesis is to be treated as a whole, until the corresponding end bracket is found. You can implement it recursively, using the same function to return the subexpression result, treated as a value token.

Let's try with :
1+ 2*(4+1) + 3

tree : none
current operation : nil
potential left operand : nil

input: + 2*(4+1) + 3

tree : token(1)
current operation : nil
potential left operand : token(1)

input: *(4+1) + 3

=> combine left and right operands

tree : +(1,2)
current operation : +(1,2)
potential left operand : current operation

=> '*' has a higher prority, so get inside

current operation : +(1,2)
potential left operand : token(2)

=> after reading '*', do a recursive call that returns +(4,1)

input : +3

=> combine left and right operands, which gives *(2, +(4,1))

=> replace token in current operation

tree : +(1, *(2, +(4,1) ) )
current operation : *(2, +(4,1))
potential left operand : current operation

=> '+' has a lower priority and is from left to right so

current operation : nil
potential left operand : +(1, *(2, +(4,1) ) ) = tree

=> combine left and right operands

input : none
tree : +( +(1, *(2, +(4,1) ) ), 3)
current operation : tree
potential left operand : tree

Note that the current operation may not be the whole tree, so you need to loop to search the parent.

Then there are two cases :
- either there is no current operation and one potential left operand : you can decide to raise an error or do a function call
- or there is a current operation, which can have a result or not (an assignment has the lowest priority and has no result)

To handle parameters of functions, you need to handle the parenthesis differently depending on the situation. If you have a potential left operand and have no operator, then it is a parameter list. If you've just read an operator, then it is not a parameter list.

Hope this helps

Colin
17-05-2011, 09:36 AM
hi there, thanks for replies, i have decided on first my compiler will convert the expression to an RPN (reverse polish notation), this will remove the problem of () and procedence of operators, my only problem now is the best way to output the opcodes, because i will be limited to registers since the programmer could also use them, maybe an option would be to use sse, but tbh i would rather not.

btw the problem with recursion is if the programmer has a very large expression, it would use large stack space, and maybe even an overflow. as for the compiler, i am already far beyond the basics, the compiler already has many features and very usable, you can check it out if you like - http://www.codeziron.com

farcodev
28-05-2011, 04:36 AM
good luck ! ;)

chronozphere
28-05-2011, 08:33 AM
Nice. Reverse Polish notation would also be my approach; It's really elegant.

Can't wait to hear more from your compiler project. :)

Colin
29-05-2011, 10:16 AM
thanks farcodev,

@chronozphere:
i finished with the rpn converter, i have also came up with a way to assemble the RPN to machine code without it interupting the programmers code (altho i have not implemented this yet and will not for a while - since it needs alot of work), using 5 registers and some stack space for any other vars needed (not using push and pop), check the link back in other post if you want to download and check it out, it contains a massive changelog, i have not added much documentation yet, but what i have written is on the website under docuementation, there is also some samples posted, syntax and changes are open for sugestion and ideas too.

At some point i wish to write a small 3d engine that can be suplied with the compiler, i will be writing a newton header and few other things i have in mind, overall the language is moving forward very good - i converted one of my c algorithms to ziron and it went down from 1048ns to 276ns (average), also one of my delphi apps went from 2000ms down to around 500ms (dont remember exactly this), basically the language adds much less garbage, and since most of it is just plain assembly language you get what you write, but even after i add more high level syntax, i plan to keep it adding just a minimum amount of code.

the language has an inline macro system, class support, structs, dll, exe output (32bit for now only, i plan to add 64bit at a later date), anyways thanks for the interest and help.

VilleK
30-05-2011, 03:16 PM
About expressions the hardest part (imo) is parsing and generating code for conditionals like "if(a || b && c)" etc where you need to generate compare-and-branch code. There are many good books on the topic of compiler construction, I found lots of good examples in this one (http://www.amazon.com/Compiling-NET-Common-Language-Runtime/dp/0130622966).

Generating native code with register allocation is another topic that is very advanced. There is the alternate path that Adobe has chosen which is to work with LLVM (http://llvm.org/) instead and let it generate high quality code for several different architectures.

You should publish examples of the timings you talk about because it can gather interest to your compiler if it actually does perform better than a C-compiler. Frankly I doubt it but I hold my judgment until I see it ;)

Looks like an interesting project, good luck with it!

chronozphere
30-05-2011, 03:38 PM
Wow .. again, this is really interesting. If I had more time, I could indulge into this field of expertise aswell.

You are using a separate assembler right? Because writing plain IA-32 machine code is sheer hell. Also, do you allready use a linker or does it only generate a single module with all the right addresses directly in it? If this project goes any deeper, you could write a nice article about it. Any plans of showing us the source? ;)

Also your timings are very suprising to me. I also doubt that it will be significantly faster than C, but I am very curious about it's performance anyway (and the things you have to give up in order to squeeze more code into the same nanoseconds).

Colin
30-05-2011, 06:05 PM
@ VilleK my assembler/compiler already handles if, while, repeat and for statements, these are the simple things, they are just cmp and conditional jmp statements, check here to see documentation on control structure http://www.codeziron.com/index.php?page=documentation (all documentation is unfinished) if you mean multiple inline comparisons then yes it is a bit more complicated, i plan to add that next, it still is much more simple than maths expressions since you are not changing registers and so on (which is my main problem - since i dont want it to affect the users code) as for .NET - it is the destruction of programming......i would never advise anyone to even touch any language that is .NET (especially for games)

as for checking it why, why dont you? it is available for download @ http://www.codeziron.com and of course it is faster than c since it is primarily assembly lol.....c is a nice language but it is still just a compiler and compilers can often struggle to generate optimised code, humans can optimise code far better than any compiler (if they know what they are doing of course)

@ chronozphere, i do not use a seperate assembler, i am writing the assembler, that is what my project really is, it started as just a regular assembler, but now it is becoming its own language, with high level code etc... my compiler also has a built in linker, compile speeds are also very fast (but they will be much much faster once i fix all bugs etc) and machine code is actually quite easy once you get the hang of it

here is a sample of ziron code for creating a dialog from resource (very simple sample)



program WIN32GUI 'Sample1';

#resource 'dialog.RES';

//the included RTL files
#include 'ziron32.zir';
#include 'smm32.zir';

#include 'comctl32.zir';
#include 'kernel32.zir';
#include 'user32.zir';
#include 'messages.zir';

function dlgFunc(DWord hDlg; DWord uMsg; DWord wParam; DWord lParam) {
if (uMsg == WM_CLOSE) {
EndDialog(hDlg, 0);
} elseif (uMsg == WM_INITDIALOG) {
ShowWindow(hDlg, SW_SHOWNORMAL);
}
return false;
}

DWord hInstance = GetModuleHandle(nil);
DialogBoxParam(hInstance, 1000, 0, @dlgFunc, 0);
ExitProcess(0);


anyways check it out.