Programming Homework Help

Programming Homework Help. Use python to write a tokenizer

I have mentioned in class that for the Core Tokenizer, you may use (or borrow from) Java’s or Python’s Tokenizer library or even Tokenizers you might find online as long as you document it appropriately. In this note, though, I want to describe a particular way in which you can write your own Tokenizer from sscratch. And, in fact, the approach I describe is precisely how Java’s Tokenizer library works.

The key is to write your Tokenizer as a Finite State Machine (FSM). You have most likely seen FSMs in an ECE course but they are easily simulated in software. All you need is an integer variable whose value tells you what the current state of the FSM is. The FSM itself will be a loop in which you have a “select” statement or a bunch of “if” statements that, in each iteration, depending on the value of the CurrentState variable and the next “input” value changes the CurrentState variable to the value corresponding to the next state; and then you start the next iteration. This continues until the CurrentState variable reaches a value representing the “final state” of the FSM that you are simulating at which point the loop terminates. By the way, in many cases, it is convenient to split the CurrentState variable into two (or more) pieces since one piece might decide when the loop should terminate and another might have to do (only) with the behavior of the switch statement inside the loop, etc.

So how do you write your Tokenizer as an FSM? For convenience, let me assume that we have a single whitespace between each pair of tokens. The termination of the FSM’s (main) loop should obviously happen when the CurrentState variable’s value represents “end-of-file”, which you are supposed to represent by 33. What should the body of the loop look like? It is a bit more complex than just looking at the next input character. The very next input character will, however, teel you what type of token the next token is. If it is a lowecase letter, it is a keyword; if it is an uppercase letter, it is an identifier; if it is a digit, it is an integer. Based on that and other details about what the various legal tokens in Core are, you should be able to write the body of the looop.

You really should try to write your Tokenizer in this manner because this is precisely what actual Tokenizers do. The one difference is that practical Tokenizers are not written by hand. Instead, you write the Token grammar as a regular expression and the Tokenizer library functions generate the Tokenizer given the grammar as input. By the way, you may find it also useful, when designing the body of your FSM’s loop, to first write the BNF grammar of all tokens and see if that can guide the design of your loop.

You might ask, if you tried what I described above, whether it would be possible to write the *parser* in the same way. The answer is “no”. The reason is that FMSs can implement loops but they cannot implement general recursion. And we need that even for parsing such a simple language as Core (because, for example, a <stmt> may be an <if> statement which contains one or two <stmt seq> and the <stmt seq> in turn contains <stmt>s).

Programming Homework Help

 
"Our Prices Start at $11.99. As Our First Client, Use Coupon Code GET15 to claim 15% Discount This Month!!"