hexdump - code review?

by **johannes.krampf** » Fri Dec 17, 2010 4:56 pm

I've implemented a simple hexdump program, imitating the behavior of hexdump -C. I became a victim of Bug #262, but fortunately the bug reporter mentioned the yield key word - which turned out to be a nicer solution for my problem than implementing IEnumerator.

My first impression: Code written in Cobra looks mostly nice and interoperability with .NET is good. There are multiple places to look for documentation (Cobra web site, MSDN) and it took me a while to find answers. I miss the REPL and introspection from other languages.

Here's my program if somebody has the time to have a look at it. I'm thankful for any suggestions for improvements to help me learn Cobra.

class Hexdump

    def main
        args = CobraCore.commandLineArgs
        if args.count < 2
            binary_name = args[0].split(c'/').toList.get(-1)
            print r'Usage: [binary_name] \[filename\]'
            return
        line_chars = ''
        for n, byte in FileByteStream(args[1]).numbered
            line_chars += if(0x1f < byte < 0xff, Convert.toChar(byte), c'.').toString
            if (n % 16) == 0
                line = String.format('{0:x8}', n)
                print '[line]  ' stop
            hex = String.format('{0:x2}', byte)
            print '[hex] ' stop
            if (n+1) % 16 == 0
                print ' |[line_chars]|'
                line_chars = ''
            else if (n+1) % 8 == 0
                print ' ' stop
        n += 1
        num = 3*(16 - (n % 16)) + if(0 < (n % 16) <= 8, 1 , 0)
        last_line = if(num == 0, '', String(c' ',num) + ' |[line_chars]|')
        print last_line
        print String.format('{0:x8}', n)

class FileByteStream
    var _input

    cue init(filename as String)
        base.init
        try
            _input = File.openRead(filename)
        catch ioe as IOException
            print 'I/O Error: [ioe.message]'

    def read as int*
        current = _input.readByte
        while current <> -1
            yield current
            current = _input.readByte
        yield break

    def numbered as KeyValuePair<of int, int>*
        n = 0
        for byte in .read
            yield KeyValuePair<of int, int>(n, byte)
            n += 1

by **hopscc** » Sat Dec 18, 2010 7:48 am

Yeah - that looks pretty good cobrawise.

You probably dont want the 'r' prefix on the line

Code: Select all: print r'Usage: [binary_name] \[filename\]'

since that will display the line verbatim without substituting the value of 'binary_name'.
( in fact the compiler tells you this with a warning about 'binary_name' being set but not used on first compilation - at least it did for me ).

(The rest isnt really cobra specific - just general pickyness)

For cmdline utilities like this I usually handle the optarg processing in the main and put the file processing in a called method - this makes it easier to add or handle variants like handling pipes and std Input and see what the main processing path is.
Call it processFile or somesuch

Looking at the processing portion (the bit in the loop) the code there is pretty cluttered, mixing accumulation, calculation and output in a non ordered manner - I'd reorder these to follow the order the output will be presented in (offset, hexPart, asciiPart) and try and do all the output
(print) in the one place. In this case you'll get a bit of a perceived boost in display speed/smoothness as well

There are a lot of compound operations packed into some of the lines .
Theres nothing wrong with this as such and for small amounts of code like this
but it does tend to make reading/understanding the code more difficult (harder) than it need be
since you are not being charged by the line theres no real need to cram as much into a line as possible
- I'd tend to split these up onto separate chunks - 1 to 2 max 'concepts'/'operations' per line

The last blob doing the final line padding is pretty opaque - perhaps change to using the String padLeft/Right calls to clarify
what you're doing - theres a couple of small omissions/errors here as well

Trailing Comments to signpost steps and progress usually doesnt hurt

Just my preference but I usually make main explicitly shared unless it is/needs an automatically generated instance class - that tends to force helper methods to be shared as well

Cant fault the FileByteStream class

Doing all that gives something like this:

# Hexdump program in cobra like hexdump -C  
class Hexdump

    def main is shared
        args = CobraCore.commandLineArgs
        if args.count < 2
            binary_name = args[0].split(c'/').toList.get(-1)
            print 'Usage: [binary_name] \[filename\]'
            return

        .processFile(args[1])   
        
    def processFile(filename as String) is shared
        hexPart = ''    
        asciiPart = ''
        for n, byte in FileByteStream(filename).numbered
            if (n % 16) == 0    # file Ofset in hex
                line = String.format('{0:x8}', n)
                hexPart += '[line]  '
            hex = String.format('{0:x2}', byte)
            hexPart += '[hex] '     #  accumulate hex chars space separated
            asc =  '.'
            if 0x1f < byte < 0xff
                asc = Convert.toChar(byte).toString
            asciiPart += asc
            if (n+1) % 16 == 0      # Display a full line
                print '[hexPart] |[asciiPart]|'
                hexPart = ''
                asciiPart = ''
            else if (n+1) % 8 == 0  # extra space in middle of hex
                hexPart += ' '

        n += 1
        if n % 16
            # space pad both parts of partially filled last line 
            hpad = 3*(16 - (n % 16)) + if(0 < (n % 16) <= 8, 1 , 0)
            hexPart += ' '.padLeft(hpad)
            apad = (16 - (n % 16))
            asciiPart += ' '.padLeft(apad)
            print '[hexPart] |[asciiPart]|'

        print String.format('{0:x8}', n)        #file byteCount

class FileByteStream
    var _input

    cue init(filename as String)
        base.init
        try
            _input = File.openRead(filename)
        catch ioe as IOException
            print 'I/O Error: [ioe.message]'

    def read as int*
        current = _input.readByte
        while current <> -1
            yield current
            current = _input.readByte
        yield break

    def numbered as KeyValuePair<of int, int>*
        n = 0
        for byte in .read
            yield KeyValuePair<of int, int>(n, byte)
            n += 1

Not particularly significantly different.
There'd probably be a bit more of a performance boost from using StringBuilders rather than Strings for the line accumulation parts perhaps at the cost of a little clarity.

by **johannes.krampf** » Sat Dec 18, 2010 11:17 am

Hi hopscc,

thanks a lot for the review. Your rewritten code is clearer and I'll take your suggestions to heart. The idea of separating command line handling and real functionality is good, the code is more nicely structured this way.

The r-prefix for the string is indeed wrong. I only found out how to disable string interpolation while writing the comment and missed the r when editing the code.

Are you still interested in sample programs? If so, I'd write proper comments and submit it. Where would I submit?

by **Charles** » Sat Dec 18, 2010 1:42 pm

My turn:

"""
HexDump program in Cobra.

Produces output like hexdump -C  
"""


class HexDump

    def main
        args = CobraCore.commandLineArgs
        if args.count <> 2
            binary_name = Path.getFileName(args[0])
            print 'Usage: [binary_name] \[filename\]'
            return
        .processFileNamed(args[1], 16)
        
    def processFileNamed(filename as String, width as int)
        require
            width % 2 == 0  # width must be even
        test
            pass  # to-do
        body
            halfWidth = width // 2
            hexPart = asciiPart = ''    
            for n, byte in FileByteStream(filename).read.toList.numbered
                if (n % width) == 0  # file offset in hex
                    line = '[n:x8]'
                    hexPart += '[line]  '
                hex = '[byte:x2]'
                hexPart += '[hex] '  #  accumulate hex chars space separated
                asc =  '.'
                if 0x1f < byte < 0xff
                    asc = Convert.toChar(byte).toString
                asciiPart += asc
                if (n+1) % width == 0  # display a full line
                    print '[hexPart] |[asciiPart]|'
                    hexPart = ''
                    asciiPart = ''
                else if (n+1) % halfWidth == 0  # extra space in middle of hex
                    hexPart += ' '

            n += 1
            if n % width
                # space pad both parts of partially filled last line 
                hpad = 3*(width - (n % width)) + if(0 < (n % width) <= halfWidth, 1, 0)
                hexPart += ' '.padLeft(hpad)
                apad = width - (n % width)
                asciiPart += ' '.padLeft(apad)
                print '[hexPart] |[asciiPart]|'

            print '[n:x8]'  # file byte count


class FileByteStream

    var _input as FileStream

    cue init(filename as String)
        base.init
        try
            _input = File.openRead(filename)
        catch ioe as IOException
            print 'I/O Error: [ioe.message]'

    def read as int*
        while true
            current = _input.readByte
            if current == -1, break
            yield current
        yield break

-- When I compile the original code, there is this warning:

hexdump1.cobra(6): warning: The value of variable "binary_name" is never used.

Which ties back to hopscc's comment about the use of the "r" prefix in a string with substitution. I just wanted to point out that it's good to heed warnings due to their utility.

-- This code:

binary_name = args[0].split(c'/').toList.get(-1)

should be:

binary_name = Path.getFileName(args[0])

Now it's cross platform, easier to read and less error prone.

-- I agree with hopscc comment about splitting main.

-- I don't agree with favoring "is shared" as hopscc does. I still see shared/static overused in real project code leading to well known problems including the inability to override methods in subclasses, increased difficulty in testing, and a subsequent bias to make shared/static variables which are effectively global variables which have their own well known problems. I don't use shared/static unless I have a "damn good reason" to. Consider "shared" methods to be guilty until proven innocent.

I might post a longer article about this at some point with more examples and explanation.

-- Since the program only processes one argument, I changed the check "if args.count < 2" to "if args.count <> 2". Of course, another approach would be to dump all arguments passed.

-- I see a lot of "16" and "8" in the program. I factored these out. In the future, maybe the 16 could even be a command line argument.

-- I changed the name of "processFile" to "processFileNamed". When a method takes the name of something rather than an object of that something, then I prefer to put "Named" in the method. If I was passing a file object then I would call it "processFile". This is a pure style preference, of course.

-- I thought the entire "def numbered" method could be removed since we have it in the Cobra std lib. So then change this:

for n, byte in FileByteStream(filename).numbered

to this:

for n, byte in FileByteStream(filename).read.numbered

but it turns out that utility method is only on IList, not IEnumerable. So for now, I have a .toList in there:

for n, byte in FileByteStream(filename).read.toList.numbered

when the std lib supports .numbered on anything enumerable, the .toList can be removed.

-- Instead of:

String.format('{0:x8}', n)

I prefer:

'[n:x8]'

-- The .read method repeats a statement. Although it's not a lot of code repetition, I prefer the version that does not, seen above.

-- I don't like "random casing":

# file Ofset in hex
# accumulate hex chars space separated
# Display a full line
# file byteCount

There is no rhyme or reason here.

-- FileByteStream has:

var _input

which is untyped. It works okay, but the code will be faster (and you will get more error checking) if it is typed:

var _input as FileStream

Another benefit is that reading the code 6 months later can be easier.

-- You'll notice that I stubbed out a potential test case, but left it blank for now. I will post separately on that.

-- BTW, no matter whose version I run, I get inconsistent widths in lines like so:

Code: Select all: 000000d0 00 80 00 00 00 02 00 00 00 00 00 00 03 00 00 00 |...............| 000000e0 00 00 10 00 00 10 00 00 00 00 10 00 00 10 00 00 |................|

Interestingly, when I pasted the output from Mac Terminal to the TextMate editor, a non-ASCII character was revealed. I guess Terminal won't print this character. I tried "hexdump -C" and it does not have this problem:

Code: Select all: 000000d0 00 80 00 00 00 02 00 00 00 00 00 00 03 00 00 00 |................| 000000e0 00 00 10 00 00 10 00 00 00 00 10 00 00 10 00 00 |................|

Nor does it ever display high bit characters. Also "hexdump -C" will print an asteriks on a line by itself on occasion, but I haven't researched what that's about.

-- For a small utility, I might not take things this far, but this is illustrative of important principles for the larger programs that are harder to develop and maintain. Thanks for stimulating this discussion.

by **Charles** » Sat Dec 18, 2010 1:55 pm

Some more comments on other issues:

-- The forums are generally a good place to submit sample programs like you have done here.

-- I agree there are many places to look for information and that the situation could be improved. I have some plans in this area. Among them are improving Cobra's "doc lib" feature to be more mature, then apply it to the standard libs and then integrate that output with our web site here. Additionally, it would be great to automatically link from those library docs to any forum posts and wiki pages that mention a class name or method name.

It will take some time to get there.

Additional suggestions are welcome.

I also encourage people to edit the wiki as they discover things. There could be a wiki page on "mkbundle" for example with some brief notes and then links to the forum discussion and a Mono page on it.

by **Charles** » Sat Dec 18, 2010 1:57 pm

I'll break out a new thread on the web site. Let's keep this thread focused on hexdump.

by **johannes.krampf** » Mon Dec 20, 2010 12:59 pm

Thanks for the review. I'm particularly glad to know about the '[n:x8]' for string formatting, which removed one of the ugliest part of the code. (The other being Convert.toChar(byte).toString)

I agree with factoring out constants, I should have thought of that myself. Thanks for your naming tips, they seem very sensible to me.

Honestly, I'm surprised to see that I accidentally created a dynamic variable. I kind of expected the compiler to infer the type for me, that's what I get for dabbling in Haskell.

Regarding the character display: 0x80 is the ASCII delete character, which can't be displayed. That's my error, the condition should probably read 0x1f < byte < 0x80

by **Charles** » Mon Dec 20, 2010 8:59 pm

Okay, new version:

"""
HexDump program in Cobra.

Produces output like hexdump -C  
"""


class HexDump

    def main
        args = CobraCore.commandLineArgs
        if args.count <> 2
            binary_name = Path.getFileName(args[0])
            print 'Usage: [binary_name] \[filename\]'
            CobraCore.exit(1)
        for i in 1 : args.count
            .dumpFileNamed(args[i], 16)

    def dumpFileNamed(filename as String, width as int)
        require width % 2 == 0
        .dumpBytes(FileByteStream(filename).read, width)

    def dumpBytes(bytes as int*, width as int)
        require
            width % 2 == 0  # width must be even
        test
            cases = [
                ['aoeu', '00000000  61 6f  65 75  |aoeu|\n00000004\n'],
                ['aoe\nasdf', '00000000  61 6f  65 0a  |aoe.|\n00000004  61 73  64 66  |asdf|\n00000008\n'],
            ]
            for input, expected in cases
                sw = StringWriter()
                print to sw
                    bytes = for b in Encoding.ascii.getBytes(input) get b to int
                    HexDump().dumpBytes(bytes, 4)
                actual = sw.toString.replace('\r', '')
                assert actual == expected, input
        body
            halfWidth = width // 2
            hexPart = asciiPart = ''    
            for n, byte in bytes.toList.numbered
                if (n % width) == 0  # file offset in hex
                    line = '[n:x8]'
                    hexPart += '[line]  '
                hex = '[byte:x2]'
                hexPart += '[hex] '  #  accumulate hex chars space separated
                asciiPart += if(0x1f < byte < 0x80, Convert.toChar(byte).toString, '.')
                if (n+1) % width == 0  # display a full line
                    print '[hexPart] |[asciiPart]|'
                    hexPart = asciiPart = ''
                else if (n+1) % halfWidth == 0  # extra space in middle of hex
                    hexPart += ' '

            n += 1
            if n % width
                # space pad both parts of partially filled last line 
                hpad = 3*(width - (n % width)) + if(0 < (n % width) <= halfWidth, 1, 0)
                hexPart += ' '.padLeft(hpad)
                apad = width - (n % width)
                asciiPart += ' '.padLeft(apad)
                print '[hexPart] |[asciiPart]|'

            print '[n:x8]'  # file byte count


class FileByteStream

    var _input as FileStream

    cue init(filename as String)
        base.init
        try
            _input = File.openRead(filename)
        catch ioe as IOException
            print 'I/O Error: [ioe.message]'

    def read as int*
        while true
            current = _input.readByte
            if current == -1, break
            yield current
        yield break

-- In .main, I changed "return" to "CobraCore.exit(1)" which is normal practice to indicate an error with a command line program.

-- In .main, the command line arguments are looped through so you can dump multiple files, same as "hexdump -C".

-- I made the correction from "0xff" to "0x80" discussed earlier.

-- I tightened up some code:

asciiPart += if(0x1f < byte < 0x80, Convert.toChar(byte).toString, '.')
...
hexPart = asciiPart = ''

-- Most prominently, I added unit testing for the core functionality. Note that isolation is a key characteristic of unit tests. This lead me to break out the reading of files from the processing of their contents, hence the new .dumpBytes method containing the core functionality. Interestingly, the code is now more reusable as you can call this method with a stream of bytes from any source. It's not uncommon for unit testing to stimulate this kind of improvement.

-- The unit testing shows my typical approach of looping through multiple cases containing input(s) and expected answer. The loop culminates with:

assert actual == expected, input

Final version? Ready for Samples?

by **Charles** » Tue Dec 21, 2010 12:36 am

Posted:

http://cobra-language.com/samples/HexDump/

hexdump - code review?

hexdump - code review?

Re: hexdump - code review?

Re: hexdump - code review?

Re: hexdump - code review?

Re: hexdump - code review?

Re: hexdump - code review?

Re: hexdump - code review?

Re: hexdump - code review?

Re: hexdump - code review?

Who is online