Forums

hexdump - code review?

General discussion about Cobra. Releases and general news will also be posted here.
Feel free to ask questions or just say "Hello".

hexdump - code review?

Postby johannes.krampf » Fri Dec 17, 2010 4:56 pm

I've implemented a simple hexdump program, imitating the behavior of hexdump -C. I became a victim of Bug #262, but fortunately the bug reporter mentioned the yield key word - which turned out to be a nicer solution for my problem than implementing IEnumerator.

My first impression: Code written in Cobra looks mostly nice and interoperability with .NET is good. There are multiple places to look for documentation (Cobra web site, MSDN) and it took me a while to find answers. I miss the REPL and introspection from other languages.

Here's my program if somebody has the time to have a look at it. I'm thankful for any suggestions for improvements to help me learn Cobra.

class Hexdump

def main
args = CobraCore.commandLineArgs
if args.count < 2
binary_name = args[0].split(c'/').toList.get(-1)
print r'Usage: [binary_name] \[filename\]'
return
line_chars = ''
for n, byte in FileByteStream(args[1]).numbered
line_chars += if(0x1f < byte < 0xff, Convert.toChar(byte), c'.').toString
if (n % 16) == 0
line = String.format('{0:x8}', n)
print '[line] ' stop
hex = String.format('{0:x2}', byte)
print '[hex] ' stop
if (n+1) % 16 == 0
print ' |[line_chars]|'
line_chars = ''
else if (n+1) % 8 == 0
print ' ' stop
n += 1
num = 3*(16 - (n % 16)) + if(0 < (n % 16) <= 8, 1 , 0)
last_line = if(num == 0, '', String(c' ',num) + ' |[line_chars]|')
print last_line
print String.format('{0:x8}', n)

class FileByteStream
var _input

cue init(filename as String)
base.init
try
_input = File.openRead(filename)
catch ioe as IOException
print 'I/O Error: [ioe.message]'

def read as int*
current = _input.readByte
while current <> -1
yield current
current = _input.readByte
yield break

def numbered as KeyValuePair<of int, int>*
n = 0
for byte in .read
yield KeyValuePair<of int, int>(n, byte)
n += 1
johannes.krampf
 
Posts: 11
Location: Suisse Romande

Re: hexdump - code review?

Postby hopscc » Sat Dec 18, 2010 7:48 am

Yeah - that looks pretty good cobrawise.

You probably dont want the 'r' prefix on the line
Code: Select all
print r'Usage: [binary_name] \[filename\]'

since that will display the line verbatim without substituting the value of 'binary_name'.
( in fact the compiler tells you this with a warning about 'binary_name' being set but not used on first compilation - at least it did for me ).

(The rest isnt really cobra specific - just general pickyness)

For cmdline utilities like this I usually handle the optarg processing in the main and put the file processing in a called method - this makes it easier to add or handle variants like handling pipes and std Input and see what the main processing path is.
Call it processFile or somesuch

Looking at the processing portion (the bit in the loop) the code there is pretty cluttered, mixing accumulation, calculation and output in a non ordered manner - I'd reorder these to follow the order the output will be presented in (offset, hexPart, asciiPart) and try and do all the output
(print) in the one place. In this case you'll get a bit of a perceived boost in display speed/smoothness as well

There are a lot of compound operations packed into some of the lines .
Theres nothing wrong with this as such and for small amounts of code like this
but it does tend to make reading/understanding the code more difficult (harder) than it need be
since you are not being charged by the line theres no real need to cram as much into a line as possible
- I'd tend to split these up onto separate chunks - 1 to 2 max 'concepts'/'operations' per line

The last blob doing the final line padding is pretty opaque - perhaps change to using the String padLeft/Right calls to clarify
what you're doing - theres a couple of small omissions/errors here as well

Trailing Comments to signpost steps and progress usually doesnt hurt

Just my preference but I usually make main explicitly shared unless it is/needs an automatically generated instance class - that tends to force helper methods to be shared as well

Cant fault the FileByteStream class :)

Doing all that gives something like this:

# Hexdump program in cobra like hexdump -C  
class Hexdump

def main is shared
args = CobraCore.commandLineArgs
if args.count < 2
binary_name = args[0].split(c'/').toList.get(-1)
print 'Usage: [binary_name] \[filename\]'
return

.processFile(args[1])

def processFile(filename as String) is shared
hexPart = ''
asciiPart = ''
for n, byte in FileByteStream(filename).numbered
if (n % 16) == 0 # file Ofset in hex
line = String.format('{0:x8}', n)
hexPart += '[line] '
hex = String.format('{0:x2}', byte)
hexPart += '[hex] ' # accumulate hex chars space separated
asc = '.'
if 0x1f < byte < 0xff
asc = Convert.toChar(byte).toString
asciiPart += asc
if (n+1) % 16 == 0 # Display a full line
print '[hexPart] |[asciiPart]|'
hexPart = ''
asciiPart = ''
else if (n+1) % 8 == 0 # extra space in middle of hex
hexPart += ' '

n += 1
if n % 16
# space pad both parts of partially filled last line
hpad = 3*(16 - (n % 16)) + if(0 < (n % 16) <= 8, 1 , 0)
hexPart += ' '.padLeft(hpad)
apad = (16 - (n % 16))
asciiPart += ' '.padLeft(apad)
print '[hexPart] |[asciiPart]|'

print String.format('{0:x8}', n) #file byteCount

class FileByteStream
var _input

cue init(filename as String)
base.init
try
_input = File.openRead(filename)
catch ioe as IOException
print 'I/O Error: [ioe.message]'

def read as int*
current = _input.readByte
while current <> -1
yield current
current = _input.readByte
yield break

def numbered as KeyValuePair<of int, int>*
n = 0
for byte in .read
yield KeyValuePair<of int, int>(n, byte)
n += 1


Not particularly significantly different.
There'd probably be a bit more of a performance boost from using StringBuilders rather than Strings for the line accumulation parts perhaps at the cost of a little clarity.
hopscc
 
Posts: 632
Location: New Plymouth, Taranaki, New Zealand

Re: hexdump - code review?

Postby johannes.krampf » Sat Dec 18, 2010 11:17 am

Hi hopscc,

thanks a lot for the review. Your rewritten code is clearer and I'll take your suggestions to heart. The idea of separating command line handling and real functionality is good, the code is more nicely structured this way.

The r-prefix for the string is indeed wrong. I only found out how to disable string interpolation while writing the comment and missed the r when editing the code.

Are you still interested in sample programs? If so, I'd write proper comments and submit it. Where would I submit?
johannes.krampf
 
Posts: 11
Location: Suisse Romande

Re: hexdump - code review?

Postby Charles » Sat Dec 18, 2010 1:42 pm

My turn:

"""
HexDump program in Cobra.

Produces output like hexdump -C
"""


class HexDump

def main
args = CobraCore.commandLineArgs
if args.count <> 2
binary_name = Path.getFileName(args[0])
print 'Usage: [binary_name] \[filename\]'
return
.processFileNamed(args[1], 16)

def processFileNamed(filename as String, width as int)
require
width % 2 == 0 # width must be even
test
pass # to-do
body
halfWidth = width // 2
hexPart = asciiPart = ''
for n, byte in FileByteStream(filename).read.toList.numbered
if (n % width) == 0 # file offset in hex
line = '[n:x8]'
hexPart += '[line] '
hex = '[byte:x2]'
hexPart += '[hex] ' # accumulate hex chars space separated
asc = '.'
if 0x1f < byte < 0xff
asc = Convert.toChar(byte).toString
asciiPart += asc
if (n+1) % width == 0 # display a full line
print '[hexPart] |[asciiPart]|'
hexPart = ''
asciiPart = ''
else if (n+1) % halfWidth == 0 # extra space in middle of hex
hexPart += ' '

n += 1
if n % width
# space pad both parts of partially filled last line
hpad = 3*(width - (n % width)) + if(0 < (n % width) <= halfWidth, 1, 0)
hexPart += ' '.padLeft(hpad)
apad = width - (n % width)
asciiPart += ' '.padLeft(apad)
print '[hexPart] |[asciiPart]|'

print '[n:x8]' # file byte count


class FileByteStream

var _input as FileStream

cue init(filename as String)
base.init
try
_input = File.openRead(filename)
catch ioe as IOException
print 'I/O Error: [ioe.message]'

def read as int*
while true
current = _input.readByte
if current == -1, break
yield current
yield break



-- When I compile the original code, there is this warning:

hexdump1.cobra(6): warning: The value of variable "binary_name" is never used.

Which ties back to hopscc's comment about the use of the "r" prefix in a string with substitution. I just wanted to point out that it's good to heed warnings due to their utility.


-- This code:
binary_name = args[0].split(c'/').toList.get(-1)

should be:
binary_name = Path.getFileName(args[0])
Now it's cross platform, easier to read and less error prone.


-- I agree with hopscc comment about splitting main.


-- I don't agree with favoring "is shared" as hopscc does. I still see shared/static overused in real project code leading to well known problems including the inability to override methods in subclasses, increased difficulty in testing, and a subsequent bias to make shared/static variables which are effectively global variables which have their own well known problems. I don't use shared/static unless I have a "damn good reason" to. Consider "shared" methods to be guilty until proven innocent.

I might post a longer article about this at some point with more examples and explanation.


-- Since the program only processes one argument, I changed the check "if args.count < 2" to "if args.count <> 2". Of course, another approach would be to dump all arguments passed.


-- I see a lot of "16" and "8" in the program. I factored these out. In the future, maybe the 16 could even be a command line argument.


-- I changed the name of "processFile" to "processFileNamed". When a method takes the name of something rather than an object of that something, then I prefer to put "Named" in the method. If I was passing a file object then I would call it "processFile". This is a pure style preference, of course.


-- I thought the entire "def numbered" method could be removed since we have it in the Cobra std lib. So then change this:
for n, byte in FileByteStream(filename).numbered
to this:
for n, byte in FileByteStream(filename).read.numbered
but it turns out that utility method is only on IList, not IEnumerable. So for now, I have a .toList in there:
for n, byte in FileByteStream(filename).read.toList.numbered

when the std lib supports .numbered on anything enumerable, the .toList can be removed.


-- Instead of:
String.format('{0:x8}', n)
I prefer:
'[n:x8]'


-- The .read method repeats a statement. Although it's not a lot of code repetition, I prefer the version that does not, seen above.


-- I don't like "random casing":

# file Ofset in hex
# accumulate hex chars space separated
# Display a full line
# file byteCount

There is no rhyme or reason here.


-- FileByteStream has:
var _input

which is untyped. It works okay, but the code will be faster (and you will get more error checking) if it is typed:
var _input as FileStream

Another benefit is that reading the code 6 months later can be easier.


-- You'll notice that I stubbed out a potential test case, but left it blank for now. I will post separately on that.


-- BTW, no matter whose version I run, I get inconsistent widths in lines like so:
Code: Select all
000000d0  00 80 00 00 00 02 00 00  00 00 00 00 03 00 00 00  |.€..............|
000000e0  00 00 10 00 00 10 00 00  00 00 10 00 00 10 00 00  |................|

Interestingly, when I pasted the output from Mac Terminal to the TextMate editor, a non-ASCII character was revealed. I guess Terminal won't print this character. I tried "hexdump -C" and it does not have this problem:
Code: Select all
000000d0  00 80 00 00 00 02 00 00  00 00 00 00 03 00 00 00  |................|
000000e0  00 00 10 00 00 10 00 00  00 00 10 00 00 10 00 00  |................|

Nor does it ever display high bit characters. Also "hexdump -C" will print an asteriks on a line by itself on occasion, but I haven't researched what that's about.


-- For a small utility, I might not take things this far, but this is illustrative of important principles for the larger programs that are harder to develop and maintain. Thanks for stimulating this discussion.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: hexdump - code review?

Postby Charles » Sat Dec 18, 2010 1:55 pm

Some more comments on other issues:

-- The forums are generally a good place to submit sample programs like you have done here.

-- I agree there are many places to look for information and that the situation could be improved. I have some plans in this area. Among them are improving Cobra's "doc lib" feature to be more mature, then apply it to the standard libs and then integrate that output with our web site here. Additionally, it would be great to automatically link from those library docs to any forum posts and wiki pages that mention a class name or method name.

It will take some time to get there.

Additional suggestions are welcome.

I also encourage people to edit the wiki as they discover things. There could be a wiki page on "mkbundle" for example with some brief notes and then links to the forum discussion and a Mono page on it.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: hexdump - code review?

Postby Charles » Sat Dec 18, 2010 1:57 pm

I'll break out a new thread on the web site. Let's keep this thread focused on hexdump.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: hexdump - code review?

Postby johannes.krampf » Mon Dec 20, 2010 12:59 pm

Thanks for the review. I'm particularly glad to know about the '[n:x8]' for string formatting, which removed one of the ugliest part of the code. (The other being Convert.toChar(byte).toString)

I agree with factoring out constants, I should have thought of that myself. Thanks for your naming tips, they seem very sensible to me.

Honestly, I'm surprised to see that I accidentally created a dynamic variable. I kind of expected the compiler to infer the type for me, that's what I get for dabbling in Haskell.

Regarding the character display: 0x80 is the ASCII delete character, which can't be displayed. That's my error, the condition should probably read 0x1f < byte < 0x80
johannes.krampf
 
Posts: 11
Location: Suisse Romande

Re: hexdump - code review?

Postby Charles » Mon Dec 20, 2010 8:59 pm

Okay, new version:
"""
HexDump program in Cobra.

Produces output like hexdump -C
"""


class HexDump

def main
args = CobraCore.commandLineArgs
if args.count <> 2
binary_name = Path.getFileName(args[0])
print 'Usage: [binary_name] \[filename\]'
CobraCore.exit(1)
for i in 1 : args.count
.dumpFileNamed(args[i], 16)

def dumpFileNamed(filename as String, width as int)
require width % 2 == 0
.dumpBytes(FileByteStream(filename).read, width)

def dumpBytes(bytes as int*, width as int)
require
width % 2 == 0 # width must be even
test
cases = [
['aoeu', '00000000 61 6f 65 75 |aoeu|\n00000004\n'],
['aoe\nasdf', '00000000 61 6f 65 0a |aoe.|\n00000004 61 73 64 66 |asdf|\n00000008\n'],
]
for input, expected in cases
sw = StringWriter()
print to sw
bytes = for b in Encoding.ascii.getBytes(input) get b to int
HexDump().dumpBytes(bytes, 4)
actual = sw.toString.replace('\r', '')
assert actual == expected, input
body
halfWidth = width // 2
hexPart = asciiPart = ''
for n, byte in bytes.toList.numbered
if (n % width) == 0 # file offset in hex
line = '[n:x8]'
hexPart += '[line] '
hex = '[byte:x2]'
hexPart += '[hex] ' # accumulate hex chars space separated
asciiPart += if(0x1f < byte < 0x80, Convert.toChar(byte).toString, '.')
if (n+1) % width == 0 # display a full line
print '[hexPart] |[asciiPart]|'
hexPart = asciiPart = ''
else if (n+1) % halfWidth == 0 # extra space in middle of hex
hexPart += ' '

n += 1
if n % width
# space pad both parts of partially filled last line
hpad = 3*(width - (n % width)) + if(0 < (n % width) <= halfWidth, 1, 0)
hexPart += ' '.padLeft(hpad)
apad = width - (n % width)
asciiPart += ' '.padLeft(apad)
print '[hexPart] |[asciiPart]|'

print '[n:x8]' # file byte count


class FileByteStream

var _input as FileStream

cue init(filename as String)
base.init
try
_input = File.openRead(filename)
catch ioe as IOException
print 'I/O Error: [ioe.message]'

def read as int*
while true
current = _input.readByte
if current == -1, break
yield current
yield break


-- In .main, I changed "return" to "CobraCore.exit(1)" which is normal practice to indicate an error with a command line program.

-- In .main, the command line arguments are looped through so you can dump multiple files, same as "hexdump -C".

-- I made the correction from "0xff" to "0x80" discussed earlier.

-- I tightened up some code:
asciiPart += if(0x1f < byte < 0x80, Convert.toChar(byte).toString, '.')
...
hexPart = asciiPart = ''


-- Most prominently, I added unit testing for the core functionality. Note that isolation is a key characteristic of unit tests. This lead me to break out the reading of files from the processing of their contents, hence the new .dumpBytes method containing the core functionality. Interestingly, the code is now more reusable as you can call this method with a stream of bytes from any source. It's not uncommon for unit testing to stimulate this kind of improvement.

-- The unit testing shows my typical approach of looping through multiple cases containing input(s) and expected answer. The loop culminates with:
assert actual == expected, input


Final version? Ready for Samples?
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: hexdump - code review?

Postby Charles » Tue Dec 21, 2010 12:36 am

Charles
 
Posts: 2515
Location: Los Angeles, CA


Return to Discussion

Who is online

Users browsing this forum: No registered users and 11 guests