Page 1 of 2

Regex sample

PostPosted: Wed Mar 25, 2009 6:26 am
by Caligari
Well, I've had occasion to make a small app using Regex in the last week or so. Here's what I have at the moment. It is more like a sample than a how-to in its current form. Advice and comments welcome.

"""
FindWords

This tool can be used to find which files in a directory tree contain any of a list of words.

The list of words to search for is provided by default in a file called "wordlist.txt" (located
in the same directory as the executable) with one regex per line.

A description of further options is available using the "-help" command-line argument.
"""

use System.Text.RegularExpressions

class FindWords

var wordsFilename as String is shared
var searchDirectory as String is shared
var onlyNames as bool is shared
var showLines as bool is shared
var words as SortedDictionary<of String, Regex> is shared
var startDir as DirectoryInfo is shared
var listFile as FileInfo is shared

def showSyntax is shared
appName = CobraCore.commandLineArgs[0]
if appName.lastIndexOf(r"\") > 0
appName = appName[appName.lastIndexOf(r"\")+1:]
if appName.lastIndexOf(".") > 0
appName = appName[:appName.lastIndexOf(".")]
print "Syntax:\n[appName] \[searchDirectory\] \[-l listFilename\] \[-n|-names] \[-s|-summary] \[-h|-help]"
print " -n|names only check file and directory names"
print " -s|summary only show files, not lines within files"
print " -h|help show this help text"


def main is shared

.wordsFilename = Path.getDirectoryName(CobraCore.exePath) + r"\wordlist.txt"
.searchDirectory = r".\"
.onlyNames = false
.showLines = true
.words = SortedDictionary<of String, Regex>()

args = CobraCore.commandLineArgs

lookingForList = false
foundDir = false
argError = false

if args.count > 1
for arg in args[1:]
if arg == "-h" or arg == "-help"
.showSyntax
return

else if arg == "-n" or arg == "-names" or arg == "-name"
.onlyNames = true
if lookingForList
print "unable to determine filename for list of words, expected '-l filename'"
lookingForList = false
argError = true

else if arg == "-s" or arg == "-summary"
.showLines = false
if lookingForList
print "unable to determine filename for list of words, expected '-l filename'"
lookingForList = false
argError = true

else if arg == "-l"
lookingForList = true

else if arg[0] == "-"
print "unrecognized argument [arg]"
argError = true

else
if lookingForList
.wordsFilename = arg.toString
lookingForList = false

else if not foundDir # must be the directory to look in
.searchDirectory = arg.toString
foundDir = true
else # already found directory
print "more than one directory provied on commandline, expected 'directory'"
argError = true

if argError
print
.showSyntax
return

try
wordsFile = StreamReader(.wordsFilename)
catch ioe as IOException
print 'I/O Error with [.wordsFilename]: [ioe.message]'
return
success
wordLine = wordsFile.readLine

while wordLine
wordMatch = Regex.match(wordLine, r".+")

if wordMatch.success
.words[wordMatch.toString] = Regex(wordMatch.toString, RegexOptions.Compiled | RegexOptions.IgnoreCase)

wordLine = wordsFile.readLine

wordsFile.close

print "\nFound [.words.count] search words.\n"

.listFile = FileInfo(.wordsFilename)
.startDir = DirectoryInfo(.searchDirectory)
curDir = .startDir

.checkDirectory(curDir)

print "\nDone."


def checkDirectory(curDir as DirectoryInfo) is shared
if not curDir.exists
print "Unable to find search directory [curDir.fullName]"
return

# print "Checking [curDir.fullName]..."

foundWords = List<of String>()

for word, wordMatch in .words
# check directory name
dirNameCheck = wordMatch.match(curDir.name)

if dirNameCheck.success
foundWords.add(word)

if foundWords.count
print "[curDir.fullName]: directory name may have [foundWords]"

# check files

for subFile in curDir.getFiles
if curDir.fullName == .startDir.fullName and subFile.fullName == .listFile.fullName
continue
.checkFile(subFile, curDir)

for subDir in curDir.getDirectories
.checkDirectory(subDir)

def checkFile(curFile as FileInfo, curDir as DirectoryInfo) is shared
if not curFile.exists
print "Unable to find file [curFile.fullName]"
return

# check filename
foundWords = List<of String>()

for word, wordMatch in .words
fileNameCheck = wordMatch.match(curFile.name)

if fileNameCheck.success
foundWords.add(word)

if foundWords.count
print "[curFile.fullName]: file name may have [foundWords]"

if .onlyNames
return

# check contents
try
openFile = StreamReader(curFile.fullName)
catch ioe as IOException
print " I/O Error with [curFile.fullName]: [ioe.message]"
return
success

if .showLines
curLine = openFile.readLine
lineNum = 1

while curLine
foundWords = List<of String>()

for word, wordMatch in .words
curLineCheck = wordMatch.match(curLine)

if curLineCheck.success
foundWords.add(word)

if foundWords.count
print "[curFile.fullName]([lineNum]) may have [foundWords]"

curLine = openFile.readLine
lineNum += 1

else # not show lines
curLine = openFile.readToEnd
foundWords = List<of String>()

for word, wordMatch in .words
curLineCheck = wordMatch.match(curLine)

if curLineCheck.success
foundWords.add(word)

if foundWords.count
print "[curFile.fullName] may have [foundWords]"

openFile.close


- Caligari

Re: Regex sample

PostPosted: Thu Mar 26, 2009 12:51 am
by jonathandavid
Great example, thanks!


Here are my 2 cents:

1) In the long "if / else if / ..." statement inside the main method, a branch statement would probably be more idiomatic. Also, in comparisons like

if arg == "-h" or arg == "-help"


I think it's better to use

if arg in ["-h",  "-help"]


2) As for the following fragment:

_
try
openFile = StreamReader(curFile.fullName)
catch ioe as IOException
print " I/O Error with [curFile.fullName]: [ioe.message]"
return
success

if .showLines
curLine = openFile.readLine
lineNum = 1

while curLine
foundWords = List<of String>()

for word, wordMatch in .words
curLineCheck = wordMatch.match(curLine)

if curLineCheck.success
foundWords.add(word)

if foundWords.count
print "[curFile.fullName]([lineNum]) may have [foundWords]"

curLine = openFile.readLine
lineNum += 1

else # not show lines
curLine = openFile.readToEnd
foundWords = List<of String>()

for word, wordMatch in .words
curLineCheck = wordMatch.match(curLine)

if curLineCheck.success
foundWords.add(word)

if foundWords.count
print "[curFile.fullName] may have [foundWords]"

openFile.close



I would have probably written it like this:


_
try
openFile = StreamReader(curFile.fullName)

if .showLines
curLine = openFile.readLine
lineNum = 1

while curLine
foundWords = List<of String>()

for word, wordMatch in .words
curLineCheck = wordMatch.match(curLine)

if curLineCheck.success
foundWords.add(word)

if foundWords.count
print "[curFile.fullName]([lineNum]) may have [foundWords]"

curLine = openFile.readLine
lineNum += 1

else # not show lines
curLine = openFile.readToEnd
foundWords = List<of String>()

for word, wordMatch in .words
curLineCheck = wordMatch.match(curLine)

if curLineCheck.success
foundWords.add(word)

if foundWords.count
print "[curFile.fullName] may have [foundWords]"

catch ioe as IOException
print " I/O Error with [curFile.fullName]: [ioe.message]"

catch e as Exception
print "Generic error: [e.message]"

finally
openFile.close

Re: Regex sample

PostPosted: Thu Mar 26, 2009 6:12 am
by Charles
Also, stuff like:
.wordsFilename = Path.getDirectoryName(CobraCore.exePath) + r"\wordlist.txt"
.searchDirectory = r".\"

Should be using Path.combine() and Path.directorySeparatorChar so it will work on any platform, such as Mono + Mac.

Re: Regex sample

PostPosted: Thu Mar 26, 2009 6:25 am
by jonathandavid
Chuck wrote:Also, stuff like:
.wordsFilename = Path.getDirectoryName(CobraCore.exePath) + r"\wordlist.txt"
.searchDirectory = r".\"

Should be using Path.combine() and Path.directorySeparatorChar so it will work on any platform, such as Mono + Mac.



Wouldn't "/" be a valid path separator on all platforms? I mean, "\" is obviously the preferred way under Windows, but "/" seems to work as well (at least on XP).

Re: Regex sample

PostPosted: Thu Mar 26, 2009 4:18 pm
by Caligari
I meant to use the Path separator and building stuff, but it slipped my mind. I've already got a new version here.

The other suggestions are great, and I'll incorporate some or all and get a new version together on the weekend, if I can find the time.

- Caligari

Re: Regex sample

PostPosted: Fri Mar 27, 2009 5:21 am
by jonathandavid
One more thing. If I'm not mistaken, the following

.words[wordMatch.toString] = Regex(wordMatch.toString, <span style="font-weight: bold">RegexOptions.Compiled | RegexOptions.IgnoreCase</span>)


could be expressed more concisely as:

.words[wordMatch.toString] = Regex(wordMatch.toString, <span style="font-weight: bold">RegexOptions(Compiled, IgnoreCase)</span>)

Re: Regex sample

PostPosted: Sun Apr 05, 2009 11:00 pm
by Charles
Caligari wrote:I meant to use the Path separator and building stuff, but it slipped my mind. I've already got a new version here.

The other suggestions are great, and I'll incorporate some or all and get a new version together on the weekend, if I can find the time.

- Caligari

Anything new to post?

Also, did you want this included in the Cobra Samples?

Re: Regex sample

PostPosted: Mon Apr 06, 2009 5:39 pm
by Caligari
Sorry, project deadline at work has left me little brain space over the last few weeks. I hope to get a revised version done this week, or over Easter.

- Caligari

Re: Regex sample

PostPosted: Mon Apr 06, 2009 7:09 pm
by Charles
No problem. Thanks for the update.

Re: Regex sample

PostPosted: Mon Apr 20, 2009 5:25 pm
by Caligari
Here is a possible finished version of the Regular Expression sample I was working on.

It uses the list "concated" method, so will not run for anyone until that is added to Cobra.

If there are any other issues, let me know. I'm happy to revise it further if needed.

- Caligari