Forums

Regex sample

General discussion about Cobra. Releases and general news will also be posted here.
Feel free to ask questions or just say "Hello".

Regex sample

Postby Caligari » Wed Mar 25, 2009 6:26 am

Well, I've had occasion to make a small app using Regex in the last week or so. Here's what I have at the moment. It is more like a sample than a how-to in its current form. Advice and comments welcome.

"""
FindWords

This tool can be used to find which files in a directory tree contain any of a list of words.

The list of words to search for is provided by default in a file called "wordlist.txt" (located
in the same directory as the executable) with one regex per line.

A description of further options is available using the "-help" command-line argument.
"""

use System.Text.RegularExpressions

class FindWords

var wordsFilename as String is shared
var searchDirectory as String is shared
var onlyNames as bool is shared
var showLines as bool is shared
var words as SortedDictionary<of String, Regex> is shared
var startDir as DirectoryInfo is shared
var listFile as FileInfo is shared

def showSyntax is shared
appName = CobraCore.commandLineArgs[0]
if appName.lastIndexOf(r"\") > 0
appName = appName[appName.lastIndexOf(r"\")+1:]
if appName.lastIndexOf(".") > 0
appName = appName[:appName.lastIndexOf(".")]
print "Syntax:\n[appName] \[searchDirectory\] \[-l listFilename\] \[-n|-names] \[-s|-summary] \[-h|-help]"
print " -n|names only check file and directory names"
print " -s|summary only show files, not lines within files"
print " -h|help show this help text"


def main is shared

.wordsFilename = Path.getDirectoryName(CobraCore.exePath) + r"\wordlist.txt"
.searchDirectory = r".\"
.onlyNames = false
.showLines = true
.words = SortedDictionary<of String, Regex>()

args = CobraCore.commandLineArgs

lookingForList = false
foundDir = false
argError = false

if args.count > 1
for arg in args[1:]
if arg == "-h" or arg == "-help"
.showSyntax
return

else if arg == "-n" or arg == "-names" or arg == "-name"
.onlyNames = true
if lookingForList
print "unable to determine filename for list of words, expected '-l filename'"
lookingForList = false
argError = true

else if arg == "-s" or arg == "-summary"
.showLines = false
if lookingForList
print "unable to determine filename for list of words, expected '-l filename'"
lookingForList = false
argError = true

else if arg == "-l"
lookingForList = true

else if arg[0] == "-"
print "unrecognized argument [arg]"
argError = true

else
if lookingForList
.wordsFilename = arg.toString
lookingForList = false

else if not foundDir # must be the directory to look in
.searchDirectory = arg.toString
foundDir = true
else # already found directory
print "more than one directory provied on commandline, expected 'directory'"
argError = true

if argError
print
.showSyntax
return

try
wordsFile = StreamReader(.wordsFilename)
catch ioe as IOException
print 'I/O Error with [.wordsFilename]: [ioe.message]'
return
success
wordLine = wordsFile.readLine

while wordLine
wordMatch = Regex.match(wordLine, r".+")

if wordMatch.success
.words[wordMatch.toString] = Regex(wordMatch.toString, RegexOptions.Compiled | RegexOptions.IgnoreCase)

wordLine = wordsFile.readLine

wordsFile.close

print "\nFound [.words.count] search words.\n"

.listFile = FileInfo(.wordsFilename)
.startDir = DirectoryInfo(.searchDirectory)
curDir = .startDir

.checkDirectory(curDir)

print "\nDone."


def checkDirectory(curDir as DirectoryInfo) is shared
if not curDir.exists
print "Unable to find search directory [curDir.fullName]"
return

# print "Checking [curDir.fullName]..."

foundWords = List<of String>()

for word, wordMatch in .words
# check directory name
dirNameCheck = wordMatch.match(curDir.name)

if dirNameCheck.success
foundWords.add(word)

if foundWords.count
print "[curDir.fullName]: directory name may have [foundWords]"

# check files

for subFile in curDir.getFiles
if curDir.fullName == .startDir.fullName and subFile.fullName == .listFile.fullName
continue
.checkFile(subFile, curDir)

for subDir in curDir.getDirectories
.checkDirectory(subDir)

def checkFile(curFile as FileInfo, curDir as DirectoryInfo) is shared
if not curFile.exists
print "Unable to find file [curFile.fullName]"
return

# check filename
foundWords = List<of String>()

for word, wordMatch in .words
fileNameCheck = wordMatch.match(curFile.name)

if fileNameCheck.success
foundWords.add(word)

if foundWords.count
print "[curFile.fullName]: file name may have [foundWords]"

if .onlyNames
return

# check contents
try
openFile = StreamReader(curFile.fullName)
catch ioe as IOException
print " I/O Error with [curFile.fullName]: [ioe.message]"
return
success

if .showLines
curLine = openFile.readLine
lineNum = 1

while curLine
foundWords = List<of String>()

for word, wordMatch in .words
curLineCheck = wordMatch.match(curLine)

if curLineCheck.success
foundWords.add(word)

if foundWords.count
print "[curFile.fullName]([lineNum]) may have [foundWords]"

curLine = openFile.readLine
lineNum += 1

else # not show lines
curLine = openFile.readToEnd
foundWords = List<of String>()

for word, wordMatch in .words
curLineCheck = wordMatch.match(curLine)

if curLineCheck.success
foundWords.add(word)

if foundWords.count
print "[curFile.fullName] may have [foundWords]"

openFile.close


- Caligari
Caligari
 
Posts: 33

Re: Regex sample

Postby jonathandavid » Thu Mar 26, 2009 12:51 am

Great example, thanks!


Here are my 2 cents:

1) In the long "if / else if / ..." statement inside the main method, a branch statement would probably be more idiomatic. Also, in comparisons like

if arg == "-h" or arg == "-help"


I think it's better to use

if arg in ["-h",  "-help"]


2) As for the following fragment:

_
try
openFile = StreamReader(curFile.fullName)
catch ioe as IOException
print " I/O Error with [curFile.fullName]: [ioe.message]"
return
success

if .showLines
curLine = openFile.readLine
lineNum = 1

while curLine
foundWords = List<of String>()

for word, wordMatch in .words
curLineCheck = wordMatch.match(curLine)

if curLineCheck.success
foundWords.add(word)

if foundWords.count
print "[curFile.fullName]([lineNum]) may have [foundWords]"

curLine = openFile.readLine
lineNum += 1

else # not show lines
curLine = openFile.readToEnd
foundWords = List<of String>()

for word, wordMatch in .words
curLineCheck = wordMatch.match(curLine)

if curLineCheck.success
foundWords.add(word)

if foundWords.count
print "[curFile.fullName] may have [foundWords]"

openFile.close



I would have probably written it like this:


_
try
openFile = StreamReader(curFile.fullName)

if .showLines
curLine = openFile.readLine
lineNum = 1

while curLine
foundWords = List<of String>()

for word, wordMatch in .words
curLineCheck = wordMatch.match(curLine)

if curLineCheck.success
foundWords.add(word)

if foundWords.count
print "[curFile.fullName]([lineNum]) may have [foundWords]"

curLine = openFile.readLine
lineNum += 1

else # not show lines
curLine = openFile.readToEnd
foundWords = List<of String>()

for word, wordMatch in .words
curLineCheck = wordMatch.match(curLine)

if curLineCheck.success
foundWords.add(word)

if foundWords.count
print "[curFile.fullName] may have [foundWords]"

catch ioe as IOException
print " I/O Error with [curFile.fullName]: [ioe.message]"

catch e as Exception
print "Generic error: [e.message]"

finally
openFile.close
jonathandavid
 
Posts: 159

Re: Regex sample

Postby Charles » Thu Mar 26, 2009 6:12 am

Also, stuff like:
.wordsFilename = Path.getDirectoryName(CobraCore.exePath) + r"\wordlist.txt"
.searchDirectory = r".\"

Should be using Path.combine() and Path.directorySeparatorChar so it will work on any platform, such as Mono + Mac.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: Regex sample

Postby jonathandavid » Thu Mar 26, 2009 6:25 am

Chuck wrote:Also, stuff like:
.wordsFilename = Path.getDirectoryName(CobraCore.exePath) + r"\wordlist.txt"
.searchDirectory = r".\"

Should be using Path.combine() and Path.directorySeparatorChar so it will work on any platform, such as Mono + Mac.



Wouldn't "/" be a valid path separator on all platforms? I mean, "\" is obviously the preferred way under Windows, but "/" seems to work as well (at least on XP).
jonathandavid
 
Posts: 159

Re: Regex sample

Postby Caligari » Thu Mar 26, 2009 4:18 pm

I meant to use the Path separator and building stuff, but it slipped my mind. I've already got a new version here.

The other suggestions are great, and I'll incorporate some or all and get a new version together on the weekend, if I can find the time.

- Caligari
Caligari
 
Posts: 33

Re: Regex sample

Postby jonathandavid » Fri Mar 27, 2009 5:21 am

One more thing. If I'm not mistaken, the following

.words[wordMatch.toString] = Regex(wordMatch.toString, <span style="font-weight: bold">RegexOptions.Compiled | RegexOptions.IgnoreCase</span>)


could be expressed more concisely as:

.words[wordMatch.toString] = Regex(wordMatch.toString, <span style="font-weight: bold">RegexOptions(Compiled, IgnoreCase)</span>)
jonathandavid
 
Posts: 159

Re: Regex sample

Postby Charles » Sun Apr 05, 2009 11:00 pm

Caligari wrote:I meant to use the Path separator and building stuff, but it slipped my mind. I've already got a new version here.

The other suggestions are great, and I'll incorporate some or all and get a new version together on the weekend, if I can find the time.

- Caligari

Anything new to post?

Also, did you want this included in the Cobra Samples?
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: Regex sample

Postby Caligari » Mon Apr 06, 2009 5:39 pm

Sorry, project deadline at work has left me little brain space over the last few weeks. I hope to get a revised version done this week, or over Easter.

- Caligari
Caligari
 
Posts: 33

Re: Regex sample

Postby Charles » Mon Apr 06, 2009 7:09 pm

No problem. Thanks for the update.
Charles
 
Posts: 2515
Location: Los Angeles, CA

Re: Regex sample

Postby Caligari » Mon Apr 20, 2009 5:25 pm

Here is a possible finished version of the Regular Expression sample I was working on.

It uses the list "concated" method, so will not run for anyone until that is added to Cobra.

If there are any other issues, let me know. I'm happy to revise it further if needed.

- Caligari
Attachments
FindWords.cobra
Regular Expression Sample
(12.16 KiB) Downloaded 1895 times
Caligari
 
Posts: 33

Next

Return to Discussion

Who is online

Users browsing this forum: No registered users and 49 guests