I may have mentioned this before but I use it so often, I think it deserves it's own post. I am often faced (as in very often) with a single field of data that has lots of disparate bits of data embedded within it. A perfect example is the cs-uri-query field contained in IIS logs, especially where Exchange is concerned. Here is an example:
ae=Folder&t=IPF.Appointment&id=LgAAAACZfhLxcD%2bnRZZG0OyEvOwcAQC3gm%2fVeZsgSIJcyGKb0MppAEwcvx5xAAAC&yr=2015&mn=3&dy=15&Initial+Budget>>Conn:1,HangingConn:0,AD:60000/60000/0%,CAS:90000/85862/7%,AB:60000/60000/0%,RPC:90000/87538/5%,FC:$null/0,Policy:DefaultThrottlingPolicy_67df4d2e-9f19-4270-9b8e-dec43773104b,Norm&v=14.3.174.1&mbx=MBX1.CONTOSO.COM&sessionId=891d047b451141b0a6406ca854de41b9&prfltncy=134&prfrpccnt=15&prfrpcltncy=78&prfldpcnt=1&prfldpltncy=15&prfavlcnt=0&prfavlltncy=0&End+Budget>>Conn:1,HangingConn:0,AD:60000/60000/0%,CAS:90000/85738/8%,AB:60000/60000/0%,RPC:90000/87463/5%,FC:$null/0,Policy:DefaultThrottlingPolicy_67df4d2e-9f19-4270-9b8e-dec43773104b,Norm[Resources:(Mdb)ABC-DAG09-DB32(Health:-1%,HistLoad:0),]
^That is a lot of info for a single field, not to mention not very review-friendly, and for the sake of my theoretical needs I really just wanted the MBX being used, the session ID and which throttling policy is being applied. I have no real need for this data right now, I'm just randomly grabbing these for the example hence the term "theoretical need".
It is easy to extract each bit of information we want into it's own field. I can for example use EXTRACT_PREFIX() and EXTRACT_SUFFIX() to do this. We could probably use SUBSTR() but that would potentially be much messier and we'd be dealing with string lengths which is going to break the moment the string we are dealing with changes.
With EXTRACT, we can use strings and grab exactly what we want. There is a shortcut in LPS which auto-creates a near ready to go combo of the EXTRACT functions, that shortcut is SHIFT+F12 and inserts the following into the LPS query window (yes, this shortcut exists because I grew extremely tired of typing it out by hand!):
EXTRACT_PREFIX(EXTRACT_SUFFIX(FieldName, 0, 'StartChar'), 0, 'EndChar') as FieldAlias
We only need to change the tokens to fit our data. Here are those tokens and their meaning:
FieldName - The name of the field we want to extract data from (in this case cs-uri-query)
StartChar - The starting character(s) we want to use as a marker to tell LPS where our data begins.
EndChar - The ending character(s) we want to use as a marker to tell LPS where our data ends.
FieldAlias - What we want our custom column to be named.
Since I want three items, I need three of these lines, one for each bit of data I want to extract (E.g.: Mailbox Server, SessionID and Throttling Policy). I'll go ahead and fill these in and you can compare to the example querystring above to see why I chose what I chose:
SELECTTOP1000
EXTRACT_PREFIX(EXTRACT_SUFFIX(cs-uri-query, 0, '&mbx='), 0, '&') as MBXServer,
EXTRACT_PREFIX(EXTRACT_SUFFIX(cs-uri-query, 0, '&sessionId='), 0, '&') as [Session ID],
EXTRACT_PREFIX(EXTRACT_SUFFIX(cs-uri-query, 0, 'Policy:'), 0, ',') as Policy
FROM'[LOGFILEPATH]'
WHERE cs-uri-stem LIKE'%ev.owa%'
AND cs-username ISNOTNULL
*The WHERE statement above exists to make sure only records that contain the querystring data I expect are returned. If I did not do this, we'd get back some empty fields because the data we want to extract doesn't exist in every request in the IIS log. Here is an example result of the above:
Mailbox | Session ID | Policy | |
---|---|---|---|
MBX2.CONTOSO.COM | 8667ec4f36ao4660a3d8382de9dfbafe | DefaultThrottlingPolicy_45df4d2e-9f19-4270-9b8e-deu22273104b | |
MBX1.CONTOSO.COM | 8667ec4f36ao4660a3d8382de9dfbafe | DefaultThrottlingPolicy_54df4d2e-9f19-4270-9b8e-deu22273104b | |
MBX3.CONTOSO.COM | 86e44db93ec245e0b64f3feb2a68feeb | DefaultThrottlingPolicy_54df4d2e-9f19-4270-9b8e-deu22273104b | |
MBX1.CONTOSO.COM | 86e44db93ec245e0b64f3feb2a68feeb | DefaultThrottlingPolicy_54df4d2e-9f19-4270-9b8e-deu22273104b |
Neat! Now this is something I can work with.
Being an Exchange Escalation Enginner I used Exchange data from my lab but this can be used with any type of data in LPS that has text, not just IIS logs. We could have just as easily used this on a CSV log or even text. Consider the following line of text and also consider it is in some text file somewhere:
foo=bar+bar=001,sometext||hello
We could easily use what we have learned to turn that into useful columns of data. In this example I'm using the TEXTLINELOG type to search my example text file:
SELECTTOP1000
EXTRACT_PREFIX(EXTRACT_SUFFIX(Text, 0, 'foo='), 0, '+') as Foo,
EXTRACT_PREFIX(EXTRACT_SUFFIX(Text, 0, 'bar='), 0, ',') as Bar,
EXTRACT_PREFIX(EXTRACT_SUFFIX(Text, 0, '||'), 0, '+') as SomeText
FROM'[LOGFILEPATH]'
Result:
Foo | Bar | SomeText | |
---|---|---|---|
bar | 001 | hello |
When data mining all types of data with LPS, I find myself using this technique much of the time. It's just too handy to be able to extract data and view it as I need to see it which is the entire point.... Get to the data and represent it in a meaningful way. Just remember to use SHIFT+F12 to insert the template.