OffBeatMammal

Searching for monkeys in Cyberspace

Using ASP.NET to restrict access to images

clock June 30, 2006 00:35 by author offbeatmammal
One thing I love about ASP.NET 2.0 and the whole evolution of the Microsoft web platform isn't the "big picture" (stability, performance, interoperability and all that good stuff) but quite often it's the little gems which as a developer suddenly make my life so much easier.

I recently had a fairly simple problem to solve... an existing site that had some images that should only be available to members not casual browsers (yes, I know. right-click, save-as and distribute the file... however once a user does that we're not responsible for who sees it... it wasn't about DRM or restriction... it was about protecting us from accusations of making the content available directly to minors (because, after all, the internet like TV makes a great babysitter)

Now, one of the great things about the combination of IIS6 is that you can create an "application" that allows you to apply custom filters (which you can write in Visual Studio 2005) and use that to control access to any file you like - checking cookies, database settings and a whole host of other information to decide if you want to show the image or something else.

It's surprisingly easy and performance is not impacted noticeably if you keep the code clean and tight

The sample below works on any image which includes "secure" in the filename. It looks for a MembershipID cookie and if it's there displays the image. If not, it replaces it with a default image.... but you could extend it to check user levels against a datasource, or insist on having a valid referrer which matches the current site (to stop leechers) etc....

Imports System
Imports System.Web

Public Class HTTPjpegAccess
    Implements IHttpHandler

    Public Sub ProcessRequest(ByVal context As HttpContext) Implements IHttpHandler.ProcessRequest
        Dim MembershipID As String
        Try
            MembershipID = context.Request.Cookies("MembershipID").ToString
        Catch ex As Exception
            MembershipID = ""
        End Try

        Dim FileName As String = context.Server.MapPath(context.Request.FilePath)
        If LCase(FileName).IndexOf("secure") > 0 Then
            If MembershipID <> "" Then
                ' they are a member
            Else
                FileName = System.IO.Path.GetDirectoryName(FileName) & "/restricted.jpg"
            End If
        Else
            ' it's not a secured image
        End If

        context.Response.ContentType = "image/JPEG"
        context.Response.WriteFile(FileName)

    End Sub

    Public ReadOnly Property IsReusable() As Boolean Implements IHttpHandler.IsReusable
        Get
            Return True
        End Get
    End Property
End Class

Once the code above has been modified to suit your filenames and/or cookies etc place it (the .dll) in the /bin directory for your site.

In the directory you wish to protect you need to set up a web.config to tell it what handler to use for the filetype - in this case the .jpgs. You will also need to place the default (in this case "restricted.jpg") image in the directory as the failover.

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <system.web>
        <httpHandlers>
            <add verb="*" path="*.jpg" type="HTTPjpegAccess.HTTPjpegAccess, HTTPjpegAccess"/>
        </httpHandlers>
    </system.web>
    
</configuration>

A great article that covers a lot of the theory about how this works, and what else it can do is here at DevX, and also this Microsoft Knowledgebase article



Performance Tuning ASP/SQL Server applications

clock June 4, 2006 00:31 by author offbeatmammal
Developers and DBAs have traditionally had a sometimes antagonistic relationship. DBAs getting upset with developers who don't tweak their applications to avoid putting any strain on the precious databases, and developers being frustrated by the sometimes esoteric rules imposed on them by how SQL works.

I've been lucky having worked as both a developer (on COBOL and Focus on IBM mainframes down to hand-crafting assembler on 8 bit micro-processors) and a DBA (on DB2, Datacom/DB and SQL Server) so can see both sides of the equation. And I've come to the realisation having worked with 'pure' DBAs and 'pure' developers that the world is a much better place if you do have a bit of an understanding of the requirements and limitations on both sides of the fence, and you're willing to make a bit of an effort to tune both the DB and the code to support the end goal - stable, efficient applications.

How busy was that server!Over the years as traffic has grown on my pet project from 1 million page impressions a day to 2 then to 3.4 and currently a peak of over 6 million in a day (690,000 in one hour), with pretty much the same number of servers handling the load, optimising the code and the database becomes more and more critical. Of course each year the hardware also gets better for the same price - this year with AMD Opteron processors and 64 bit versions of Windows 2003 Server and SQL Server 2005 we have much more raw computing power available to throw at the problem.

A lot of performance tuning is very application / environment specific, but over the years there are a few tricks which I've applied to my favourite site and also a number of other projects, either building from scratch or going in fire-fighting to work out why response times where so slow.

Because I'm a nice person.... here they are to share. If you find any of them useful, or have any suggestions to improve them please feel free to share in the comments below.

Indexing - Easy to miss, devastating if not done right.

When you're developing database structures and code side-by-side it's very common to add new queries, a quick select or an update without checking that SQL is going to be able to optimise it without a bit of extra help. Imagine if the table has 100,000 rows and you're doing an update on a column that's not the primary key (why are you doing that in the first place ?!) or isn't part of an index .... The poor old SQL Server has to sift through every one of those rows to find the one you wanted to update (maintaining a lock on the table in addition to the CPU, RAM and Disk resource required for the trawl). Index optimisation can be assisted (with sufficient test data) through use of the Enterprise Manager SQL Tuning Wizard, but in most cases a manual tuning exercise identifying queries and either re-structuring them to support the data, or adding indices to support their requirements is beneficial. Note: Don't go overboard on index creation.... Every additional index brings with it overhead in terms of data storage and the resource requirements to maintain those indices.

Locking – Sometimes it's good to share.

SQL Server is, by default, fairly cautious. While this is often good, especially with rapidly changing data and transactions, it can be very resource intensive for simple data retrieval where it is either known that the data won't be changing, or it's okay to use the last completed status rather than worry about the values of a currently updating record. This is often referred to as a dirty read.

Telling SQL Server not to observe locks on a table is simple – add "with (nolock)" after the table name in the from clause (eg "select email from tMember with (nolock) where member_id = 1")

Don't nest queries on the same connection - Sometimes it's not so good to share

A lot of time in an application it's easy to just create one connection to a database and use that for all communications. While it makes for a quick and easy programming model, it's not making best use of the database and communications resources. In a typical web application the ASP pages communicate with the SQL Server via a TCP connection. Each one of those connections requires a dedicated port which is held open (and locked) for as long as SQL and the operating system believe it's still needed. In a simple query model where all requests are sequential this doesn't cause a huge problem as the system is able to keep track of resources and almost all the assumptions it makes are correct. If however you nest queries on the same connection that all starts to get confused.

rs = con.execute("select email, first_name, last_name from tMember with (nolock) where member_id = 1")
is okay as it's obvious to the database when you've finished with the connection
rs = con.execute("select email, first_name, last_name from tMember with (nolock) order by member_id")
While not rs.eof
	rs2 = con.execute("select answer from tCompetitionEntry with (nolock) where comp_id = 1 and member_id = " & rs("member_id").value
	....
	rs.movenext
End While
is bad because SQL and the underlying operating system can never be sure exactly when you've finished with the query... so leave it to timeout rather than free it up for re-use nice and quickly.

The solution is very simple... open a second database connection for the nested transaction

rs = con.execute("select email, first_name, last_name from tMember with (nolock) order by member_id")
While not rs.eof
	rs2 = con2.execute("select answer from tCompetitionEntry with (nolock) where comp_id = 1 and member_id = " & rs("member_id").value
	         ^
	....
	rs.movenext
End While
- just don't forget for each level of nesting you add a new level of database connection. It's quite easy when you have nested includes, function and subroutine calls and (especially dangerous) recurssion to not track which database connection you are using, but for a high traffic, database intensive site getting this wrong can be pretty catastrophic.
Release resources - some things just want to be free

Further to eliminating nested queries on a connection there are a number of other things that you can do to help SQL and the operating system free up resources to make for a much smoother and stable environment.

Always close connections yourself, as soon as you can (and don't rely on timeouts and garbage collection to do your housekeeping).

rs = con.execute("select count(*) from tCompetitionEntry with (nolock) where comp_id = 1 and member_id = " & uMemberId")
if rs(0).value > 0 then
	' they have already entered so bounce them to an error page
	response.redirect("error.aspx?err=" & server.urlencode("Sorry, you have already entered this competition"))
else
.....
while not having any huge issue with it is however bad because it leaves the database connection hanging around. Better to include a specific con.close before the redirect. In fact I always use a small subroutine to ensure the database connection(s) have been closed before performing the redirection (makes sure I don't forget!)
...
	' they have already entered so bounce them to an error page
	page_redirect("error.aspx?err=" & server.urlencode("Sorry, you have already entered this competition"))
...

Sub page_redirect(whereTo)
	closeDB()	' call to subroutine to ensure all database connections are closed
	response.redirect(whereTo)
End sub

Move the data to the application and free the connection. While it's quick and easy to keep a connection open while you loop through and process the data you need to be aware that in doing so you have tied up the connection for the entire duration of the processing.

rs = con.execute("select email, first_name, last_name from tMember with (nolock) order by member_id")
While not rs.eof
	....
	rs.movenext
End While
In this day and age RAM is pretty much a commodity and it's very cost effective to add a GB or 2 of RAM to a server. Certainly better than having to upgrade your entire infrastructure because it's easier than optimising some code. To take advantage of the available memory simply dump the data from the query into an array, release the connection and then process the data (although of course, then monitoring memory usage to ensure you're not thrashing to disc is important)
rs = con.execute("select email, first_name, last_name from tMember with (nolock) order by member_id")
memberArray = rs.getRows()
rs = nothing
for memberLoop = 0 to ubound(memberArray,2)
	....
next

TCP Time_Wait and Max User (Ephemeral) Ports By default Windows uses the conventional BSD range of 1024 to 5000 for its anonymous (ephemeral) port range. This places a limit on the number of concurrent sessions you can have active at any one time which for a high traffic site this can be a significant issue. You can increase the port range that is used for anonymous ports to approximately 20,000 ports by modifying the MaxUserPort registry key (this parameter controls the maximum port number that is used when an application requests any available user port from the system). You can set only the upper bound of the ephemeral port range. To modify the MaxUserPort registry key, follow these steps:

  • Start Registry Editor (Regedt32.exe). Please note that you should backup your registry and any important files on your computer before editing the registry.
  • Locate the MaxUserPort key in the registry:
    HKEY_LOCAL_MACHINE\ SYSTEM\ CurrentControlSet\ Services\ Tcpip\ Parameters
  • On the Edit menu, click Add Value, and then add the following registry value:
    Value Name: MaxUserPort
    Data Type: REG_DWORD
    Value: 65534 (for example)
    Valid Range: 5000-65534 (decimal)
    Default: 0x1388 (5000 decimal)
  • Quit Registry Editor.

When a connection is closed it stays in the TIME_WAIT state for a certain length of time in case the system encounters a situation where it is better to re-use it rather than create a new connection. As most of the connections will be from new users wanting new data this however it's an optimal situation for a hightly dynamic website. By default this TIME_WAIT state is maintained for 240 seconds. While a connection is in the TIME_WAIT state, the socket pair cannot be reused. For more information, see RFC 793.

You can change the timeout on the connections from 240 seconds (the default) to any length from 30 seconds to 300 seconds. Use the TcpTimedWaitDelay registry parameter to change this value (this parameter determines the length of time that a connection stays in the TIME_WAIT state when the connection is being closed). To do this:

  • Start Registry Editor (Regedt32.exe)
  • locate the following key in the registry, and then modify the value
    HKEY_LOCAL_MACHINE\ System\ CurrentControlSet\ services\ Tcpip\ Parameters
    Value Name:TcpTimedWaitDelay
    Value Type: REG_DWORD-time in seconds
    Valid Range: 30-300 (decimal)
    Default: 0xF0 (240 decimal)
  • More information available here.
Select * - sometimes wishing on a star isn't good

It's very simple when writing a query to take the easy route and ask the database for all the columns in a query – by using "select * from tMember (with nolock)". This innocent piece of coding can have a lot of undesired results. Firstly in terms of performance – not only does SQL Server have to go away and find out what those column names are before it can execute the query, it also has to pull back more data that you potentially need. It's also possible that any code which relies on the sequence those rows are returned in will break when a new column is added.

Far better to craft SQL Selections that return exactly what you need (with a minimal overhead) by specifically defining the required columns (eg "select first_name, last_name, email from tMember with (nolock) where member_id = 1" is a lot better for the system than "select * from tMember where member_id = 1" in so many ways)

OPTION (FAST n) - For when bigger isn't better

Before executing a query SQL Server has no real way of knowing how large the returned set of data will be, so it has to allocate an estimated buffer. Although it's estimates are good (and get better as it works with the data) there are many occasions where a little human intervention and knowledge of the application can really help.

If you know that a query will only every return one row, or a hundred rows, or for that matter 100,000 rows why not tell SQL Server up-front so it can try and marshal it's resources in the most efficient way.

Note: There can be a dark side to this hint... Although it optimises for the number of rows specified if your query returns more than the declared number of rows overall performance will be compromised, so only use this if you're sure of the size of the return set, or you add a bit of margin

Note: Option Fast is similar to the SQL7 hint FASTFIRSTROW which only optimised for single row returns, but FASTFIRSTROW will not be supported in future versions of SQL

Stored Procedures vs ad hoc queries – It's always better if you know what you're doing

If you send an ad hoc query to SQL Server it has to attempt to optimise it (based on what it knows about the database, the tables, existing indices etc, assisted by hints and help you've already given it) and then execute it. It has to do that every time the query is submitted.

Stored Procedures are a way in which a series of SQL statements (and associated logic) can be packaged together with input and output parameters to perform a specific function. While requiring a little more forethought on behalf of the developer and DBA Stored Procedures are great for SQL Server as it only has to perform the optimisation step once when the query is defined, and not every time it is executed. Those small savings can add up over time.

The other bonus of using Stored procedures is code maintainability. If you make all DB references via Stored Procedures then when you need to make a change to the database you don't have to spend as much time checking every dynamic SQL statement to see if it breaks - check the Stored Procedures referencing the table in question and work from there....

Sadly stored procedures are often an area where the demarcation really rears its head, with DBAs not wanting to let the developers near their database, and the programmers not wanting a DBA dictating how they develop their queries. But it's also one of the best places to forge some common ground and shared understanding of whats possible, desirable and practical.

It is strongly recommended that you do not create any stored procedures using sp_ as a prefix. SQL Server always looks for a stored procedure beginning with sp_ in this order:

  • The stored procedure in the master database.
  • The stored procedure based on any qualifiers provided (database name or owner).
  • The stored procedure using dbo as the owner, if one is not specified.
Therefore, although the user-created stored procedure prefixed with sp_ may exist in the current database, the master database is always checked first, even if the stored procedure is qualified with the database name.
Important: If any user-created stored procedure has the same name as a system stored procedure, the user-created stored procedure will never be executed.
Database Housekeeping – It's easier to find things when they're organised

Over time, just like the filesystem on your PC, any database will become fragmented. The data in tables and indices scattered around the structure of the database file, and even the physical files underlying the database scattered around the hard drive(s). This all adds overhead to every database operation. Sadly tidying this up is not a quick, or easy task – but if scheduled regularly can be made more predictable, and certainly less painful.

There are a number of DBCC tasks which can be run within SQL Server. Please check the documentation before diving in and trying these on a live database.

  • DBCC INDEXDEFRAG Is one of the more useful utilities as it optimises the target index.
  • DBCC SHOWCONTIG can be used to show what tables and indices are in need of defragmentation.
These two DBCC commands can be used together to identify trouble-spots and rectify them.
Externally to the database itself the physical health of the machine (fragmentation of the hard drive, sufficient RAM etc) can all contribute to smooth running, or poor performance. Use of SQL Profiler and the Windows Performance Monitor tools is highly recommended to examine all projects on a regular basis – in new development to make sure there are no unexpected poor performers, in old projects just to make sure you can't eke out a little more performance or improve stability and in maintenance projects just to make sure the latest changes have not impacted the database unduly.
Is that it? It all sounds so simple....

Taken on their own any of thes techniques have value. Used in combination they can have some pretty good results. Obviosuly it's always better to design and build your application from the ground-up to include these guidelines and any other site specific performance rules... but always remember there's no one panacea that's guaranteed to make your application perform perfectly, and you can never stop researching other/better solutions to problems you think you've got covered. Often you may have to try a number of combinations and review code, database and environment to find the answer to the problem. And it's not always obvious. Although these hints are mostly based around development work I have had an experience where a website was performing terribly, even though the database server was at less than 15% utilisation and the front-end webservers where all below 20% utilisation.... the problem turned out to be a faulty network switch in the infrastructure between them (so... developers and DBAs... remember, play nice with the networking gurus as well as each other!)



Upload and resize and image in ASP.NET

clock November 19, 2005 00:02 by author offbeatmammal

For a project I've just finished I needed to be able to upload and resize (within certain parameters) an image. The code I was working with was a patch of Classic ASP to make it work in an ASP.NET environment and remove the reliance on some third party components that where not supported inthe new environment.

As usual I googled to find some samples (why reinvent the wheel) but most of them didn't quite fit my requirements (either problems with maintaining aspect ratio, not placing upper limits on the width or height of the thumbnail or - the most persistant problem - ending up giving me thumbnails that where bigger than the original image)

So I ended up having to roll my own (it's good for the soul anyway) ... hopefully to save anyone else the pain I've added some comments and made it available for you do use how you like.... just click here for the ASP.NET File Upload and Resize Sample (live demo and source download).

...or you can just grab the code from here.

  1: <%@ Page Trace="False" Language="vb" aspcompat="false" debug="true" validateRequest="false"%>
  2: <%@ Import Namespace=System.Drawing %>
  3: <%@ Import Namespace=System.Drawing.Imaging %>
  4: <%@ Import Namespace=System %>
  5: <%@ Import Namespace=System.Web %>
  6: <SCRIPT LANGUAGE="VBScript" runat="server">
  7: const Lx = 200 ' max width for thumbnails
  8: const Ly = 240 ' max height for thumbnails
  9: const upload_dir = "/upload_resize_test/" ' directory to upload file
 10: const upload_original = "sample" ' filename to save original as (suffix added by script)
 11: const upload_thumb = "thumb" ' filename to save thumbnail as (suffix added by script)
 12: const upload_max_size = 25 ' max size of the upload (KB) note: this doesn't override any server upload limits
 13: dim fileExt ' used to store the file extension (saves finding it mulitple times)
 14: dim newWidth, newHeight as integer ' new width/height for the thumbnail
 15: dim l2 ' temp variable used when calculating new size
 16: dim fileFld as HTTPPostedFile ' used to grab the file upload from the form
 17: Dim originalimg As System.Drawing.Image ' used to hold the original image
 18: dim msg ' display results
 19: dim upload_ok as boolean ' did the upload work ?
 20: </script>
 21: <%
 22: randomize() ' used to help the cache-busting on the preview images
 23: upload_ok = false
 24: if lcase(Request.ServerVariables("REQUEST_METHOD"))="post" then
 25:  fileFld = request.files(0) ' get the first file uploaded from the form (note:- you can use this to itterate through more than one image)
 26:  if fileFld.ContentLength > upload_max_size * 1024 then
 27:   msg = "Sorry, the image must be less than " & upload_max_size & "Kb"
 28:  else
 29:   try
 30:    originalImg = System.Drawing.Image.FromStream(fileFld.InputStream)
 31:    ' work out the width/height for the thumbnail. Preserve aspect ratio and honour max width/height
 32:    ' Note: if the original is smaller than the thumbnail size it will be scaled up
 33:    If (originalImg.Width/Lx) > (originalImg.Width/Ly) Then
 34:     L2 = originalImg.Width
 35:     newWidth = Lx
 36:     newHeight = originalImg.Height * (Lx / L2)
 37:     if newHeight > Ly then
 38:      newWidth = newWidth * (Ly / newHeight)
 39:      newHeight = Ly
 40:     end if
 41:    Else
 42:     L2 = originalImg.Height
 43:     newHeight = Ly
 44:     newWidth = originalImg.Width * (Ly / L2)
 45:     if newWidth > Lx then
 46:      newHeight = newHeight * (Lx / newWidth)
 47:      newWidth = Lx
 48:     end if
 49:    End If
 50: 
 51:             Dim thumb As New Bitmap(newWidth, newHeight)
 52: 
 53:             'Create a graphics object
 54:             Dim gr_dest As Graphics = Graphics.FromImage(thumb)
 55: 
 56:    ' just in case it's a transparent GIF force the bg to white
 57:    dim sb = new SolidBrush(System.Drawing.Color.White)
 58:    gr_dest.FillRectangle(sb, 0, 0, thumb.Width, thumb.Height)
 59: 
 60:             'Re-draw the image to the specified height and width
 61:    gr_dest.DrawImage(originalImg, 0, 0, thumb.Width, thumb.Height)
 62: 
 63:    try
 64:     fileExt = System.IO.Path.GetExtension(fileFld.FileName).ToLower()
 65:     originalImg.save(Server.MapPath(upload_dir & upload_original & fileExt), originalImg.rawformat)
 66:     thumb.save(Server.MapPath(upload_dir & upload_thumb & fileExt), originalImg.rawformat)
 67:     msg = "Uploaded " & fileFld.FileName & " to " & Server.MapPath(upload_dir & upload_original & fileExt)
 68:     upload_ok = true
 69:    catch
 70:     msg = "Sorry, there was a problem saving the image."
 71:    end try
 72:    ' Housekeeping for the generated thumbnail
 73:    if not thumb is nothing then
 74:     thumb.Dispose()
 75:     thumb = nothing
 76:    end if
 77:   catch
 78:    msg = "Sorry, that was not an image we could process."
 79:   end try
 80:  end if
 81: 
 82:  ' House Keeping !
 83:  if not originalImg is nothing then
 84:   originalImg.Dispose()
 85:   originalImg = nothing
 86:  end if
 87: 
 88: end if
 89: %>
 90: <html>
 91: <head>
 92: <title>ASP.NET File Upload and Resize Sample</title>
 93: <META NAME="Description" CONTENT="ASP.NET File Upload and Resize Sample (Hybrid VB.NET)">
 94: <META NAME="Keywords" CONTENT="ASP.NET, ASP, NET, VB, VBScript, Image, Upload, Resize, Thumbnail, Constrain, Filesize, File, Size, Free">
 95: <META NAME="Copyright" CONTENT="Rufan-Redi Pty Ltd 2005">
 96: <META NAME="Author" CONTENT="System developed by Jeremy at http://www.Rufan-Redi.com">
 97: </head>
 98: <body>
 99: 
100: <p><b>Hybrid ASP.NET File Upload and Resize Sample (VB.NET)</b>
101: <br>Upload and resize a GIP/JPG/PNG images, ensuring filesizes are optimum.</p>
102: 
103: <form enctype="multipart/form-data" method="post" runat="server">
104: <table>
105: <tr><td>Select the file to upload:</td><td><input type="file" name="upload_file"></td></tr>
106: <tr><td colspan=2>Max upload size <%=upload_max_size%>Kb, gif/jpg/png only</td></tr>
107: <tr><td colspan=2><input type="submit" value="Upload"></td></tr>
108: </table>
109: </form>
110: 
111: <%
112: if upload_ok then
113: %>
114: <table>
115: <tr>
116: <td valign=top><img src="<%=upload_dir & upload_original & fileExt & "?" & rnd()%>"></td>
117: <td valign=top><img src="<%=upload_dir & upload_thumb & fileExt & "?" & rnd()%>"></td>
118: </tr>
119: </table>
120: <%
121: else
122:  response.write(msg)
123: end if
124: %>
125: </body>
126: </html>


Getting access to an ASP.NET uploaded file

clock November 16, 2005 23:59 by author offbeatmammal

Sometimes in ASP.NET you just want to get access to the content of a text file uploaded from a form. I wanted to use it in conjunction with my nifty RegEx CSV Processor but... here it is to share

1:  <%
2:  ' upload a text file (eg a csv) and make the content available as an array of lines  
3:  ' Hybrid ASP.NET (VB) Solution  
4:  Dim fileFld
5:  Dim byteLoop, fileText
6:  
7:  If lcase(Request.ServerVariables("REQUEST_METHOD"))="post" Then
8:     fileFld = request.files(0)
9:     If fileFld.filename = "" Then
10:          ' no file to process  
11:     Else         12:         Dim intFileLen As Integer = fileFld.ContentLength ' find out how long the file is
13:         Dim b(intFileLen) As Byte ' and create an array big enough to hold it
14:          ' move the uploaded file into the byte array  
15:         fileFld.InputStream.Read(b, 0, intFileLen)
16:           ' Copy the byte array into a string.  
17:          For byteLoop = 0 To intFileLen-1
18:             fileText = fileText & Chr(b(byteLoop))
19:          Next byteLoop
20:          ' split the file into an array  
21:         fileLine = Split(fileText,Chr(13))
22:     End If
23:  End If
24:  %>
25:  
26:  .....
27:  
28:  <form enctype="multipart/form-data" method="post" runat="server">
29:  <table>
30:  <tr><td>Select the file to upload:</td><td><input type="file" name="upload_file"></td></tr>
31:  <tr><td colspan="2"><input type="submit" value="Upload">
32:  </table>
33:  </form>
34:  %>


Using a RegEx (Regular Expression) to read a CSV (Comma Seperated Variable) file

clock November 16, 2005 23:50 by author offbeatmammal
Had to solve one of those tricky little problems today... a user uploading a CSV and splitting the fields into a database.

Played around with the ADO CSV driver but... it's very picky about the quality of the data and ended up being harder work than a simple CSV import function should be...

A quick search (Google is easier than thinking after all) revealed a bunch of regex to handle csv input... the only trouble being that most of the solutions where partial...

So... here's my contribution... handles CSVs and copes with quoted items.

1:  <%
2:  ' Split a line of a CSV into array cells  
3:  ' support quoted (") entries with internal commas  
4:  ' Hybrid ASP.NET (VB) Solution  
5:  
6:  dim csvRegEx As Regex = New RegEx(",(?=(?:[^\""]*\""[^\""]*\"")*(?![^\""]*\""))")
7:  dim source_str
8:  dim csvArray, loopArray
9:  
10:  ' load source_str - this can be from a text file, or an uploaded file  
11:  
12:  csvArray = csvRegEx.Split(source_str)
13: ' Strip off quotes (if the entry was quoted)
14:  For loopArray = 0 To UBound(csvArray)
15:     if csvArray(loopArray) <> "" Then
16:         If Left(csvArray(loopArray),1) = Chr(34) And Right(csvArray(loopArray),1) = Chr(34) Then
17:             csvArray(loopArray) = trim(Mid(csvArray(loopArray),2,Len(csvArray(loopArray))-2))
18:         End if
19:     End if
20:  Next
21:  %>


Search

Calendar

<<  September 2010  >>
SuMoTuWeThFrSa
2930311234
567891011
12131415161718
19202122232425
262728293012
3456789

Sign in

Twitter


    follow OffBeatMammal at http://twitter.com


    Amazon Store


     
    Donate unused CPU cycles with BOINC Stats and Account Management from BOINCStats.com



    Blogroll

    Archive

    Tags

    Categories


    Disclaimer

    The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

    © Copyright 2010