Lazycoder

13Apr/044

Strongly typed objects and collections vs. performance

Strongly Typed Objects – I’ve lost a battle, but will win the war.

Plip, whom I’ve never met but I’ve heard about (He was “on the runway as we speak” the Saturday afternoon I hung out with Wally McClure and watch the NCAA game) has a series of articles detailing how to create strongly typed objects and collections of those objects. The idea is you use them instead of DataSets or possibly DataReaders. I’ve implemented quite a few strongly typed objects but I’ve never created a collection for them unless I was sure they wouldn’t hold very many. Why? Performance. I’m usually developing web based applications.I avoid creating a large number of objects unless I’m sure I can a) cache them somewhere b) they are really tiny objects.

Right now in the system I’m developing I have a “Patient” object, it makes sense to create a “PatientList” collection and use that for binding to DataGrids, Repeaters, and what not. The problem is we’ve only had the data import running for about 4 months and we have 2700+ patients in it. So a typical patient list collection would have about 900 or so patient objects for a given user. That seems like a awful lot of overhead just to be able to type “lblPatientName = Patient.Name;”.

Has anyone else implemented a strongly type object collection with a large number of elements? Have you seen any performance hits? How does it perform? Is it worth it?

update: I apparently decided to spam Plips comment section for this post. Whoops, sorry about that. 1/2 the .text sites seem to have trackback enabled and 1/2 don’t and you never know which 1/2 your dealing with.

Share and Enjoy:
  • del.icio.us
  • DotNetKicks
  • DZone
  • Reddit
  • Digg
  • StumbleUpon
  • LinkedIn
  • Facebook
  • FriendFeed
  • HackerNews
  • Netvibes
  • Posterous
  • Tumblr
  • Twitter
Filed under: .NET Leave a comment
Comments (4) Trackbacks (2)
  1. I’m sorry that you had to endure Wally sulking after his team was annihilated ;)

    With Large collections the system becomes … interesting, I’ve not spent enough time optimizing my stuff to be able to tell you about the sizes of the objects and in memory stuff whereby things get slow.

    In reality all you’re dealing with at ArrayLists so if you find some performance metrics on them, you should have something pretty close.

    In my opinion if you have x thousand entries, do they really need to be in memory or can you get away with just the items you need to have? Can you rework your queries to cut down on size, or only half populate your objects, if you just need Patient Name an Id, just populate those items.

    I’d love to run some perf tests but doubt I’ll have the time soon.

  2. Perhaps I’m missing something (wow, I’ve started a lot of blog comments like that today….ughhh), but if you are gonna cache the data, won’t size be a problem reguardless of whether its in a custom collection or a Dataset. I’d actually think the custom collection would have less overhead.

    If you aren’t caching them, either way you are you either have to repopulate a dataset, or a collection of objects from a datareader..I wouldn’t guess which would be worse.

    What we’ve discovered from our own data is a 20/80 rule..where 20% of records are used 80% of the time. In other words simply caching [the best] ~500 patients would probably be ideal.

  3. Phil,
    That’s essentially what I’m doing now. Just retrieving the items I need. Right now I’m just using a DataReader or a DataSet, depending on the result set size and what I want to do with the data once it gets down to the client. To use a strongly typed collection I have to create new classes representing sub-sets of the patients data that I need for a particular interface. E.g. The the patient’s name and id for one class, perhaps their name, id and primary care physician in another. I still need to create the dataset/dataReader and populate those classes so I’m not really saving any work or lowering the overhead; In fact I’m creating more work for my middle tier and decreasing the performance by creating the DataSet/DataReader and then creating a new object to hold the contents of each row of the DataSet/DataReader. Unless I set up a very elaborate caching scheme to cache the individual objects and share them across the user boundries I have to do this on each page load for each user. I could cache a Patient object for every patient in the database, we’re adding about 40 new patients a week. Pretty soon we’ll be talking about some real numbers at that rate. That’s just for the patients, each patient row in the database has multiple lab rows associated with it, somewhere around 500-2500 rows per patient, multiple medications, multiple physicians, appointments, etc… The overhead of caching all of these strongly typed collections would be enormous.

    In the case of large result sets I think that the scenario with the best performance is probably using a DataReader to populate the list (e.g. DataRepeater, DataGrid, DataList) and then creating the strongly typed object for the item when it is selected (e.g. the user clicks on a hyperlink and a new “Patient” object is created and populated. I will be doing some tests on strongly typed collections of both value and reference types to see what the overhead is, if any.

    I think in a WinForms application, where you have a concept of “State”, strongly typed collections work great.

  4. Karl,
    Yeah the size will be a problem but not if I can cache all of the possible objects individually within the applications scope. Since classes are all reference types I’m just passing around copies of references when I pass them into methods or retrieve them from the cache. Say two of my users both have access to patient #12, I create the patient object for #12 once and cache it. Whenever any of my users want to work on patient #12 I retrieve the object from the cache. I can create the cache at application startup, and add or remove from it during the life of the application as I see fit, which creates a lot of overhead when I start the application up but removes during the important part, when the users are using it. That brings in a lot of other threading and locking issues though.

    By creating a strongly typed collection of objects specific to the user and caching that I end up with redundant objects… I think. My understanding is that if I create an object during a page_load event, and I have two separate users load the page and create a new patient object for user #12 I end up with two separate copies of the patient object. Ideally, if I wanted to use strongly typed collections, I’d create one that allowed me create a subset (e.g. PatientList patientList.selectPatients(int[] PatientIDs)) and cache a list of ALL the patients in the database. The tricky thing about that is I have to be mindful of updates,deletions,and additions of patients and make sure to sync the cache after any of those events. Which adds enough overhead to the operations that it probably outweighs the benefits of caching the data in the first place!

    I had this same discussion with Java guys back when I was doing some JSP work. If you do everything in a very OOP manner you end up with hundreds and thousands of objects sitting around in memory waiting to get GC’d that may or may not get used. .NET doesn’t really have a robust object management system for caching,locking, and all that yet like J2EE does. We have to write all that code ourselves. I’m really, really waiting for Microsoft, or someone, to come out with a JBoss type server.


Leave a comment