| By Warren MacEvoy | Article Rating: |
|
| January 5, 2005 12:00 AM EST | Reads: |
20,862 |
Let me begin with a philosophical rant. There is a motto from scientific computing that carries over to many areas of computer science:
The gains made by better algorithms almost always outstrip the gains from better hardware.
I've frequently seen where algorithm improvements pay by factors of ten to tens of thousands in CPU time. One change I made in a numerical algorithm improved CPU requirements by a factor of 50,000: from weeks on a super computer to minutes on a workstation.
Any business-savvy engineer knows that algorithm improvements come at a price: the engineer's time. Striking that balance makes software systems move forward rather than stagger to a halt in bloat and dysfunction. It also helps to use people who actually know what they are doing - knowing how to compile code doesn't make you a software engineer any more than knowing how to spell makes you a writer. End of rant.
On to (rant-related) business. On most Web sites, think of how many times a data source will be used to retrieve the same data and produce the same content over and over again. Most successful services deliver a highly redundant amount of information to their users. For example, the JDJ Web site will deliver this (same) content to perhaps a hundred thousand users. If the servers are overtaxed, customers will experience significant delays or malfunctions.
There are several useful solutions to this. Well-configured caching proxy servers come to mind, although server-side scripting makes this difficult. Buying more hardware will eventually fix the problem, which may be the correct business solution.
But what about asking programmers to be a little more lazy?
The source for the LazyFileOutputStream can be downloaded from http://bpp.sourceforge.net/download/bpp-0.8.5b/src/bpp/LazyFileOutputStream.java. It acts just like a regular FileOutputStream except that if created on a file that already exists, it "reads" the data from the file instead of writing it. The stream compares what is already in the file with what you are currently writing to it. If at any point it sees that there is a difference in the data you are writing this time compared to what is already there, the stream automatically switches to a write mode that writes over the remainder of the file with the changes.
The upshot is, if your program generates the same output twice, the output file is unmodified the second time (leaving the original modification date). First, by simply changing FileOutputStream to LazyFileOutputStream, any downstream processing can use timestamp information on the files to check if they need to do anything at all. If the timestamp hasn't changed, neither have the contents.
But wait, there's more! In addition to the standard close(), the LazyFileOutputStream also supports abort(). This method effectively states "I'm done now, leave the rest of the file alone." The remainder of the file will be the same, even without reproducing it. This means that if you determine at an early point in the processing of the file that it's going to turn out the same, you can simply abort() to leave it alone. It's similar to the idea of not changing the modification dates on files that are rewritten with the same data, but allows you to save CPU time for the current process step as well as downstream processing.
Certain engines produce part of the template before you can conveniently intervene to decide if you really need to regenerate it. By opening up the output as a Lazy file, you can just abort() early and have the old version with the the old modification time around for downstream processing.
Okay, rant concluded and point made: CPUs around the planet are spinning through the same data tens of thousands of times, producing the same content tens of thousands of times. Instead of buying great big servers to manage this, a smart caching policy based on lazy file writers and some modification time testing could save some sites that same wild-sounding factor of 50,000, without having to buy 50,000 new servers.
Anecdote #1
There is a certain technical advantage to this style of writing data as well: most storage devices are easier to read from than write to, adhering to the 80/20 rule - 80% of file access will be reads, 20% will be writes. The LazyFileOutputStream takes advantage of that for the many files that are simply rewritten with the same content.
Anecdote #2
There must be a few curled toes out there saying to themselves, "Why not LazyFileWriter?" There are good technical reasons for the OutputStream: the logic of the data written must be checked in its raw "byte" format for the idea to work correctly, and you can always wrap this in an OutputStreamWriter followed by a BufferedWriter, which is what I recommend.
Now I'm even done with the anecdotes. Have a nice day.
Published January 5, 2005 Reads 20,862
Copyright © 2005 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Warren MacEvoy
Warren D. MacEvoy is Asst Professor of Comp Science in the department of Computer Science, Mathematics & Statistics at Mesa State College, Grand Junction, Colorado.
![]() |
Infernoz 01/16/05 10:16:45 PM EST | |||
It sounds weird, but may work, provided that the old file content is in memory, if not, you may loose any speed benefit to massive disk seek/read time delays. I've found that just opening/close files can be very costly too e.g. for mux/demux file processing, so it can be helpful to use an LRU cache for any Buffered (min 64Kbyte) File Readers/Writers, to reduce all of the I/O costs, this idea could be extended to support random access read/write caching. |
||||
- It's the Java vs. C++ Shootout Revisited!
- Patterns for Building High Performance Applications
- Asynchronous Logging Using Spring
- Java for Programmers (2nd Edition)
- Cross-Platform Mobile Website Development – a Tool Comparison
- Three Buzzwords That Every CIO Hears but One They Should Listen To
- Write Once Run Anywhere or Cross Platform Mobile Development Tools
- Immersing into JavaScript Frameworks
- Workday Reportedly Prepping to Go Public
- Cloud Expo New York: The Java EE 7 Platform - Developing for the Cloud
- Book Review: Sams Teach Yourself Java in 24 Hours
- OpenOffice.com Lives
- Book Excerpt: Introducing HTML5
- Adobe Sends Flex to the Apache Foundation
- Five Years Waiting for JRE 7: Is It Justified? (Part 1)
- Book Excerpt: Java Application Profiling Tips and Tricks
- i-Technology in 2012: Five Industry Predictions
- It's the Java vs. C++ Shootout Revisited!
- Patterns for Building High Performance Applications
- OpenXava 4.3: Rapid Java Web Development
- The Next Web Architecture
- Asynchronous Logging Using Spring
- Java for Programmers (2nd Edition)
- Is Write Once Run Anywhere Ever Going to Be a Reality?
- A Cup of AJAX? Nay, Just Regular Java Please
- Java Developer's Journal Exclusive: 2006 "JDJ Editors' Choice" Awards
- JavaServer Faces (JSF) vs Struts
- The i-Technology Right Stuff
- Rich Internet Applications with Adobe Flex 2 and Java
- Java vs C++ "Shootout" Revisited
- Bean-Managed Persistence Using a Proxy List
- Reporting Made Easy with JasperReports and Hibernate
- Creating a Pet Store Application with JavaServer Faces, Spring, and Hibernate
- Why Do 'Cool Kids' Choose Ruby or PHP to Build Websites Instead of Java?
- What's New in Eclipse?
- i-Technology Predictions for 2007: Where's It All Headed?


















