Thursday, October 26, 2006

"Graying of the membership" – Attracting younger railfans

[Comments regarding a Railroad and Locomotive Historical Society discussion.]

The American Photographic History Society, as an example, never understood or fully transitioned to the internet, and so it was sad, but not at all unexpected that one day they just disappeared.

But for the R&LHS, pessimistic comments lamenting "graying of the membership" and to "despair the fate of railfanning" represents an unfortunate misdiagnosis.

Try typing RAILROAD HISTORY into Google – our website is on the first page – but good luck trying to find the R&LHS website in the search results.

Our family's website – a celebration of railroad ancestor, CPRR First Assistant Chief Engineer, Lewis Metzler Clement, illustrated by period photographs – was anticipated to be an extremely obscure narrow topic, but instead has attracted 1.5 million visitors. No young railfans? – You've got to be kidding! Probably a thousand students visit our website a day.

But how many visitors has attracted? Is there enough there to attract members and younger potential members to visit and to keep coming back? (This is not intended to be critical of the hard work of the people who have created the website, which is excellent for what it is, but instead to point out what it instead needs to become to be able to attract new members.)

How about this alternate hypothesis?: The Society's membership is graying because doesn't meet the needs of the non-member "16 to 50" demographic. Younger people use the internet, not the library, to find information, and the R&LHS's valuable content is mostly offline or restricted, invisible to them. Fortunately this is easy to fix, but only if a new way of thinking and some simple changes are embraced.

What is needed to revive to draw in younger members is massive amounts of world class, compelling, authoritative, and freely available content that will attract links from other websites (which is how Google ranks are determined), and as a result, lots of visitors – just what has been produced by the Society and is sitting in bound volumes on library shelves. At present, not only is this actual full content mostly offline, but even the link to the Railroad History index from the Railroad History page which hints at what incredible riches are presently inaccessible now gives a file not found error. That's too bad since there is no doubt but that what the Society has chosen not to put online is likely far superior to most existing online resources. Not just the public is affected – we've never been able to read any of the Railroad History articles about transcontinental railroad history that have long been noted at the end of our want list, as all those musty past Railroad History volumes are completely inaccessible to us as well.

To be attractive to younger potential future members, the RLHS website, and especially the home page, needs to consist of railroad history information accessible right there, not primarily about railroad history information available offline or elsewhere, and not primarily information about the Society. The very limited and restricted content "members only" opposite approach to railroad history content ensures obscurity. Instead, how about, for example, hosting the RLHS discussion group on the website, and also including online every page of all 85 yearly volumes of Railroad History at in an easily accessible format that facilitates linking to the articles, scanned at 600dpi to preserve photographic quality, with OCR (text behind page images) to make every word of it indexed by Google, drawing people to the Society and its website? Here is one of many examples of a page linking to such a searchable scanned journal article.

If Google can put 11,000,000 books online, RLHS ought to be able to manage to put just 85 volumes of Railroad History onto the Society's website. With "do it yourself" volunteer labor, the cost should be negligible: Scan; OCR; Web.

Looking to the future, any Society that wants to prosper, grow, and attract younger members needs to "get" the internet and transition to producing its newsletters and journals, etc. in electronic form that can be searched, delivered, and made available online without the need for conversion and at close to zero reproduction and delivery cost. Online writings, of course, can continue to be offered at a higher cost, as needed on paper via high quality print on demand technology.


From: "Wyatt, Kyle"
Subject: Funding CPRR Museum Web Site

I found your recent posting to the R&LHS List very timely. We were just discussing a planned presentation at the upcoming Association of Railway Museums/Tourist Railway Assn conference titled "Mommy, What's a Train? Engaging 21st Century Audiences."

So one question that occurred to me is how you fund your wildly successful web site. The most frequent response to the suggestion to expand the web site is the cost of doing so.

There are actually several parts to the question
1. Purchasing and maintaining band width
2. Editing content and updating

I'd welcome your thoughts.

Kyle K. Wyatt
Curator of History & Technology
California State Railroad Museum
111 "I" Street
Sacramento, CA 95814
(916) 324-7660

10/27/2006  
Thanks so much for your kind words about the CPRR Museum website.

The CPRR Museum is riding an amazing technological wave in computing and telecommunications, with cost driven by Moore's Law (and similar improvements in storage and telecommunications) and usefulness driven by Metcalf's Law.

The open secret to funding our project is that the effect of these laws has been to drive the cost of the technology needed surprisingly close to zero. When we started, we were always running out of storage and bandwidth, so we had to purchase more and more, eventually spending the grand sum of $100/month. But despite adding new content as fast as we could, the storage and bandwidth provided grew even faster, so we actually had to downgrade our service as a result of excess capacity. Purchasing and maintaining outsourced ultra-reliable bandwidth and online storage currently costs us less than fifty dollars a month, providing us with a large and exponentially growing excess capacity.

As a result we would be more than pleased to arrange to host on our website any content that CSRM or R&LHS is able to scan, at no charge (and at no incremental cost to us)! Remarkably, tens of thousands of additional pages or images would actually be no problem for our website to accommodate.

Professional quality scanners and cameras with capability far beyond what is needed for a website likewise have become incredibly inexpensive, usually in the $200-$500 range. We have also been careful to use simple standard well established software methods which help us to avoid the delays, costs, complexity, compatibility, and reliability issues that typically plague large custom software projects. For example, our pages are all pre-computed, and there is no database software on our server or computational overhead needed to serve our web pages. So ten thousand extra visitors to our website on a particular day poses no performance problem.

A bricks and mortar institution typically has huge overhead costs for facilities and labor which can also be made to vanish in the virtual world. For example, our many generous donors have given us access to all the wonderful historic content shown online, and all of the considerable labor needed to create, update, and operate the website has been volunteered by family members.

10/27/2006  
Subject: Great input, R&LHS Digest 1408!

You make some really good points. Why don't you volunteer to help make some of these things happen? Sounds as if you have the skills.

—Russ Davies

10/27/2006  
Thanks for your comments. We're doing exactly this and have already put thousands of 19th century images and thousands of pages of primary source content online at

We would be more than pleased to arrange to host on our website, at no charge to the Society, any content that R&LHS is able to scan.

If any advice is needed about how to scan the Railroad History volumes, please let us know.

10/27/2006  
Fifty dollars per month is the total expense for outsourcing the server, storage space, and bandwidth (which is easily affordable for an individual, much less a large institution).

Adding Google search capability to a website is free.

Google will index all the individual documents such as html pages and pdf files (unless you tell it not to do so which you can customize using a metatag or robots.txt file). The Google search boxes on a particular website are actually just a convenience. You can search ANY specific website using Google – just add an additional search term such as "" to the search to limit a Google search to that website, in case you dislike the search provided by the site. One important exception is that if the site uses a database to present the web pages, it sometimes may prevent Google from indexing the website, which is another reason to avoid a server database driven website design. (Paradoxically, it seems like the more important the collection, the more likely that they have made this blunder.)

For example, to find something in the CPRR discussion group, include in your Google search the two terms " discussion". You can also restrict the search based on the URL, for example, include in your Google search the two terms " inurl:CPRR_Discussion_Group".

Nothing very exotic is needed as far as hardware or software to create a website. Dreamweaver is very helpful software for constructing web pages if you speak html, because it shows a split screen with the code and the rendered page both shown, letting you quickly go back and forth making changes and observing how they look.

We use a digital camera and four specialized scanners with OCR software as follows:

For Stereoviews, prints, and transparencies, our older scanner is now superseded by one of the following, Epson Perfection 4990 PHOTO, V700 PHOTO, V750-M PRO; for Books that can't be flattened without damaging the binding; for stacks of individual pages (including double sided scans); for oversized material such as portions of large maps; and, for Optical Character Recognition

For the future, we'll eventually need a better 35 mm scanner such as the Nikon 5000. We also need a client side database driven exhibit generator that would make it easy to add another picture to an existing exhibit, be able to create an exhibit based on a search, and to automatically read IPTC captions included in image files and put the captions on customized pages. Programming database software can get very complex, and we have not found the right software for us yet. (This refers to using database software on your computer to create a static website that does not need database software on the server.) The software that we currently use to generate exhibits is extremely easy to use, but because we customize the appearance of the exhibit as an additional step, we can't easily go back and make changes.

10/28/2006  
We're puzzled by the comment: "Once you're in the business of doing this, it's not so easy to host the website, and once the volume of material and the number of hits reaches a critical mass, it's not so cheap."

While $50/month now gives us more than we can use (outsourced for ease of maintenance), for example, for $219/month you can host half a million pages of searchable page images and have bandwidth sufficient to deliver more than six million pages per month, and this capacity will likely double every 1-2 years.

It seems incredibly easy and inexpensive to host a website at today's prices, and the web server capacity almost surely will grow over time faster than your ability to make use of it.

These online storage and bandwidth estimates are based on the following (please advise if you believe the following approximate calculations to be incorrect – but don't wait too long, as the prices and capacities are constantly improving, typically with price reductions appearing at least once a week; if you don't believe this, use to notify yourself each time the price of one of the disk drives currently being offered for sale changes):

This worst case estimate is calculated based on the 251 page pdf of CALIFORNIA: FOR HEALTH, PLEASURE, AND RESIDENCE. BY CHARLES NORDHOFF, displayed at 300 dpi and modified to be searchable taking up 149.6 megabytes, and $219/month purchasing 320 gigabytes of online storage and 4 terabytes of data transfer per month. This is based on the use of a pair of 160 GB hard drives which currently cost $59 each (one time purchase cost, shipping included); if 750 GB drives which are currently available for $345 each were substituted, the capacity would be more than 2 million scanned page images.

More examples of searchable pdf page image railroad books (and their file sizes) are available.

Moreover, if best case, text is used instead of scanned page images, for example, the 160 page The Railroad Photographs of Alfred A. Hart, Artist, by Mead B. Kibbey. California State Library Foundation, 1995, taking up 1.8 megabytes, then the capacity for $219/month is more than 25 million pages, and data transfer of 300 million pages/month.

Also note that server capacity is growing exponentially, while a scanning project adds content linearly with time, so you should expect that for a constant cost that percent of capacity utilization will actually decline over time.

10/28/2006  
... On databases, of course most large institutions have based their computer-based cataloging around a database of one sort or another. For myself, I regularly take results of a search on CSRM's database and copy it into a Word document for my own use. I just find it easier that way. I assume a Word or Excel based spread sheet would lend itself to a Goggle-type search pretty readily, and a database can fairly readily be downloaded into a text-based document. ...


10/28/2006  
When database software is used, it should be to create the type of static file website that would lend itself to a Google-type search and that should be what is put on the web (i.e, in computer speak, the website should be compiled, not interpreted). The website user should not need to do this manually. Having the database do the work just once when the website is updated (using a client workstation), instead of constantly (using a server) has numerous advantages. The website that is actually online is much simpler and less brittle (and consequently low maintenance), the computational load on the server is dramatically decreased so that capacity to respond to an unexpectedly large numbers of requests is much better, and the server equipment and software can be much less expensive yet give superior performance. The web content is visible whenever Google is used, not just when the website's internal search capability is used.

To be avoided is the situation where the website does not actually exist, is ever changing, and is uniquely created in response to each inquiry, so that other websites may be prevented from linking to the content, or all the links from other websites become broken each time the database is "improved."

It's also important to avoid bizarre library speak design constructs like "container listing" that accompany committee designed "best practice" nightmarishly complex databases that automatically produce extremely boring websites devoid of a soul, displaying images that look bad because they are intentionally not optimized for human viewing.

Some examples of websites that have exceptional content, but due to database issues have exhibited such flaws are the Online Archive of California, the Library of Congress, and the National Archives. To be clear, what is being criticized are not programming errors, they are bad intentional design decisions that the institutions would likely vigorously defend.

For example, the Library of Congress has to maintain two different web address schemes, one which is evanescent, produced when a search is done, and an entirely separate web address scheme that makes it possible to link to specific content, but not to a search result. The situation at the NARA website was so bad that we had to reproduce their web content on our server because it was impossible to link to their search result web pages.

10/28/2006  
Many thanks. Another question. What types of updates/changes break a link, and what types of them will keep a link intact? Clearly you want to be able to update, but also you would prefer not to break links. (I note UP broke most links to their historical material site when they redid the site - and removed many items formerly available.)


10/29/2006  
Once content is put online, and it is has been extensively linked to from other websites, it really should remain permanently. Updates should not result in large numbers of prior links not working.

This can be accomplished either by leaving the old content intact, or by automatically forwarding the old no longer existing location to the corresponding new location.

Websites that have their web addresses created automatically by database software make it much too easy for an inconsiderate webmaster do an update that is enormously disruptive, making a huge mess for others who have relied upon the prior website structure for their work.

Disrupting the web is very poor practice that makes content unnecessarily obscure, damages other websites, violates the golden rule, and wastes huge amounts of time as disappointed users encounter broken links and as other webmasters are forced to make time consuming and tedious repairs in an attempt to make their websites conform to the thoughtless wholesale changes.

Fortunately, sometimes the damage can be undone by using to restore links to copies of deleted material.

10/29/2006  
I'll add a couple of comments, based on my experience with the reference library at the California State Railroad Museum, which also provides access to the R&LHS National collection. First, yes, there is a great deal of access available to the R&LHS collection. While maintained as a separate collection, access is thoroughly integrated with access to CSRM's own collections – a point I consider a major asset and a good thing (as far as it goes for both). I think maintaining and promoting that integrated access is really valuable for researchers. The more separate sites a researcher has to visit (and to think of in the first place), the more difficult the research is. Think of university library on-line catalogs where you can search for a book at a large number of libraries in a single search, with results giving you all the different libraries that have that title.

From the technological/internet access perspective, the CSRM collection shares many of the limitations that the R&LHS collection has (but perhaps not quite all of them). CSRM does have a database with search engine available through its web site. As a not infrequent user of said search engine, I will say I regularly find it most frustrating and difficult to use. For public access, I think we need a much simpler and more direct paradigm for searching files – something along the lines of a Goggle search of a text (likely HTML) document. Keep it simple.

One thought I've had to help people locate information is to post user guides to individual collections, as a really good way to summarize the contents of a collection. These should be set up so that you can search within the user guide, and also can search across all the user guides.

Have you ever done a Goggle search and had PDF documents come up in the search? If you click to open the selection as HTML text instead of as a PDF, you lose the graphics, but each of your Goggle search terms is highlighted in the text, each in a different color. It makes for a really effective way to search a document.

Of course, the other vital step is to post the actual information on the web, not just the method to identify what you want. My knowledge of actually setting up and maintaining web sites is pretty limited, but I know how vital it is to get the information to there. I frequently Goggle to research a topic. (And yes, I'm part of the graying membership of the R&LHS.)

To effectively search, it is often important to have reference material available to help suggest relevant search terms. For instance, posting one of the guides to past and present railroad names would be helpful. To search, you have to have access to the information that will let you find what you are looking for.

As a frequent user of the US Patent Office web site, I find it most frustrating that you can only access the older patents (1970s and earlier, as I recall) if you already have the patent number. Many historical sources give the patent date, not the number. There is a Patent Office site with cataloged patents that allows you to search under numerous classifications, but they are often not systematic. I've often found many interesting patents that way, but also have not found specific patents that I know exist, because it is filed under a slightly different category. Every year the Patent Office published the Patent Digest, listing all the new patents for the year. If all the back issues were scanned and posted on the web, that would provide a workable way to locate specific patents, and also a convenient way to scan and survey for patents of interest that you might not otherwise think of and find. (Any good researcher knows that serendipity plays a vital role in researching a topic – tripping across things you never thought to look for.) There is a freeware downloadable viewer (linked on the site) that you need to look at actual patents. The Patent classification System is available online.

To close, I really think that both the R&LHS and CSRM need to get much more of their material accessible on the web. The resources in Sacramento are wonderful, but it just isn't easy for most people to get there and spend the time necessary to really go through a subject. And it is unrealistic to think that staff (and volunteers) can do the research for someone else and fully answer all requests. The standard is that staff will do the "nickel's worth of research" on a request, but if information is not easily accessible and requires digging very far to find, the then best that staff can do is to suggest the directions that the requesting person might look to find their answer. Putting the actual material on the web (not just the finding aids, although they also help) allows the inquiring person to do their own digging from home. Most requests then are for hard copies of things like photos when a person wants a better copy than is available on the web.


[from the R&LHS Newsgroup.]

10/29/2006  
Be aware that business-as-usual institutional websites can be ridiculously expensive, costing millions of dollars.

For example, the budget to "support the establishment of the [California Digital Library] and its initial collections" was $11,500,000.

Similarly, the Library of Congress Information Technology budget states that "one-time funding of $1.299 million is also needed to implement a single integrated search function ... " and "$1 million is needed to cover ... actual and projected maintenance costs" just for a single year.

Yes, you read that right, $1.3 million additional just to update search capability – what Google does profitably much better for free.

You can put content online first and deal with organizing and categorizing it later, since Google can automatically search machine readable content just fine with no manual cataloging required. Once the information is accessible online, the Society's website could be in the forefront, but anyone else in the world who has the time and interest can and will contribute to help catalog the content using their web links.

When a high quality book scanner costs only $219, how about just ordering one and immediately get to work scanning. (600 dpi, please, to preserve the pictures) That's exactly what we did in creating portions of the CPRR Museum website.

It is gratifying to see so much enthusiasm among R&LHS members for putting Railroad History online. With the Society's permission as copyright owner, if someone will scan all the original undamaged volumes of Railroad History, keep backup copies, and donate to us a set of DVD's containing the high resolution journal page images with file names in numerical sequence (either as individual tiff or jpeg images, or as pdf files), we would be pleased to volunteer in keeping with our website's policies to both do the OCR and to host the full set online on our webserver at no cost to the R&LHS.

10/29/2006  
I appreciate your comments. I'm still getting my arms around web addresses. If I read you right, updating info on a web site will automatically break old links to that web site – although this can be mitigated by providing a forward from the old link address. Perhaps the way to handle sites that are expected to be updated is to set up a link that forwards as the permanent outside link address, and then just modify the forwarding address as the site is updated. ...

So, turning in a different direction, have you ever had to change the host of your site? And what does that do to web addresses?


10/30/2006  
Updating a website will only break all the links if the website is poorly designed by archivists using advanced database software. This should never happen, but unfortunately major institutions often create hugely expensive websites that use database software that does break the links, which is unacceptable behavior, in violation of web standards. This often happens when web locations are named based on the software technology used, instead of the content presented, so the naming is very brittle. Even worse, they may create a temporary web address for each search result which will always become broken immediately after its first use. We have in the past encountered these severe problems with the Online Archive of California, the California State Railroad Museum, the Library of Congress American Memory Project, and the National Archives and Records Administration websites, but practically nowhere else. ...

When you own your own domain name, like, if you change the host server being used for the website (which changes the numerical internet protocol address of the website), all you need to do is change the IP numerical address recorded on your domain name servers to point to the new location. So long as you keep both the new and the old webservers online during the day or so that it takes to propagate the change among the various DNS servers on the web, the switchover will be entirely transparent and without any interruption of service.

10/30/2006  
Re: Graying of the membership

Thank you for your thoughts on the RLHS membership. Very worthwhile.

Also, thank you for that superb CPRR website. You have done a LOT of work.

Carl Rodolf
PCC archivist and treasurer

12/22/2006  
Subject: "Best kept secret"

It is disheartening to read the comment that "the implication that R&LHS is like a 'best kept secret' just doesn't resonate ... "

A Google search indicates that despite billions of web pages online, only 5 websites link to

Having so few links to the RLHS website also results in web search obscurity, as search results are ranked based on links to the website.

To make the R&LHS less hidden, as previously discussed, this discussion group, the full content of the past "Railroad History" volumes, the full existing website content, etc. must all be freely accessible and searchable online.

A Society that keeps almost all of its most compelling content offline and has "members only" online content in an internet age relegates itself to being a "best kept secret."

Instead, the R&LHS could so easily be highly visible if it would just stop hiding its treasures.

1/14/2007  
From: "Schuyler Larrabee"

RLHS members, if you don't do anything else today, read the ... message above.

There is a very generous offer contained therein, about 1/3 of the way down. Hosting our Railroad History text at no charge. ...

Whenever I mention "R&LHS" [Railroad and Locomotive Historical Society] to other train folks I know, they say: "Huh? What's that." We are a secret society.


[from the R&LHS Newsgroup.]

1/16/2007  
From: "Adrian Ettlinger"

There is effectively no limitation to what we could put on the [] website. If someone can convert the material to digital form, I can post it.

[from the R&LHS Newsgroup.]

1/16/2007  
Google has now provided the ability for websites that use a database to get their content properly indexed so that their content can be searched.

4/30/2007  

