WA Secretary of State Blogs

Internet Librarian, Day 1 – Sunday, Oct 25 2009 – #IL2009

image Technically a pre-conference, Sunday was well-spent attending the Searcher’s Academy. A number of speakers covered a number of topics, all relating to searching, and many of them very useful. Tons of useful resources are mentioned (and noted below), so read carefully!

Speakers and topics covered include:

  • Chris Sherman, Executive Editor, SearchEngineLand.com
  • Mary Ellen Bates, Owner, Bates Information Service
    “Digging into the Deep Web”
  • Mary Ellen Bates
    “Searching the Collaborative Web”
  • Marcy Phelps, Principal, Phelps Research
    “Cost-Effective Searching: Online Strategies for Tough Times”
  • Mary Ellen Bates
    “Hidden Tools and Features of the Major Search Engines”
  • Marcy Phelps
    “Business 2.0”
  • Doris Small Helfer (writes for Searcher magazine)
    “Sensational Science Sites”
  • Marcy Phelps
    “U.S. Government Sources”
  • Gary Price, Publisher, ResourceShelf.com
    “Legal Resources”
  • Gary Price
    “Ready Reference”

Other IL posts: Index | Day 1 | Day 2 | Day 3 | Day 4

Complete note for Searcher’s Academy available after the cut.

Searcher’s Academy: Searching 2.0

Chris Sherman, Executive Editor, SearchEngineLand.com

seismic changes in search over past few years
four popular search engines left

google – guardian of status quo
– went from innovator to conservative in 10 years
– though still doing neat things in Google Labs
– “show options” allows faceting to searches “advanced search options”
— altavista did the exact same thing 11 years ago
– public data available through google, inc. graphs
– google labs doing lots of cool stuff
— fast flip – google news with thumbnails of front pages
— great for visually geared searchers, visual recognition good for recall purposes
– google city tours – nice walking tour, 5-6 notable landmarks, sometimes walking too exhaustive
– google squared – shows data in a table, creates fields “squares” and fills them with info (very cool)
– google trends – shows hot searches
– google insights for search – shows search patterns, geographies, trends for a subject
– google visualization api
— motion chart, charts data over time and then animates it
– google flu trends – world map, narrow down to US or by state, based on flu-related queries

yahoo search – emperor’s new clothes
– microsoft competition, salvation or sell-out?
– microsoft does the heavy lifting, yahoo sells ads
– yahoo will innovate more on the browsing side of things, what happens “before and after search”

bing – assimilator as innovator
– microsoft pursues acquisition as a key strategy ; ACQUIRED 128 COMPANIES and stakes in over 60
– Bing Is Not Google (BING) – no other explanation for name provided
– uses powerset as a semantic search engine, basically puts wikipedia into bing (but later could do more)
— searches all mentions of search, e.g. einstein, in wikipedia and collates them into a single article
– travel, breaks search of a location down into subjects, hotels, sights, etc
– xrank – sort of like trends, shows top celebrities, musicians, etc … what are people interested in or searching for?
– as good or better than Google based on chris’ testing, shows parts of the web that google does not

wolfram alpha – describe selves as NOT a search engine
– constantly expanding set of data, calculator, “computational knowledge engine”
– operates on “curated” data, closed-source data sets – managed and arranged by specialists
– but how do you calculate answers to ambiguous questions?
— e.g. best hotel in monterey, best laptop to buy, etc…
– has a ways to go, still can’t deal with many natural language queries
— really best at comparing the sets of data it has within it
– search for “who’s the fairest of them all?” now shows Snow White (3 weeks ago had null return)
– search for “when will i die?” shows average life expectancy worldwide
– keeping an eye on the searches people are doing, and adding data when search trends appear
– can do very complex computational analysis

real time search follies
– social media is a fad

twitter twaddle
– twitter is not a search engine
– great for customer service, public relations, or reputation management
problems
– currently the playground of the technorati and pop-culture obsessed kids
– cannot scale for quality search
– difficult to monetize
– tweets are so short that it is difficult to place results in context and get good information out of them

facebook follies
– geocities of the 21st century
– great for connecting individuals, problematic for coorporations or other large orgs
– coming soon legal woes when friends run afoul of laws of disclosure and compliance

video search
– roughly where the web was about 15 years ago
– heavily reliant on titles and metadata
– ok with non-fiction conversion of speech to text
– nearly worthless with dramas, or videos that include humor, irony, body language
— these things don’t transmit searchable data for the search engine
videosurf
– best search engine for video out there right now
– does try to break down the video and convert speech to text
– can search by actor or famous line, etc.
blinkx
– download “pico”, lets you create alerts for specific video searches, will download them into background
everyzing
– another one worth trying

semantic search
– the holy grail of search?
– search now, we type in some keywords and hope we hit the jackpot
– what we really want is for search engines to read our mind, e.g. understand our intent for the search
– people used to use search engines to find their way around the web, now they use it more to buy things and to find information
— e.g. transactional and informational searches are becoming much more popular than navigational searches
– tools in search now: pagerank, popularity, keyword, links to site pages, time spent on page, content freshness (300+ fiunctions overall)
– semantic search requires true understanding of language and meaning
– shift from search engine to answer engine, e.g. semantic search focuses on delivering answers
— the key is “disambiguation”, of both user queries and web content
— heavy emphasis on natural language processing of meaning and intention
– why now?
— semantic search is time-consuming and costly, regular search is “good enough”
— requires lots of computational power and storage
— competitive landscape has changed, and the price of computing has become much cheaper
– semantic search is NOT:
— natural language processing (NLP)
— a replacement for navigational queries
— a replacement for transactional queries
— the semantic web (where objects have knowledge of themselves embedded and can transmit information about themselves)
– many people offering semantic search are just refining search sets into categories
– northern light, vivisimo — not true semantic search, just categorization

some semantic players
– powerset, hakia, trueknowledge, wolphramalpha, gopubmed, lexee, deepdyve
– powerset aggregates wikipedia data and gives you the portion of the article that is relevant to your search
– powerset “Factz” great for ready reference questions ; but still just based on wikipedia info (and freebase)
– hakia – some queries return “resumes”, specials sets of info ; also offers credible sites recommended by librarians
– hakia still has really bad results for some queries (but still in beta)
– trueknowledge – probably the first true “answer engine”
— building semantic networks, api returns structures results to machine-based queries
— will actually take your question and return an answer
-wolphram alpha
— can combine lots of different data types, very addictive!
— lots of computational power
-gopubmed
— excellent disambiguation of queries and categorizition of search results
— only works with structued corpora, and very busy page
-lexee
— far deeper than a general search engine, but erratic and relies on wikipedia
– deepdyve
— good semantic understanding, excellent “more like this” refinement
— confusing UI

how will Google et al use semantic search?
– folded into existing technologies
– providing superior results to “long tail” queries
– provide personalization without keeping personal data
– better results for new and dynamic content

google semantic search efforts
– suggestions for related queries
– longer snippets for queries that are longer than three words
– not really true semantic search, more a combo of brute force applied to their vast amounts of data

biggest benefits of SS will be for advertisters

SS is interesting, not a panacea, not the next big thing, but one of them

second tier search engines (ask[jeeves] etc) not doing enough to keep up with first-tier search engines
– many specialized search engines still helpful for some topics
– nobody yet fully cracked the “invisible web” problem

fun searchy tools to check out
– factual, xobni, nearby {iphone}, facesearc, fansnap, spotify (when available), kerosene and A Match

few major search engine players, likely to stay that way
competition among the major players has increased, which is good for everyone, will drive innovation
advertising can decrease as it becomes more refined, e.g. better advertising will improve effectiveness and reduce the need for advertising

q&a

how do you classify google scholar?
– google not working on developing google scholar much anymore, still useful but not much effort going into it
– focus has been diverted over to google book search, scholar will likely be incorporated into that
– google basically going into the publishing business (also the music business); stream whole songs from google search set
– google also going to develop their own phone to push the android OS

——–

Mary Ellen Bates, Owner, Bates Information Service
“Digging into the Deep Web”

slides available at batesinfo.com/il2009

google.com/translate_s
– translate your search into another language, searches in that language, translates results back for you

Wolphram|Alpha
– search-and-compute engine
– indexes systematic knowledge (data)
– works with factual queries, searches templates by topic, good disambiguation
– offers very cool comparisons, e.g. search for a given distance, like the length of a marathon
– “earthquakes near japan” — taps into earthquakes database
— shows a picture of earthquakes in japan over last two months; can change to last 10 years
— also shows a timeline around when these earthquakes occured (in a graph)
— great disambiguation, also GREAT visualization of information! (e.g. will assume Hawaii as state, but also offer as island)

DeepDyve
– federated search, some free and some pay-per-view
– primarily medical and patent searching
– can handle large chunks of text
– can search on highlighted text
– can “save” results from list, create a web bibliography as you go

Scirus
– indexes science-related sites only
– primarily free, nice advanced search
– good features for refining search
– neat limits, like only searching “scientist’s homepages”
– NOT the first place to go for open-access journals

Deep Web Technologies
– Biznar, Mednar, ScienceResearch, Worldwide Science Web
– four federated search tools (on streoids), search multiple portals and databases
– good clustering of results
– can search subsets by topic or source
– shows relative relevence using stars
– can create bibliography of selected hits

Biznar
– semi-random collection of business resources
– refine by popular topics, e.g. swine flu, pandemics, etc
– can limit by resource sets, e.g. google news or scholar, etc.
– use as a brainstorming search engine; expand your search concepts, very helpful
– very effective faceting approaches

exalead.com/search/
– one of the only deep web search engines providing a “near” function (searches within 16 words)
– near function useful when searching for new concepts (i.e. not a “subject” yet), like epedimic preparadness

Mednar
– medical research, over 60 sources
– very similar to Biznar (e.g. same engine)

ScienceResearch
– searches science.gov, OISTER, WorldWideScience, e-Print network

Worldwide Science Web
– federation of 61 national science portals and databases

——–

Mary Ellen Bates
“Searching the Collaborative Web”

– the concise nature of Twitter actually makes it an excellent search tool
— the ability of people to distill their thoughts down into something small

Blog Search
– technorati, still by far the best blog search engine
– most powerful way to find the best stuff when searching blogs
– authority filter, like relevance ranking for blogs – looks at number of people linking to the blog
— can limit searches to high authority blogs
– see who blogged about your site, read links to a blog, lots of advanced search options

Facebook
– only searches friends and exhibitionists
– useful for IDing interest groups
– limited to status updates from the last 30 days
– ID upcoming conferences
– “Even people who don’t use the web have figured out Facebook.”
– tap into groups and see their concerns, interests, questions, etc.
– a way of finding resources more than anything else, find people that have something in common

LinkedIn
– “Where you apply the “grown-up” filter to Facebook.”
– can search throughout LinkedIn
– cool limited tools: within x miles of location, in an industry, in one of your groups
– search your extended network for experts in various fields, can contact them for specific information
– search within a group, or industry, within a location – “find an expert on the ground”
– find people using criteria you might not have normally thought of

Looking for Discussions
– boardreader.com – signal-to-noise ratio drops fast
– yahoo groups – can only search group descriptions
– google groups search – can search non-google groups too

Chofter.com
– easy search-tool comparison site
– web, image, video, news, blog, books, technical search
– tabs within each category

OneRiot
– crawls links that people share on SNS’s
– avoids the noise problem
– returns web pages, not tweets, Diggs, etc.
– sort by: realtime (most current), pulse (most popular)
– very cool!

Twitter
– twitter searches that work
— monitor live action, “swine flu” or near:denver, within:25mi
— find resources, semantic search, filter:links (only displays results with links in them)
— get highlights of a conference, e.g. #il2009, compile tweets right after a conference
– search at twitter.com
— only searches last 7-9 days
— support hashtags and @searches
— real-time refresh
— good advanced search tools (is.gd/3UWsf)
– search using Google
— site:twitter.com
— blunt-instrument searching
— deep archive
– only twitter search supports hashtags, but google may support this soon

——–

Marcy Phelps, Principal, Phelps Research
“Cost-Effective Searching: Online Strategies for Tough Times”

phelpsresearch.com/il2009/

– in “everything is free on web” culture, cost-effective searching is a big deal
— we have to justify our expenses as searching experts

what’s the problem?
– budgets are down and costs are up
– high price doesn’t equal answers, just because we spend more doesn’t mean we get more
– more information out there to search ; infoglut

cost = time + expenses
– know what your hourly rate is; searching cost includes your time spent
– cost-effective is not the same as free; having to spend a lot of time sorting through free resources might be less cost effective than getting a direct answer at cost

what’s really cost effective?
– depends on various factors
— what’s your area of expertise; costs spent in your area of expertise might be money better spent
— who are your clients; have the products that support your clientele specifically
— how specialized is the request; do they need general or deep information
— what’s your time frame; you may have to spend more to get the right info in an hour than if you had days

– combine strategy and tools that match expertise and time
– always balance time costs vs search costs

strategies for balancing time costs and search costs
– know when free is good enough (and when it’s not)

– zapdata.com
— d&b data on 14 million US businesses
— can create and filter search lists
— includes industry reports (using SIC codes); good for market analysis

– USPTO.gov
— search patents and trademarks
— includes guides, manuals, and other resources
— use for overview of market and competetive information
— leave the serious patent searching to the professionals!

– edgar.sec.gov
— securities and exchange commission
— search SEC filing for information about:
— markets; competitors; management bios
— morningstar document research 10k wizard (fee-based alternative; lots more options)

– medical searching
— pubmed (http://snipurl.com/s5g8c) – abstracts
— pubmed central (http://snipurl.com/s661i) – all full-text and digitized
— clinicaltrials.gov – lots of filter options

use power searching tools and techniques

– clustering
– clustering very useful to get overview of subject, shows different aspects of a search
— biznar.com
— good facets, e.g. limit to 2009, by topic, etc.
— carrot2 (snipurl.com/s5d1p)
— includes a cluster for market research reports
— clusty.com
— more consumer-oriented, good for more popular searches

– verticals
– specialized tool; by content type, by medium, not as broad but much deeper
— pandia powersearch, directory of search tools – pandia.com/powersearch
— zibb.com (business specific) – includes real-time web for trending topics
— UMich Government Documents Center: Statistics (snipurl.com/s66jr)

Why should I care about business resources:
– “In just about every topic you can think of, business cares.”

Science Verticals
– searching very similar in the 3 listed, but content is very different, check out publications list
– scienceresearch.com
– scitopia.org – has handy icon to show publisher
– WorldWideScience.org

Free Aggregators
– websites that do all the collecting for us and put it all in one place (big time-saver!)
– scholarly electronic publishing bibliography
— not so much for searching, but great for skimming and browsing
– Alltop – aggregates news content (Guy Kawasaki) – can track all sorts of trends, from beer to libraries

Social web tools
– addictomatic.com
— latest buzz on any topic; breaks down into boxes, includes twitter, digg, youtube, blog search, etc
– whostalkin.com
— social media search
– tweetdeck.com (software download)
— good for tracking twitter, hashtags plus save searches on whatever you like

Take advantage of technology
– google trends
— break down trends by geography (down to the city level)
– silobreaker.com
— news with analytical results
— hotspots shows places named in stories on the searched topic
— e.g. in what countries is this a hot topic?
– google news archive
— really handy for creating timelines
— find out WHEN something was a hot topic
– trendrr
— will tell you if a topic was hot, and when
— tells you different sources, # of blog posts, tweets, etc
— just gives you the analytics, not the content
– yahoo pipes
— create custom feeds on a particular topic
— look for fumsi article on yahoo pipes (coming out in a couple weeks)

fumsi.com
– all articles are free!

cost-effective searching summary
– balance time costs vs search costs, know your hourly rate!
– learn about low-cost and free alternatives – know when they’re good enough and when they’re not
– become a super-searcher, know your tools and all their small hidden features
– use technology to your advantage

resources
– pay now or pay later: exposing the hidden cost of free – snipurl.com/s68a9
– when it comes to a fork in the road, take it! – snipurl.com/s68p5
– hints and tips for cost-effective searching on dialog – snipurl.com/s686n

——–

Mary Ellen Bates
“Hidden Tools and Features of the Major Search Engines”

how to determine your hourly rate
– your salary * 1.3 (includes building, energy, benefits, etc)
— SLA median salary is $72k
– divide that by the number of weeks that you actually work
— remove hours that you are out of office, not working, at conferences, etc.
— just hours AT THE DESK
– divide weeks worked by 40 hours per week

fastflip
– browse news sources visually
– skim by broad topic, most popular
– part of google labs

experimental search (google)
– tied to your google account
– comment, move to top, move to bottom
– google is LEARNING FROM YOU

show options
– facet your google search
– filter by media, date restriction
– includes related searches, wonder wheel
– extract more text, images or prices, with each snippet

google squared
– generates table of facts
– you can edit the table, compare to Wolphram|Alpha
– for more, see: snipr.com/rgp8j

bing
– extra features offered are determined by your search
— different refining options for different search types!

——–

Marcy Phelps
“Business 2.0”

– samepoint.com – conversation search engine
— organizes results by type of content
— get the most up to date information
— includes wikis, real-time web, news sources, etc.

– yahoo news RSS
— subscribe to feeds in broad categories
— create your own feeds
— includes photo thumbnails

– ibizradio.com
— business podcast directory and search engine
— find podcasts and podcast posts
— includes podcast statistics
— technology isn’t there yet for searching within the podcasts tho’ (which would be quite helpful)

– technorati
— still best tool for searching blogs
— use the authority refinement to find the top blogs on your search

– hellometro.com
— very detailed to neighborhood and zipcode levels
— parks, local bands/artists, crime stats, history and more
— also try outside.in

——–

Doris Small Helfer (writes for Searcher magazine)
“Sensational Science Sites”

– general interdisciplinary freely available science sites
— arxiv.org – open access site for over 500,000 e-prints; physics, math, compsci, quantitativebiology, quantfinance and statistics
— invention dimension – web.mit.edu/invent/ – potential inventors and innovators website directed mainly to students
— who invented what, who holds patents, inventor’s handbook of faq, how to invent, browsable
— OIAster – oaister.org – u. michigan; great open access source of digital resources
— science.gov; gateway to over 1950 scientific websites with a lot of authority!
— scirus.com – owned by Elsevier – won many awards; more than 450 million scientific items indexed to date
— great advanced search features and limits – attempt to be open source from elsevier
— scitopia.org – searches digital libraries of 21 leading science and technology societies
— many available full-text, can be clustered by author, topic, etc.
— shodor.org – computation science education, science and math – modeling and simulation technologies – animation
— worldcat.org – a good source wherever, whenever … broadly helpful, good for ILL
— worldwidescience.org – global science gateway connecting users to national and international scientific databases and portals
— includes information from 15 international member organizations

– physics websites
— physics.org – very authoritative, websites okayed by Institute of Physics – includes browse, faq, suggest titles, physics games, and more

– mathematics websites
— florida state university department of mathematics
— www.math.fsu.edu/virtual/index.php

– geology websites
— geology.com – produced by a professor at Mansfield University

– chemistry websites
— general chemistry online, antoine.frostburg.edu/chem/senese/101/index.shtml

– biology websites
— university of arizona biology project, www.biology.arizona.edu
— lots of activities and illustrations, useful for anyone interested in biology

– astronomy websites
— astrobiology life in the universe – astrobiology.nasa.gov; up-to-date and comprehensive info

——–

Marcy Phelps
“U.S. Government Sources”

– usa.gov “connect with government” tools
— social web tools and resources
— gadgets, blogs, feeds, images, and more

– bls geographic guide
— economic statistics
— displays data available by area
— easy way to identify local-level data

– american factfinder, factfinder.census.gov
— decennial census, economic census, population estimates, annual surveys
— tables, maps, narratives

– doing business in international markets
— snipurl.com/s75e3
— comes from export.gov, via Department of State

– GPOaccess.gov
— migrating to federal digital system in 2009(?); snipurl.com/s7nhs
— documents from all three branches of government

——–

Gary Price, Publisher, ResourceShelf.com
“Legal Resources”

– justia and RECAP; justia.com / recapthelaw.org
— court dockets (sometimes actual filings), legal blog search, and more

– NetrOnline.com
— specifically good for assessor records, state/county court URLs, and other public records

– GovTrack.us — tracks federal legislation

MULTIMEDIA

– the oyez project / Metavid, oyez.org / metavid.org/wiki/category:media_categories
— listen to supreme court cases being argued
— open archive of US Congress – searchable or browse by bill name, congressperson, and other categories

– openregs.com
— for federal regulations from the federal register

– bonus: the great government data mash-ups from the “Apps for America Contest”

——–

Gary Price
“Ready Reference”

– encyclopedia.com
— free online encyclopedia, features premier titles like Columbia Encyclopedia, Oxford’s World Ency, and the Ency of World Biography
— also has a helpful online dictionary (actually multiple dictionaries)

– domaintools.com
— WHOIS lookup (free) plus fee-based services including historical domain info

– ticTOCs
— free table of contents for over 12,000 scholarly journals

– country files from BBC monitoring

– Intute.ac.uk

– BONUS: clicker.com – tv guide for high quality online video



Tags: , , , , , , , , , , , , , , , ,

You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

AddThis Social Bookmark Button

Comments are closed.