The National Archives Labs

16/06/11 UPDATE – Improving search

We have further developed our new taxonomy allowing users to filter search results using subject categories.


An automatic categorising tool allows individual document descriptions to be tagged using a list of subject categories (see below). Subject categorisation is no longer limited to collection level descriptions but now extends across metadata for files, boxes and bundles held by The National Archives. This has been achieved by building sophisticated boolean queries identifying which records should be tagged under different categories.

We have now exported the taxonomy and plugged it into a user interface, allowing testing and tuning to commence. So far, we have created over 10 million tags across The National Archives Catalogue data. Multiple tagging of one record is allowed within our framework, so we will focus on the breadth and depth of tagging during the testing process.

We welcome your feedback on these developments.

HEADER SUBJECT CATEGORY
Culture Archives and libraries
Culture Art, architecture and design
Culture Events and exhibitions
Culture Literature
Culture Museums and galleries
Culture Performing arts
Culture Photography and film
Culture Radio and television
Culture Sports
Demographic Census
Demographic Migration
Demographic Nationality
Demographic Population
Demographic Refugees
Emancipation Chartism
Emancipation Electoral reform
Emancipation Slavery
Emancipation Votes for women
Faith and belief Religions
Faith and belief Religious discrimination and persecution
Faith and belief Witchcraft
Governance Communism
Governance Democracy
Governance Devolution
Governance Disasters and emergencies
Governance Fascism
Governance Intelligence
Governance Royalty
Health and social welfare Disability
Health and social welfare Disease
Health and social welfare Education
Health and social welfare Hospitals
Health and social welfare Housing
Health and social welfare Medicine
Health and social welfare Mental illness
Health and social welfare National Health Service
Health and social welfare Poverty
Health and social welfare Sewerage
Health and social welfare Welfare
Industry Coal
Industry Construction industries
Industry Farming
Industry Fishing
Industry Forestry
Industry Iron, steel and metals
Industry Labour
Industry Manufacturing
Industry Mining and quarrying
Industry Nuclear energy
Industry Oil and gas
Industry Renewable energies
International Aid and development
International Conflict
International Disarmament
International International
International Merchant seaman
International Piracy and privateering
International Treaties and alliances
Land and property Common land
Land and property Conveyancing
Land and property Crown lands and estates
Land and property Landed estates
Land and property Manors
Land and property Maps and plans
Land and property Planning
Land and property Royal Parks
Law and order Conscientious objection
Law and order Crime
Law and order Internment
Law and order Litigation
Law and order Pardons
Law and order Policing
Law and order Prisons
Law and order Public disorder
Law and order Transportation
Law and order Treason and rebellion
Military Air Force
Military Badges and insignia
Military Medals
Military Military personnel
Military Navy
Military Operations, battles and campaigns
Military Regiments and Corps
Military Weapons
Money Banking
Money Debt
Money Government finances
Money Inflation
Money National debt
Money Pay and pensions
Money Taxation
Money Tithes
Money Trade and commerce
Science, technology and invention Communications
Science, technology and invention Computing
Science, technology and invention Research
Science, technology and invention Resources
Society Charities
Society Children
Society Clothing
Society Diaries
Society Food and drink
Society Freemasons
Society Friendly societies
Society Hunting
Society Marriage and divorce
Society Mutual societies
Society Race relations
Society Rationing
Society Sex and gender
Society Travel and tourism
Society Wills and probate
Transport Air transport
Transport Canals and river transport
Transport Railways
Transport Road transport
Transport Shipping

Comments (15)

  • Dick Lane

    I have made one visit to Kew with helpful and very good results.
    I do struggle with on line research as it seems difficult to find the right keywords e.g.I wanted a Trade Directory for 1877 for East India Merchants in Leadenhall St but could not find a way in!
    Hope this will make it easier

  • Colin Hartwright

    I have made several visits to TNA and one of the pleasures of my visits has been dealing with the most pleasant and helpful staff. Security, reception and resarch all first rate.
    But I find the online facility somewhat daunting at times. I don’t think it is very intuitive. Once you get into the swing it is very good only I seem to have to re-learn the technique each time I go on. Maybe the ‘re-vamp’ will make it a little easier for folk like me!

  • Guy

    Quite a daunting list – but seems to be a good summary of TNAs key subjects. A few initial comments: Will there be definitions eg what does “Research” mean in terms of the records but also for researchers. I think that piracy and privateers should be separate as they were different – one legal and the other illegal. Please don’t use abbreviations eg PoW = prisoners of war – but then PoWs could be military personnel. I can imagine how difficult it is to create subject headings for such a wide variety of records covering 1000 years of history and an empire.

  • Simon Wilson

    David – really interested to see the proposals for extending the search, we are also looking at moving towards facet searching as part of our work with the Fedora digital repository.

    Are these facets an integrated part of the catalogue? We are looking to develop our EAD finding aids and using subject and dates with-in the EAD context would appear to be the most obvious approach – is this what you have done?

    There is an option to import the UK Archival Thesaurus into CALM but it appears that you have already identified a fixed number of broad terms to use.

    If memory serves correctly the catalogue data is held slightly differently than with Record Offices using CALM and creating EAD-based finding aids.

    Simon Wilson
    Hull University Archives

  • Elizabeth Tippins

    Impossible to find a “personal name” with just a name and a date and no county!

  • Jacqui Kirk

    I’m all in favour of making searching simpler but don’t forget to enable us to search under the class no eg WO 97 as well so that those of us who have already mastered the old system’s reference numbers and its categories don’t have to go all round the houses for the information we know is there somewhere because we looked at it six months ago but we can’t find it now.

    Access2Archives for instance is now so difficult to use since we are no longer able to search by repository that it is almost useless.

  • Cliff Falconer

    Do you guys have any idea how difficult captcha codes are fro the visually impaired.

    Most of us simply stop using the resource as soon as we come up against these damn things.

    Regards,
    Cliff

  • John Lewis

    Your example is interesting, but the 1st Duke of Wellington was Arthur Wellesley. Will “Wellington” find entries for Wellesley?

  • Sarah

    This is a good start. Presumably multiple categories within a facet can be assigned if relevant. I agree with the earlier post – abbreviations must be avoided. I don’t think examples are useful in subject headings and suggest replacing Resources (e.g. Water) with “Energy Resources” and/or “Natural Resources”.

  • Maria

    Am really interested to see the proposals for extending the search
    , we are also looking at moving towards facet searching as part of our work with the Fedora digital repository.

    Are these facets an integrated part of the catalogue? We are looking to develop our EAD finding aids and using subject and dates with-in the EAD context would appear to be the most obvious approach – is this what you have done?

    There is an option to import the UK Archival Thesaurus into CALM but it appears that you have already identified a fixed number of broad terms to use.

    If memory serves correctly the catalogue data is held slightly differently than with Record Offices using CALM and creating EAD-based finding aids.

    Maria

  • Lkenneth

    Interesting Article! Thanks. I agree with Cliff reading Captchas are hard especially some of the google ones and my eyes are fine.

  • Sean O'Connor

    capcha’s are NOT accessible to some categories of people who may wish to use the service principally people with combined sight and hearing impairment will not be able to decipher the capcha information correctly. however set against that scanned documents that are taken from original handwritten material present the same(or similar) barriers.

  • Mick Trumpeter

    Anything to improve the search is welcomed I have been trying to use this site off and on for two years or more and I have only ever found ONE item I have searched for and it cost me £2 to see if it was the correct one. This is not the National Archives but the Bank of National Secrets. If you want information look somewhere else.

    The National Archives reply:

    Mick – thank you for your feedback, we are sorry to hear you have had problems with the search. Have you seen our research guidance on the website? http://www.nationalarchives.gov.uk/records/research-guide-listing.htm You can search guides by topic, these may help you to find what you’re looking for.

  • Brian Talbot

    I agree with Guy in regards to this being quite a daunting list but so interesting if not
    ironic to see terms such as debt,poverty and banking appearing in your cloud graphic above.

    I think it was a wise move to free subject categories so they are no longer limited to
    “collection level descriptions”.

    10 million individual tags is quite an accomplishment, so well done to all involved.

Leave a comment




Comment validation by @