Search Engine Optimization Articles, Tips and Advice

Search Engine Optimization Articles, Tips and Advice


In order to attempt to understand Google’s algorithm and how LSI (Latent Semantic Indexing) is applied, you need to first understand where it all comes from.

It all may sound weird at first; but once you try putting it all together it starts to make more sense.
First and foremost Google is using artificial intelligence (AI) and linguistics within their algorithm.  The whole purpose is for the search engine to learn and adapt based on engineering and computer sciences.
Think about the evolution of video games over the past twenty years; compare the Atari system of 1983 to the X-box of 2006; this is a great example of the use of artificial intelligence.

The search engines are evolving; the only way to evolve is to communicate, adapt and learn.  Google’s algorithm is using artificial intelligence to communicate with its searchers, learn from its results and queries and adapt to become more intelligent over time.

The best way to achieve top search engine rankings, particularly in Google, is to communicate, learn and adapt… hmmm, that sounds familiar, right?

In order to communicate, you simply need to stay informed.  In order to stay informed you need to become aware of what things are like latent semantic indexing, natural language processing, ontology and synonymy, in order to determine how to learn and to adapt.

Artificial Intelligence

LSI helps the search engine to understand the intent of the users query in order to return relevant information and results.

There are various forms of artificial intelligence being used in Google’s algorithm; to provide better, more accurate search results to their visitors.  This includes grouping and categorizing their database based on language, geography, topics, relation to other pages and more.

This is determined by an artificial intelligence called ontology.

Ontology within computer science and applied to latent semantic indexing; is when data can be used to identify relationships such as apple to computer, vs. apple to pie.  This allows Google to ensure they do not show computer related sites when someone searches apple pie; and visa versa.

Search engines use artificial intelligence to determine which synonyms and polysemes apply to your site.  When synonymy is used within your site, it helps to build the relevance of your pages for specific phrases.  The use of polysemes within your pages lowers the relevance of your page for specific phrases.

Synonymy (multiple words with the same meaning) Google has identified specific phrases they consider synonyms; these synonyms are then considered the same and are counted towards your density and frequency for a specific phrase.  An example would be if you used the word car, auto, automobile, and vehicle in your page.  Your keyword count would be 4 for the search “car” and would be more relevant than if you used car and cars as your keyword phrases.

Polysemy (single words with multiple meanings) also called homographs.  Polysemes create potential issues with relevancy; when polysemes are used, Google relies more heavily on the surrounding content and the context in which the phrase is used to determine what the synonyms are.

An example of this would be the use of the phrase “vehicle”.

If you use the phrase vehicle to describe your motorcycles for sale, you will be considered less relevant. Why?

Vehicle is a polysemy and synonyms for vehicle have been determined as auto, car, automobile and automotive.  Therefore you would need to use phrases such as Honda, Suzuki, Bike, Harley-Davidson and cycle; which are predefined synonyms for motorcycle to be more relevant.

Neural Networks is pattern recognition used in Google algorithm; it helps to identify what is considered natural patterns, verses unnatural patterns.

The pattern of what is considered a natural pattern is compiled on a per keyword basis across all trust rank websites and top ten websites showing up for that particular phrase.  A bell curve is then created of what is an acceptable and natural pattern that is applied to the neural network pattern recognition area within Google’s algorithm.

With LSI, ontology allows them to build concepts and relationships through words and phrases.
In order to use ontology and apply it to LSI, they must create and determine which words are synonymies and which are polysemes.

When determining what search results are relevant, it would run through the database of synonyms and ensure all are included; then run through the list of polysemes and ensure they are all removed.

This process is called data clustering and classification.

The end result being search results determined through information retrieval with latent Symantec indexing applied and an attempt to ensure the most relevant results being shown.

Most people think Google uses a stemming technology; however they opted to use n-gram models; which have proven more effective for word matching, sequence matching and comparisons. In September, Google made their n-gram database public; which is now powered by one trillion words that they have gathered from public web pages.  N-gram models help Google’s algorithm to identify words that are dependent of one another, or words that are often times found together to complete a phrase.

The database is available for purchase at the Linguistic Data Consortium; http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13.

1st on the List Promotion Inc has been a leading search engine optimization services company since 1997, providing professional SEO consulting services to clients throughout the United States and Canada.

For more information about 1st on the List search engine optimization services call Toll Free at 1-888-262-6687 or email our SEO consultant.

Share, Bookmark and Enjoy:
  • del.icio.us
  • Digg
  • Facebook
  • Google
  • YahooMyWeb
  • Sphinn
  • Mixx
  • Technorati
  • Furl
  • blogmarks
  • StumbleUpon
Posted by Angela on Wednesday, August 20th, 2008

One Response to “LSI & Artificial Intelligence Behind Google’s Algorithm” Add your own

Post A Comment

Page copy protected against web site content infringement by Copyscape