In order to attempt to understand how Google’s algorithm works and how LSI (Latent Semantic Indexing) is applied, you need to first understand where it all comes from.

These concepts may sound strange at first but once you start putting it all together it begins to make a lot more sense.

First and foremost Google is using artificial intelligence (AI) and linguistics within their algorithm.  The whole purpose is for the search engine to learn and adapt based on engineering and computer sciences.

Think about the evolution of video games over the past twenty years. Compare the Atari system of 1983 to the X-box of 2006 and you’ll see a great example of how the use of artificial intelligence has matured over time.

Just like popular gaming systems the search engines are also evolving by communicating, adapting and learning with its users.  Google’s algorithm is using artificial intelligence to communicate with its searchers, learn from its results and queries and adapt to become more intelligent over time.

The best way to achieve top search engine rankings, particularly in Google, is to communicate, learn and adapt… doesn’t this sound familiar?

In order to communicate, you simply need to stay informed.  In order to stay informed, you need to become aware of how Artificial Intelligence works through concepts such as latent semantic indexing, natural language processing, ontology and synonymy. Once you are aware you will be well positioned to determine how to learn and to adapt.

Google’s Artificial Intelligence

LSI helps the search engine to understand the intent of the users query in order to return relevant information and results.

There are various forms of artificial intelligence being used in Google’s algorithm; to provide better, more accurate search results to their visitors.  This includes grouping and categorizing their database based on language, geography, topics, relation to other pages and more. This is determined by an artificial intelligence called Ontology.

What is Ontology?

Ontology within computer science and applied to latent semantic indexing; is when data can be used to identify relationships such as apple to computer, vs. apple to pie.  This allows Google to ensure they do not show computer related sites when someone searches apple pie and visa versa.

The Difference Between Synonymy and Polysemy

Search engines use artificial intelligence to determine which synonyms and polysemes apply to your site.  When synonymy is used within your site, it helps to build the relevance of your pages for specific phrases.  The use of polysemes within your pages lowers the relevance of your page for specific phrases.

What is Synonymy?

Synonymy (multiple words with the same meaning) Google has identified specific phrases they consider synonyms; these synonyms are then considered the same and are counted towards your density and frequency for a specific phrase.  An example would be if you used the word car, auto, automobile, and vehicle in your page.  Your keyword count would be 4 for the search “car” and would be more relevant than if you used car and cars as your keyword phrases.

What is Polysemy?

Polysemy (single words with multiple meanings) also called homographs.  Polysemes create potential issues with relevancy; when polysemes are used, Google relies more heavily on the surrounding content and the context in which the phrase is used to determine what the synonyms are.

An example of this would be the use of the phrase “vehicle”.

If you use the phrase vehicle to describe your motorcycles for sale, you will be considered less relevant. Why?

Vehicle is a polysemy and synonyms for vehicle have been determined as auto, car, automobile and automotive.  Therefore you would need to use phrases such as Honda, Suzuki, Bike, Harley-Davidson and cycle; which are predefined synonyms for motorcycle to be more relevant.

The Google Neural Network

Neural Networks is pattern recognition used in Google algorithm to help identify what is considered natural patterns, verses unnatural patterns.

The pattern of what is considered a natural pattern is compiled on a per keyword basis across all trust rank websites and top ten websites showing up for that particular phrase.  A bell curve is then created of what is an acceptable and natural pattern that is applied to the neural network pattern recognition area within Google’s algorithm.

With LSI, ontology allows them to build concepts and relationships through words and phrases. In order to use ontology and apply it to LSI, they must create and determine which words are synonymies and which are polysemes.

Data Clustering and Classification

When determining what search results are relevant, it would run through the database of synonyms and ensure all are included; then run through the list of polysemes and ensure they are all removed.

This process is called data clustering and classification.

The end result being search results determined through information retrieval with latent Symantec indexing applied and an attempt to ensure the most relevant results being shown.

Most people think Google uses a stemming technology; however they opted to use n-gram models which have proven more effective for word matching, sequence matching and comparisons. In September, Google made their n-gram database public; which is now powered by one trillion words that they have gathered from public web pages.  N-gram models help Google’s algorithm to identify words that are dependent of one another, or words that are often times found together to complete a phrase.

The database is available for purchase at the Linguistic Data Consortium.

1st on the List has been a leading search engine optimization services company since 1997, providing professional SEO consulting services to clients throughout the United States and Canada.

For more information about 1st on the List search engine optimization services call Toll Free at 1-888-262-6687 or email our SEO Consultants.