892. Alien Dictionary#

Difficulty: Hard

Link to Problem: To see the Alien Dictionary problem on LintCode, click here!


There is a new alien language which uses the latin alphabet. However, the order among letters are unknown to you. You receive a list of non-empty words from the dictionary, where words are sorted lexicographically by the rules of this new language. Derive the order of letters in this language.

Constraints:

  • You may assume all letters are in lowercase.

  • The dictionary is invalid, if string a is prefix of string b and b is appear before a.

  • If the order is invalid, return an empty string.

  • There may be multiple valid order of letters, return the smallest in normal lexicographical order.

  • The letters in one string are of the same rank by default and are sorted in Human dictionary order.

Probelm Explanation:#

The problem at hand involves determining the order of letters in a new alien language that uses the Latin alphabet. However, the order of letters in this alien language is unknown. To solve this problem, we are given a list of non-empty words from the dictionary, where the words are sorted lexicographically based on the rules of the alien language. The goal is to derive the correct order of letters in this new language.

Here’s a more detailed explanation of the problem:

  1. The alphabet: The new alien language uses the Latin alphabet, which consists of lowercase letters.

  2. Dictionary: We are provided with a list of non-empty words. These words are sorted in lexicographical order based on the rules of the alien language. The order of letters in the words reflects the correct order in the new language.

  3. Invalid order: The problem specifies that the dictionary is considered invalid if there exists a situation where one string ‘a’ is a prefix of another string ‘b,’ and ‘b’ appears before ‘a’ in the dictionary. This condition ensures that there are no inconsistencies in the order of letters.

  4. Multiple valid orders: It’s possible that there are multiple valid orders of letters that satisfy the provided dictionary. However, we are instructed to return the smallest valid order in normal lexicographical order.

To solve this problem, you need to analyze the given list of words and determine the correct order of letters in the alien language while adhering to the specified constraints. You should return the order as a string in lexicographical order if it’s valid, and an empty string if the order is invalid.

The key to solving this problem is to construct a directed graph of letter relationships based on the order information provided by the words in the dictionary. Then, perform a topological sort on this graph to derive the order of letters, ensuring that the order is consistent and valid according to the given constraints.

Solution:#

Here’s a Python function to implement this algorithm:

from collections import defaultdict, deque

def alienOrder(words):
    # Create an adjacency list to represent the graph
    graph = defaultdict(list)
    
    # Create a dictionary to store the in-degrees of each letter
    in_degree = {}
    
    # Initialize in-degrees to 0 for all letters
    for word in words:
        for char in word:
            if char not in in_degree:
                in_degree[char] = 0
    
    # Build the graph and update in-degrees
    for i in range(1, len(words)):
        word1, word2 = words[i-1], words[i]
        min_length = min(len(word1), len(word2))
        
        # Compare characters in the two words
        j = 0
        while j < min_length and word1[j] == word2[j]:
            j += 1
        
        if j < min_length:
            # If word1[j] is lexicographically before word2[j], add an edge
            graph[word1[j]].append(word2[j])
            in_degree[word2[j]] += 1
            
            # Break the loop to avoid further comparisons
            break
    
    # Perform topological sorting using Kahn's algorithm
    result = []
    queue = deque([char for char in in_degree if in_degree[char] == 0])
    
    while queue:
        char = queue.popleft()
        result.append(char)
        
        for neighbor in graph[char]:
            in_degree[neighbor] -= 1
            if in_degree[neighbor] == 0:
                queue.append(neighbor)
    
    # Check if there is a valid order
    if len(result) < len(in_degree):
        return ""
    
    return "".join(result)

Explanation:#

The code is designed to determine the order of letters in an unknown alien language that uses the Latin alphabet. It takes a list of non-empty words from a dictionary as input, where the words are sorted lexicographically based on the rules of this alien language.

The code follows these main steps:

  1. Create a data structure to represent the graph of letters and their relationships in the alien language. It uses a defaultdict to store a list of letters that come after each letter. Additionally, it uses a dictionary called in_degree to keep track of the in-degrees (the number of letters that precede a letter) for each letter.

  2. Initialize the in-degrees of all letters to 0. This is done by iterating through all the words and characters in the words, adding each character to the in_degree dictionary with an initial in-degree of 0.

  3. Build the graph by comparing adjacent words in the dictionary. For each pair of adjacent words, the code finds the first differing character between the two words. If such a character exists, it means there’s an order relationship between the letters represented by these characters. The code updates the graph and in-degrees accordingly.

  4. After constructing the graph, the code performs a topological sorting of the letters using Kahn’s algorithm. It starts by initializing an empty result list and a queue containing letters with in-degrees of 0. The algorithm repeatedly removes a letter with an in-degree of 0 from the queue, adds it to the result list, and updates the in-degrees of its neighbors. This process continues until the queue is empty.

  5. Finally, the code checks if the topological sorting was successful by comparing the length of the result list with the number of unique letters in the input. If the lengths differ, it means there’s a cycle in the graph, and the order is invalid. In such a case, an empty string is returned. Otherwise, the result list is joined together to form the smallest order of letters in normal lexicographical order, and it is returned as the output.

The code provides a way to determine the order of letters in the alien language while handling various edge cases, including checking for invalid order conditions.

Test cases:#

Here’s how you can use this solution:

# Example 1
words1 = ["wrt", "wrf", "er", "ett", "rftt"]
print(alienOrder(words1))  # Output: "wertf"
wrtef
# Example 2
words2 = ["z", "x"]
print(alienOrder(words2))  # Output: "zx"
zx

Time and Space Complexity Analysis#

Let’s analyze the time and space complexity of the provided code for determining the order of letters in an alien language.

Time Complexity:

  1. Building the graph: The code iterates through the list of words and compares adjacent words. In the worst case, where all words have the same prefix, it takes O(N) time to build the graph, where N is the total number of characters in all the words.

  2. Topological sorting: Performing a topological sort on the graph takes O(V + E) time, where V is the number of unique letters (vertices) and E is the number of relationships between letters (edges) in the graph.

Overall, the time complexity is O(N + V + E), where N is the total number of characters in the words, V is the number of unique letters, and E is the number of order relationships between letters.

Space Complexity:

  1. Graph and In-degrees: The code uses data structures to represent the graph and in-degrees. The space complexity for these data structures is O(V + E), where V is the number of unique letters, and E is the number of order relationships between letters in the words.

  2. Result List: The result list stores the order of letters, which can have a maximum size of V, where V is the number of unique letters.

  3. Queue: The space used by the queue during topological sorting is also O(V).

Overall, the space complexity is O(V + E) due to the data structures used for graph representation and the space needed for storing the result and the queue.

In the worst case, where there are many unique letters and many order relationships, the space complexity is dominated by the number of unique letters and their relationships.

In summary, the time complexity of the code is O(N + V + E), and the space complexity is O(V + E). The actual performance will depend on the specific input data, such as the number of unique letters and the relationships between them in the alien language.

Challenging Exercises:#

  1. Multiple Valid Orders: Extend the problem to handle cases where there are multiple valid orders of letters in the alien language. Modify the code to return all valid orders instead of just one. Be mindful of performance.

  2. Detect Invalid Dictionary: Given a dictionary of words, write a function to detect whether the dictionary is valid or not based on the constraints mentioned in the problem statement.