Working with Blocks in Objective-C

Avoiding Strong Reference Cycles with the weak keyword and Using the block keyword

When using blocks in Objective-C, there is a real risk of creating a strong reference cycle. This happens when your block captures a reference to self. In other words, if you are creating a block inside a class and if inside that block you need to access a method or property belonging to said class, then you have to capture a reference to the class in which the block is being created. It sounds more confusing when trying to explain it, so let's illustrate this using an example from linguistics and hopefully things will become more clear.

Let's create a new Cocoa class that inherits from NSObject. We'll call this class Tokenizer because it will be responsible for breaking up a text into its individual words. "Tokenization" refers to this process where a string is broken up into words. Here is the Tokenizer.h header file:

           

@interface Tokenizer : NSObject

@property (atomic,strong) NSString* text;

@end

There is only one property exposed in the header file, and that is the text that we want to tokenize. Now we'll go ahead and define an initializer inside the Tokenizer.m implementation file.

        
            @implementation Tokenizer

-(id)initWithText:(NSString*)text{

    self = [super init];
    
    if(self){
        _text = text;
    }
    
    return self;
}

@end

This initializer is pretty self-explanatory. It simply accepts a string argument and uses that string to initialize the text property.

Before we go any further, we'll define two readonly properties in an extension to the Tokenizer class, as shown below:

        
            @interface Tokenizer ()

@property (readonly)NLTokenizer* tokenizer;
@property (readonly)NSRange textRange;

@end

Basically, the textRange property is defined for convenience and represents the range of the entire text that will be tokenized. The tokenizer property is an instance of NLTokenizer, which is provided by Apple's NaturalLanguage framework. Here's how these two properties are going to be implemented:

        

-(NLTokenizer *)tokenizer{
    NLTokenizer* tokenizer = [[NLTokenizer alloc] initWithUnit:NLTokenUnitWord];
    [tokenizer setString:_text];
    return tokenizer;
}

-(NSRange)textRange{
    return NSMakeRange(0, [_text length]);
}

Basically, in the case of the tokenizer object, we initialize a tokenizer with NLTokenUnitWord, since we plan to break up the text into words, and we set the string property of the tokenizer to the text that we plan to tokenize. Likewise, textRange simply provides a range from the beginning location to the ending location of the text in question.

Now let's implement a tokenize function, whose job it will be to tokenize said text and return an array of strings representing all of the words in the text:

        
            -(NSArray<NSString*>*)tokenize{
    
    Tokenizer* __weak weakSelf = self;
    
    NSMutableArray<NSString*>* tokensArray = [[NSMutableArray alloc] init];
    
    [self.tokenizer enumerateTokensInRange:self.textRange usingBlock:^(NSRange tokenRange, NLTokenizerAttributes flags, BOOL * _Nonnull stop) {
        
        NSString* token = [weakSelf.text substringWithRange:tokenRange];
        [tokensArray addObject:token];
    }];
    
    return tokensArray;
}

The main thing to notice in this code snippet is where we define weakSelf:

        
                Tokenizer* __weak weakSelf = self;

What's the point of this? By declaring a "weakly"-defined reference to self, we allow for the block in the enumerateTokens method to capture a weak reference to the current Tokenizer class, which happens to have a reference to the text that we wish to tokenize and which we therefore need to access from within the scope of the block.

What if inside the block we instead had the following code?

        
             NSString* token = [self.text substringWithRange:tokenRange];
        [tokensArray addObject:token];

If we used self.text, then would be creating a strong reference cycle. This means that the block itself would have a strong reference to the text property at the same time that the Tokenizer class has a reference to the block via the tokenize method in which the block is defined. This could cause a memory leak if, when an instance of our Tokenizer object should've been deallocated, the compiler instead didn't deallocate it because of the reference to "self" maintained by the block.

Before we go further, let's refactor the Tokenizer.h header file so that exposes the constructor as well as the tokenize() method that we just defined:

        
            #import <Foundation/Foundation.h>
#import <NaturalLanguage/NaturalLanguage.h>

NS_ASSUME_NONNULL_BEGIN

@interface Tokenizer : NSObject

@property (atomic,strong) NSString* text;
-(id)initWithText:(NSString*)text;
-(NLTokenizer*)tokenizer;
-(NSArray<NSString*>*)tokenize;
-(NSInteger)averageWordCount;
@end

NS_ASSUME_NONNULL_END

Make sure that the NaturalLangauge framework gets imported at the top of the file with the appropriate import statement:

        
            #import <NaturalLanguage/NaturalLanguage.h>

We've implemented a tokenize() method in our Tokenizer class. We will shortly test it out and make sure everything is in working order. However, before we do that, let's continue to extend the functionality of our Tokenizer class and in the process gain some more insights into the magic behind blocks.

We have a good working understanding of how and when to use the __weak keyword when capturing a reference to self inside a block. Let's add another method to our Tokenizer class called -(NSInteger)averageWordCount, which will return an NSInteger that represents the average word count for a sample text. We will also go back to the Tokenizer.h header file and add the method signature for this method in the header file.

        
@interface Tokenizer : NSObject

@property (atomic,strong) NSString* text;
-(id)initWithText:(NSString*)text;
-(NLTokenizer*)tokenizer;
-(NSArray<NSString*>*)tokenize;
-(NSInteger)averageWordCount;
@end

Back in the Tokenizer.m implementation file, here is how we go about implementing this method:

        
            -(NSInteger)averageWordCount{
    
    NSArray<NSString*>* tokens = [self tokenize];
    
    __block int sum = 0;
    
    [tokens enumerateObjectsUsingBlock:^(NSString * _Nonnull token, NSUInteger idx, BOOL * _Nonnull stop) {
        
        sum += [token length];
        
    }];
    
    return (int)(sum/tokens.count);
}

There are two interesting things to notice in this code snippet. First, the __block keyword. This allows for the cod both inside the block and that of the surrounding code to share state with regard to the int variable sum. That is, using the __block keyword when declaring a variable allows for the data in the variable to be modified within a block. The sum variable in this case is used to add up the lengths of all the words in the tokens array.

The other interesting thing to notice is the method enumerateObjectsUsingBlock, which can be called on any array in Objective-C and which allows us to perform operations on each individual element in the array on which this method is called. This method can be used to implement the map, filter, and reduce operations available in Swift.

We are ready to test our Tokenizer class, specifically the tokenize() and averageWordCount methods that we just implemented. In order to do that, we'll need some sample text. I'll be using text from news articles, which I've defined as static readonly properties on a MockData class. You can also check out this sample data here.

Finally, if you are defining this Tokenizer class within an XCode project, go back into the ViewController.m file and in the viewDidLoad() method try out the following code to test out the Tokenizer class that we've just defined:

        

    - (void)viewDidLoad {
    [super viewDidLoad];
    // Do any additional setup after loading the view.
    
    
    NSString* text1 = [MockData article1];
    
    Tokenizer* tokenizer = [[Tokenizer alloc] initWithText:text1];
    
    NSArray<NSString*>* tokens = [tokenizer tokenize];
    
    NSLog(@"Here are all the word tokens in article1: \n");
    
    [tokens enumerateObjectsUsingBlock:^(NSString * _Nonnull string, NSUInteger idx, BOOL * _Nonnull stop) {
        
        NSLog(string);
        
    }];
    
    NSLog(@"\n");
    
    NSInteger averageWordCount = [tokenizer averageWordCount];
    
    NSLog(@"The average word count is: %d",averageWordCount);
}

Now go ahead and run the program. You should be able to see the following output in the console:

        
            2019-09-24 00:58:43.012730-0400 RegexParserDemo[2346:75464] Here are all the word tokens in article1:
2019-09-24 00:58:43.012978-0400 RegexParserDemo[2346:75464] As
2019-09-24 00:58:43.013224-0400 RegexParserDemo[2346:75464] Dorian
2019-09-24 00:58:43.013554-0400 RegexParserDemo[2346:75464] makes
2019-09-24 00:58:43.013641-0400 RegexParserDemo[2346:75464] its
.
.
.
2019-09-24 00:58:43.091109-0400 RegexParserDemo[2346:75464] The average word count is: 4

The beauty and brilliance of blocks never ceases to amaze!

Working with Blocks in Objective-C

Avoiding Strong Reference Cycles with the __weak keyword and Using the __block keyword

Avoiding Strong Reference Cycles with the weak keyword and Using the block keyword