What is a string?
- A string is a sequence characters.
- When we write “hello world!” what computer sees is this:
h |
e |
l |
l |
o |
|
w |
o |
r |
l |
d |
! |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
j |
a |
c |
k |
|
s |
m |
i |
t |
h |
|
|
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
|
|
String Indexing
Just a string |
f |
o |
o |
b |
a |
r |
position |
0 |
1 |
2 |
3 |
4 |
5 |
|
|
|
|
|
|
|
string |
b |
o |
b |
|
|
|
position |
0 |
1 |
2 |
|
|
|
|
|
|
|
|
|
|
string |
a |
l |
i |
c |
e |
|
position |
0 |
1 |
2 |
3 |
4 |
|
|
|
|
|
|
|
|
string |
h |
e |
l |
l |
o |
|
position |
0 |
1 |
2 |
3 |
4 |
|
A Word About String Indexing
Indexing starts from 0 Indexing ends at len() - 1 >>> s = 'rakesh' >>> s[0] 'r' >>> s[1] 'a' >>> s[2] 'k' >>> s[3] 'e' >>> s[4] 's' >>> s[5] 'h' >>> s[6] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: string index out of range >>>
First Set of Questions on Strings
1. Declare a string variable
2. Print the third letter from that string
3. Print the length of the string variable
Q: If last index of a string is 15, what is it’s length? |
Answer: 16 |
What is a slice and why is it needed?
- With indexing: we can access a character in the string.
- What if I ask you to extract just “Jack” for me?
- What about just “Smith”?
- What about “Junior”?
J |
a |
c |
k |
|
S |
m |
i |
t |
h |
|
J | u | n | i | o | r |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
Building a slice
- For Jack: starting index is: 0, ending index is: 3
- So, the slice is: [0:4]
- For Smith: starting index is: 5, ending index is: 9
- So, the slice is: [5:10]
- For Junior: starting index is: 11, ending index is: 16
- So, the slice is: [11:17]
>>> x = "Jack Smith Junior" >>> x[0:4] 'Jack' >>> x[5:10] 'Smith' >>> x[11:17] 'Junior' >>>
J |
a |
c |
k |
|
S |
m |
i |
t |
h |
|
J | u | n | i | o | r |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
Demonstrating Negative Step For a Slice
- Let us say that this time we set step = -1
- Then, the traversing of the string would happen from right to left.
- Now, let us say start index: 15
- Now, let us say end index: 5
- This will traverse the string from right to left: from index 15 to index 6 because it is exclusive of end index.
>>> x[15:5:-1] 'oinuJ htim' >>>
J |
a |
c |
k |
|
S |
m |
i |
t |
h |
|
J | u | n | i | o | r |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
One more example: For Step > 1
- Now, let us say our step = 2
- This time it will not traverse each index, it will increment by 2 to pick an index.
- And, start index is: 0
- End index is: 17 (Why 17? Because slicing is exclusive of end index.)
J |
a |
c |
k |
|
S |
m |
i |
t |
h |
|
J | u | n | i | o | r |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
>>> x[0:17:2] 'Jc mt uir' >>>
What is a Slice?
- A slice has three parts.
- First part is: start index
- Second part is: end index
- Third part is: step (optional)
- Representation: [start index: end index: step]
Default values
- Start index: 0
- End index: len(mylist) or len(mystr) when step is positive.
- Step: 1
- Negative step means direction of traversing is from right to left.
Slice (example)
j |
a |
c |
k |
|
s |
m |
i |
t |
h |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
If the string “jack smith” is given to you, then what all possible values can start index take?
Answer: 0, 1, 2, ..., 7, 8, 9
Let’s assume that step = 1.
What all possible values can end index take?
Answer: it depends on start index.
>>> x = 'jack smith'
>>> x[5:9]
'smit'
>>> x[5:2]
''
Problems on slice expansion.
- What is the expansion of [2:7]
- Start index is 2. End index is 7. Step takes the default value: 1 --> [2:7:1]
- What is the expansion of [:2]
- Start index is not given. Takes the default of 0. End index is given 2 --> [0:2:1]
- What is the expansion of [3:]
- One colon is there. Step is optional. So step takes the default value. Start index is given to be 3. End index takes the default value: len() --> [3:len():1]
- What is the expansion of [:]
- Read as: End to end. All the default values will be there --> [0:len():1]
- What is the expansion of [::-1]
- This comes in handy when you want to reverse a string.
- Since start index is not there and end index is also not there, we would assume that it would go end to end. And since step is negative we would traverse the list from right to left.
- But it is not equal to [0:len(x):-1]
String Slicing
- String is a sequence of characters.
- X = ‘jack smith’
- x[0] # j
- x[1] # a
- x[2] # c
- Q: What if you want to take out a substring?
- Substring: shorter string from a longer string.
- Syntax: x[start index : end index : step] # exclusive of end index
- Returns the string from “start index” to “end index - 1”.
- X[0:2] # slicing:- start index: 0, end index: 2
What is the output of following slicing based code:
- x = "jack smith"
- print(x[2])
- print(len(x))
- print(x[0:2])
- print(x[5:8])
- print(x[5:100])
- # Python is intelligent about end index.
- # If you give an end index larger than the last index, it automatically picks up the last index.
Solution:
j |
a |
c |
k |
|
s |
m |
i |
t |
h |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
Problem on index and slicing
- Q: Print all the characters from even indices (inc. 0) of your name.
- Q: Print all the characters from odd indices of your name.
- Q: Print every second character of your name.
- name1 = "Ashish Jain" # Ahs an
- name2 = "Sonia and Johanth"
- temp = name1[::2] # Start index, end index, step (default value of step is 1)
- print(temp)
- temp3 = name2[1::2]
- print(temp3)
Solution on index and slicing
>>> x = 'vikash gupta'
>>> x[0:len(x):2]
'vks ut'
>>>
>>> x[1:len(x):2]
'iahgpa'
>>>
Follow-up Question
- Q: What is the difference when single colon is used and when double colon is used?
- If there is only one colon while indexing a string:
- You have to assume that step is not mentioned and it is equal to 1 (the default value).
- name[0::1] # end index is maximum value possible
- name[::2] # start index: 0, end index: max value, step: 2
- name[:] # start index: 0, end index: len(), step: 1
>>> x
'vikash gupta'
>>> x[:]
'vikash gupta'
>>> x[::]
'vikash gupta'
Second Set of Questions on Strings
- 1. Reverse the string
- 2. Check if a string is a palindrome.
- Note: Palindrome is a string that is spelled the same way forward and backward. For example: mam, madam, malayalam.
Solutions
>>> x
'vikash gupta'
>>> x[::-1]
'atpug hsakiv'
>>> y = 'madam'
>>> y == y[::-1]
True
>>> z = 'malayalam'
>>> z == z[::-1]
True
>>>
>>> x == x[::-1]
False
More on Strings
Some Commonly Used String Methods
count(): Returns the number of times a specified value occurs in a string |
|
startswith(): Returns true if the string starts with the specified value |
|
endswith(): Returns true if the string ends with the specified value |
|
isalpha(): Returns True if all characters in the string are in the alphabet |
|
isdigit(): Returns True if all characters in the string are digits |
|
isspace(): Returns True if all characters in the string are whitespaces |
|
islower(): Returns True if all characters in the string are lower case isupper(): Returns True if all characters in the string are upper case |
|
lower(): Converts a string into lower case upper(): Converts a string into upper case |
|
split(): Splits the string at the specified separator, and returns a list |
|
splitlines(): Splits the string at line breaks and returns a list |
|
strip(): Returns a trimmed version of the string |
|
zfill(): Fills the string with a specified number of 0 values at the beginning |
|
Note About String in Python
- A line about Python String from the book "Pg 191, Learning Python (O'Reilly, 5e)":
- Strictly speaking, Python strings are categorized as immutable sequences, meaning that the characters they contain have a left-to-right positional order and that they cannot be changed in place. In fact, strings are the first representative of the larger class of objects called sequences that we will study here. Pay special attention to the sequence operations introduced in this post, because they will work the same on other sequence types we’ll explore later, such as lists and tuples.
- Note: All string methods returns new values. They do not change the original string.
Table 7-1. Common string literals and operations
Operation | Interpretation |
S = '' | Empty string |
S = "spam's" | Double quotes, same as single |
S = 's\np\ta\x00m' | Escape sequences |
S = """...multiline...""" | Triple-quoted block strings |
S = r'\temp\spam' | Raw strings (no escapes) print(S) # \temp\spam |
B = b'sp\xc4m' | Byte strings in 2.6, 2.7, and 3.X print(B) # b'sp\xc4m' |
U = u'sp\u00c4m' | Unicode strings in 2.X and 3.3+ print(U) # spÄm |
S1 + S2 | Concatenate |
S * 3 | repeat |
S[i] | Index |
S[i:j] | slice |
len(S) | length |
"a %s parrot" % 'kind' | String formatting expression print("a %s parrot" % 'kind') # a kind parrot |
"a {0} parrot".format('kind') | String formatting method in 2.6, 2.7, and 3.X |
S.find('pa') | String methods (see ahead for all 43): search print('a parrot'.find('pa')) # 2 |
S.rstrip() | remove whitespace from end print("!" + " okay ".rstrip() + "!") # ! okay! |
S.strip() | remove whitespace from beginning and end print("!" + " okay ".strip() + "!") # !okay! |
S.replace('pa', 'xx') | replacement print("parrot".replace('pa', 'xx')) # xxrrot |
S.split(',') | split on delimiter |
S.isdigit() | content test |
S.lower() S.upper() |
case conversion print("Parrot".lower()) # parrot print("parrot".upper()) # PARROT |
S.endswith('spam') | end test print("is this yours".endswith("yours")) # True print("my parrot".startswith("my")) # True |
'spam'.join(strlist) | delimiter join |
S.encode('latin-1') | Unicode encoding |
B.decode('utf8') | Unicode decoding, etc. |
for x in S: print(x) | Iteration |
'spam' in S | membership |
[c * 2 for c in S] | list comprehension to create a new list |
map(ord, S) | map(ord, "hello") # [104, 101, 108, 108, 111] map(lambda x: 10*x, [1,2,3,4]) # [10, 20, 30, 40] |
re.match('sp(.*)am', line) | Pattern matching: library module |