Friday, April 14, 2023

Ch 5 - Strings in Python

What is a string?

  • A string is a sequence characters.
  • When we write “hello world!” what computer sees is this:

h

e

l

l

o

w

o

r

l

d

!

0

1

2

3

4

5

6

7

8

9

10

11

j

a

c

k

s

m

i

t

h

0

1

2

3

4

5

6

7

8

9

String Indexing

Just a string

f

o

o

b

a

r

position

0

1

2

3

4

5

string

b

o

b

position

0

1

2

string

a

l

i

c

e

position

0

1

2

3

4

string

h

e

l

l

o

position

0

1

2

3

4

A Word About String Indexing

Indexing starts from 0 Indexing ends at len() - 1 >>> s = 'rakesh' >>> s[0] 'r' >>> s[1] 'a' >>> s[2] 'k' >>> s[3] 'e' >>> s[4] 's' >>> s[5] 'h' >>> s[6] Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: string index out of range >>>

First Set of Questions on Strings

1. Declare a string variable

2. Print the third letter from that string

3. Print the length of the string variable

Q: If last index of a string is 15, what is it’s length?

Answer: 16

Q: If the length is 7, what is the last index?

What is a slice and why is it needed?

  • With indexing: we can access a character in the string.
  • What if I ask you to extract just “Jack” for me?
  • What about just “Smith”?
  • What about “Junior”?

J

a

c

k

S

m

i

t

h

J u n i o r

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Building a slice

  • For Jack: starting index is: 0, ending index is: 3
  • So, the slice is: [0:4]
  • For Smith: starting index is: 5, ending index is: 9
  • So, the slice is: [5:10]
  • For Junior: starting index is: 11, ending index is: 16
  • So, the slice is: [11:17]
>>> x = "Jack Smith Junior"
>>> x[0:4]
'Jack'
>>> x[5:10]
'Smith'
>>> x[11:17]
'Junior'
>>>  

J

a

c

k

S

m

i

t

h

J u n i o r

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Demonstrating Negative Step For a Slice

  • Let us say that this time we set step = -1
  • Then, the traversing of the string would happen from right to left.
  • Now, let us say start index: 15
  • Now, let us say end index: 5
  • This will traverse the string from right to left: from index 15 to index 6 because it is exclusive of end index.
>>> x[15:5:-1]
'oinuJ htim'
>>>     

J

a

c

k

S

m

i

t

h

J u n i o r

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

One more example: For Step > 1

  • Now, let us say our step = 2
  • This time it will not traverse each index, it will increment by 2 to pick an index.
  • And, start index is: 0
  • End index is: 17 (Why 17? Because slicing is exclusive of end index.)

J

a

c

k

S

m

i

t

h

J u n i o r

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

>>> x[0:17:2]
'Jc mt uir'
>>>     

What is a Slice?

  • A slice has three parts.
  • First part is: start index
  • Second part is: end index
  • Third part is: step (optional)
  • Representation: [start index: end index: step]

Default values

  • Start index: 0
  • End index: len(mylist) or len(mystr) when step is positive.
  • Step: 1
  • Negative step means direction of traversing is from right to left.

Slice (example)

j

a

c

k

s

m

i

t

h

0

1

2

3

4

5

6

7

8

9

If the string “jack smith” is given to you, then what all possible values can start index take?

Answer: 0, 1, 2, ..., 7, 8, 9

Let’s assume that step = 1.

What all possible values can end index take?

Answer: it depends on start index.

>>> x = 'jack smith'

>>> x[5:9]

'smit'

>>> x[5:2]

''

Problems on slice expansion.

  • What is the expansion of [2:7]
  • Start index is 2. End index is 7. Step takes the default value: 1 --> [2:7:1]
  • What is the expansion of [:2]
  • Start index is not given. Takes the default of 0. End index is given 2 --> [0:2:1]
  • What is the expansion of [3:]
  • One colon is there. Step is optional. So step takes the default value. Start index is given to be 3. End index takes the default value: len() --> [3:len():1]
  • What is the expansion of [:]
  • Read as: End to end. All the default values will be there --> [0:len():1]
  • What is the expansion of [::-1]
  • This comes in handy when you want to reverse a string.
  • Since start index is not there and end index is also not there, we would assume that it would go end to end. And since step is negative we would traverse the list from right to left.
  • But it is not equal to [0:len(x):-1]

String Slicing

  • String is a sequence of characters.
  • X = ‘jack smith’
  • x[0] # j
  • x[1] # a
  • x[2] # c
  • Q: What if you want to take out a substring?
  • Substring: shorter string from a longer string.
  • Syntax: x[start index : end index : step] # exclusive of end index
  • Returns the string from “start index” to “end index - 1”.
  • X[0:2] # slicing:- start index: 0, end index: 2

What is the output of following slicing based code:

  • x = "jack smith"
  • print(x[2])
  • print(len(x))
  • print(x[0:2])
  • print(x[5:8])
  • print(x[5:100])
  • # Python is intelligent about end index.
  • # If you give an end index larger than the last index, it automatically picks up the last index.

Solution:

j

a

c

k

s

m

i

t

h

0

1

2

3

4

5

6

7

8

9

Problem on index and slicing

  • Q: Print all the characters from even indices (inc. 0) of your name.
  • Q: Print all the characters from odd indices of your name.
  • Q: Print every second character of your name.
  • name1 = "Ashish Jain" # Ahs an
  • name2 = "Sonia and Johanth"
  • temp = name1[::2] # Start index, end index, step (default value of step is 1)
  • print(temp)
  • temp3 = name2[1::2]
  • print(temp3)

Solution on index and slicing

>>> x = 'vikash gupta'

>>> x[0:len(x):2]

'vks ut'

>>>

>>> x[1:len(x):2]

'iahgpa'

>>>

Follow-up Question

  • Q: What is the difference when single colon is used and when double colon is used?
  • If there is only one colon while indexing a string:
  • You have to assume that step is not mentioned and it is equal to 1 (the default value).
  • name[0::1] # end index is maximum value possible
  • name[::2] # start index: 0, end index: max value, step: 2
  • name[:] # start index: 0, end index: len(), step: 1

>>> x

'vikash gupta'

>>> x[:]

'vikash gupta'

>>> x[::]

'vikash gupta'

Second Set of Questions on Strings

  • 1. Reverse the string
  • 2. Check if a string is a palindrome.
  • Note: Palindrome is a string that is spelled the same way forward and backward. For example: mam, madam, malayalam.

Solutions

>>> x

'vikash gupta'

>>> x[::-1]

'atpug hsakiv'

>>> y = 'madam'

>>> y == y[::-1]

True

>>> z = 'malayalam'

>>> z == z[::-1]

True

>>>

>>> x == x[::-1]

False

More on Strings

Some Commonly Used String Methods

count(): Returns the number of times a specified value occurs in a string

startswith(): Returns true if the string starts with the specified value

endswith(): Returns true if the string ends with the specified value

  • Form validation of an email ID

isalpha(): Returns True if all characters in the string are in the alphabet

  • Usage: Form validation

isdigit(): Returns True if all characters in the string are digits

  • Usage: Form validation

isspace(): Returns True if all characters in the string are whitespaces

  • Usage: Form validation

islower(): Returns True if all characters in the string are lower case

isupper(): Returns True if all characters in the string are upper case

lower(): Converts a string into lower case

upper(): Converts a string into upper case

  • Used in palindrome check.

split(): Splits the string at the specified separator, and returns a list

  • Usage: File processing

splitlines(): Splits the string at line breaks and returns a list

  • Usage: File processing

strip(): Returns a trimmed version of the string

zfill(): Fills the string with a specified number of 0 values at the beginning

  • Left padding a string with 0s (ex. phn no)

Note About String in Python

  • A line about Python String from the book "Pg 191, Learning Python (O'Reilly, 5e)":
  • Strictly speaking, Python strings are categorized as immutable sequences, meaning that the characters they contain have a left-to-right positional order and that they cannot be changed in place. In fact, strings are the first representative of the larger class of objects called sequences that we will study here. Pay special attention to the sequence operations introduced in this post, because they will work the same on other sequence types we’ll explore later, such as lists and tuples.
  • Note: All string methods returns new values. They do not change the original string.

Table 7-1. Common string literals and operations

Operation Interpretation
S = '' Empty string
S = "spam's" Double quotes, same as single
S = 's\np\ta\x00m' Escape sequences
S = """...multiline...""" Triple-quoted block strings
S = r'\temp\spam' Raw strings (no escapes)
print(S) # \temp\spam
B = b'sp\xc4m' Byte strings in 2.6, 2.7, and 3.X
print(B) # b'sp\xc4m'
U = u'sp\u00c4m' Unicode strings in 2.X and 3.3+
print(U) # spÄm
S1 + S2 Concatenate
S * 3 repeat
S[i] Index
S[i:j] slice
len(S) length
"a %s parrot" % 'kind' String formatting expression
print("a %s parrot" % 'kind') # a kind parrot
"a {0} parrot".format('kind') String formatting method in 2.6, 2.7, and 3.X
S.find('pa') String methods (see ahead for all 43): search
print('a parrot'.find('pa')) # 2
S.rstrip() remove whitespace from end
print("!" + " okay ".rstrip() + "!") # ! okay!
S.strip() remove whitespace from beginning and end
print("!" + " okay ".strip() + "!") # !okay!
S.replace('pa', 'xx') replacement
print("parrot".replace('pa', 'xx')) # xxrrot
S.split(',') split on delimiter
S.isdigit() content test
S.lower()
S.upper()
case conversion
print("Parrot".lower()) # parrot
print("parrot".upper()) # PARROT
S.endswith('spam') end test
print("is this yours".endswith("yours")) # True
print("my parrot".startswith("my")) # True
'spam'.join(strlist) delimiter join
S.encode('latin-1') Unicode encoding
B.decode('utf8') Unicode decoding, etc.
for x in S: print(x) Iteration
'spam' in S membership
[c * 2 for c in S] list comprehension to create a new list
map(ord, S) map(ord, "hello") # [104, 101, 108, 108, 111]
map(lambda x: 10*x, [1,2,3,4]) # [10, 20, 30, 40]
re.match('sp(.*)am', line) Pattern matching: library module

No comments:

Post a Comment